Skip to the content.

Final course project – 2023

For the final course project (fcp), you will handle a real-world dataset. In particular, you are required to store, manipulate, and analyze bug data for the Mozilla project. Mozilla is a peculiar example of open source software (OSS) development originating from Netscape, a computer services company.

OSS challenges common managerial assumptions on the organizing and functioning of organizations (Gulati et al., 2012), attracting the interest of research enthusiasts from several disciplines (e.g., information systems, management, and sociology). But the OSS phenomenon is also extremely relevant from a business perspective. For example, you may think of the Python project or the father of all – Linux (powering NASA projects, Chrome OS, Android hardware, and the largest share of worldwide servers). The OSS experience keeps offering fresh business and research insights and may guide us in the next phase of organizing technologies based on remote work.

Mozilla has employed the Bugzilla software as a bug-tracking system since its early stages. For this project, you will handle bugs for the 1997-2003 development window.

Tasks

You are required to choose your preferred DBMS – PostgreSQL or MongoDB – and:

  1. Clean, manipulate, and structure data. The expected result is a well-designed dataset that complies with the specific approach of the chosen DBMS.
  2. Provide valuable descriptive insights. The expected result is a set of descriptive statistics that depicts some interesting trends or noteworthy data characteristics.
  3. [optional] Perform insightful data analysis. For example, you can try to uncover how organizational problems are allocated to participants. You may want to check the recent work of Tonellato et al. (2023) to get a sense of how to perform such a task. If this topic does not fit your interests, you may want to skim through the reference list provided to get some inspiration.

To perform tasks 1 and 2, you need to use either SQL or MQL (MongoDB query language). Alternatively, if you prefer using Python, you can leverage psycopg2 or pymongo. For what concerns task 3, you should use PySpark (e.g., you may want to leverage the MLlib pyspark library).

Data

The fcp is based on bug data for the Mozilla project hosted at Bugzilla. A bug is a defect in the design, manufacture or operation of software generating undesired results or impeding operation (Wikipedia definition).

The data can be retrieved at this link. In particular, you can find:

folder content time frame size #bugs
archive-mozilla-bugs bug history 1997-03-19 - 2003-08-05 2.9 GB 215,173

Data are stored in pickle format. Please, check load_data.py for an example of loading pickle files.

To get a sense of the data structure, I suggest you explore the available data either in MongoDB or Python. Also, check the Bugzilla@Mozilla documentation to get further info.

If you are interested in expanding the data collected, you can consider the following library:

Deliverables

By July 21st (4:00 PM, London time), groups have to upload:

Please keep the supporting document within 3,000 words excluding tables and figures.

References

Here you can find some academic articles dealing with open-source software:

Here you can find some further readings:

A cool documentary on the transition at Netscape from proprietary to open-source software:

An amazing documentary on the early stages of open source: