Skip to the content.

Final course project – 2021

For the final course project (fcp), you will handle a real-world dataset. In particular, you are required to collect, store, manipulate, and analyze Airbnb data. This data are publicly available via Inside Airbnb.

Data

The data are available via Inside Airbnb under the “get the data” section, where you have to collect the latest available data for London. Hence, you must use the following CSVs1:

data content
listings.csv.gz Detailed Listings
calendar.csv.gz Detailed Calendar Data
reviews.csv.gz Detailed Review Data

Tasks

You are required to choose your preferred DBMS – PostgreSQL or MongoDB – and:

  1. Clean, manipulate, and structure data. The expected result is a well-designed dataset that complies with the specific approach of the chosen DBMS.
  2. Provide valuable descriptive insights. The expected result is a set of descriptive statistics that depicts some interesting trends or noteworthy data characteristics.
  3. [optional] Perform an insightful data analysis. For example, you can use the available features to classify the offerings or predict their value. You may want to skim through the reference list provided to get some inspiration.

To perform task 1 and 2, you need to use either SQL or MQL2 (MongoDB query language). Alternatively, if you prefer using python, you can leverage psycopg2 or pymongo. For what concerns task 3, you should use PySpark (e.g., you may want to leverage on the MLlib pyspark library).

Deliverables

By July 16th (8:00 PM, London time), groups have to upload:

References

Here you can find some academic articles dealing with Airbnb:

Here you can find some further readings:


Notes

1: You can find further information on data structure here.

2: Choosing MongoDB, you may be interested in $lookup; see further info here.