Final course project – 2021
For the final course project (fcp), you will handle a real-world dataset. In particular, you are required to collect, store, manipulate, and analyze Airbnb data. This data are publicly available via Inside Airbnb.
Data
The data are available via Inside Airbnb under the “get the data” section, where you have to collect the latest available data for London. Hence, you must use the following CSVs1:
| data | content |
|---|---|
| listings.csv.gz | Detailed Listings |
| calendar.csv.gz | Detailed Calendar Data |
| reviews.csv.gz | Detailed Review Data |
Tasks
You are required to choose your preferred DBMS – PostgreSQL or MongoDB – and:
- Clean, manipulate, and structure data. The expected result is a well-designed dataset that complies with the specific approach of the chosen DBMS.
- Provide valuable descriptive insights. The expected result is a set of descriptive statistics that depicts some interesting trends or noteworthy data characteristics.
- [optional] Perform an insightful data analysis. For example, you can use the available features to classify the offerings or predict their value. You may want to skim through the reference list provided to get some inspiration.
To perform task 1 and 2, you need to use either SQL or MQL2 (MongoDB query language).
Alternatively, if you prefer using python, you can leverage psycopg2 or pymongo.
For what concerns task 3, you should use PySpark (e.g., you may want to leverage on the
MLlib pyspark library).
Deliverables
By July 16th (8:00 PM, London time), groups have to upload:
- SQL, JS, or Python scripts;
- Supporting documentation (accepted format: .md, .docx, or .pdf) containing:
- a detailed justification of your design choices;
- a clear and concise description of the insights coming from descriptive statistics obtained;
- [optional] a clear and concise description of further insights and results obtained analyzing data through PySpark.
References
Here you can find some academic articles dealing with Airbnb:
- Barron, K., Kung, E., & Proserpio, D. (2021). The effect of home-sharing on house prices and rents: Evidence from Airbnb. Marketing Science, 40(1), 23-47. [link]
- Sun, S., Zhang, S., & Wang, X. (2021). Characteristics and influencing factors of Airbnb spatial distribution in China’s rapid urbanization process: A case study of Nanjing. PloS one, 16 (3). [link]
- Chang, H. H., & Sokol, D. D. (2020). How incumbents respond to competition from innovative disruptors in the sharing economy — The impact of Airbnb on hotel performance. Strategic Management Journal. [link]
- Boon, W. P., Spruit, K., & Frenken, K. (2019). Collective institutional work: the case of Airbnb in Amsterdam, London and New York. Industry and Innovation, 26 (8), 898-919. [link]
- Deboosere, R., Kerrigan, D. J., Wachsmuth, D., & El-Geneidy, A. (2019). Location, location and professionalization: a multilevel hedonic analysis of Airbnb listing prices and revenue. Regional Studies, Regional Science, 6 (1), 143-156. [link]
- Ye, P. , Qian, J., Chen, J., Wu, C., Zhou, Y., De Mars, S., Yang, F. and Zhang, L. (2018). Customized Regression Model for Airbnb Dynamic Pricing. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ‘18). Association for Computing Machinery, New York, NY, USA, 932–940. [link]
- Abrahao, B., Parigi, P., Gupta, A., & Cook, K. S. (2017). Reputation offsets trust judgments based on social biases among Airbnb users. Proceedings of the National Academy of Sciences, 114 (37), 9848-9853. [link]
Here you can find some further readings:
- Lee, Dave (2021). Airbnb claims ‘resilience’ as bookings begin to bounce back. Financial Times, Feb. 25. [link]
- The Economist (2020). Airbnb guests seek out cleaner properties in the pandemic. The Economist, Dec. 8. [link]
- Glusac, Elaine (2020). Hotels vs Airbnb: Has Covid-19 Disrupted the Disrupter? The New York Times, Nov. 16. [link]
Notes
1: You can find further information on data structure here.
2: Choosing MongoDB, you may be interested in $lookup; see further info here.