Skip to the content.

Data Management Systems

dms-smm695

Databases are always there, even if you do not know. Searching for a product on e-commerce, writing a message to a friend, or looking for a paper to cite in your thesis, you are interacting with a database. For this reason, databases are a fundamental component of the digital era, and this is also why it is worth knowing their basic functioning.

Through the years, the world of databases has faced several developments with always new approaches to structuring, storing, and interacting with data. From the relational model to more flexible systems, the journey of databases is in constant evolution.

Instructor

Name: Matteo Devigili

Contacts: matteo.devigili.2@city.ac.uk

Webinar: Friday — 11:00 - 12:50 (Zoom)

Office hours: Friday — 15:00 - 17:00 (Zoom)

Module Overview

This module focuses on storing, querying, and manipulating data. In particular, we will discuss PostgreSQL (a prominent, advanced, and open-source relational database) and MongoDB (a schema-free database especially useful with evolving streams of data). In the last week, a more exploratory lecture (not strictly required to complete the final coursework) will drive you through Apache Spark (a cluster-computing framework that can scale SQL, machine learning, and network analysis pipelines) leveraging on PySpark.

Materials & Readings

For this course, you do not have to buy any books, but you need to go through the following:

Furthermore, I will provide you with some not mandatory and not rated homework to test your understanding of the lecture.

The following references concern additional material you may be interested in:

Learning Objectives and Assessment

At the end of the module, students should be able to:

In terms of assessment, students are required to deliver one group-level coursework project (so, no final examination or individual assignments).

The final course project will be launched in week 5, and submissions will be evaluated on a rolling-based window and are due by July 19 (4:00 PM London Time). Students will be required to deal with real-world data from scratch, thus implementing what learned during this module.

The project will be evaluated along with the following criteria: i) appropriate use of notions and frameworks discussed in class; ii) effectiveness of the proposed answer or solution; iii) appropriate explanation of the proposed solution; iv) organization and clarity of submitted materials. All criteria carry out an equal weight in terms of the mark.

Organization of the Module

The following table shows the schedule of the module. Based on students’ progress throughout the module, the topics included could suffer from some minor changes.

All the pre-recorded material is already available on the course GitHub page or GitHub repo.

An interactive webinar will be held each Friday from 11:00 to 12:50 London time. Students are expected to go through the weekly pre-recorded material in advance. In the first part of the class, I will provide a recap of the video recording and answer students’ questions concerning the topics covered. Note: students are invited to share their questions via email the day before the webinar (by 8:00 PM London time). In the second part, I will discuss some further applications of the topic covered.

To recap:

Week (dd-mm) Agenda Topics Material
1 (24-05) PostgreSQL Introduction to RDMS Lecture
    PostgreSQL (psql and pgAmin4) Webinar
    Installation  
    Create (Database, Schema, Table)  
    Data types:  
    — Numeric  
    — Monetary  
    — Character  
    — Date and time  
    Drop (Database, Schema, Table)  
2 (31-05)   Constraints: Lecture
    — Not Null Webinar
    — Unique  
    — Primary Key  
    — Check  
    Import data  
    Basic SQL  
    Aggregate functions  
    Grouping  
3 (07-06)   Foreign Key Lecture
    Joins: Webinar
    — Inner  
    — Left/Right/Full (Outer)  
    — Cross  
    Export data  
4 (13-06) MongoDB Introduction to MongoDB Lecture
    Installation Webinar
    CRUD operations:  
    — Insert  
    — Find  
    — Update (Replace)  
    — Delete (Drop)  
5 (21-06)   Load data Lecture
    Query and Projection Operators Webinar
    Introduction to the Aggregation Framework  
    Data Export  
6 (28-06) PySpark Introduction to PySpark Lecture
    Connection to PostgreSQL and MongoDB Webinar
    Regression module  
    NLP examples  

Software requirements

During the course, students will be guided to install:

Check the environment folder for further info.

We will also interact with Amazon RDS and MongoDB Atlas, so please be sure to have a stable internet connection.

To follow the lectures in weeks 2, 4, 6 and webinars, you need to run Python >= 3.7. The easiest way to do that is to install Anaconda.

Jump to

Lectures Webinars Utils
1 1 Python environment set-up
2 2 Tutorials
3 3 Past assignments
4 4 Final-Course-Project
5 5  
6 6