Spring 2024
When: TR 2:30PM - 4:00PM
Where: 32-155
Instructors: Samuel R. Madden (madden AT csail.mit.edu)
Michael Cafarella (michjc AT csail.mit.edu)
TAs: Matthew Russo (mdrusso AT mit.edu)
Xinjing Zhou (xinjing AT mit.edu)
Office Hours:
Tuesday 5:00PM - 6:00PM - G32 9th Floor Lounge - Matthew
Wednesday 5:00PM - 6:00PM - G32 9th Floor Lounge - Xinjing
Please post questions/comments/concerns to Piazza.


This class will survey techniques and systems for ingesting, efficiently processing, analyzing, and visualizing large data sets. Topics will include data cleaning, data integration, scalable systems (relational databases, NoSQL, Spark, etc.), analytics (data cubes, scalable statistics and machine learning), fundamental statistics and machine learning and scalable visualization of large data sets. The goal of the class is to gain working experience along with in-depth discussions of the topics covered. Students should have a background in programming and algorithms. There will be a semester-long project, as well as paper and hands-on labs designed to give experience with state-of-the-art data processing tools.

Classes consist of lectures and readings related to course topics. Grades are based on a semester long project, and about 6-8 labs of varying length, and two exams. For more information about the schedule and assignments, use the links at the top of the page.

Last change: 2/06/2024.