Spring 2022
Units: 3-0-9
When: MW 2:30PM - 4:00PM
Where: 37-212
Instructors: Tim Kraska (kraska AT csail.mit.edu)
Samuel R. Madden (madden AT csail.mit.edu)
TAs: Markos Markakis (markakis AT mit.edu)
Amadou Ngom (ngom AT mit.edu)
Office hours :Monday 4:00PM - 5:00PM - 37-212 - Markos
Tuesday 6:30PM - 7:30PM - Online (link on Piazza) - Markos
Friday 11:00AM - 12:00PM - 24-323 - Amadou
Saturday 11:00AM - 12:00PM - Online (link on Piazza) - Amadou
Please post questions/comments/concerns to Piazza.


This class will survey techniques and systems for ingesting, efficiently processing, analyzing, and visualizing large data sets. Topics will include data cleaning, data integration, scalable systems (relational databases, NoSQL, Spark, etc.), analytics (data cubes, scalable statistics and machine learning), fundamental statistics and machine learning and scalable visualization of large data sets. The goal of the class is to gain working experience along with in-depth discussions of the topics covered. Students should have a background in programming and algorithms. There will be a semester-long project and paper, and hands-on labs designed to give experience with state of the are data processing tools.

Classes consist of lectures and readings related to course topics. Grades in 6.S079 are assigned based on a semester long project, 6 labs of varying length and 2 quizzes. For more information about the schedule and assignments, use the links at the top of the page.

Last change: 1/27/2022.