Spring 2022
Units: 3-0-9
When: MW 2:30-4:00
Where: 37-212
Instructors: Tim Kraska (kraska AT csail.mit.edu)
Samuel R. Madden (madden AT csail.mit.edu)
Instructor office hours: by appointment
TAs: Markos Markakis (markakis AT mit.edu)
Amadou Ngom (ngom AT mit.edu)
TA office hours : TBD
Please post questions/comments/concerns to Piazza


This class will survey techniques and systems for ingesting, efficiently processing, analyzing, and visualizing large data sets. Topics will include data cleaning, data integration, scalable systems (relational databases, NoSQL, Spark, etc.), analytics (data cubes, scalable statistics and machine learning), fundamental statistics and machine learning and scalable visualization of large data sets. The goal of the class is to gain working experience along with in-depth discussions of the topics covered. Students should have a background in programming and algorithms. There will be a semester-long project and paper, and hands-on labs designed to give experience with state of the are data processing tools.

Classes consist of lectures and readings related to course topics. Grades in 6.S079 are assigned based on a semester long project, and about 10 labs of varying length. For more information about the readings and assignments, use the links at the top of the page.

Last change: 12/16/2021.