This is the website for the Fall 2019 iteration of this course. For the current iteration, please click here.
|Instructors:||Tim Kraska (kraska AT csail.mit.edu)|
|Samuel Madden (madden AT csail.mit.edu)|
|Instructor office hours:||by appointment|
|TAs:||Matthew Perron (mperron AT csail.mit.edu)|
|Joana M. F. da Trindade (jmf AT csail.mit.edu)|
|TA office hours :||Monday 11AM-12PM in the 32-G9 Lounge|
|Tuesday 2-3PM in the 32-G9 Lounge|
|Please post questions/comments/concerns to Piazza|
This class will survey techniques and systems for ingesting, efficiently processing, analyzing, and visualizing large data sets. Topics will include data cleaning, data integration, scalable systems (relational databases, NoSQL, Spark, etc.), analytics (data cubes, scalable statistics and machine learning), fundamental statistics and machine learning and scalable visualization of large data sets. The goal of the class is to gain working experience along with in-depth discussions of the topics covered. Students should have a background in programming and algorithms. There will be a semester-long project and paper, and hands-on labs designed to give experience with state of the are data processing tools.
Classes consist of lectures and readings related to course topics. Grades in 6.S080 are assigned based on a semester long project, and about 10 labs of varying length. For more information about the readings and assignments, use the links at the top of the page.