Data Systems Group (DSG) @ MIT

Over the past decade, AI has made substantial methodological advances in learning the complex relationships that have evolved among data. In addition, “deep learning” has excelled at a number of perceptual tasks, including image recognition and speech processing. These enhancements have enabled applications from personal digital assistants to autonomous vehicles. An open question, however, is: How far can AI technology be pushed into other application domains?

We founded the Data Systems and AI Lab (DSAIL) to explore this frontier by going beyond the use of AI for automating simple perceptual tasks to investigating opportunities to enhance and optimize large-scale data systems and enterprise applications with learned components synthesized using AI. This will include applying AI both to the construction of traditional data structures such as indexes and database methods like query optimization, schema design, and logical and physical database design; and to algorithms like system load balancing and scheduling. In addition, large-scale enterprise applications, including data integration and predictive modelling, are already benefiting from AI technology. However, at enterprise scale, applying AI technology suffers from an absence of support tools and scalable algorithms.

To achieve our goals, several things need to happen. First, we need new, efficient AI algorithms that can efficiently operate as a part of the inner-loop of large scale systems. Second, before AI can be widely used in mission-critical enterprise applications (as opposed to inherently imprecise applications like web search and information retrieval), we need new systems that systematically manage the process of collecting, cleaning and preparing data, as well as the process of building models and integrating them into deployed systems. Third, software systems and AI algorithms need to co-evolve to take advantage of emerging hardware trends including specialized accelerators, new high-speed interconnects and advanced memory technologies. If successful, the results of this research will change the way we build the large-scale systems of the future, and the way that we use AI techniques inside the modern enterprise.

DSAIL was launched in 2018 Over this period DSAIL has become the leading research group for ML for Systems research, producing over 100 publications of which 40 papers involved at least one co-author from our industry partners. We deeply engaged with our industry partners, with 15 summer internships and 2 of our PhD graduates joining our lab sponsors as full-time employees ( many students out of the lab yet have to graduate). We organized 3 DSAIL retreats and 5 company-specific specific workshops to exchange ideas on ML for Systems problems, educate, and learn from each other's problems and solutions. We also open-sourced many of our projects and actively engaged with our industry partners in technology transfer. For example, both Microsoft and Google have incorporated ideas out of our lab into their commercial products.

DSAIL phase 2, launched in 2022, aims to continue this trajectory of deep collaboration with our industry partners, coupled with leading-edge research in applying learning to data systems.

Industry partners, researchers and students came together to share research progress and ideas at the first annual DSAIL Research Retreat, held November 1, 2019, in Cambridge, Mass. (Missing from photo: co-director Sam Madden, who’s taking the photo.)

DSAIL is also supported by NSF award 1900933 III: Medium: Learning-based Synthesis of Data Processing Engines.