Data Systems Group at MIT

Alpine Meadow - Interactive Auto ML

Statistical knowledge and domain expertise are key to extracting actionable insights from data, yet such skills rarely coexist together. In Machine Learning (ML), high-quality results are only attainable via mindful data preprocessing, hyperparameter tuning and model selection. Domain experts are often overwhelmed by such complexity, de-facto inhibiting wider adoption of ML techniques.  Existing libraries claim to solve this problem, but still require well-trained practitioners. Such frameworks involve heavy data preparation steps and are often too slow for interactive feedback from the user, severely limiting the scope of such systems.

Alpine Meadow is an Interactive Automated Machine Learning tool. What makes our system unique is not only our focus on interactivity, but also our combined systemic and algorithmic design approach.  On the one hand, we leverage ideas from query optimization; on the other hand we have devised novel selection and pruning strategies combining cost-based Multi-Armed Bandits and Bayesian Optimization.

We have evaluated our system on over 300 datasets and compared our system against other AutoML tools, including the current NIPS winner, as well as other expert solutions.  Not only is Alpine Meadow able to significantly outperform the other AutoML systems while providing interactive latencies, but also it outperforms expert solutions in 80% of the cases on datasets it has never seen before.  As a result, since March 2018 Alpine Meadow has led the DARPA D3M Automatic Machine Learning competition.

In the future, this project aims to improve system performance by improving the techniques discussed above.

Participants