Data preparation has been seen as "janitor work" yet essential in data-to-insight pipelines. The increasing liberality of data is followed by an explosion in the diversity of data consumers. However, the required technical and domain expertise prevents many from performing extensive data preparation. Further, many seem to be stuck in a vicious cycle of writing one-off programs to process data. Recently, automating data preparation programs has been shown to improve many aspects of the pipeline, including data quality, research reproducibility, and user productivity. We propose a novel approach to automatically improve data preparation programs.


Eugenie Lai, Yuze Lou, Brit Youngmann, Mike Cafarella