Data partitioning is crucial to improving query performance and several workload-based partitioning techniques have been proposed in the database literature. However, many modern analytic applications involve ad-hoc or exploratory analysis where users do not have a representative query workload a priori. Therefore, static workload-based data partitioning techniques are not suitable for such settings.
To address this problem, we developed Amoeba, which is a distributed storage system that uses adaptive multi-attribute data partitioning to efficiently support ad-hoc as well as recurring queries. Amoeba requires zero set-up and tuning effort, allowing analysts to get the benefits of partitioning without requiring upfront queries.