Data Systems Group at MIT

Database Designs: The Case for Co-evolution of Applications and Data

The traditional wisdom for performing logical database design can be found in any DBMS textbook, and is:

 

Form an entity-relationship (E-R) model of your data. When you are satisfied with your E-R model, push a button which executes an E-R to third normal form (3NF) translation algorithm. Create the 3NF schema and code the application logic for this schema.

When business conditions change (and they do at least once a quarter), then update the E-R model, update the schema, move the data to this schema and perform application maintenance

Applying these principles will guarantee the schema is always in 3NF and is thereby a “good” schema.  However, extensive application maintenance may be required. Also, repeated patching of application code may cause the application to “decay,”  i.e., become more convoluted and harder to maintain. In other words, the traditional wisdom will ensure no database decay but perhaps large application decay.

In the real world, NO SERIOUS DEVELOPERS use the traditional wisdom.  Some may use it for initial development, but none use it for evolution.  Specifically, developers are almost always interested in minimizing application maintenance, and will endure large amounts of data decay to achieve this goal.  To minimize application maintenance, the goal is to change the schema as little as possible (preferably not at all) by introducing data redundancy. Hence, data is allowed to decay to minimize application decay.  Sooner or later applications (or the database) become so degraded that a complete project re-implementation is required.

The thesis of this project is that one should perform co-evolution of code and data.  Specifically, one should have a holistic metric that minimizes the composition of data and code decay. Sometimes one should focus on application decay and sometimes on data decay.  We have obtained six years of application and data evolution from B2W, a large Brazilian e-tailer, encompassing about 70 iterations. Moreover, we have mostly built an evolution tool that can quantify the decay that any modification will entail.  On top of this tool, we hope to build a machine-learning-based recommendation engine that will suggest which evolution tactic to use.

Participants