Read the following paper:
- Patricia Selinger, M. Astrahan, D. Chamberlin, Raymond Lorie, and T. Price. Access Path Selection in a Relational Database Management System. Proceedings of ACM SIGMOD, 1979. Pages 22-34. Red Book. [PDF]
- Optionally, you may also wish to look at: Michael Mannino, Paichen Chu, and Thomas Sager. Statistical Profile Estimation in Database Systems. ACM Computing Surveys 20(3), 1988. Pages 191-221. [PDF] Requires an MIT IP for access. This paper discusses many of the techniques that used to make query optimization and cost estimation practical in modern database systems. Will will cover some of the ideas at a high level in class.
- The Selinger paper claims to be 'optimal'. Under what assumptions is this optimality true? Can you think of a situation in which Selinger will definitely be non-optimal?
- Query optimization is highly dependent on the effectiveness of cost estimation. The cost metrics that Selinger proposes are very simple; how would you make them more sophisticated? What is the impact of more sophisticated cost metrics on the performance of a database system?