2026 (3 publications)
Consistency and Correctness in Data-Oriented Workflow Systems. CIDR, 2026

Michael Stonebraker, Xinjing Zhou, Peter Kraft, Qian Li

AgentSM: Semantic Memory for Agentic Text-to-SQL. CoRR, 2026

Asim Biswal, Chuan Lei, Xiao Qin, Aodong Li, Balakrishnan Narayanaswamy, Tim Kraska

2025 (43 publications)
Improving DBMS Scheduling Decisions with Accurate Performance Prediction on Concurrent Queries. Proc. VLDB Endow., 2025

Ziniu Wu, Markos Markakis, Chunwei Liu, Peter Baile Chen, Balakrishnan Narayanaswamy, Tim Kraska, Samuel Madden

PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking. Proc. VLDB Endow., 2025

Yan Zhou, Chunwei Liu, Bhuvan Urgaonkar, Zhengle Wang, Magnus Mueller, Chao Zhang, Songyue Zhang, Pascal Pfeil, Dominik Horn, Zhengchun Liu, Davide Pagano, Tim Kraska, Samuel Madden, Ju Fan

Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing. CIDR, 2025

Chunwei Liu, Matthew Russo, Michael J. Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael J. Franklin, Tim Kraska, Samuel Madden, Rana Shahout, Gerardo Vitagliano

Virtualizing Cloud Data Infrastructures with BRAD. SIGMOD Conference Companion, 2025

Geoffrey X. Yu, Ziniu Wu, Ferdi Kossmann, Tianyu Li, Markos Markakis, Amadou Ngom, Sophie Zhang, Tim Kraska, Samuel Madden

Improving DBMS Scheduling Decisions with Fine-grained Performance Prediction on Concurrent Queries - Extended. CoRR, 2025

Ziniu Wu, Markos Markakis, Chunwei Liu, Peter Baile Chen, Balakrishnan Narayanaswamy, Tim Kraska, Samuel Madden

The Cambridge Report on Database Research. CoRR, 2025

Anastasia Ailamaki, Samuel Madden, Daniel Abadi, Gustavo Alonso, Sihem Amer-Yahia, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Michael J. Cafarella, Surajit Chaudhuri, Susan B. Davidson, David J. DeWitt, Yanlei Diao, Xin Luna Dong, Michael J. Franklin, Juliana Freire, Johannes Gehrke, Alon Y. Halevy, Joseph M. Hellerstein, Mark D. Hill, Stratos Idreos, Yannis E. Ioannidis, Christoph Koch, Donald Kossmann, Tim Kraska, Arun Kumar, Guoliang Li, Volker Markl, Renée J. Miller, C. Mohan, Thomas Neumann, Beng Chin Ooi, Fatma Ozcan, Aditya G. Parameswaran, Ippokratis Pandis, Jignesh M. Patel, Andrew Pavlo, Danica Porobic, Viktor Sanca, Michael Stonebraker, Julia Stoyanovich, Dan Suciu, Wang-Chiew Tan, Shivaram Venkataraman, Matei Zaharia, Stanley B. Zdonik

Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation. CoRR, 2025

Peter Baile Chen, Yi Zhang, Dan Roth, Samuel Madden, Jacob Andreas, Michael J. Cafarella

Abacus: A Cost-Based Optimizer for Semantic Operator Systems. CoRR, 2025

Matthew Russo, Sivaprasad Sudhir, Gerardo Vitagliano, Chunwei Liu, Tim Kraska, Samuel Madden, Michael J. Cafarella

KramaBench: A Benchmark for AI Systems on Data-to-Insight Pipelines over Data Lakes. CoRR, 2025

Eugenie Lai, Gerardo Vitagliano, Ziyu Zhang, Sivaprasad Sudhir, Om Chabra, Anna Zeng, Anton A. Zabreyko, Chenning Li, Ferdi Kossmann, Jialin Ding, Jun Chen, Markos Markakis, Matthew Russo, Weiyang Wang, Ziniu Wu, Michael J. Cafarella, Lei Cao, Samuel Madden, Tim Kraska

PBench: Workload Synthesizer with Real Statistics for Cloud Analytics Benchmarking. CoRR, 2025

Yan Zhou, Chunwei Liu, Bhuvan Urgaonkar, Zhengle Wang, Magnus Mueller, Chao Zhang, Songyue Zhang, Pascal Pfeil, Dominik Horn, Zhengchun Liu, Davide Pagano, Tim Kraska, Samuel Madden, Ju Fan

CONCUR: A Framework for Continual Constrained and Unconstrained Routing. CoRR, 2025

Peter Baile Chen, Weiyue Li, Dan Roth, Michael J. Cafarella, Samuel Madden, Jacob Andreas

Causal DAG Summarization. Proc. VLDB Endow., 2025

Anna Zeng, Michael J. Cafarella, Batya Kenig, Markos Markakis, Brit Youngmann, Babak Salimi

Toward Standardized Data Preparation: A Bottom-Up Approach. EDBT, 2025

Eugenie Y. Lai, Yuze Lou, Brit Youngmann, Michael J. Cafarella

CausaLens: A System for Summarizing Causal DAGs. SIGMOD Conference Companion, 2025

Noam Chen, Anna Zeng, Michael J. Cafarella, Batya Kenig, Markos Markakis, Oren Mishali, Brit Youngmann, Babak Salimi

SeerCuts: Explainable Attribute Discretization. SIGMOD Conference Companion, 2025

Eugenie Y. Lai, Inbal Croitoru, Noam Bitton, Ariel Shalem, Brit Youngmann, Sainyam Galhotra, El Kindi Rezig, Michael J. Cafarella

CauSumX: Summarized Causal Explanations For Group-By-Average Queries. SIGMOD Conference Companion, 2025

Nativ Levy, Michael J. Cafarella, Amir Gilad, Sudeepa Roy, Brit Youngmann

PalimpChat: Declarative and Interactive AI analytics. SIGMOD Conference Companion, 2025

Chunwei Liu, Gerardo Vitagliano, Brandon Rose, Matthew Printz, David Andrew Samson, Michael J. Cafarella

PalimpChat: Declarative and Interactive AI analytics. CoRR, 2025

Chunwei Liu, Gerardo Vitagliano, Brandon Rose, Matt Prinz, David Andrew Samson, Michael J. Cafarella

EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline. CoRR, 2025

Peter Baile Chen, Tomer Wolfson, Michael J. Cafarella, Dan Roth

Causal DAG Summarization (Full Version). CoRR, 2025

Anna Zeng, Michael J. Cafarella, Batya Kenig, Markos Markakis, Brit Youngmann, Babak Salimi

DBOS Network Sensing: A Web Services Approach to Collaborative Awareness. HPEC, 2025

Sophia Lockton, Jeremy Kepner, Michael Stonebraker, Hayden Jananthan, LaToya Anderson, William Arcand, David Bestor, William Bergeron, Alex Bonn, Daniel Burrill, Chansup Byun, Timothy Davis, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Chasen Milner, Guillermo Morales, Julie Mullen, Michel Pelletier, Alex Poliakov, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee, Alex Pentland

DBOS Network Sensing: A Web Services Approach to Collaborative Awareness. CoRR, 2025

Sophia Lockton, Jeremy Kepner, Michael Stonebraker, Hayden Jananthan, LaToya Anderson, William Arcand, David Bestor, William Bergeron, Alex Bonn, Daniel Burrill, Chansup Byun, Timothy Davis, Vijay Gadepally, Michael Houle, Matthew Hubbell, Michael Jones, Piotr Luszczek, Peter Michaleas, Lauren Milechin, Chasen Milner, Guillermo Morales, Julie Mullen, Michel Pelletier, Alex Poliakov, Andrew Prout, Albert Reuther, Antonio Rosa, Charles Yee, Alex Pentland

Practical DB-OS Co-Design with Privileged Kernel Bypass. Proc. ACM Manag. Data, 2025

Xinjing Zhou, Viktor Leis, Jinming Hu, Xiangyao Yu, Michael Stonebraker

DBOS: three years later. VLDB J., 2025

Qian Li, Peter Kraft, Christos Kozyrakis, Matei Zaharia, Michael Stonebraker

Tiered-Indexing: Optimizing Access Methods for Skew. VLDB J., 2025

Xinjing Zhou, Xiangpeng Hao, Xiangyao Yu, Michael Stonebraker

Parachute: Single-Pass Bi-Directional Information Passing. Proc. VLDB Endow., 2025

Mihail Stoian, Andreas Zimmerer, Skander Krid, Amadou Ngom, Jialin Ding, Tim Kraska, Andreas Kipf

Insert-Optimized Implementation of Streaming Data Sketches. DaMoN, 2025

Pascal Pfeil, Dominik Horn, Orestis Polychroniou, George Erickson, Zhe Heng Eng, Mengchu Cai, Tim Kraska

Utilizing Past User Feedback for More Accurate Text-to-SQL. HILDA@SIGMOD, 2025

Matthias Urban, Jialin Ding, David Kernert, Kapil Vaidya, Tim Kraska

PipeRAG: Fast Retrieval-Augmented Generation via Adaptive Pipeline Parallelism. KDD, 2025

Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, Tim Kraska

ODIN: A NL2SQL Recommender to Handle Schema Ambiguity. CoRR, 2025

Kapil Vaidya, Abishek Sankararaman, Jialin Ding, Chuan Lei, Xiao Qin, Balakrishnan Narayanaswamy, Tim Kraska

TailorSQL: An NL2SQL System Tailored to Your Query Workload. CoRR, 2025

Kapil Vaidya, Jialin Ding, Sebastian Kosak, David Kernert, Chuan Lei, Xiao Qin, Abhinav Tripathy, Ramesh Balan, Balakrishnan Narayanaswamy, Tim Kraska

SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL. CoRR, 2025

Yue Gong, Chuan Lei, Xiao Qin, Kapil Vaidya, Balakrishnan Narayanaswamy, Tim Kraska

Parachute: Single-Pass Bi-Directional Information Passing. CoRR, 2025

Mihail Stoian, Andreas Zimmerer, Skander Krid, Amadou Latyr Ngom, Jialin Ding, Tim Kraska, Andreas Kipf

Recursive Language Models. CoRR, 2025

Alex L. Zhang, Tim Kraska, Omar Khattab

2024 (49 publications)
Symphony: Towards Trustworthy Question Answering and Verification using RAG over Multimodal Data Lakes. IEEE Data Eng. Bull., 2024

Nan Tang, Chenyu Yang, Zhengxuan Zhang, Yuyu Luo, Ju Fan, Lei Cao, Sam Madden, Alon Y. Halevy

RITA: Group Attention is All You Need for Timeseries Analytics. Proc. ACM Manag. Data, 2024

Jiaming Liang, Lei Cao, Samuel Madden, Zack Ives, Guoliang Li

Outlier Summarization via Human Interpretable Rules. Proc. VLDB Endow., 2024

Yuhao Deng, Yu Wang, Lei Cao, Lianpeng Qiao, Yuping Wang, Xu Jingzhe, Yizhou Yan, Samuel Madden

Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL. Proc. VLDB Endow., 2024

Ju Fan, Zihui Gu, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Samuel Madden, Xiaoyong Du, Nan Tang

Databases Unbound: Querying All of the World's Bytes with AI. Proc. VLDB Endow., 2024

Samuel Madden, Michael J. Cafarella, Michael J. Franklin, Tim Kraska

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD. Proc. VLDB Endow., 2024

Geoffrey X. Yu, Ziniu Wu, Ferdinand Kossmann, Tianyu Li, Markos Markakis, Amadou Latyr Ngom, Samuel Madden, Tim Kraska

MetaStore: Analyzing Deep Learning Meta-Data at Scale. Proc. VLDB Endow., 2024

Huayi Zhang, Binwei Yan, Lei Cao, Samuel Madden, Elke A. Rundensteiner

Serverless State Management Systems. CIDR, 2024

Tianyu Li, Badrish Chandramouli, Sebastian Burckhardt, Samuel Madden

Kairos: Efficient Temporal Graph Analytics on a Single Machine. CoRR, 2024

Joana M. F. da Trindade, Julian Shun, Samuel Madden, Nesime Tatbul

A Declarative System for Optimizing AI Workloads. CoRR, 2024

Chunwei Liu, Matthew Russo, Michael J. Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael J. Franklin, Tim Kraska, Samuel Madden, Gerardo Vitagliano

CascadeServe: Unlocking Model Cascades for Inference Serving. CoRR, 2024

Ferdi Kossmann, Ziniu Wu, Alex Turk, Nesime Tatbul, Lei Cao, Samuel Madden

Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD - Extended Version. CoRR, 2024

Geoffrey X. Yu, Ziniu Wu, Ferdi Kossmann, Tianyu Li, Markos Markakis, Amadou Ngom, Samuel Madden, Tim Kraska

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs. CoRR, 2024

Ferdi Kossmann, Bruce Fontaine, Daya Khudia, Michael J. Cafarella, Samuel Madden

Distributed Speculative Execution for Resilient Cloud Applications. CoRR, 2024

Tianyu Li, Badrish Chandramouli, Philip A. Bernstein, Samuel Madden

Increasing Forest Cover and Connectivity Both Inside and Outside of Protected Areas in Southwestern Costa Rica. Remote. Sens., 2024

Hilary Brumberg, Samuel Furey, Marie G. Bouffard, María José Mata Quirós, Hikari Murayama, Soroush Neyestani, Emily Pauline, Andrew Whitworth, Marguerite Madden

BEAVER: An Enterprise Benchmark for Text-to-SQL. CoRR, 2024

Peter Baile Chen, Fabian Wenz, Yi Zhang, Moe Kayali, Nesime Tatbul, Michael J. Cafarella, Çagatay Demiralp, Michael Stonebraker

Summarized Causal Explanations For Aggregate Views. Proc. ACM Manag. Data, 2024

Brit Youngmann, Michael J. Cafarella, Amir Gilad, Sudeepa Roy

LucidScript: Bottom-up Standardization for Data Preparation. Proc. VLDB Endow., 2024

Eugenie Y. Lai, Yuze Lou, Brit Youngmann, Michael J. Cafarella

From Logs to Causal Inference: Diagnosing Large Systems. Proc. VLDB Endow., 2024

Markos Markakis, Brit Youngmann, Trinity Gao, Ziyu Zhang, Rana Shahout, Peter Baile Chen, Chunwei Liu, Ibrahim Sabek, Michael J. Cafarella

MDCR: A Dataset for Multi-Document Conditional Reasoning. EMNLP, 2024

Peter Baile Chen, Yi Zhang, Chunwei Liu, Sejal Gupta, Yoon Kim, Mike Cafarella

Press ECCS to Doubt (Your Causal Graph). GUIDE-AI@SIGMOD, 2024

Markos Markakis, Ziyu Zhang, Rana Shahout, Trinity Gao, Chunwei Liu, Ibrahim Sabek, Michael J. Cafarella

Sawmill: From Logs to Causal Diagnosis of Large Systems. SIGMOD Conference Companion, 2024

Markos Markakis, An Bo Chen, Brit Youngmann, Trinity Gao, Ziyu Zhang, Rana Shahout, Peter Baile Chen, Chunwei Liu, Ibrahim Sabek, Michael J. Cafarella

MDCR: A Dataset for Multi-Document Conditional Reasoning. CoRR, 2024

Peter Baile Chen, Yi Zhang, Chunwei Liu, Sejal Gupta, Yoon Kim, Michael J. Cafarella

Summarized Causal Explanations For Aggregate Views (Full version). CoRR, 2024

Brit Youngmann, Michael J. Cafarella, Amir Gilad, Sudeepa Roy

Variable Extraction for Model Recovery in Scientific Literature. CoRR, 2024

Chunwei Liu, Enrique Noriega-Atala, Adarsh Pyarelal, Clayton T. Morrison, Mike Cafarella

Towards Buffer Management with Tiered Main Memory. Proc. ACM Manag. Data, 2024

Xiangpeng Hao, Xinjing Zhou, Xiangyao Yu, Michael Stonebraker

FlexpushdownDB: rethinking computation pushdown for cloud OLAP DBMSs. VLDB J., 2024

Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker

Humboldt: Metadata-Driven Extensible Data Discovery. VLDB Workshops, 2024

Alex Bäuerle, Çagatay Demiralp, Michael Stonebraker

Making LLMs Work for Enterprise Data Tasks. CoRR, 2024

Çagatay Demiralp, Fabian Wenz, Peter Baile Chen, Moe Kayali, Nesime Tatbul, Michael Stonebraker

Humboldt: Metadata-Driven Extensible Data Discovery. CoRR, 2024

Alex Bäuerle, Çagatay Demiralp, Michael Stonebraker

Stage: Query Execution Time Prediction in Amazon Redshift. SIGMOD Conference Companion, 2024

Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrishnan Narayanaswamy, Tim Kraska

Stage: Query Execution Time Prediction in Amazon Redshift. CoRR, 2024

Ziniu Wu, Ryan Marcus, Zhengchun Liu, Parimarjan Negi, Vikram Nathan, Pascal Pfeil, Gaurav Saxena, Mohammad Rahman, Balakrishnan Narayanaswamy, Tim Kraska

Resource Management in Aurora Serverless. Proc. VLDB Endow., 2024

Bradley Barnhart, Marc Brooker, Daniil Chinenkov, Tony Hooper, Jihoun Im, Prakash Chandra Jha, Tim Kraska, Ashok Kurakula, Alexey Kuznetsov, Grant Mcalister, Arjun Muthukrishnan, Aravinthan Narayanan, Douglas Terry, Bhuvan Urgaonkar, Jiaming Yan

Why TPC Is Not Enough: An Analysis of the Amazon Redshift Fleet. Proc. VLDB Endow., 2024

Alexander van Renen, Dominik Horn, Pascal Pfeil, Kapil Vaidya, Wenjian Dong, Murali Narayanaswamy, Zhengchun Liu, Gaurav Saxena, Andreas Kipf, Tim Kraska

Panda: Performance Debugging for Databases using LLM Agents. CIDR, 2024

Vikramank Y. Singh, Kapil Vaidya, Vinayshekhar Bannihatti Kumar, Sopan Khosla, Balakrishnan Narayanaswamy, Rashmi Gangadharaiah, Tim Kraska

Forecasting Algorithms for Intelligent Resource Scaling: An Experimental Analysis. SoCC, 2024

Yanlei Diao, Dominik Horn, Andreas Kipf, Oleksandr Shchur, Ines Benito, Wenjian Dong, Davide Pagano, Pascal Pfeil, Vikram Nathan, Balakrishnan Narayanaswamy, Tim Kraska

Vista: Machine Learning based Database Performance Troubleshooting Framework in Amazon RDS. SoCC, 2024

Vikramank Y. Singh, Zhao Song, Balakrishnan (Murali) Narayanaswamy, Kapil Eknath Vaidya, Tim Kraska

Automated Multidimensional Data Layouts in Amazon Redshift. SIGMOD Conference Companion, 2024

Jialin Ding, Matt Abrams, Sanghita Bandyopadhyay, Luciano Di Palma, Yanzhu Ji, Davide Pagano, Gopal Paliwal, Panos Parchas, Pascal Pfeil, Orestis Polychroniou, Gaurav Saxena, Aamer Shah, Amina Voloder, Sherry Xiao, Davis Zhang, Tim Kraska

Intelligent Scaling in Amazon Redshift. SIGMOD Conference Companion, 2024

Vikram Nathan, Vikramank Y. Singh, Zhengchun Liu, Mohammad Rahman, Andreas Kipf, Dominik Horn, Davide Pagano, Gaurav Saxena, Balakrishnan Narayanaswamy, Tim Kraska

PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design. CoRR, 2024

Wenqi Jiang, Shuai Zhang, Boran Han, Jie Wang, Bernie Wang, Tim Kraska

2023 (38 publications)
AutoOD: Automatic Outlier Detection. Proc. ACM Manag. Data, 2023

Lei Cao, Yizhou Yan, Yu Wang, Samuel Madden, Elke A. Rundensteiner

Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning. Proc. ACM Manag. Data, 2023

Zihui Gu, Ju Fan, Nan Tang, Lei Cao, Bowen Jia, Sam Madden, Xiaoyong Du

SeeSaw: Interactive Ad-hoc Search Over Image Databases. Proc. ACM Manag. Data, 2023

Oscar R. Moll, Manuel Favela, Samuel Madden, Vijay Gadepally, Michael J. Cafarella

Cackle: Analytical Workload Cost and Performance Stability With Elastic Pools. Proc. ACM Manag. Data, 2023

Matthew Perron, Raul Castro Fernandez, David J. DeWitt, Michael J. Cafarella, Samuel Madden

FactorJoin: A New Cardinality Estimation Framework for Join Queries. Proc. ACM Manag. Data, 2023

Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel Madden

Extract-Transform-Load for Video Streams. Proc. VLDB Endow., 2023

Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, Sam Madden

Check Out the Big Brain on BRAD: Simplifying Cloud Data Processing with Learned Automated Data Meshes. Proc. VLDB Endow., 2023

Tim Kraska, Tianyu Li, Samuel Madden, Markos Markakis, Amadou Ngom, Ziniu Wu, Geoffrey X. Yu

Robust Query Driven Cardinality Estimation under Changing Workloads. Proc. VLDB Endow., 2023

Parimarjan Negi, Ziniu Wu, Andreas Kipf, Nesime Tatbul, Ryan Marcus, Sam Madden, Tim Kraska, Mohammad Alizadeh

Pando: Enhanced Data Skipping with Logical Data Partitioning. Proc. VLDB Endow., 2023

Sivaprasad Sudhir, Wenbo Tao, Nikolay Pavlovich Laptev, Cyrille Habis, Michael J. Cafarella, Samuel Madden

Future of Database System Architectures. SIGMOD Conference Companion, 2023

Gustavo Alonso, Natassa Ailamaki, Sailesh Krishnamurthy, Sam Madden, Swami Sivasubramanian, Raghu Ramakrishnan

Interpretable Outlier Summarization. CoRR, 2023

Yu Wang, Lei Cao, Yizhou Yan, Samuel Madden

RITA: Group Attention is All You Need for Timeseries Analytics. CoRR, 2023

Jiaming Liang, Lei Cao, Samuel Madden, Zachary G. Ives, Guoliang Li

Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation. CoRR, 2023

Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, Yuxin Zhang, Zui Chen, Lei Cao, Guoliang Li, Sam Madden, Xiaoyong Du

SEED: Simple, Efficient, and Effective Data Management via Large Language Models. CoRR, 2023

Zui Chen, Lei Cao, Sam Madden, Ju Fan, Nan Tang, Zihui Gu, Zeyuan Shang, Chunwei Liu, Michael J. Cafarella, Tim Kraska

Extract-Transform-Load for Video Streams. CoRR, 2023

Ferdinand Kossmann, Ziniu Wu, Eugenie Lai, Nesime Tatbul, Lei Cao, Tim Kraska, Samuel Madden

R3: Record-Replay-Retroaction for Database-Backed Applications. Proc. VLDB Endow., 2023

Qian Li, Peter Kraft, Michael J. Cafarella, Çagatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

Transactions Make Debugging Easy. CIDR, 2023

Qian Li, Peter Kraft, Michael J. Cafarella, Çagatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Matei Zaharia

Causal Data Integration. Proc. VLDB Endow., 2023

Brit Youngmann, Michael J. Cafarella, Babak Salimi, Anna Zeng

On Explaining Confounding Bias. ICDE, 2023

Brit Youngmann, Michael J. Cafarella, Yuval Moskovitch, Babak Salimi

NEXUS: On Explaining Confounding Bias. SIGMOD Conference Companion, 2023

Brit Youngmann, Michael J. Cafarella, Yuval Moskovitch, Babak Salimi

Causal Data Integration. CoRR, 2023

Brit Youngmann, Michael J. Cafarella, Babak Salimi, Anna Zeng

Epoxy: ACID Transactions Across Diverse Data Stores. Proc. VLDB Endow., 2023

Peter Kraft, Qian Li, Xinjing Zhou, Peter Bailis, Michael Stonebraker, Xiangyao Yu, Matei Zaharia

Two is Better Than One: The Case for 2-Tree for Skewed Data Sets. CIDR, 2023

Xinjing Zhou, Xiangyao Yu, Goetz Graefe, Michael Stonebraker

Joint Proceedings of Workshops at the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, Canada, August 28 - September 1, 2023. VLDB WorkshopsCEUR Workshop Proceedings, 2023

Rajesh Bordawekar, Cinzia Cappiello, Vasilis Efthymiou, Lisa Ehrlinger, Vijay Gadepally, Sainyam Galhotra, Sandra Geisler, Sven Groppe, Le Gruenwald, Alon Y. Halevy, Hazar Harmouch, Oktie Hassanzadeh, Ihab F. Ilyas, Ernesto Jiménez-Ruiz, Sanjay Krishnan, Tirthankar Lahiri, Guoliang Li, Jiaheng Lu, Wolfgang Mauerer, Umar Farooq Minhas, Felix Naumann, M. Tamer Özsu, El Kindi Rezig, Kavitha Srinivas, Michael Stonebraker, Satyanarayana R. Valluri, Maria-Esther Vidal, Haixun Wang, Jiannan Wang, Yingjun Wu, Xun Xue, Mohamed Zaït, Kai Zeng

Enhancing Computation Pushdown for Cloud OLAP Databases. CoRR, 2023

Yifei Yang, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker

Unshackling Database Benchmarking from Synthetic Workloads. ICDE, 2023

Parimarjan Negi, Laurent Bindschaedler, Mohammad Alizadeh, Tim Kraska, Jyoti Leeka, Anja Gruenheid, Matteo Interlandi

Auto-WLM: Machine Learning Enhanced Workload Management in Amazon Redshift. SIGMOD Conference Companion, 2023

Gaurav Saxena, Mohammad Rahman, Naresh Chainani, Chunbin Lin, George Caragea, Fahim Chowdhury, Ryan Marcus, Tim Kraska, Ippokratis Pandis, Balakrishnan (Murali) Narayanaswamy

CorBit: Leveraging Correlations for Compressing Bitmap Indexes. VLDB Workshops, 2023

Xi Lyu, Andreas Kipf, Pascal Pfeil, Dominik Horn, Jana Giceva, Tim Kraska

Hyperspecialized Compilation for Serverless Data Analytics. VLDB Workshops, 2023

Leonhard F. Spiegelberg, Tim Kraska, Malte Schwarzkopf

2022 (36 publications)
ExSample: Efficient Searches on Video Repositories through Adaptive Sampling. ICDE, 2022

Oscar R. Moll, Favyen Bastani, Sam Madden, Mike Stonebraker, Vijay Gadepally, Tim Kraska

ExSample: Efficient Searches on Video Repositories through Adaptive Sampling. ICDE, 2022

Oscar R. Moll, Favyen Bastani, Sam Madden, Mike Stonebraker, Vijay Gadepally, Tim Kraska

A Demonstration of AutoOD: A Self-tuning Anomaly Detection System. Proc. VLDB Endow., 2022

Dennis M. Hofmann, Peter M. VanNostrand, Huayi Zhang, Yizhou Yan, Lei Cao, Samuel Madden, Elke A. Rundensteiner

Self-Organizing Data Containers. CIDR, 2022

Samuel Madden, Jialin Ding, Tim Kraska, Sivaprasad Sudhir, David E. Cohen, Timothy G. Mattson, Nesime Tatbul

Ad-hoc Searches on Image Databases. Poly/DMAH@VLDB, 2022

Oscar R. Moll Thomae, Sam Madden, Vijay Gadepally

Tile-based Lightweight Integer Compression in GPU. SIGMOD Conference, 2022

Anil Shanbhag, Bobbi W. Yogatama, Xiangyao Yu, Samuel Madden

SeeSaw: interactive ad-hoc search over image databases. CoRR, 2022

Oscar R. Moll, Manuel Favela, Samuel Madden, Vijay Gadepally

FactorJoin: A New Cardinality Estimation Framework for Join Queries. CoRR, 2022

Ziniu Wu, Parimarjan Negi, Mohammad Alizadeh, Tim Kraska, Samuel Madden

Nonintrusive Measurements for Detecting Progressive Equipment Faults. IEEE Trans. Instrum. Meas., 2022

Daisy H. Green, Devin W. Quinn, Samuel Madden, Peter A. Lindahl, Steven B. Leeb

A Progress Report on DBOS: A Database-oriented Operating System. CIDR, 2022

Qian Li, Peter Kraft, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Jason Li, Michael J. Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Matei Zaharia

Apiary: A DBMS-Backed Transactional Function-as-a-Service Framework. CoRR, 2022

Peter Kraft, Qian Li, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Danny Cho, Jason Li, Robert Redmond, Nathan W. Weckwerth, Brian S. Xia, Peter Bailis, Michael J. Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

Transactions Make Debugging Easy. CoRR, 2022

Qian Li, Peter Kraft, Michael J. Cafarella, Çagatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Matei Zaharia

Infrastructure for Rapid Open Knowledge Network Development. AI Mag., 2022

Michael J. Cafarella, Michael R. Anderson, Iz Beltagy, Arie Cattan, Sarah E. Chasins, Ido Dagan, Doug Downey, Oren Etzioni, Sergey Feldman, Tian Gao, Tom Hope, Kexin Huang, Sophie Johnson, Daniel King, Kyle Lo, Yuze Lou, Matthew D. Shapiro, Dinghao Shen, Shivashankar Subramanian, Lucy Lu Wang, Yuning Wang, Yitong Wang, Daniel S. Weld, Jenny M. Vo-Phamhi, Anna Zeng, Jiayun Zou

Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration. CIDR, 2022

Michael R. Anderson, Yuze Lou, Jiayun Zou, Michael J. Cafarella, Sarah E. Chasins, Doug Downey, Tian Gao, Kexin Huang, Dinghao Shen, Jenny M. Vo-Phamhi, Yitong Wang, Yuning Wang, Anna Zeng

Debugging the OmniTable Way. OSDI, 2022

Andrew Quinn, Jason Flinn, Michael J. Cafarella, Baris Kasikci

On Explaining Confounding Bias. CoRR, 2022

Brit Youngmann, Michael J. Cafarella, Yuval Moskovitch, Babak Salimi

The Seattle report on database research. Commun. ACM, 2022

Daniel Abadi, Anastasia Ailamaki, David G. Andersen, Peter Bailis, Magdalena Balazinska, Philip A. Bernstein, Peter Boncz, Surajit Chaudhuri, Alvin Cheung, AnHai Doan, Luna Dong, Michael J. Franklin, Juliana Freire, Alon Y. Halevy, Joseph M. Hellerstein, Stratos Idreos, Donald Kossmann, Tim Kraska, Sailesh Krishnamurthy, Volker Markl, Sergey Melnik, Tova Milo, C. Mohan, Thomas Neumann, Beng Chin Ooi, Fatma Ozcan, Jignesh M. Patel, Andrew Pavlo, Raluca A. Popa, Raghu Ramakrishnan, Christopher Ré, Michael Stonebraker, Dan Suciu

Applying Machine Learning and Data Fusion to the "Missing Person" Problem. Computer, 2022

K. M. A. Solaiman, Tao Sun, Alina Nesen, Bharat K. Bhargava, Michael Stonebraker

Kyrix-J: Visual Discovery of Connected Datasets in a Data Lake. CIDR, 2022

Wenbo Tao, Adam Sah, Leilani Battle, Remco Chang, Michael Stonebraker

Machine Learning with DBOS. CoRR, 2022

Robert Redmond, Nathan W. Weckwerth, Brian S. Xia, Qian Li, Peter Kraft, Deeptaanshu Kumar, Çagatay Demiralp, Michael Stonebraker

Research Report: Progress on Building a File Observatory for Secure Parser Development. SP, 2022

Tim Allison, Wayne Burke, Dustin Graf, Chris Mattmann, Anastasija Mensikova, Mike Milano, Philip Southam, Ryan Stonebraker

SageDB: An Instance-Optimized Data Analytics System. Proc. VLDB Endow., 2022

Jialin Ding, Ryan Marcus, Andreas Kipf, Vikram Nathan, Aniruddha Nrusimha, Kapil Vaidya, Alexander van Renen, Tim Kraska

Can Learned Models Replace Hash Functions? Proc. VLDB Endow., 2022

Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, Michael Mitzenmacher, Tim Kraska

SNARF: A Learning-Enhanced Range Filter. Proc. VLDB Endow., 2022

Kapil Vaidya, Tim Kraska, Subarna Chatterjee, Eric R. Knorr, Michael Mitzenmacher, Stratos Idreos

TreeLine: An Update-In-Place Key-Value Store for Modern Storage. Proc. VLDB Endow., 2022

Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, Tim Kraska

Bao: Making Learned Query Optimization Practical. SIGMOD Rec., 2022

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska

LSI: a learned secondary index structure. aiDM@SIGMOD, 2022

Andreas Kipf, Dominik Horn, Pascal Pfeil, Ryan Marcus, Tim Kraska

LSI: A Learned Secondary Index Structure. CoRR, 2022

Andreas Kipf, Dominik Horn, Pascal Pfeil, Ryan Marcus, Tim Kraska

2021 (51 publications)
Inferring and improving street maps with data-driven automation. Commun. ACM, 2021

Favyen Bastani, Songtao He, Satvat Jagwani, Edward Park, Sofiane Abbar, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

Replicated Layout for In-Memory Database Systems. Proc. VLDB Endow., 2021

Sivaprasad Sudhir, Michael J. Cafarella, Samuel Madden

RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation. Proc. VLDB Endow., 2021

Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, Xiaoyong Du, Guoliang Li, Samuel Madden, Mourad Ouzzani

LANCET: Labeling Complex Data at Scale. Proc. VLDB Endow., 2021

Huayi Zhang, Lei Cao, Samuel Madden, Elke A. Rundensteiner

Updating Street Maps using Changes Detected in Satellite Imagery. SIGSPATIAL/GIS, 2021

Favyen Bastani, Songtao He, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories. ICCV, 2021

Songtao He, Mohammad Amin Sadeghi, Sanjay Chawla, Mohammad Alizadeh, Hari Balakrishnan, Samuel Madden

ELITE: Robust Deep Anomaly Detection with Meta Gradient. KDD, 2021

Huayi Zhang, Lei Cao, Peter M. VanNostrand, Samuel Madden, Elke A. Rundensteiner

SkyQuery: an aerial drone video sensing platform. Onward, 2021

Favyen Bastani, Songtao He, Ziwen Jiang, Osbert Bastani, Sam Madden

Asynchronous Prefix Recoverability for Fast Distributed Stores. SIGMOD Conference, 2021

Tianyu Li, Badrish Chandramouli, Jose M. Faleiro, Samuel Madden, Donald Kossmann

TagMe: GPS-Assisted Automatic Object Annotation in Videos. CoRR, 2021

Songtao He, Favyen Bastani, Mohammad Alizadeh, Hari Balakrishnan, Michael J. Cafarella, Tim Kraska, Sam Madden

SkyQuery: An Aerial Drone Video Sensing Platform. CoRR, 2021

Favyen Bastani, Songtao He, Ziwen Jiang, Osbert Bastani, Michael J. Cafarella, Tim Kraska, Sam Madden

Updating Street Maps using Changes Detected in Satellite Imagery. CoRR, 2021

Favyen Bastani, Songtao He, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

DBOS: A DBMS-oriented Operating System. Proc. VLDB Endow., 2021

Athinagoras Skiadopoulos, Qian Li, Peter Kraft, Kostis Kaffes, Daniel Hong, Shana Mathew, David Bestor, Michael J. Cafarella, Vijay Gadepally, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker, Lalith Suresh, Matei Zaharia

Data Governance in a Database Operating System (DBOS). Poly/DMAH@VLDB, 2021

Deeptaanshu Kumar, Qian Li, Jason Li, Peter Kraft, Athinagoras Skiadopoulos, Lalith Suresh, Michael J. Cafarella, Michael Stonebraker

Technical Report on Data Integration and Preparation. CoRR, 2021

El Kindi Rezig, Michael J. Cafarella, Vijay Gadepally

ML-In-Databases: Assessment and Prognosis. IEEE Data Eng. Bull., 2021

Tim Kraska, Umar Farooq Minhas, Thomas Neumann, Olga Papaemmanouil, Jignesh M. Patel, Christopher Ré, Michael Stonebraker

DICE: Data Discovery by Example. Proc. VLDB Endow., 2021

El Kindi Rezig, Anshul Bhandari, Anna Fariha, Benjamin Price, Allan Vanterpool, Vijay Gadepally, Michael Stonebraker

Horizon: Scalable Dependency-driven Data Cleaning. Proc. VLDB Endow., 2021

El Kindi Rezig, Mourad Ouzzani, Walid G. Aref, Ahmed K. Elmagarmid, Ahmed R. Mahmood, Michael Stonebraker

FlexPushdownDB: Hybrid Pushdown and Caching in a Cloud DBMS. Proc. VLDB Endow., 2021

Yifei Yang, Matt Youill, Matthew E. Woicik, Yizhou Liu, Xiangyao Yu, Marco Serafini, Ashraf Aboulnaga, Michael Stonebraker

Kyrix-S: Authoring Scalable Scatterplot Visualizations of Big Data. IEEE Trans. Vis. Comput. Graph., 2021

Wenbo Tao, Xinli Hou, Adam Sah, Leilani Battle, Remco Chang, Michael Stonebraker

Flow-Loss: Learning Cardinality Estimates That Matter. Proc. VLDB Endow., 2021

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh

Davos: A System for Interactive Data-Driven Decision Making. Proc. VLDB Endow., 2021

Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Philipp Eichmann, Navid Karimeddiny, Charlie Meyer, Wesley Runnels, Tim Kraska

Towards a Benchmark for Learned Systems. ICDE Workshops, 2021

Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus, Umar Farooq Minhas

Partitioned Learned Bloom Filters. ICLR, 2021

Kapil Vaidya, Eric Knorr, Michael Mitzenmacher, Tim Kraska

LEA: A Learned Encoding Advisor for Column Stores. aiDM@SIGMOD, 2021

Lujing Cen, Andreas Kipf, Ryan Marcus, Tim Kraska

Instance-Optimized Data Layouts for Cloud Analytics Workloads. SIGMOD Conference, 2021

Jialin Ding, Umar Farooq Minhas, Badrish Chandramouli, Chi Wang, Yinan Li, Ying Li, Donald Kossmann, Johannes Gehrke, Tim Kraska

Bao: Making Learned Query Optimization Practical. SIGMOD Conference, 2021

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska

Steering Query Optimizers: A Practical Take on Big Data Workloads. SIGMOD Conference, 2021

Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc T. Friedman, Alekh Jindal

Tuplex: Data Science in Python at Native Code Speed. SIGMOD Conference, 2021

Leonhard F. Spiegelberg, Rahul Yesantharao, Malte Schwarzkopf, Tim Kraska

Flow-Loss: Learning Cardinality Estimates That Matter. CoRR, 2021

Parimarjan Negi, Ryan Marcus, Andreas Kipf, Hongzi Mao, Nesime Tatbul, Tim Kraska, Mohammad Alizadeh

LEA: A Learned Encoding Advisor for Column Stores. CoRR, 2021

Lujing Cen, Andreas Kipf, Ryan Marcus, Tim Kraska

When Are Learned Models Better Than Hash Functions? CoRR, 2021

Ibrahim Sabek, Kapil Vaidya, Dominik Horn, Andreas Kipf, Tim Kraska

PLEX: Towards Practical Learned Indexing. CoRR, 2021

Mihail Stoian, Andreas Kipf, Ryan Marcus, Tim Kraska

Bounding the Last Mile: Efficient Learned String Indexing. CoRR, 2021

Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, Tim Kraska