Augmenting AI Methods With Data Management Systems
Speaker
Michael Cafarella
CSAIL MIT
Host
Daniel Jackson
CSAIL MIT
Title: Augmenting AI Methods With Data Management Systems
Abstract: Over the past decade-plus, AI advances have yielded wonder after wonder. However, AI methods often require an expensive scaffolding of data curation and data engineering before they can be fully exploited. This talk will describe two separate projects to build data systems around AI methods and thereby make them more practical, scalable, and widely applicable. The first data system applies causal reasoning methods to the problem of debugging distributed systems using log data. We show that the system can find root causes of problems 1.6-18x more accurately than competing methods, while reducing the amount of human effort by a factor of 2.5-16.8 when compared to more conventional curation practices in causal inference. The second data system enables declarative optimization for foundation model programs. We show that when compared with conventional AI program implementations across several benchmark tasks, even the single-threaded optimized programs can reduce runtime by 50-83%, can reduce model costs by 40-86%, and can retain output quality that is roughly similar to the original.
Bio:
Michael Cafarella is a Principal Research Scientist in the MIT Computer Science and Artificial Intelligence Lab. His research interests include databases, information extraction, data integration, and applying data-intensive methods to economics. He has published extensively in venues such as SIGMOD, VLDB, and elsewhere. His academic awards include the NSF CAREER award, the Sloan Research Fellowship, and the VLDB Ten-Year Best Paper award. In addition to his academic work, he costarted the widely-used Hadoop open-source project, and cofounded Lattice Data, a research-based startup that was later acquired by Apple. From 2009 to 2019 he was a professor of Computer Science and Engineering at the University of Michigan.
Abstract: Over the past decade-plus, AI advances have yielded wonder after wonder. However, AI methods often require an expensive scaffolding of data curation and data engineering before they can be fully exploited. This talk will describe two separate projects to build data systems around AI methods and thereby make them more practical, scalable, and widely applicable. The first data system applies causal reasoning methods to the problem of debugging distributed systems using log data. We show that the system can find root causes of problems 1.6-18x more accurately than competing methods, while reducing the amount of human effort by a factor of 2.5-16.8 when compared to more conventional curation practices in causal inference. The second data system enables declarative optimization for foundation model programs. We show that when compared with conventional AI program implementations across several benchmark tasks, even the single-threaded optimized programs can reduce runtime by 50-83%, can reduce model costs by 40-86%, and can retain output quality that is roughly similar to the original.
Bio:
Michael Cafarella is a Principal Research Scientist in the MIT Computer Science and Artificial Intelligence Lab. His research interests include databases, information extraction, data integration, and applying data-intensive methods to economics. He has published extensively in venues such as SIGMOD, VLDB, and elsewhere. His academic awards include the NSF CAREER award, the Sloan Research Fellowship, and the VLDB Ten-Year Best Paper award. In addition to his academic work, he costarted the widely-used Hadoop open-source project, and cofounded Lattice Data, a research-based startup that was later acquired by Apple. From 2009 to 2019 he was a professor of Computer Science and Engineering at the University of Michigan.