Project
Aurum: Large Scale Data Discovery
Organizations face a data discovery problem when their analysts spend more time looking for relevant data than analyzing it. This problem has become commonplace in modern organizations as: i) data is stored across multiple storage systems, from databases to data lakes; ii) data scientists do not operate within the limits of well-defined schemas or a small number of data sources – instead, to answer complex questions they must access data spread across thousands of data sources. To address this problem we are building AURUM, a system to tackle data discovery problems. AURUM introduces a new discovery algebra, called the Source Retrieval Query Language (SRQL), that lets users declaratively search for relevant data sources through a set of primitives that expose the relations of the underlying data. We are investigating new abstractions to represent all data assets within organizations and methods to find it efficiently.
Group
Data Systems GroupContact us
If you would like to contact us about our work, please refer to our members below and reach out to one of the group leads directly.
Last updated Aug 16 '17