Error diagnosis and data profiling with data x-ray
TL;DRAbstract
The problem of identifying and repairing data errors has been an area of persistent focus in data management research. However, while traditional data cleaning techniques can be effective at identifying several data discrepancies, they disregard the fact that many errors are systematic , inherent to the process that produces the data, and thus will keep occurring unless the root cause is identified and corrected. In this demonstration, we will present a large-scale diagnostic framework called D ata XR ay . Like a medical X-ray that aids the diagnosis of medical conditions by revealing problems underneath the surface, D ata XR ay reveals hidden connections and common properties among data errors. Thus, in contrast to traditional cleaning methods, which treat the symptoms, our system investigates the underlying conditions that cause the errors. The core of D ata XR ay combines an intuitive and principled cost model derived by Bayesian analysis, and an efficient, highly-parallelizable dia
Chat with Paper
AI Agents for this Paper
The problem of identifying and repairing data errors has been an area of persistent focus in data management research. However, while traditional data cleaning techniques can be effective at identifying several data discrepancies, they disregard the fact that many errors are systematic , inherent to the process that produces the data, and thus will keep occurring unless the root cause is identified and corrected. In this demonstration, we will present a large-scale diagnostic framework called D ata XR ay . Like a medical X-ray that aids the diagnosis of medical conditions by revealing problems underneath the surface, D ata XR ay reveals hidden connections and common properties among data errors. Thus, in contrast to traditional cleaning methods, which treat the symptoms, our system investigates the underlying conditions that cause the errors. The core of D ata XR ay combines an intuitive and principled cost model derived by Bayesian analysis, and an efficient, highly-parallelizable dia
Keywords
Chat
Click to start Chat