CitedEvidence
User Settings

Error diagnosis and data profiling with data x-ray

Xiaolan Wang,Mary Feng,Yue Wang,Xin Dong,Alexandra Meliou-2015-08-01-Proceedings of the VLDB Endowment
8

TL;DRAbstract

The problem of identifying and repairing data errors has been an area of persistent focus in data management research. However, while traditional data cleaning techniques can be effective at identifying several data discrepancies, they disregard the fact that many errors are systematic , inherent to the process that produces the data, and thus will keep occurring unless the root cause is identified and corrected. In this demonstration, we will present a large-scale diagnostic framework called D ata XR ay . Like a medical X-ray that aids the diagnosis of medical conditions by revealing problems underneath the surface, D ata XR ay reveals hidden connections and common properties among data errors. Thus, in contrast to traditional cleaning methods, which treat the symptoms, our system investigates the underlying conditions that cause the errors. The core of D ata XR ay combines an intuitive and principled cost model derived by Bayesian analysis, and an efficient, highly-parallelizable dia

Chat with Paper

AI Agents for this Paper

The problem of identifying and repairing data errors has been an area of persistent focus in data management research. However, while traditional data cleaning techniques can be effective at identifying several data discrepancies, they disregard the fact that many errors are systematic , inherent to the process that produces the data, and thus will keep occurring unless the root cause is identified and corrected. In this demonstration, we will present a large-scale diagnostic framework called D ata XR ay . Like a medical X-ray that aids the diagnosis of medical conditions by revealing problems underneath the surface, D ata XR ay reveals hidden connections and common properties among data errors. Thus, in contrast to traditional cleaning methods, which treat the symptoms, our system investigates the underlying conditions that cause the errors. The core of D ata XR ay combines an intuitive and principled cost model derived by Bayesian analysis, and an efficient, highly-parallelizable dia

Keywords

Medical diagnosisParallelizable manifoldComputer scienceProfiling (computer programming)Data miningProcess (computing)Focus (optics)Bayesian probability

Chat

Click to start Chat