CitedEvidence
User Settings
Open AccessArticle

ACE: Agile, Contingent and Efficient Similarity Joins Using MapReduce

Mahalakshmi Lakshminarayanan-2013-01-01-OhioLink ETD Center (Ohio Library and Information Network)

TL;DRAbstract

Similarity Join is an important operation for data mining, with a diverse range of real world applications.Three efficient MapReduce Algorithms for performing Similarity Joins between multisets are proposed in this thesis.Filtering techniques for similarity joins minimize the number of pairs of entities joined and hence, they are vital for improving the efficiency of the algorithm.Multisets represent real world data better, by considering the frequency of its elements.Prior serial algorithms incorporate filtering techniques only for sets, but not multisets, while prior MapReduce algorithms do not incorporate any filtering technique or inefficiently incorporate prefix filtering with poor scalability.This work extends the filtering techniques, namely the prefix, size, positional and suffix filters to multisets, and also achieves the challenging task of efficiently incorporating them in the shared-nothing MapReduce model.Adeptly incorporating the filtering techniques in a strategic sequen

Chat with Paper

AI Agents for this Paper

Similarity Join is an important operation for data mining, with a diverse range of real world applications.Three efficient MapReduce Algorithms for performing Similarity Joins between multisets are proposed in this thesis.Filtering techniques for similarity joins minimize the number of pairs of entities joined and hence, they are vital for improving the efficiency of the algorithm.Multisets represent real world data better, by considering the frequency of its elements.Prior serial algorithms incorporate filtering techniques only for sets, but not multisets, while prior MapReduce algorithms do not incorporate any filtering technique or inefficiently incorporate prefix filtering with poor scalability.This work extends the filtering techniques, namely the prefix, size, positional and suffix filters to multisets, and also achieves the challenging task of efficiently incorporating them in the shared-nothing MapReduce model.Adeptly incorporating the filtering techniques in a strategic sequen

Keywords

JoinsSimilarity (geometry)Computer scienceAgile software developmentData miningInformation retrievalData scienceArtificial intelligence

Chat

Click to start Chat