Scaling data mining activities on very large datasets

Article

Scaling data mining activities on very large datasets

Alberto Grand-2013-01-01-PORTO Publications Open Repository TOrino (Politecnico di Torino)

0

TL;DRAbstract

This thesis addresses the issue of enhancing the scalability of data mining techniques, with specific emphasis on association rule and frequent itemset mining. In particular, it proposes a scalable itemset mining approach relying on (i) a persistent (disk-based) representation of the transactional data, (ii) ad-hoc data retrieval techniques, and (iii)~strategies for the integration of existing itemset mining algorithms. A parallel design based on the same approach, to perform itemset extraction in a parallel and/or distributed environment, is also described. To address the manageability of frequent itemsets, a concise disk-based representation, with a set of querying techniques, is proposed. This work has been preliminarly validated in the Semantic Web domain, to identify semantic relationships from textual collections with a semi-automatic approach. As a minor topic, the extracion of frequent itemsets from streams of data, modelled as a set of transactional data windows, has also been

Chat with Paper

AI Agents for this Paper

This thesis addresses the issue of enhancing the scalability of data mining techniques, with specific emphasis on association rule and frequent itemset mining. In particular, it proposes a scalable itemset mining approach relying on (i) a persistent (disk-based) representation of the transactional data, (ii) ad-hoc data retrieval techniques, and (iii)~strategies for the integration of existing itemset mining algorithms. A parallel design based on the same approach, to perform itemset extraction in a parallel and/or distributed environment, is also described. To address the manageability of frequent itemsets, a concise disk-based representation, with a set of querying techniques, is proposed. This work has been preliminarly validated in the Semantic Web domain, to identify semantic relationships from textual collections with a semi-automatic approach. As a minor topic, the extracion of frequent itemsets from streams of data, modelled as a set of transactional data windows, has also been

Keywords

Computer scienceData miningScalabilityAssociation rule learningData stream miningInformation retrievalSet (abstract data type)Field (mathematics)

Chat

Click to start Chat