Scaling data mining activities on very large datasets
TL;DRAbstract
This thesis addresses the issue of enhancing the scalability of data mining techniques, with specific emphasis on association rule and frequent itemset mining. In particular, it proposes a scalable itemset mining approach relying on (i) a persistent (disk-based) representation of the transactional data, (ii) ad-hoc data retrieval techniques, and (iii)~strategies for the integration of existing itemset mining algorithms. A parallel design based on the same approach, to perform itemset extraction in a parallel and/or distributed environment, is also described. To address the manageability of frequent itemsets, a concise disk-based representation, with a set of querying techniques, is proposed. This work has been preliminarly validated in the Semantic Web domain, to identify semantic relationships from textual collections with a semi-automatic approach. As a minor topic, the extracion of frequent itemsets from streams of data, modelled as a set of transactional data windows, has also been
Chat with Paper
AI Agents for this Paper
This thesis addresses the issue of enhancing the scalability of data mining techniques, with specific emphasis on association rule and frequent itemset mining. In particular, it proposes a scalable itemset mining approach relying on (i) a persistent (disk-based) representation of the transactional data, (ii) ad-hoc data retrieval techniques, and (iii)~strategies for the integration of existing itemset mining algorithms. A parallel design based on the same approach, to perform itemset extraction in a parallel and/or distributed environment, is also described. To address the manageability of frequent itemsets, a concise disk-based representation, with a set of querying techniques, is proposed. This work has been preliminarly validated in the Semantic Web domain, to identify semantic relationships from textual collections with a semi-automatic approach. As a minor topic, the extracion of frequent itemsets from streams of data, modelled as a set of transactional data windows, has also been
Keywords
Chat
Click to start Chat