A novel two stage scheme utilizing the test set for model selection in text classification

Open AccessArticle

A novel two stage scheme utilizing the test set for model selection in text classification

Bernhard Pfahringer,Peter Reutemann,Michael Mayo-2005-01-01-Research Commons (University of Waikato)

TL;DRAbstract

Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel simple two stage scheme based on dagging which allows for utilizing the test set in model selection. The dagging ensemble can also be used by itself instead of the original classifier. We evaluate the performance of a meta classifier choosing between various base learners and their respective dagging ensembles. The selection process seems to perform robustly especially for small percentages of available labels for training.

Chat with Paper

AI Agents for this Paper

Text classification is a natural application domain for semi-supervised learning, as labeling documents is expensive, but on the other hand usually an abundance of unlabeled documents is available. We describe a novel simple two stage scheme based on dagging which allows for utilizing the test set in model selection. The dagging ensemble can also be used by itself instead of the original classifier. We evaluate the performance of a meta classifier choosing between various base learners and their respective dagging ensembles. The selection process seems to perform robustly especially for small percentages of available labels for training.

Keywords

Classifier (UML)Computer scienceArtificial intelligenceMachine learningTest setTraining setScheme (mathematics)Selection (genetic algorithm)

Chat

Click to start Chat