Skip to main content

Research Repository

Advanced Search

Cost-sensitive decision tree learning using a multi-armed bandit framework

Lomax, SE

Authors

SE Lomax



Contributors

Abstract

Decision tree learning is one of the main methods of learning from data. It has been applied to a variety of different domains over the past three decades. In the real world, accuracy is not enough; there are costs involved, those of obtaining the data and those when classification errors occur. A comprehensive survey of cost-sensitive decision tree learning has identified over 50 algorithms, developing a taxonomy in order to classify the algorithms by the way in which cost has been incorporated, and a recent comparison shows that many cost-sensitive algorithms can process balanced, two class datasets well, but produce lower accuracy rates in order to achieve lower costs when the dataset is less balanced or has multiple classes.

This thesis develops a new framework and algorithm concentrating on the view that cost-sensitive decision tree learning involves a trade-off between costs and accuracy. Decisions arising from these two viewpoints can often be incompatible resulting in the reduction of the accuracy rates.

The new framework builds on a specific Game Theory problem known as the multi-armed bandit. This problem concerns a scenario whereby exploration and exploitation are required to solve it. For example, a player in a casino has to decide which slot machine (bandit) from a selection of slot machines is likely to pay out the most. Game Theory proposes a solution of this problem which is solved by a process of exploration and exploitation in which reward is maximized. This thesis utilizes these concepts from the multi-armed bandit game to develop a new algorithm by viewing the rewards as a reduction in costs, utilizing the exploration and exploitation techniques so that a compromise between decisions based on accuracy and decisions based on costs can be found. The algorithm employs the adapted multi-armed bandit game to select the attributes during decision tree induction, using a look-ahead methodology to explore potential attributes and exploit the attributes which maximizes the reward.

The new algorithm is evaluated on fifteen datasets and compared to six well-known algorithms J48, EG2, MetaCost, AdaCostM1, ICET and ACT. The results obtained show that the new multi-armed based algorithm can produce more cost-effective trees without compromising accuracy. The thesis also includes a critical appraisal of the limitations of the developed algorithm and proposes avenues for further research.

Citation

Lomax, S. Cost-sensitive decision tree learning using a multi-armed bandit framework. (Thesis). University of Salford

Thesis Type Thesis
Deposit Date Jul 16, 2013
Publicly Available Date Jul 16, 2013

Files


D9_AnalysisOfResultsFile.csv (41.6 Mb)
Spreadsheet

Version
Appendix Item D9 (CSV file of results)


D8_Tables_showing_datasets_processed_using_unpruned_versions.xlsx (61 Kb)
Spreadsheet

Version
Appendix Item D8


D7_Tables_showing_datasets_processed_using_pruned_versions.xlsx (67 Kb)
Spreadsheet

Version
Appendix Item D7


D6_Annotated_graphs_showing_datasets_processed_using_unpruned_versions_of_the_cost.docx (275 Kb)
Document

Version
Appendix Item D6


D5_Annotated_graphs_showing_datasets_processed_using_pruned_versions_of_the_cost.docx (294 Kb)
Document

Version
Appendix Item D5


D4_Comparing_mean_values_obtained_with_values_obtained_by_a_given_strategy.xlsx (161 Kb)
Spreadsheet

Version
Appendix Item D4


D4_Comparing_mean_values_obtained_with_values_obtained_by_a_given_strategy.docx (968 Kb)
Document

Version
Appendix Item D4


D3_Best_results_obtained_for_each_dataset_strategycost.xlsx (50 Kb)
Spreadsheet

Version
Appendix Item D3


D3_Best_results_obtained_for_each_dataset_strategyacc.xlsx (39 Kb)
Spreadsheet

Version
Appendix Item D3


D3_Best_results_obtained_for_each_dataset.docx (207 Kb)
Document

Version
Appendix Item D3


D2_Graphs_showing_datasets_with_misclassification_costs_reduced_into_groups.xlsx (39 Kb)
Spreadsheet

Version
Appendix Item D2


D2_Graphs_showing_datasets_with_misclassification_costs_reduced_into_groups.docx (192 Kb)
Document

Version
Appendix Item D2


D1_Do_nothing_cost_accuracy_versus_cost_accuracy_obtained_by_number_of_classes.docx.xlsx (54 Kb)
Spreadsheet

Version
Appendix Item D1


D1_Do_nothing_accuracy_versus_accuracy_obtained_by_number_of_classes.docx (147 Kb)
Document

Version
Appendix Item D1





You might also like



Downloadable Citations