Cost sensitive meta-learning

Shilbayeh, SA

Cost sensitive meta-learning

Shilbayeh, SA

Authors

SA Shilbayeh

Abstract

Classification is one of the primary tasks of data mining and aims to assign a class label to unseen examples by using a model learned from a training dataset. Most of the accepted classifiers are designed to minimize the error rate but in practice data mining involves costs such as the cost of getting the data, and cost of making an error. Hence the following question arises:
Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?
It is well known to the machine learning community that there is no single algorithm that performs best for all domains. This observation motivates the need to develop an “algorithm selector” which is the work of automating the process of choosing between different algorithms given a specific domain of application.
Thus, this research develops a new meta-learning system for recommending cost-sensitive classification methods. The system is based on the idea of applying machine learning to discover knowledge about the performance of different data mining algorithms. It includes components that repeatedly apply different classification methods on data sets and measuring their performance. The characteristics of the data sets, combined with the algorithm and the performance provide the training examples. A decision tree algorithm is applied on the training examples to induce the knowledge which can then be applied to recommend algorithms for new data sets, and then active learning is used to automate the ability to choose the most informative data set that should enter the learning process.
This thesis makes contributions to both the fields of meta-learning, and cost sensitive learning in that it develops a new meta-learning approach for recommending cost-sensitive methods.
Although, meta-learning is not new, the task of accelerating the learning process remains an open problem, and the thesis develops a novel active learning strategy based on clustering that gives the learner the ability to choose which data to learn from and accordingly, speed up the meta-learning process.
Both the meta-learning system and use of active learning are implemented in the WEKA system and evaluated by applying them on different datasets and comparing the results with existing studies available in the literature. The results show that the meta-learning system developed produces better results than METAL, a well-known meta-learning system and that the use of clustering and active learning has a positive effect on accelerating the meta-learning process, where all tested datasets show a decrement of error rate prediction by 75 %.

Citation

Shilbayeh, S. (in press). Cost sensitive meta-learning. (Thesis). University Of Salford

Thesis Type	Thesis
Acceptance Date	Sep 4, 2015
Deposit Date	Nov 9, 2015
Publicly Available Date	Nov 9, 2015
Additional Information	References : Abdi, H. (2007). Z-scores. Encyclopedia of measurement and statistics. Thousand Oaks, CA: Sage. Aha, D. W. (1992). Generalizing from case studies: A case study. Proc. of the 9th International Conference on Machine Learning, 1-10. Aha, D. W., & Bankert, R. L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. AAAI-94 Workshop on Case-Based Reasoning, 106-112. Alexandros, K., & Melanie, H. (2001). Model selection via meta-learning: a comparative study. International Journal on Artificial Intelligence Tools, 10(04), 525-554. Almuallim, H., & Dietterich, T. G. (1991). Efficient algorithms for identifying relevant features. Proc. of the 9th Canadian Conference on Artificial Intelligence, 38-45. Aminian, M. (2005). Active learning for reducing bias and variance of a classifier using Jensen-Shannon divergence. Machine Learning and Applications, 2005. Proceedings. Fourth International Conference, 1-6. Anderson, M. L., & Oates, T. (2007). A review of recent research in metareasoning and metalearning. AI Magazine, 28(1), 12. Angluin, D. (1988). Queries and concept learning. Machine learning, 2(4), 319-342. Bache, K., & Lichman, M. (2013). UCI machine learning repository. URL http://archive. ics. uci. edu/ml, 901. Balte, A., Pise, N., & Kulkarni, P. (2014). Meta-Learning With Landmarking: A Survey. International Journal of Computer Applications, 105, 47-51. Baxter, J. (2000). A model of inductive bias learning. J. Artif. Intell. Res.(JAIR), 12, 149-198. Behja, H., Marzak, A., & Trousse, B. (2012). Ontology-Based Knowledge Model for Multi-View KDD Process. lnternational Journal of Mobile Computing and Multimedia Communications (IJMCMC], 4(3), 21-33. Berrer, H., Paterson, I., & Keller, J. r. (2000). Evaluation of machine-learning algorithm ranking advisors. In Proceedings of the PKDD-2000 Workshop on DataMining, Decision Support, Meta-Learning and ILP: Forum for Practical Problem Presentation and Prospective Solutions, 1-13. Bhargava, N., Sharma, G., Bhargava, R., & Mathuria, M. (2013). Decision tree analysis on j48 algorithm for data mining. Proceedings of International Journal of Advanced Research in Computer Science and Software Engineering, 3(6). Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial intelligence, 97(1), 245-271. Bond, T. G., & Fox, C. M. (2013). Applying the Rasch model: Fundamental measurement in the human sciences: Psychology Press. Brazdil, P., Christophe, G.-C., Carlos, S., & Ricardo, V. (2008). Metalearning: Applications to Data Mining: Springer Publishing Company, Incorporated. Brazdil, P., Soares, C., & Da Costa, J. P. (2003). Ranking learning algorithms: Using IBL and meta-learning on accuracy and time results. Machine Learning, 50(3), 251-277. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16(1), 321-357. Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine learning, 15(2), 201-221. Cohn, D., Ghahramani, Z., & Jordan, M. I. (1996). Active learning with statistical models. Journal of artificial intelligence research, 129-145. Dasgupta, S. (2011). Two faces of active learning. Theoretical computer science, 412(19), 1767-1781. Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(3), 131-156. de Miranda, P. B. C., Prudancio, R. B. C., de Carvalho, A. C. P. L. F., & Soares, C. (2012). An experimental study of the combination of meta-learning with particle swarm algorithms for svm parameter selection. In Computational Science and Its Applicationsâ€“ICCSA 2012 (pp. 562-575): Springer. Devasena, C. L., Sumathi, T., Gomathi, V. V., & Hemalatha, M. (2011). Effectiveness evaluation of rule based classifiers for the classification of iris data set. Bonfring International Journal of Man Machine Interface, 1(Special Issue Inaugural Special Issue), 05-09. Dietterich, T. (1995). Overfitting and undercomputing in machine learning. ACM computing surveys (CSUR), 27(3), 326-327. Domingos, P. (1999). Metacost: A general method for making classifiers cost-sensitive. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, 155-164. Drummond, C., & Holte, R. C. (2005). Severe class imbalance: Why better algorithms arenâ€™t the answer. In Machine Learning: ECML 2005 (pp. 539-546): Springer. Elkan, C. (2001). The foundations of cost-sensitive learning. International joint conference on artificial intelligence, 17, 973-978. Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999). AdaCost: misclassification cost-sensitive boosting. ICML, 97-105. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine learning, 8(1), 87-102. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of machine learning research, 3, 1289-1305. Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1993). Information, prediction, and query by committee. Advances in neural information processing systems, 483-490. Fu, Y., Zhu, X., & Li, B. (2013). A survey on instance selection for active learning. Knowledge and information systems, 35(2), 249-283. Furnkranz, J., & Petrak, J. (2001). An evaluation of landmarking variants. Working Notes of the ECML/PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, 57-68. Ganganwar, V. (2012). An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4), 42-47. Giraud-Carrier, C. (2008). Metalearning-a tutorial. Proceedings of the 7th international conference on machine learning and applications, 1-45. Gomes, T. A. F., Prudancio, R., Soares, C., Rossi, A. L. D., & Carvalho, A. (2012). Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing, 75(1), 3-13. Guo, X., Yin, Y., Dong, C., Yang, G., & Zhou, G. (2008). On the class imbalance problem. Natural Computation, 2008. ICNC'08. Fourth International Conference on, 4, 192-201. Hall, M. (1999a). Correlation-based feature selection for machine learning. Unpublished PhD, The University of Waikato. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18. Hall, M. A. (1999b). Correlation-based feature selection for machine learning. The University of Waikato. Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on, 15(6), 1437-1447. Hecht-Nielsen, R. (1989). Theory of the backpropagation neural network. Paper presented at the Neural Networks, 1989. IJCNN., International Joint Conference on. Hilario, M., Nguyen, P., Do, H., Woznica, A., & Kalousis, A. (2009). Ontology-based meta-mining of knowledge discovery workflows. In Meta-learning in computational intelligence (pp. 273-315): Springer. Hilario, M., Nguyen, P., Do, H., Woznica, A., & Kalousis, A. (2011). Ontology-based meta-mining of knowledge discovery workflows. In Meta-learning in computational intelligence (pp. 273-315): Springer. Holub, A., Perona, P., & Burl, M. C. (2008). Entropy-based active learning for object recognition. Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. IEEE Computer Society Conference on, 1-8. Huang, S.-J., Jin, R., & Zhou, Z.-H. (2010). Active learning by querying informative and representative examples. Advances in neural information processing systems, 892-900. Hutter, F., & Hamadi, Y. (2005). Parameter adjustment based on performance prediction: Towards an instance-aware problem solver. In: Technical Report: MSR-TR-2005125, Microsoft Research, 1-59. Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent data analysis, 6(5), 429-449. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. Machine Learning: Proceedings of the Eleventh International Conference, 121-129. Jordan, A. (2002). On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. Advances in neural information processing systems, 14, 841. Kabakchieva, D. (2013). Predicting student performance by using data mining methods for classification. Cybernetics and Information Technologies, 13(1), 61-72. Kalousis, A., & Hilario, M. (2001). Feature selection for meta-learning: Springer. Kalousis, A., & Theoharis, T. (1999). Noemon: Design, implementation and performance results of an intelligent assistant for classifier selection. Intelligent Data Analysis, 3(5), 319-337. Kane, E. (1983). Doing your own research: Basic descriptive research in the social sciences and humanities: Marion Boyars. Kearns, M. J., & Valiant, L. G. (1988). Learning Boolean formulae or finite automata is as hard as factoring: Harvard University, Center for Research in Computing Technology, Aiken Computation Laboratory. Keet, M., awrynowicz, A., daeamato, C., & Hilario, M. (2013). Modeling issues & choices in the data mining optimization ontology. Kim, Y., Street, W. N., & Menczer, F. (2003). Feature selection in data mining. Data mining: opportunities and challenges, 3(9), 80-105. King, D., Feng, C., & Sutherland, A. (1995). Statlog: comparison of classification algorithms on large real-world problems. Applied Artificial Intelligence an International Journal, 9(3), 289-333. Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. Proceedings of the ninth international workshop on Machine learning, 249-256. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial intelligence, 97(1), 273-324. Koller, D., & Sahami, M. (1996a). Toward optimal feature selection. Proceedings of the Thirteenth International Conference on Machine Learning, 284–292. Koller, D., & Sahami, M. (1996b). Toward optimal feature selection. Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. Machine Learning: ECML-94, 171-182. Kothari, C. R. (2004). Research methodology: methods and techniques: New Age International. Kothari, C. R. (2011). Research methodology: methods and techniques: New Age International. Kotsiantis, S., Kanellopoulos, D., & Pintelas, P. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25-36. Ladha, L., & Deepa, T. (2011). Feature Selection Methods and Algorithims International Journal on Computer Science & Engineering, 3(5), 129-134. Lakatos, I. (1980). The methodology of scientific research programmes: Volume 1: Philosophical papers (Vol. 1): Cambridge university press. Lemke, C., Budka, M., & Gabrys, B. (2013). Metalearning: a survey of trends and technologies. Artificial Intelligence Review, 1-14. Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Machine learning: ECML-98 (pp. 4-15): Springer. Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 3-12. Lindenbaum, M., Markovitch, S., & Rusakov, D. (2004). Selective sampling for nearest neighbor classifiers. Machine learning, 54(2), 125-152. Liu, H., & Motoda, H. (1998). Feature selection for knowledge discovery and data mining: Springer Science & Business Media. Liu, Y., & Schumann, M. (2005). Data mining feature selection for credit scoring models. Journal of the Operational Research Society, 56(9), 1099-1108. Lomax, S., & Vadera, S. (2013). A survey of cost-sensitive decision tree induction algorithms. ACM Computing Surveys (CSUR), 45(2), 16. Mamitsuka, N. A. H. (1998). Query learning strategies using boosting and bagging. Machine Learning: Proceedings of the Fifteenth International Conference (ICML'98), 1. McCarthy, K., Zabar, B., & Weiss, G. (2005a). Does cost-sensitive learning beat sampling for classifying rare classes? Paper presented at the Proceedings of the 1st international workshop on Utility-based data mining. McCarthy, K., Zabar, B., & Weiss, G. (2005b). Does cost-sensitive learning beat sampling for classifying rare classes? Proceedings of the 1st international workshop on Utility-based data mining, 69-77. Mease, D., Wyner, A. J., & Buja, A. (2007). Boosted classification trees and class probability/quantile estimation. The Journal of Machine Learning Research, 8, 409-439. Melo , H., Hannois , G., Rodrigues , A., & Natal , J. (2011). Active learning on the ward: outcomes from a comparative trial with traditional methods. Medical education, 45(3), 273-279. Melville, P., & Mooney, R. J. (2004). Diverse ensembles for active learning. Proceedings of the twenty-first international conference on Machine learning, 74. Molina, L. C., Belanche, L., & Nebot, Ã. n. (2002). Feature selection algorithms: A survey and experimental evaluation. Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on, 306-313. Nakhaeizadeh, G., & Schnabl, A. (1997). Development of Multi-Criteria Metrics for Evaluation of Data Mining Algorithms. KDD, 37-42. Nasa, C. (2012). Evaluation of different classification techniques for web data. International Journal of Computer Applications, 52(9). Nguyen, H. T., & Smeulders, A. (2004). Active learning using pre-clustering. Proceedings of the twenty-first international conference on Machine learning, 79. Norton, S. W. (1989). Generating Better Decision Trees. IJCAI, 89, 800-805. Núnez, M. (1991). The use of background knowledge in decision tree induction. Machine learning, 6(3), 231-250. Pajkossy, K. (2013). Studying feature selection methods applied to classification tasks in natural language processing. Unpublished PhD thesis, Eatvas Lorand University. Patil, T. R., & Sherekar, S. S. (2013). Performance analysis of Naive Bayes and J48 classification algorithm for data classification. International Journal of Computer Science and Applications, 6(2), 256-261. Patton, M. Q. (1980). Qualitative evaluation methods. 381. Pfahringer, B., Bensusan, H., & Giraud-Carrier, C. (2000). Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), 743-750. Prudancio, R., De Souto, M. C. P., & Ludermir, T. B. (2011). Selecting machine learning algorithms using the ranking meta-learning approach. In Meta-Learning in Computational Intelligence (pp. 225-243): Springer. Prudancio, R., & Ludermir, T. (2007). Active selection of training examples for meta-learning. Hybrid Intelligent Systems, 2007. HIS 2007. 7th International Conference on, 126-131. Prudancio, R., & Ludermir, T. (2008). Selective generation of training examples in active meta-learning. Int. J. Hybrid Intell. Syst., 5(2), 59-70. Prudancio, R., Soares, C., & Ludermir, T. B. (2011). Combining meta-learning and active selection of datasetoids for algorithm selection. In Hybrid Artificial Intelligent Systems (pp. 164-171): Springer. Qin, Z., Zhang, C., Wang, T., & Zhang, S. (2011). Cost sensitive classification in data mining. In Advanced Data Mining and Applications (pp. 1-11): Springer. Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81-106. Rendell, L., & Cho, H. (1990). Empirical learning as a function of concept character. Machine Learning, 5(3), 267-298. Rendell, L., & Ragavan, H. (1993). Improving the design of induction methods by analyzing algorithm functionality and data-based concept complexity. IJCAI, 952-959. Rice, J. R. (1975). The algorithm selection problem. 75-152. Roy, N., & McCallum, A. (2001). Toward optimal active learning through monte carlo estimation of error reduction. ICML, Williamstown, 441–448. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine learning, 37(3), 297-336. Settles, B. (2011). From theories to queries: Active learning in practice. Active Learning and Experimental Design W, 1-18. Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. Proceedings of the fifth annual workshop on Computational learning theory, 287-294. Silverman, D. (2013). Doing qualitative research: A practical handbook: SAGE Publications Limited. Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill climbing algorithms. Proceedings of the eleventh international conference on machine learning, 293-301. Sun, Q. (2014). Meta-Learning and the Full Model Selection Problem. Unpublished PhD thesis, University of Waikato. Sun, Y., Kamel, M. S., Wong, A. K. C., & Wang, Y. (2007). Cost-sensitive boosting for classification of imbalanced data. Pattern Recognition, 40(12), 3358-3378. Sun, Y., Wong, A. K. C., & Kamel, M. S. (2009). Classification of imbalanced data: A review. International Journal of Pattern Recognition and Artificial Intelligence, 23(04), 687-719. Tan, M., & Schlimmer, J. (1989). Cost-sensitive concept learning of sensor use in approach and recognition. Proceedings of the sixth international workshop on Machine learning, 392-395. Ting. (1998). Inducing cost-sensitive trees via instance weighting. In Principles of Data Mining and Knowledge Discovery (Vol. 1510, pp. 139-147): Springer Berlin Heidelberg. Todorovski, L., Brazdil, P., & Soares, C. (2000). Report on the experiments with feature selection in meta-level learning. Proceedings of the PKDD-00 workshop on data mining, decision support, meta-learning and ILP: forum for practical problem presentation and prospective solutions, 27-39. Tong, S., & Koller, D. (2002). Support vector machine active learning with applications to text classification. The Journal of Machine Learning Research, 2, 45-66. Turney, P. (1995). Cost-sensitive classification: Empirical evaluation of a hybrid genetic decision tree induction algorithm. Journal of artificial intelligence research, 369-409. Turney, P. (2000). Types of cost in inductive concept learning. ICML-2000 Workshop on Cost-Sensitive Learning, 15-21. Urner, R., Wulff, S., & Ben-David, S. (2013). Plal: Cluster-based active learning. Conference on Learning Theory, 376-397. Vadera, S. (2010). Inducing cost-sensitive non-linear decision trees. 1-22. Vanschoren, J. (2010). Understanding machine learning performance with experiment databases. Unpublished Ph.D, Katholieke Universiteit Leuven. Vilalta, R., & Drissi, Y. (2002). A perspective view and survey of meta-learning. Artificial Intelligence Review, 18(2), 77-95. Wang, G., Yu, H., & Yang, D. C. (2002). Decision table reduction based on conditional information entropy. CHINESE JOURNAL OF COMPUTERS-CHINESE EDITION-, 25(7), 759-766. Wang, T. (2013). Efficient techniques for cost-sensitive learning with multiple cost considerations. Unpublished PhD thesis, University of Technology, Sydney. Weiss, G. M. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7-19. Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. Evolutionary Computation, IEEE Transactions on, 1(1), 67-82. Xu, Z., Akella, R., & Zhang, Y. (2007). Incorporating diversity and density in active learning for relevance feedback: Springer. Xu, Z., Yu, K., Tresp, V., Xu, X., & Wang, J. (2003). Representative sampling for text classification using support vector machines: Springer. Yang, J., & Honavar, V. (1998). Feature subset selection using a genetic algorithm. In Feature extraction, construction and selection (pp. 117-136): Springer. Yeh, I. C., King-Jang, Y., & Tao-Ming, T. (2009). Knowledge discovery on RFM model using Bernoulli sequence. Expert Syst. Appl., 36(3), 5866-5871. Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. ICML, 3, 856-863. Zadrozny, B., Langford, J., & Abe, N. (2003). Cost-sensitive learning by cost-proportionate example weighting. Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, 435-442. Zhu, J., Wang, H., Yao, T., & Tsou, B. K. (2008a). Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, 1137-1144. Zhu, J., Wang, H., Yao, T., & Tsou, B. K. (2008b). Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. Paper presented at the Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1. Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using gaussian fields and harmonic functions. ICML 2003 workshop on the continuum from labeled to unlabeled data in machine learning and data mining, 58-65.

Files

Cost sensitive meta learning_2015.pdf (6.9 Mb)
PDF

Downloadable Citations

HTML

BIB

RTF