Skip to main content

Research Repository

Advanced Search

Measuring performance when positives are rare: relative advantage versus predictive accuracy - a biological case-study

Muggleton, SH; Bryant, CH; Srinivasan, A

Authors

SH Muggleton

A Srinivasan



Contributors

RL de Mántaras
Editor

E Plaza
Editor

Abstract

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

Citation

Muggleton, S., Bryant, C., & Srinivasan, A. (2000). Measuring performance when positives are rare: relative advantage versus predictive accuracy - a biological case-study. In R. de Mántaras, & E. Plaza (Eds.), Machine learning: ECML 2000: 11th European conference on machine learning, Barcelona, Catalonia, Spain, May 31-June 2 2000 (300-312)

Publication Date Jan 1, 2000
Deposit Date Feb 17, 2009
Publicly Available Date Feb 17, 2009
Publisher Springer
Pages 300-312
Series Title Lecture notes in computer science
Series Number 1810
Book Title Machine learning: ECML 2000: 11th European conference on machine learning, Barcelona, Catalonia, Spain, May 31-June 2 2000
ISBN 9783540676027
Keywords inductive logic programming
Publisher URL https://doi.org/10.1007/3-540-45164-1_32
Additional Information Paper originally presented at the 11th European Conference on Machine Learning Barcelona, Catalonia, Spain, May 31 – June 2, 2000 Proceedings.

Files





You might also like



Downloadable Citations