Skip to main content

Research Repository

Advanced Search

Measuring performance when positives are rare: relative advantage versus predictive accuracy - a biological case-study

Muggleton, SH; Bryant, CH; Srinivasan, A

Authors

SH Muggleton

A Srinivasan



Contributors

RL de Mántaras
Editor

E Plaza
Editor

Abstract

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. Predictive accuracy is not a good measure of performance for this domain because it does not discriminate well between NPP recognition models: despite covering varying numbers of (the rare) positives, all the models are awarded a similar (high) score by predictive accuracy because they all exclude most of the abundant negatives.

Presentation Conference Type Conference Paper (published)
Publication Date Jan 1, 2000
Deposit Date Feb 17, 2009
Publicly Available Date Feb 17, 2009
Publisher Springer
Pages 300-312
Series Title Lecture notes in computer science
Series Number 1810
Book Title Machine learning: ECML 2000: 11th European conference on machine learning, Barcelona, Catalonia, Spain, May 31-June 2 2000
ISBN 9783540676027
Keywords inductive logic programming
Publisher URL https://doi.org/10.1007/3-540-45164-1_32
Additional Information Paper originally presented at the 11th European Conference on Machine Learning Barcelona, Catalonia, Spain, May 31 – June 2, 2000 Proceedings.

Files






You might also like



Downloadable Citations