Skip to main content

Research Repository

Advanced Search

Learning Chomsky-like grammars for biological sequence families

Muggleton, SH; Bryant, CH; Srinivasan, A

Authors

SH Muggleton

A Srinivasan



Contributors

P Langley
Editor

Abstract

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the positive-only learning framework of CProgol. Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features.

Presentation Conference Type Conference Paper (published)
Start Date Jun 29, 2000
End Date Jul 2, 2000
Publication Date Jul 2, 2000
Deposit Date Feb 16, 2009
Publicly Available Date Feb 16, 2009
Pages 631-638
Book Title Proceedings of the 17th International Conference on Machine Learning
ISBN 1-55860-707-2
Publisher URL https://dl.acm.org/doi/10.5555/645529.658131
Additional Information Event Type : Conference

Files






You might also like



Downloadable Citations