Skip to main content

Research Repository

Advanced Search

Learning Chomsky-like grammars for biological sequence families

Muggleton, SH; Bryant, CH; Srinivasan, A

Authors

SH Muggleton

A Srinivasan



Contributors

P Langley
Editor

Abstract

This paper presents a new method of measuring performance when positives are rare and investigates whether Chomsky-like grammar representations are useful for learning accurate comprehensible predictors of members of biological sequence families. The positive-only learning framework of the Inductive Logic Programming (ILP) system CProgol is used to generate a grammar for recognising a class of proteins known as human neuropeptide precursors (NPPs). As far as these authors are aware, this is both the first biological grammar learnt using ILP and the first real-world scientific application of the positive-only learning framework of CProgol. Performance is measured using both predictive accuracy and a new cost function, em Relative Advantage (RA). The RA results show that searching for NPPs by using our best NPP predictor as a filter is more than 100 times more efficient than randomly selecting proteins for synthesis and testing them for biological activity. The highest RA was achieved by a model which includes grammar-derived features. This RA is significantly higher than the best RA achieved without the use of the grammar-derived features.

Citation

Muggleton, S., Bryant, C., & Srinivasan, A. (2000). Learning Chomsky-like grammars for biological sequence families. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (631-638)

Start Date Jun 29, 2000
End Date Jul 2, 2000
Publication Date Jul 2, 2000
Deposit Date Feb 16, 2009
Publicly Available Date Feb 16, 2009
Pages 631-638
Book Title Proceedings of the 17th International Conference on Machine Learning
ISBN 1-55860-707-2
Publisher URL https://dl.acm.org/doi/10.5555/645529.658131
Additional Information Event Type : Conference

Files





You might also like



Downloadable Citations