Skip to main content

Research Repository

Advanced Search

Is automatic detection of hidden knowledge an anomaly?

Preiss, J

Authors

J Preiss



Abstract

Background: The quantity of documents being published requires researchers to
specialize to a narrower field, meaning that inferable connections between
publications (particularly from different domains) can be missed. This has given
rise to automatic literature based discovery (LBD). However, unless heavily
filtered, LBD generates more potential new knowledge than can be manually
verified and another form of selection is required before the results can be passed
onto a user. Since a large proportion of the automatically generated hidden
knowledge is valid but generally known, we investigate the hypothesis that non
trivial, interesting, hidden knowledge can be treated as an anomaly and identified
using anomaly detection approaches.

Results: Two experiments are conducted: (1) to avoid errors arising from
incorrect extraction of relations, the hypothesis is validated using manually
annotated relations appearing in a thesaurus, and (2) automatically extracted
relations are used to investigate the hypothesis on publication abstracts. These
allow an investigation of a potential upper bound and the detection of limitations
yielded by automatic relation extraction.

Conclusion: We apply one-class SVM and isolation forest anomaly detection
algorithms to a set of hidden connections to rank connections by identifying
outlying (interesting) ones and show that the approach increases the F1 measure
by a factor of 10 while greatly reducing the quantity of hidden knowledge to
manually verify. We also demonstrate the statistical significance of this result.

Keywords: literature based discovery; anomaly detection; unified medical
language system

Citation

Preiss, J. (2019). Is automatic detection of hidden knowledge an anomaly?. BMC Bioinformatics, 20(Sup 10), 251. https://doi.org/10.1186/s12859-019-2815-4

Journal Article Type Article
Acceptance Date Aug 24, 2018
Online Publication Date May 29, 2019
Publication Date May 29, 2019
Deposit Date Feb 1, 2019
Publicly Available Date Jun 12, 2019
Journal BMC Bioinformatics
Publisher Springer Verlag
Volume 20
Issue Sup 10
Pages 251
DOI https://doi.org/10.1186/s12859-019-2815-4
Publisher URL https://doi-org.salford.idm.oclc.org/10.1186/s12859-019-2815-4
Related Public URLs https://bmcbioinformatics.biomedcentral.com/

Files






Downloadable Citations