M Arguello Casteleiro
Deep learning meets ontologies : experiments to anchor the cardiovascular disease ontology in the biomedical literature
Arguello Casteleiro, M; Demetriou, G; Read, W; Fernandez-Prieto, MJ; Maroto, N; Maseda Fernandez, D; Nenadic, G; Klein, J; Keane, J; Stevens, R
Authors
G Demetriou
W Read
MJ Fernandez-Prieto
N Maroto
D Maseda Fernandez
G Nenadic
J Klein
J Keane
R Stevens
Abstract
Background
Automatic identification of term variants or acceptable alternative free-text terms for gene and protein names from the millions of biomedical publications is a challenging task. Ontologies, such as the Cardiovascular Disease Ontology (CVDO), capture domain knowledge in a computational form and can provide context for gene/protein names as written in the literature. This study investigates: 1) if word embeddings from Deep Learning algorithms can provide a list of term variants for a given gene/protein of interest; and 2) if biological knowledge from the CVDO can improve such a list without modifying the word embeddings created.
Methods
We have manually annotated 105 gene/protein names from 25 PubMed titles/abstracts and mapped them to 79 unique UniProtKB entries corresponding to gene and protein classes from the CVDO. Using more than 14m PubMed articles (titles and available abstracts), word embeddings were generated with CBOW and Skip-gram. We setup two experiments for a synonym detection task, each with four raters, and 3,672 pairs of terms (target term and candidate term) from the word embeddings created. For Experiment I, the target terms for 64 UniProtKB entries were those that appear in the titles/abstracts; Experiment II involves 63 UniProtKB entries and the target terms are a combination of terms from PubMed titles/abstracts with terms (i.e. increased context) from the CVDO protein class expressions and labels.
Results
In Experiment I, Skip-gram finds term variants (full and/or partial) for 89% of the 64 UniProtKB entries, while CBOW finds term variants for 67%. In Experiment II (with the aid of the CVDO), Skip-gram finds term variants for 95% of the 63 UniProtKB entries, while CBOW finds term variants for 78%. Combining the results of both experiments, Skip-gram finds term variants for 97% of the 79 UniProtKB entries, while CBOW finds term variants for 81%.
Conclusions
This study shows performance improvements for both CBOW and Skip-gram on a gene/protein synonym detection task by adding knowledge formalised in the CVDO and without modifying the word embeddings created. Hence, the CVDO supplies context that is effective in inducing term variability for both CBOW and Skip-gram while reducing ambiguity. Skip-gram outperforms CBOW and finds more pertinent term variants for gene/protein names annotated from the scientific literature.
Keywords: Semantic deep learningOntologyDeep learningCBOWSkip-gramCardiovascular disease ontologyPubMed
Citation
Arguello Casteleiro, M., Demetriou, G., Read, W., Fernandez-Prieto, M., Maroto, N., Maseda Fernandez, D., …Stevens, R. (2018). Deep learning meets ontologies : experiments to anchor the cardiovascular disease ontology in the biomedical literature. Journal of Biomedical Semantics, 9(13), https://doi.org/10.1186/s13326-018-0181-1
Journal Article Type | Article |
---|---|
Acceptance Date | Mar 6, 2018 |
Online Publication Date | Apr 12, 2018 |
Publication Date | Apr 12, 2018 |
Deposit Date | May 16, 2018 |
Publicly Available Date | May 16, 2018 |
Journal | Journal of Biomedical Semantics |
Publisher | Springer Verlag |
Volume | 9 |
Issue | 13 |
DOI | https://doi.org/10.1186/s13326-018-0181-1 |
Keywords | Ontology, Pubmed, Deep Learning, Skip-gram, Cardiovascular Disease Ontology, Semantic Deep Learning, Cbow |
Publisher URL | https://doi.org/10.1186/s13326-018-0181-1 |
Related Public URLs | https://jbiomedsem.biomedcentral.com/ |
Additional Information | Projects : sysVASC |
Files
PMC5896136.pdf
(2.4 Mb)
PDF
Licence
http://creativecommons.org/licenses/by/4.0/
Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search