An approach for measuring semantic similarity between words using multiple information sources

Li, Y; Bandar, ZA; McLean, D

doi:10.1109/TKDE.2003.1209005

An approach for measuring semantic similarity between words using multiple information sources

Li, Y; Bandar, ZA; McLean, D

Authors

Y Li

ZA Bandar

D McLean

Abstract

Semantic similarity between words is becoming a generic problem for many applications of computational linguistics and artificial intelligence. This paper explores the determination of semantic similarity by a number of information sources, which consist of structural semantic information from a lexical taxonomy and information content from a corpus. To investigate how information sources could be used effectively, a variety of strategies for using various possible information sources are implemented. A new measure is then proposed which combines information sources nonlinearly. Experimental evaluation against a benchmark set of human similarity ratings demonstrates that the proposed measure significantly outperforms traditional similarity measures.

Citation

Li, Y., Bandar, Z., & McLean, D. (2003). An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 15(4), 871-882. https://doi.org/10.1109/TKDE.2003.1209005

Journal Article Type	Article
Publication Date	Jul 1, 2003
Deposit Date	Jul 28, 2015
Journal	IEEE Transactions on Knowledge and Data Engineering
Print ISSN	1041-4347
Publisher	Institute of Electrical and Electronics Engineers
Peer Reviewed	Peer Reviewed
Volume	15
Issue	4
Pages	871-882
DOI	https://doi.org/10.1109/TKDE.2003.1209005
Publisher URL	http://dx.doi.org/10.1109/TKDE.2003.1209005
Related Public URLs	http://ieeexplore.ieee.org/xpl/aboutJournal.jsp?punumber=69
Additional Information	Additional Information : This paper rigorously investigates the contributions of different information sources to similarity between words. It presents word similarity measures by nonlinearly combining structural semantic information from lexical taxonomy and information content from corpus. Our approach outperforms previously published measures: best published correlation against the benchmark set of word pairs of Rubenstein-Goodenough's human similarity ratings has been 0.8484, whilst ours is 0.8914. The paper has been cited over 70 times (SCI) and 300 times (Google Scholar) as of Jan 2010, selected as advanced reading material in CIS526 Machine Learning, Temple University, Philadelphia, and adopted by other researchers in real system developments, e.g., Bibster - a semantics-based bibliographic peer-to-peer system.

Downloadable Citations

HTML

BIB

RTF