Sandeep Kumar Rachamadugu
Exploring Topic Coherence with PCC-LDA and BERT for Contextual Word Generation
Rachamadugu, Sandeep Kumar; Pushphavathi, T.P.; Khan, Surbhi Bhatia; Alojail, Mohammad
Authors
Abstract
In the field of natural language processing (NLP), topic modeling and word generation are crucial for comprehending and producing texts that resemble human languages. Extracting key phrases is an essential task that aids document summarization, information retrieval, and topic classification. Topic modeling significantly enhances our understanding of the latent structure of textual data. Latent Dirichlet Allocation (LDA) is a popular algorithm for topic modeling, which assumes that every document is a mix of several topics, and each topic will have multiple words. A new model similar to LDA, but a better version called Probabilistic Correlated Clustering Latent Dirichlet Allocation (PCC-LDA) was recently introduced. On the other hand, BERT is an advanced bidirectional pre-trained language model that understands words in a sentence based on the full context to generate more precise and contextually correct words. Topic modeling is a useful way to discover hidden themes or topics within a range of documents aiming to tune better topics from the corpus and enhance topic modeling implementation. The experiments indicated a significant improvement in performance when using this combination approach. Coherence criteria of are utilized to judge whether the words in each topic accord with prior knowledge, which could ensure that topics are interpretable and meaningful. The above results of the topic-level analysis indicate that PCC-LDA consistency topics perform better than LDA and NMF(non-negative matrix factorization Technique) by at least 15.4%,12.9%( k=5 ) and up to nearly 12.5% and 11.8% ( k=10 ) respectively, where k represents the number of topics.
Citation
Rachamadugu, S. K., Pushphavathi, T., Khan, S. B., & Alojail, M. (2024). Exploring Topic Coherence with PCC-LDA and BERT for Contextual Word Generation. IEEE Access, 12, 175252 - 175267. https://doi.org/10.1109/access.2024.3477992
Journal Article Type | Article |
---|---|
Acceptance Date | Jan 1, 2024 |
Publication Date | Oct 25, 2024 |
Deposit Date | Jan 7, 2025 |
Publicly Available Date | Jan 7, 2025 |
Journal | IEEE Access |
Electronic ISSN | 2169-3536 |
Publisher | Institute of Electrical and Electronics Engineers |
Peer Reviewed | Peer Reviewed |
Volume | 12 |
Pages | 175252 - 175267 |
DOI | https://doi.org/10.1109/access.2024.3477992 |
Files
Published Version
(1.3 Mb)
PDF
Publisher Licence URL
http://creativecommons.org/licenses/by-nc-nd/4.0/
You might also like
Enhancing Image Security via Block Cyclic Construction and DNA Based LFSR
(2024)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search