Skip to main content

Research Repository

Advanced Search

Clustering Medical Transcriptions Using K -Means

Salloum, Said; Tahat, Dina; Tahat, Khalaf; Alfaisal, Raghad; Salloum, Ayham

Authors

Said Salloum

Dina Tahat

Khalaf Tahat

Raghad Alfaisal

Ayham Salloum



Abstract

The clustering of medical transcriptions is an essential task for the categorization and summarization of large volumes of medical records. This paper explores the efficacy of k-means clustering, a well-known unsupervised machine learning algorithm, to discern patterns and segregate medical transcriptions into distinct clusters. We processed a dataset comprising various medical reports, systematically cleaning and preparing the text for analysis. By employing a Term Frequency-Inverse Document Frequency (TF-IDF) approach, we converted the textual data into a vectorized format amenable to machine learning methods. Subsequent dimensionality reduction through Principal Component Analysis (PCA) facilitated the visualization and interpretation of the high-dimensional data in two-dimensional space. The k-means algorithm was then applied, revealing five distinct clusters. Each cluster was characterized by examining the prevalence of key terms, uncovering thematic consistencies that may correspond to particular medical procedures or specialties. The resulting clusters demonstrate the algorithm's potential to automatically categorize medical documentation in a way that mirrors clinical relevance, thereby providing a foundation for improved information management systems in healthcare settings.

Presentation Conference Type Conference Paper (published)
Conference Name 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS)
Start Date Sep 24, 2024
End Date Sep 27, 2024
Publication Date Sep 24, 2024
Deposit Date Feb 5, 2025
Peer Reviewed Peer Reviewed
Pages 291-294
Book Title 2024 International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS)
ISBN 9798350354706
DOI https://doi.org/10.1109/iccns62192.2024.10776237