Skip to main content

Research Repository

Advanced Search

Leveraging K-Means Clustering for Analysis of Arabic Hate Speech Tweets

Salloum, Said; Tahat, Khalaf; Mansoori, Ahmed; Alfaisal, Raghad; Tahat, Dina

Authors

Said Salloum

Khalaf Tahat

Ahmed Mansoori

Raghad Alfaisal

Dina Tahat



Abstract

As hate speech is becoming common on social media platforms, it is important to detect, and curb hate speech in order to provide a better and safe environment online. Given the heavy usage of manual methods of hate speech detection, researchers started putting efforts in the direction of machine-learning-based automated methods sooner or later. Many available datasets and models on hate speech detection are largely inadequate for Arab hate speech because of the complexity of language and cultural nuances. In this paper, the researchers eased these difficulties, as they, used the K-Means to apply compilation for Arab hate speech in the L-HSAB dataset on the tweets. Methodology consisted of Prospecting and Pre-processing, Term Frequency Inverse Document frequency (TF-IDF) dimension reductions, through Principal Component Analysis (PCA), and assembly via K-Means. They helped to identify different sets of hate speech tweets and understand the common topics and topics. It has led to new understandings of how Arab hate speech flows on the Internet and now offers the potential for tailored interventions. Automated hate speech analysis via machine learning would allow policymakers to formulate tailored modification strategies focused on making the Internet safer and the community more harmonious.

Presentation Conference Type Conference Paper (published)
Conference Name Global Congress on Emerging Technologies (GCET-2024)
Start Date Dec 9, 2024
End Date Dec 11, 2024
Publication Date Dec 9, 2024
Deposit Date Apr 16, 2025
Peer Reviewed Peer Reviewed
Pages 282-285
Book Title Global Congress on Emerging Technologies (GCET-2024)
ISBN 979-8-3315-4261-0
DOI https://doi.org/10.1109/gcet64327.2024.10934641