Said Salloum
Enhancing Cybersecurity: Machine Learning and Natural Language Processing for Arabic Phishing Email Detection
Salloum, Said
Abstract
Phishing is a significant threat to the modern world, causing considerable financial losses. Although electronic mail has shown to be a valuable asset around the world in terms of facilitating communication for all parties involved, whether huge corporations or individuals communicating in their everyday lives, it has also brought with it its own set of issues. Scammers take advantage of such issues by sending out bogus emails to susceptible persons in order to acquire access to their personal information. Phishing email detection is considered an important research field, and the research community has tried hard to address this problem in various common languages like English. There are some other important languages, such as Arabic, which have not been given much attention when it comes to phishing detection. Arabic is the native language of more than 300 million people and is ranked as the fifth most extensively used language throughout the world. In terms of content-based phishing email detection, there has been relatively little research on Arabic language phishing emails. This study presents an English-Arabic Phishing Detection (EAPD) model developed on the word level (Term Frequency-Inverse Document Frequency (TF-IDF), Document-Term Matrix (DTM), and FastText embedding) and the character-level convolutional neural network (CharEmbedding) to decrease this gap. It will be one of the first studies to explore the extent to which machine learning (ML) and natural language processing (NLP) methods can be used to develop models for detecting English/Arabic phishing attacks. An English-Arabic parallel phishing email corpus was developed using the English and Arabic text provided by the leading security and privacy analytics anti-phishing shared task (IWSPA-AP 2018). To evaluate the effectiveness of the EAPD model, a collection of balanced 1258 emails in Arabic and English, featuring equal ratios of legitimate and phishing emails, was used. The experiments indicate that when using the Multilayer Perceptron (MLP) classifier combined with TF-IDF, the EAPD achieved an accuracy of 95.3% on Arabic datasets. The English text, on the other hand, reached a 95.7% accuracy when paired with the Support Vector Machine (SVM) classifier and TF-IDF. Salloum's list, a new set of Arabic stop words, was introduced and found that while traditional ML classifiers remained largely unaffected, deep learning (DL) models with FastText embedding, especially LSTM, showed a significant 14% variance following the integration of this extended list. Overall, this study presents a promising approach for detecting phishing emails in both English and Arabic, with high accuracy and efficiency.
Citation
Salloum, S. (2024). Enhancing Cybersecurity: Machine Learning and Natural Language Processing for Arabic Phishing Email Detection. (Thesis). University of Salford
Thesis Type | Thesis |
---|---|
Deposit Date | Jan 17, 2024 |
Publicly Available Date | Feb 27, 2024 |
Award Date | Jan 26, 2024 |
Files
Published Version
(5.4 Mb)
PDF
You might also like
Development of an evolutionary cost sensitive decision tree induction algorithm
(2022)
Presentation / Conference
Phishing website detection from URLs using classical machine learning ANN model
(2021)
Journal Article
Cost-sensitive meta-learning framework
(2021)
Journal Article
Phishing email detection using Natural Language Processing techniques : a literature survey
(2021)
Journal Article
Downloadable Citations
About USIR
Administrator e-mail: library-research@salford.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search