Skip to main content

Research Repository

Advanced Search

A New English/Arabic Parallel Corpus for Phishing Emails

Salloum, Said; Gaber, Tarek; Vadera, Sunil; Shaalan, Khaled

Authors

Said Salloum

Tarek Gaber

Khaled Shaalan



Abstract

Phishing involves malicious activity whereby phishers, in the disguise of legitimate entities, obtain illegitimate access to the victims’ personal and private information, usually through emails. Currently, phishing attacks and threats are being handled effectively through the use of the latest phishing email detection solutions. Most current phishing detection systems assume phishing attacks to be in English, though attacks in other languages are growing. In particular, Arabic is a widely used language and therefore represents a vulnerable target. However, there is a significant shortage of corpora that can be used to develop Arabic phishing detection systems. This paper presents the development of a new English-Arabic parallel phishing email corpus that has been developed from the anti-phishing share task text (IWSPA-AP 2018). The email content was to be translated, and the task had been allotted to 10 volunteers who had a university background and were English and Arabic language experts. To evaluate the effectiveness of the new corpus, we develop phishing email detection models using Term Frequency–Inverse Document Frequency (TF-IDF) and Multilayer Perceptron using 1258 emails in Arabic and English that have equal ratios of legitimate and phishing emails. The experimental findings show that the accuracy reaches 96.82% for the Arabic dataset and 94.63% for the emails in English, providing some assurance of the potential value of the parallel corpus developed.

Citation

Salloum, S., Gaber, T., Vadera, S., & Shaalan, K. (in press). A New English/Arabic Parallel Corpus for Phishing Emails. #Journal not on list, https://doi.org/10.1145/3606031

Journal Article Type Article
Acceptance Date Jun 12, 2023
Online Publication Date Jun 28, 2023
Deposit Date Jul 10, 2023
Publicly Available Date Jul 10, 2023
Journal ACM Transactions on Asian and Low-Resource Language Information Processing
Peer Reviewed Peer Reviewed
DOI https://doi.org/10.1145/3606031
Keywords General Computer Science

Files

Accepted Version (562 Kb)
PDF

Copyright Statement
© {Owner/Author | ACM} {2023}. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in {ACM Transactions on Asian and Low-Resource Language Information Processing}, http://dx.doi.org/10.1145/{10.1145/3606031}.




You might also like



Downloadable Citations