Download PDFOpen PDF in browser

Offensive Language Detection in Arabic Social Media Using Machine Learning With TF-IDF Technique

EasyChair Preprint 15865

7 pagesDate: February 24, 2025

Abstract

With the rise of social media, effective communication has become more crucial than ever. However, Arabic writing is complex and contains several different languages, making it challenging to detect offensive concepts. To get around this challenge, this work uses a variety of machine learning models, such as support vector machines, random forests, decision trees, and logistic regression classifiers, to tackle the issue of offensive language identification. We evaluated the performance of these classifiers using a comprehensive testing procedure with 4,505 tweets from the "ArCybC" dataset. Our results show that the model improves significantly with further runs, particularly in terms of recall and precision.

Notably, the Random Forest (RF) and Decision Tree (DT) classifiers showed improved recall with more runs, but the DT classifier fared better in precision than the others. The results demonstrate how machine learning can be used to detect offensive language in Arabic social media with high accuracy, offering developers and academics useful information to enhance the security of online communication.

Keyphrases: ArCybC, machine learning, offensive language

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15865,
  author    = {Saleem Abu Lehyeh and Mahmoud Omari and Fatima Shannag and Ghaith Jaradat},
  title     = {Offensive Language Detection in Arabic Social Media Using Machine Learning With TF-IDF Technique},
  howpublished = {EasyChair Preprint 15865},
  year      = {EasyChair, 2025}}
Download PDFOpen PDF in browser