AI system achieves 93% accuracy in detecting hate speech on social media

The study identifies several key limitations that impact AI-driven hate speech moderation in Arabic. First, the system struggles to detect sarcasm and implicit hate speech, accounting for about 15% of misclassifications. These cases often include ironic or culturally coded language that is not flagged by standard algorithms. Dialectal variation - particularly in Gulf, Levantine, and North African Arabic - also poses a challenge. Despite preprocessing and normalization, around 12% of errors stem from regional linguistic nuances that are poorly captured by generalized embeddings.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 25-03-2025 14:31 IST | Created: 25-03-2025 14:31 IST
AI system achieves 93% accuracy in detecting hate speech on social media
Representative Image. Credit: ChatGPT

A research team at Qassim University has developed an advanced artificial intelligence framework capable of detecting and classifying Arabic-language hate speech on social media platform X, achieving a detection accuracy of 92% and a classification accuracy of 93%. The study, titled "Protecting Intellectual Security Through Hate Speech Detection Using an Artificial Intelligence Approach" and published in the journal Algorithms, marks a major leap forward in digital content moderation.

The novel system is designed around a two-layer structure. In the first phase, it detects whether a tweet qualifies as hate speech. If so, the second layer classifies it into one of three targeted categories: political, social, or religious hate speech. The classification system addresses the growing threat posed by online hate speech, especially in the Middle East and North Africa, where regional tensions, dialectal complexity, and lack of tailored moderation tools have left digital platforms vulnerable to misuse.

Arabic remains one of the most linguistically rich and structurally complex languages, posing considerable challenges for natural language processing. The researchers designed their model to specifically address these challenges. They built a dataset of 7,923 Arabic tweets, manually annotated by native Arabic linguists with domain expertise. Through careful preprocessing - diacritic removal, stemming, lemmatization, and normalization - the team prepared the content for deep learning analysis while mitigating dialectal noise and syntactic ambiguity.

The research introduces a convolutional neural network (CNN)-based detection mechanism that significantly outperforms transformer-based architectures in handling short, localized expressions of hate - typical of tweets. When combined with both trainable and pre-trained word embeddings, the CNN model achieved high accuracy without incurring the computational burden typical of transformer-based models like AraBERT.

In the first layer of detection, the CNN model achieved a precision of 92.59%, recall of 93.36%, and an F1 score of 92.96%. In the second layer, which classifies hate speech into categories, the CNN again outperformed other models with an F1 score of 92.36%. These figures were validated through stratified five-fold cross-validation and reinforced by confidence intervals, with the mean accuracy for CNN estimated at 92% and a 95% confidence interval ranging from 91.5% to 92.5%.

Benchmarking experiments were conducted using two external datasets, L-HSAB and OSACT4, to evaluate model generalizability. CNN achieved an F1 score of 88.3% on L-HSAB and 85.0% on OSACT4. AraBERT, a transformer-based model, slightly outperformed CNN on these datasets in terms of contextual understanding but lagged behind in efficiency and precision for short-text detection.

The study identifies several key limitations that impact AI-driven hate speech moderation in Arabic. First, the system struggles to detect sarcasm and implicit hate speech, accounting for about 15% of misclassifications. These cases often include ironic or culturally coded language that is not flagged by standard algorithms. Dialectal variation - particularly in Gulf, Levantine, and North African Arabic - also poses a challenge. Despite preprocessing and normalization, around 12% of errors stem from regional linguistic nuances that are poorly captured by generalized embeddings.

Additionally, class imbalance affected model recall, particularly in underrepresented categories such as political hate speech. While religious hate speech was detected with high accuracy (precision at 91%, recall at 89%), political hate speech yielded lower performance (precision at 79%, recall at 74%). This disparity underscores the need for a more balanced dataset and advanced augmentation techniques.

The system's real-time processing capabilities offer a practical advantage. Processing 10,000 tweets takes only a few seconds, making it suitable for deployment on content moderation pipelines operating at scale. Its low computational complexity compared to transformer models positions it as a feasible option for platforms with limited processing capacity.

Despite its strong performance, the study acknowledges ethical risks. Overreliance on AI moderation can lead to false positives, silencing legitimate political or cultural expression. To mitigate these risks, the researchers advocate for human-in-the-loop oversight, transparency in algorithmic decision-making, and ongoing model refinement using culturally adaptive datasets.

To enhance future iterations, the researchers propose integrating multimodal data, including emojis, images, and videos, which frequently accompany hateful messages. They also recommend combining CNN architectures with transformer layers to capture both local and global semantic features. Oversampling, cost-sensitive learning, and dialect-specific embeddings are being explored as strategies to reduce classification bias and improve robustness.

The full dataset and framework have been made publicly available on GitHub, inviting further research and collaboration in combating hate speech across Arabic-speaking digital communities.

The system provides an actionable model for social media companies, governments, and civil society actors looking to monitor and counter hate speech in one of the world’s most widely spoken and digitally engaged languages. Its dual-layer design, adaptability, and empirical validation position it as a next-generation solution for maintaining intellectual security and promoting safer online spaces.

Full Paper: https://www.mdpi.com/1999-4893/18/4/179

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback