LLMs and persuasive tech join forces to curb cyberhate

The study found that most users (81%) found the AI-generated suggestions clear and easy to understand. However, only 35% of suggestions were accepted on average across users. While this shows moderate success, it also signals resistance among a significant portion of participants. Among those who rejected the suggestions, 51% felt they didn’t reflect their true opinions, 38% thought the rewrites were too different from their intent, and 23% did not consider their original comment hateful.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 03-05-2025 18:22 IST | Created: 03-05-2025 18:22 IST
LLMs and persuasive tech join forces to curb cyberhate
Representative Image. Credit: ChatGPT

Hate speech and cyberhate continue to poison digital conversations, prompting developers and researchers to explore new ways to counter the growing toxicity. A new study published in Computers titled “Combining the Strengths of LLMs and Persuasive Technology to Combat Cyberhate” presents a timely and empirical strategy: combine the detection power of large language models (LLMs) with the behavioral guidance of persuasive technology (PT) to not just flag harmful comments, but also suggest more constructive alternatives.

The study introduces a prototype feature called the Comment Analysis Feature, which integrates Google’s Gemini LLM with prompt-engineered interventions. This feature actively monitors user-generated comments on a news platform, determines whether a comment contains cyberhate, and, if so, suggests a less harmful rephrasing. Through a real-world case study involving 122 participants, the researchers assessed user reactions, suggestion acceptance, perceived censorship, and the potential for long-term behavioral change. The results reveal both promise and complexity in deploying AI-powered moderation tools that aim to transform, not just police, online speech.

How does the AI system work and what does it aim to change?

The Comment Analysis Feature operates through a dual-prompt system. First, it classifies a user’s comment using the question: “Does this sentence contain inappropriate language or hate speech?” If flagged, it then activates a second prompt asking the LLM to suggest more polite alternatives based on the original input. Participants were then given the freedom to accept or reject these suggestions. This intervention represents a self-monitoring persuasive strategy, where users are gently nudged toward more positive behavior without overt censorship.

The researchers chose template-based prompting over few-shot or zero-shot learning to improve consistency in responses. Gemini was configured with a balanced level of creativity using a temperature of 0.7 and sampling thresholds to ensure responses remained coherent and relevant. The feature was embedded into a custom-built interactive news website, offering participants a safe space to engage in comment-driven discourse without platform algorithmic interference.

Importantly, users retained agency. If a comment was not flagged as hateful, it was simply published. If flagged, suggestions were offered, but not imposed. This opt-in design aligns with persuasive technology principles that emphasize voluntary behavior change through prompts rather than punitive enforcement.

How did users respond to AI-generated suggestions?

The study found that most users (81%) found the AI-generated suggestions clear and easy to understand. However, only 35% of suggestions were accepted on average across users. While this shows moderate success, it also signals resistance among a significant portion of participants. Among those who rejected the suggestions, 51% felt they didn’t reflect their true opinions, 38% thought the rewrites were too different from their intent, and 23% did not consider their original comment hateful.

About 44% of participants said the suggestions increased their willingness to express opinions, suggesting the tool may boost confidence among users who worry about crossing the line online. Yet, 11% reported the opposite, feeling silenced or overly constrained by the AI. This balance between promoting civility and preserving freedom of expression is a central theme in the study, and a known concern in moderation system design.

The study also explored how often users triggered the suggestion system. While most participants received only a few suggestions, about 10% triggered five or more, indicating either a higher rate of inflammatory language or differences in expression style. Importantly, acceptance ratios were significantly higher among users who reported finding the suggestions helpful, confirming that perceived usefulness is closely tied to compliance.

The system also impacted perceived tone improvement. Nearly 41% said the suggestions improved their commenting tone, with 24% reporting significant improvement. Only a small minority (2.5%) believed their tone worsened. These results point to the potential of LLM interventions to serve as real-time nudges that increase self-awareness and constructive communication.

Can AI shape behavior without sacrificing free expression?

Perhaps the most critical question addressed by the study is whether LLM-based moderation tools can improve discourse without acting as covert censors. The findings were nuanced. A solid 68% of participants believed the intervention could reduce cyberhate long-term, and 47% said it did not infringe on their freedom of speech. However, 36% felt it bordered on censorship, and 16% were unsure. This tension underscores the complexity of designing moderation systems that balance platform health and user rights.

The system’s detection accuracy was also rigorously tested. Using a binary classification task with 600 labeled tweets, the Gemini-based model achieved a precision of 0.93, recall of 0.93, and F1 score of 0.93, alongside an accuracy of 94%. A manual post-hoc analysis of flagged comments showed a 92% true-positive rate, lending credibility to the model’s reliability. However, the authors acknowledged limitations, including missed nuances in user intent and a tendency to misclassify certain expressions as hate when they were not.

Demographic diversity among the 122 participants, who represented a range of ages, educational levels, and cultural backgrounds, strengthens the external validity of the findings. Nonetheless, the researchers caution that the study’s news platform environment doesn’t replicate full social media dynamics, such as recommendation algorithms and virality, which could influence comment behavior differently.

Users expressed varied levels of comfort and adaptability to the suggestions. Some saw them as valuable feedback tools for improving civility, while others questioned their interpretive fairness. Future directions suggested by the authors include adaptive intervention frequency, deeper contextual modeling, and longitudinal studies to test sustained impact over time.

This dual approach, detection plus behavioral nudge, distinguishes the Comment Analysis Feature from traditional moderation tools. Rather than punishing users or removing content, it empowers individuals to take ownership of their language and reconsider how they communicate in digital spaces. It’s a step toward moderation by design rather than by force.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback