Federated learning under siege: The silent war between poisoning attacks and security defenses

The introduction of BadUnlearn highlights a previously unaddressed security risk, demonstrating that FU alone is not a guaranteed solution to removing poisoned influences. The research also underscores that new attack vectors require advanced defense mechanisms, as traditional unlearning methods are insufficient against adversarial manipulation. With the development of UnlearnGuard, the study sets a new benchmark for secure federated unlearning, showing that it is possible to effectively remove adversarial influence without the need for complete retraining.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 03-02-2025 16:27 IST | Created: 03-02-2025 16:27 IST
Federated learning under siege: The silent war between poisoning attacks and security defenses
Representative Image. Credit: ChatGPT

Federated Learning (FL) has revolutionized machine learning by enabling multiple clients to collaboratively train a global model without exposing their raw data. However, its decentralized nature makes it vulnerable to poisoning attacks, where malicious clients inject harmful updates to manipulate the global model. To counteract these threats, researchers have introduced Federated Unlearning (FU) - a method designed to remove the influence of compromised models efficiently without requiring a full retraining process.

A recent study titled “Poisoning Attacks and Defenses to Federated Unlearning” by Wenbin Wang, Qiwen Ma, Zifan Zhang, Yuchen Liu, Zhuqing Liu, and Minghong Fang, published in the Companion Proceedings of the ACM Web Conference 2025 (WWW Companion '25), sheds light on a critical security loophole in FU. The researchers present BadUnlearn, the first poisoning attack that specifically targets federated unlearning, and introduce UnlearnGuard, a robust defense mechanism to counter such threats.

The study and its key findings

BadUnlearn: A Novel Poisoning Attack on Federated Unlearning

The research introduces BadUnlearn, a sophisticated attack that exploits vulnerabilities in federated unlearning. Unlike traditional poisoning attacks that occur during FL, BadUnlearn targets the FU phase by strategically injecting malicious model updates to ensure that the unlearned model remains close to the poisoned one. This attack effectively neutralizes the benefits of unlearning and allows adversarial influence to persist. BadUnlearn is designed to work under various knowledge settings.

In a full-knowledge scenario, the attacker is aware of all client updates and the server’s aggregation rule, while in a partial-knowledge setting, only the updates of malicious clients are known. Even in a black-box environment, where the attacker is unaware of the server’s aggregation rule, the attack remains highly effective. The attack is formulated as an optimization problem, enabling adversarial clients to manipulate the FU process systematically. By injecting malicious updates that align with aggregation rules, BadUnlearn ensures that FU fails to unlearn the corrupted model effectively.

UnlearnGuard: A Defense Mechanism Against BadUnlearn

To counteract BadUnlearn, the study proposes UnlearnGuard, a federated unlearning framework designed to maintain the integrity of the global model. Unlike existing FU methods that focus on efficiency, UnlearnGuard prioritizes security by leveraging historical model updates stored by the server. These past updates enable the server to estimate expected client updates during FU, ensuring that malicious modifications can be identified and filtered out. The framework incorporates two distinct variants:

UnlearnGuard-Dist and UnlearnGuard-Dir. UnlearnGuard-Dist works by measuring the distance between new updates and historical updates, ensuring they do not deviate significantly. However, distance-based filtering alone can be insufficient, as attackers can manipulate updates within a small magnitude. To address this, UnlearnGuard-Dir incorporates directional verification, checking that updates align with the original direction of learning. This dual-layer approach ensures that UnlearnGuard effectively detects and filters out poisoned updates, preserving the integrity of the unlearning process. Theoretically, the study proves that models unlearned through UnlearnGuard closely resemble those trained from scratch, validating its robustness.

Experimental analysis and key results

The researchers conducted extensive experiments on the MNIST dataset, testing different federated learning and unlearning methods under various attack conditions. The findings reveal that BadUnlearn significantly compromises existing FU methods. Standard aggregation techniques like FedAvg, Median, and Trimmed-Mean were particularly vulnerable, as they failed to remove the influence of malicious clients. Furthermore, FedRecover, a commonly used unlearning method, proved ineffective against BadUnlearn, as it failed to restore model integrity. In contrast, UnlearnGuard demonstrated strong resilience against poisoning attacks, maintaining model performance even when subjected to sophisticated adversarial strategies. Both UnlearnGuard-Dist and UnlearnGuard-Dir outperformed traditional FU techniques, successfully mitigating BadUnlearn’s effects.

The study also examined advanced aggregation rules and attack models. It was found that even when using complex aggregation methods like Bulyan, Krum, and FLAME, BadUnlearn could still compromise the FU process. However, UnlearnGuard remained effective across these scenarios, highlighting its robustness. Additional tests under partial-knowledge and black-box attacks confirmed that even when attackers had limited information, BadUnlearn could still disrupt FU, while UnlearnGuard remained a reliable defense.

Moreover, adaptive attacks, where the attacker continuously adjusts poisoning strategies based on filtering mechanisms, were evaluated. Despite these challenges, UnlearnGuard successfully countered these attacks, proving its adaptability. The researchers also assessed storage overhead, finding that while UnlearnGuard requires additional storage for historical updates, its computational efficiency far surpasses full retraining, making it a practical security solution.

Implications and future directions

This study provides crucial insights into the security vulnerabilities of federated unlearning and emphasizes the importance of robust countermeasures. The introduction of BadUnlearn highlights a previously unaddressed security risk, demonstrating that FU alone is not a guaranteed solution to removing poisoned influences. The research also underscores that new attack vectors require advanced defense mechanisms, as traditional unlearning methods are insufficient against adversarial manipulation. With the development of UnlearnGuard, the study sets a new benchmark for secure federated unlearning, showing that it is possible to effectively remove adversarial influence without the need for complete retraining.

Looking ahead, future research should focus on expanding UnlearnGuard to handle larger and more complex datasets while maintaining efficiency. There is also potential for integrating privacy-preserving techniques to enhance security further without increasing computational burdens. Additionally, real-time detection mechanisms could improve the ability to identify and eliminate malicious activities before they affect the model. As federated learning continues to be deployed in critical applications such as healthcare, finance, and autonomous systems, ensuring the security of federated unlearning will be a top priority.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback