Neural networks are widely adopted in various fields due to their ability to model complex patterns and relationships. However, they face a critical vulnerability to adversarial attacks – small, malicious input changes that cause unpredictable outputs. This issue poses significant challenges to the reliability and security of machine learning models across various applications. While several defense methods like adversarial training and purification have been developed, they often fail to provide robust protection against sophisticated attacks. The rise of diffusion models has led to diffusion-based adversarial purifications, enhancing robustness. However, these methods face challenges like computational complexities and the risk of new attack strategies that can weaken model defenses.
One of the existing methods to address adversarial attacks includes Denoising Diffusion Probabilistic Models (DDPMs), a class of generative models that add noise to input signals during training, and then learn to denoise from the resulting noisy signal. Other approaches include Diffusion models as adversarial purifiers which come under Markov-based purification (or DDPM-based), and Score-based purification. It introduces a guided term to preserve sample semantics and DensePure, which uses multiple reversed samples and majority voting for final predictions. Lastly, Tucker Decomposition, a method for analyzing high-dimensional data arrays, has shown potential in feature extraction, presenting a potential path for enhancing adversarial purification techniques.
Researchers from the Theoretical Division and Computational Sciences at Los Alamos National Laboratory, Los Alamos, NM have proposed LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbations with low intrinsic purification errors. LoRID overcomes the limitations of current diffusion-based purification methods by providing a theoretical understanding of the purification errors associated with Markov-based diffusion methods. Moreover, it utilizes a multistage purification process, that integrates multiple rounds of diffusion-denoising loops at early time steps of diffusion models with Tucker decomposition. This integration removes the adversarial noise in high-noise regimes and enhances the robustness against strong adversarial attacks.
LoRID’s architecture is evaluated on multiple datasets including CIFAR-10/100, CelebA-HQ, and ImageNet, comparing its performance against state-of-the-art (SOTA) defense methods. It utilizes WideResNet for classification, evaluating both standard and robust accuracy. LoRID’s performance is tested under two threat models: black-box and white-box attacks. In the black-box, the attacker knows only the classifier, while in the white-box setting, the attacker has full knowledge of both the classifier and the purification scheme. The proposed method is evaluated against AutoAttack for CIFAR-10/100 and BPDA+EOT for CelebA-HQ in black-box settings, and AutoAttack and PGD+EOT in white-box scenarios.
The evaluated results demonstrated the superior performance of LoRID across multiple datasets and attack scenarios. It significantly enhances standard and robust accuracy against AutoAttacks in black-box and white-box settings on CIFAR-10. For example, it enhances black-box robust accuracy by 23.15% on WideResNet-28-10 and 4.27% on WideResNet-70-16. For CelebA-HQ, LoRID outperforms the best baseline by 7.17% in robust accuracy while maintaining high standard accuracy against BPDA+EOT attacks. At high noise levels (ϵ = 32/255), its robustness exceeds SOTA performance at standard noise levels (ϵ = 8/255) by 12.8%, showing its outstanding potential in handling critical adversarial perturbations.
In conclusion, researchers have introduced LoRID, an innovative defense strategy against adversarial attacks that utilizes multiple looping in the early stages of diffusion models to purify adversarial examples. This approach is further enhanced by integrating Tucker decomposition, which is effective in high noise regimes. LoRID’s effectiveness has been validated through theoretical analysis and detailed experimental evaluations across diverse datasets like CIFAR-10/100, ImageNet, and CelebA-HQ. The evaluated result proves LoRID’s potential as a promising advancement in the adversarial defense field, providing enhanced protection for neural networks against a wide range of complex attack strategies.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Sajjad Ansari is a final year undergraduate from IIT Kharagpur. As a Tech enthusiast, he delves into the practical applications of AI with a focus on understanding the impact of AI technologies and their real-world implications. He aims to articulate complex AI concepts in a clear and accessible manner.