Deep learning advances face spoof detection for biometric authentication

Face spoofing remains one of the most pressing threats in biometric authentication, with implications spanning mobile device access, financial services, and surveillance systems. As digital forgeries grow more sophisticated, traditional defenses have proved inadequate, prompting the rise of deep learning-based detection systems that can identify subtle visual anomalies.


CO-EDP, VisionRICO-EDP, VisionRI | Updated: 27-03-2025 18:24 IST | Created: 27-03-2025 18:24 IST
Deep learning advances face spoof detection for biometric authentication
Representative Image. Credit: ChatGPT

A new comparative study titled "Face Spoofing Detection using Deep Learning" in image-based biometric security has revealed that MobileNetV2, a lightweight convolutional neural network, outperforms more complex architectures like Vision Transformer (ViT) and ResNet50 in detecting face spoofing attempts. The findings suggest that MobileNetV2 offers a more balanced and robust solution for real-world deployment, particularly in environments with limited computational resources.

Conducted by researchers from Karakoram International University and Pukyong National University, the study tested the three vision-based models on a spoof detection dataset comprising over 150,000 images. The research aimed to evaluate each model’s ability to distinguish between genuine facial images and presentation attacks such as printed photos, 3D masks, or video replays.

Face spoofing remains one of the most pressing threats in biometric authentication, with implications spanning mobile device access, financial services, and surveillance systems. As digital forgeries grow more sophisticated, traditional defenses have proved inadequate, prompting the rise of deep learning-based detection systems that can identify subtle visual anomalies.

Researchers benchmarked MobileNetV2, ViT (ViTL16), and ResNet50 using accuracy, precision, recall, and F1 score metrics across both test and validation datasets. MobileNetV2 demonstrated the most consistent performance, with a test accuracy of 91.59%, outpacing ViT at 86.54% and ResNet50 at 42.96%. On the validation dataset, MobileNetV2 scored 97.17%, slightly ahead of ViT’s 96.36%.

“While transformer-based models like ViT show strong training accuracy, they often underperform on unseen data due to overfitting,” said co-author Maaz Salman. “MobileNetV2 achieved better generalization, which is vital for real-world anti-spoofing systems.”

The analysis included training, testing, and validation phases using a dataset of 150,986 images. Each model was exposed to a balanced distribution of real and fake samples during training (140,002 images), testing (10,984 images), and validation (39,574 images). This allowed for a rigorous evaluation of model behavior across different operating scenarios.

MobileNetV2’s strength lies in its architectural efficiency. Designed for mobile and embedded applications, the model uses depthwise separable convolutions to detect localized inconsistencies, such as irregular edges or unnatural shading, that often signify spoofing. Its ability to operate with low latency and limited resources makes it ideal for deployment on edge devices.

By contrast, ViT, which processes images as sequences of patches using self-attention mechanisms, excels at capturing global context. This makes it suitable for complex image classification tasks but less efficient in real-time systems due to higher computational demands. ViT’s performance on the test set reflected this trade-off: while it achieved near-perfect accuracy on the training set (99.85%), it fell short on test generalization, with a relatively high false positive rate of 24.11%.

The confusion matrix for MobileNetV2 revealed a true negative rate of 94.36% on the test set, meaning it correctly flagged most spoofed images. ViT, while better at identifying genuine images (true positive rate: 97.20%), showed vulnerability in detecting fakes, with a true negative rate of only 75.89%.

ResNet50, a traditional convolutional model with deeper architecture, significantly lagged behind in all metrics. It posted a test accuracy of just 42.96% and a validation accuracy of 43.42%, with a corresponding F1 score below 41%, indicating poor generalization and overfitting despite achieving 95.47% training accuracy.

Further comparisons between the test and validation datasets revealed notable discrepancies, particularly for ViT. While both MobileNetV2 and ViT scored over 96% on the validation set, the sharp performance drop on the test set exposed challenges in adapting to data with varying lighting, resolution, or presentation styles.

Researchers also emphasized the interpretability of model decisions. MobileNetV2 supports visualization tools like Grad-CAM that highlight the areas of an image influencing classification, aiding in transparency and system auditing. ViT, on the other hand, remains a black box due to the complexity of its attention mechanisms.

Despite its strong results, MobileNetV2 did show a higher false negative rate (11.18%) on the test set, meaning it occasionally misclassified real images as fake. However, researchers argue this is less damaging in security-sensitive applications than the alternative, accepting fake images as real.

The study recommends MobileNetV2 for practical deployment in face spoof detection systems, especially in mobile phones, ATM authentication, and surveillance networks where real-time inference and energy efficiency are paramount. ViT may still hold promise if augmented with additional regularization or ensemble learning to reduce its false positive rate.

Future work should focus on exploring hybrid models combining ViT’s global pattern recognition with MobileNetV2’s efficient local detection. Data augmentation, advanced loss functions, and diversity in training data are also proposed to improve model resilience.

  • FIRST PUBLISHED IN:
  • Devdiscourse
Give Feedback