A Lightweight Multimodal Deep Fusion Network for Face Antis Poofing with Cross-Axial Attention and Deep Reinforcement Learning Technique

2026-02-04

Diyar Wirya Omar Ameenulhakeem, Osman Nuri Uçan,
A Lightweight Multimodal Deep Fusion Network for Face Antis Poofing with Cross-Axial Attention and Deep Reinforcement Learning Technique,
Computers, Materials and Continua,
Volume 85, Issue 3,
2025,
Pages 5671-5702,
ISSN 1546-2218,
https://doi.org/10.32604/cmc.2025.070422.
(https://www.sciencedirect.com/science/article/pii/S1546221825009762)
Abstract: Face antispoofing has received a lot of attention because it plays a role in strengthening the security of face recognition systems. Face recognition is commonly used for authentication in surveillance applications. However, attackers try to compromise these systems by using spoofing techniques such as using photos or videos of users to gain access to services or information. Many existing methods for face spoofing face difficulties when dealing with new scenarios, especially when there are variations in background, lighting, and other environmental factors. Recent advancements in deep learning with multi-modality methods have shown their effectiveness in face antispoofing, surpassing single-modal methods. However, these approaches often generate several features that can lead to issues with data dimensionality. In this study, we introduce a multimodal deep fusion network for face anti-spoofing that incorporates cross-axial attention and deep reinforcement learning techniques. This network operates at three patch levels and analyzes images from modalities (RGB, IR, and depth). Initially, our design includes an axial attention network (XANet) model that extracts deeply hidden features from multimodal images. Further, we use a bidirectional fusion technique that pays attention to both directions to combine features from each mode effectively. We further improve feature optimization by using the Enhanced Pity Beetle Optimization (EPBO) algorithm, which selects the features to address data dimensionality problems. Moreover, our proposed model employs a hybrid federated reinforcement learning (FDDRL) approach to detect and classify face anti-spoofing, achieving a more optimal tradeoff between detection rates and false positive rates. We evaluated the proposed approach on publicly available datasets, including CASIA-SURF and GREATFASD-S, and realized 98.985% and 97.956% classification accuracy, respectively. In addition, the current method outperforms other state-of-the-art methods in terms of precision, recall, and F-measures. Overall, the developed methodology boosts the effectiveness of our model in detecting various types of spoofing attempts.
Keywords: Face antispoofing; lightweight; multimodal; deep feature fusion; feature extraction; feature optimization