Application of YOLOv10 Integrated with Attention Mechanism in the Senseless Monitoring of Students' Classroom Psychological State

This paper proposes a YOLOv10 model integrated with an attention mechanism for the senseless monitoring of students' psychological states in class, aiming to achieve high-precision, real-time, and non-invasive psychological state recognition. The method introduces a multi-layer attention module in both channel and spatial dimensions to enhance the representation ability of key features. At the same time, collaborative optimization of detection and mental state recognition is achieved by combining lightweight feature enhancement with an end-to-end mental state classification network. The model is validated on a large-scale real classroom dataset (561,200 images covering multiple disciplines, different lighting, and occlusion conditions). It achieves an mAP@0.5 of 0.873, a psychological state classification accuracy of 0.835, and an F1-score of 0.812, while maintaining a real-time performance of 69 FPS. Ablation experiments show that the attention module and the feature enhancement module contribute 4.4% and 5.3% to mAP, respectively, demonstrating the model's robustness in complex scenes. The stability and long-term monitoring capability of the system are further verified in 50 real classroom deployment experiments. The results show that this method achieves high-precision, real-time, and deployable monitoring of students' psychological states in intelligent education scenarios, providing quantifiable data support for classroom management and teaching optimization.

Hickey, B. A., Chalmers, T., Newton, P., Lin, C. T., Sibbritt, D., McLachlan, C. S., ... & Lal, S. (2021). Smart devices and wearable technologies to detect and monitor mental health conditions and stress: A systematic review. Sensors, 21(10), 3461. DOI: 10.3390/s21103461
Gomes, N., Pato, M., Lourenco, A. R., & Datia, N. (2023). A survey on wearable sensors for mental health monitoring. Sensors, 23(3), 1330. DOI: 10.3390/s23031330
Sheikh, M., Qassem, M., & Kyriacou, P. A. (2021). Wearable, environmental, and smartphone-based passive sensing for mental health monitoring. Frontiers in Digital Health, 3, 662811. DOI: 10.3389/fdgth.2021.662811
Gopalakrishnan, A., Gururajan, R., Zhou, X., Venkataraman, R., Chan, K. C., & Higgins, N. (2024). A survey of autonomous monitoring systems in mental health. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(3), e1527. DOI: 10.1002/widm.1527
Vamshi Krishna, B., Padmavathy, N., & Kumar, A. (2025). Deep Learning Models for Monitoring Student's Emotion During the Class: A Comprehensive Survey. In Artificial Intelligence and IoT in Online Education Systems: Monitoring, Assessment, and Evaluation, 165-201. DOI: 10.1002/9781394302666.ch6
Zhang, X., Ding, Y., Huang, X., Li, W., Long, L., & Ding, S. (2024). Smart classrooms: How sensors and ai are shaping educational paradigms. Sensors, 24(17), 5487. DOI: 10.3390/s24175487
Leo, M., Carcagnì, P., Mazzeo, P. L., Spagnolo, P., Cazzato, D., & Distante, C. (2020). Analysis of facial information for healthcare applications: A survey on computer vision-based approaches. Information, 11(3), 128. DOI: 10.3390/info11030128
Manakitsa, N., Maraslidis, G. S., Moysis, L., & Fragulis, G. F. (2024). A review of machine learning and deep learning for object detection, semantic segmentation, and human action recognition in machine and robotic vision. Technologies, 12(2), 15. DOI: 10.3390/technologies12020015
Jiang, Z., Luskus, M., Seyedi, S., Griner, E. L., Rad, A. B., Clifford, G. D., ... & Cotes, R. O. (2022). Utilizing computer vision for facial behavior analysis in schizophrenia studies: A systematic review. PLoS One, 17(4), e0266828. DOI: 10.1371/journal.pone.0266828
Sarma, D., & Bhuyan, M. K. (2021). Methods, databases and recent advancement of vision-based hand gesture recognition for hci systems: A review. SN Computer Science, 2(6), 436. DOI: 10.1007/s42979-021-00827-x
Tong, F. (2025). Edge-Assisted CNN-Attention Model for Real-Time Multimodal Learner State Recognition in IoT-Enhanced Educational Systems. Informatica, 49(32). DOI: 10.31449/inf.v49i32.10569
Rasheed, S. (2026). Lightweight Deep Learning Models for Face Mask Detection in Real-Time Edge Environments: A Review and Future Research Directions. Machine Learning and Knowledge Extraction, 8(4), 102. DOI: 10.3390/make8040102
Hosain, M. T., Zaman, A., Abir, M. R., Akter, S., Mursalin, S., & Khan, S. S. (2024). Synchronizing object detection: Applications, advancements and existing challenges. IEEE Access, 12, 54129-54167. DOI: 10.1109/access.2024.3388889
Saikrishna, P. S. (2026). Affective Edge Computing: Challenges and Opportunities in Decoding Emotional States. In Bridging the Gap between Mind and Machine: Exploring the Future of Human-AI-Neurotechnology Integration, 41-63. DOI: 10.1007/978-3-032-06713-5_3
Elhanashi, A., Dini, P., Saponara, S., & Zheng, Q. (2023). Integration of deep learning into the iot: A survey of techniques and challenges for real-world applications. Electronics, 12(24), 4925. DOI: 10.3390/electronics12244925
Li, H., Yue, X., & Meng, L. (2022). Enhanced mechanisms of pooling and channel attention for deep learning feature maps. PeerJ Computer Science, 8, e1161. DOI: 10.7717/peerj-cs.1161
Zhu, Y., Han, G., Zhu, H., & Zhang, F. (2025). Feature Description Attention: Channel-independent local–global fusion for multi-scale feature representation. Engineering Applications of Artificial Intelligence, 161, 112139. DOI: 10.1016/j.engappai.2025.112139
Liu, T., Luo, R., Xu, L., Feng, D., Cao, L., Liu, S., & Guo, J. (2022). Spatial channel attention for deep convolutional neural networks. Mathematics, 10(10), 1750. DOI: 10.3390/math10101750
Li, X., Lei, L., Sun, Y., Li, M., & Kuang, G. (2020). Multimodal bilinear fusion network with second-order attention-based channel selection for land cover classification. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 13, 1011-1026. DOI: 10.1109/jstars.2020.2975252
Caicedo, J. E., Agudelo-Martínez, D., Rivas-Trujillo, E., & Meyer, J. (2023). A systematic review of real-time detection and classification of power quality disturbances. Protection and Control of Modern Power Systems, 8(1), 1-37. DOI: 10.1186/s41601-023-00277-y
Panigrahi, R., Borah, S., Bhoi, A. K., Ijaz, M. F., Pramanik, M., Jhaveri, R. H., & Chowdhary, C. L. (2021). Performance assessment of supervised classifiers for designing intrusion detection systems: a comprehensive review and recommendations for future research. Mathematics, 9(6), 690. DOI: 10.3390/math9060690
Thakkar, A., & Lohiya, R. (2022). A survey on intrusion detection system: feature selection, model, performance measures, application perspective, challenges, and future research directions. Artificial Intelligence Review, 55(1), 453-563. DOI: 10.1007/s10462-021-10037-9
Şentaş, A., Tashiev, İ., Küçükayvaz, F., Kul, S., Eken, S., Sayar, A., & Becerikli, Y. (2020). Performance evaluation of support vector machine and convolutional neural network algorithms in real-time vehicle type and color classification. Evolutionary Intelligence, 13(1), 83-91. DOI: 10.1007/s12065-018-0167-z
Mateen, M., Wen, J., Hassan, M., Nasrullah, N., Sun, S., & Hayat, S. (2020). Automatic detection of diabetic retinopathy: a review on datasets, methods and evaluation metrics. IEEE Access, 8, 48784-48811. DOI: 10.1109/access.2020.2980055
Fan, C., Ghaemi, S., Khazaei, H., & Musilek, P. (2020). Performance evaluation of blockchain systems: A systematic survey. IEEE Access, 8, 126927-126950. DOI: 10.1109/access.2020.3006078
Atilgan, C., & Mercimek, M. (2025). Balancing Precision and Speed: Introducing The Performance Efficiency Evaluation Ratio (PEER) in Visual Odometry. IEEE Access. DOI: 10.1109/access.2025.3571921
Vdoviak, G., Sledevič, T., Serackis, A., Plonis, D., Matuzevičius, D., & Abromavičius, V. (2025). Evaluation of deep learning models for insects detection at the hive entrance for a bee behavior recognition system. Agriculture, 15(10), 1019. DOI: 10.3390/agriculture15101019
Zhang, X., Zhang, Y., Li, Z., Song, Y., Chen, S., Mao, Z., ... & Nie, L. (2025). A real-time cell image segmentation method based on multi-scale feature fusion. Bioengineering, 12(8), 843. DOI: 10.3390/bioengineering12080843
Raghavan, K., B, S., & v, K. (2024). Attention guided grad-CAM: an improved explainable artificial intelligence model for infrared breast cancer detection. Multimedia Tools and Applications, 83(19), 57551-57578. DOI: 10.1007/s11042-023-17776-7
Zhang, Y., Zhu, Y., Liu, J., Yu, W., & Jiang, C. (2024). An interpretability optimization method for deep learning networks based on Grad-CAM. IEEE Internet of Things Journal, 12(4), 3961-3970. DOI: 10.1109/jiot.2024.3485765