MAPFusion: Enhancing RoBERTa for News Classification with Multi-Head Attention Pooling and Feature Fusion
DOI:
https://doi.org/10.71451/ISTAER2555Keywords:
MAPFusion, News text classification, Multi-head attention pooling, RoberTa, Attention mechanismAbstract
This paper proposes MAPFusion, a novel approach to enhance RoBERTa for news text classification using multi-head attention pooling. Traditional methods often rely on static pooling or simple averaging of [CLS] tokens, which can overlook subtle contextual information across token positions. The proposed method addresses this limitation with multi-head attention pooling (MAP), which captures context-aware representations by aggregating token-level embeddings with learned attention weights across multiple representation subspaces. The outputs from multiple attention heads are then integrated through a feature fusion layer to create a comprehensive sentence representation. This component is seamlessly integrated into the output layer of RoBERTa, preserving its pre-trained weights and requiring minimal additional parameters during fine-tuning. Experiments on benchmark datasets demonstrate that MAP-Fusion consistently outperforms baseline models, achieving significant improvements in classification accuracy. The framework is computationally efficient, broadly applicable to various text classification tasks, and provides a principled approach to more effectively utilize the latent representations of RoBERTa. Our work highlights the importance of adaptive feature aggregation in Transformer-based models, providing insights for future representation learning research. The code for this paper is available by contacting the respective authors.
References
[1] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N19-1423 DOI: https://doi.org/10.18653/v1/N19-1423
[2] Liu, Y., Ji, L., Huang, R., Ming, T., Gao, C., & Zhang, J. (2019). An attention-gated convolutional neural network for sentence classification. Intelligent Data Analysis, 23(5), 1091–1107. DOI: https://doi.org/10.3233/IDA-184311
[3] Lv, S., Dong, J., Wang, C., Wang, X., & Bao, Z. (2024). RB-GAT: A text classification model based on RoBERTa-BiGRU with graph attention network. Sensors, 24(11), 3365. DOI: https://doi.org/10.3390/s24113365 DOI: https://doi.org/10.3390/s24113365
[4] Lai, H., Wu, K., & Li, L. (2021). Multimodal emotion recognition with hierarchical memory networks. Intelligent Data Analysis, 25(4), 1031–1045. DOI: https://doi.org/10.3233/IDA-205183 DOI: https://doi.org/10.3233/IDA-205183
[5] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In K. Knight, A. Nenkova, & O. Rambow (Eds.), Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480–1489). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N16-1174 DOI: https://doi.org/10.18653/v1/N16-1174
[6] Zhong, Q., & Shao, X. (2024). A cross-model hierarchical interactive fusion network for end-to-end multimodal aspect-based sentiment analysis. Intelligent Data Analysis, 28(5), 1293–1308. DOI: https://doi.org/10.3233/IDA-230305 DOI: https://doi.org/10.3233/IDA-230305
[7] Li, J., Peng, J., Liu, S., Weng, L., & Li, C. (2022). Temporal link prediction in directed networks based on self-attention mechanism. Intelligent Data Analysis, 26(1), 173–188. DOI: https://doi.org/10.3233/IDA-205524 DOI: https://doi.org/10.3233/IDA-205524
[8] Lin, Z., Feng, M., Santos, C. N. dos, Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv. DOI: https://arxiv.org/abs/1703.03130
[9] Kim, Y. (2014). Convolutional neural networks for sentence classification. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1746–1751). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/D14-1181 DOI: https://doi.org/10.3115/v1/D14-1181
[10] Liu, Y., Ji, L., Huang, R., Ming, T., Gao, C., & Zhang, J. (2019). An attention-gated convolutional neural network for sentence classification. Intelligent Data Analysis, 23(5), 1091–1107. DOI: https://doi.org/10.3233/IDA-184311 DOI: https://doi.org/10.3233/IDA-184311
[11] Nguyen, V. Q., Anh, T. N., & Yang, H.-J. (2019). Real-time event detection using recurrent neural network in social sensors. International Journal of Distributed Sensor Networks, 15(6). DOI: https://doi.org/10.1177/1550147719856492 DOI: https://doi.org/10.1177/1550147719856492
[12] Zhang, D., Tian, L., Hong, M., Han, F., Ren, Y., & Chen, Y. (2018). Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification. IEEE Access, 6, 73750–73759. DOI: https://doi.org/10.1109/ACCESS.2018.2882878 DOI: https://doi.org/10.1109/ACCESS.2018.2882878
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Scientific Technical and Economic Research

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).