MAPFusion: Enhancing RoBERTa for News Classification with Multi-Head Attention Pooling and Feature Fusion

Authors

DOI:

https://doi.org/10.71451/ISTAER2555

Keywords:

MAPFusion, News text classification, Multi-head attention pooling, RoberTa, Attention mechanism

Abstract

This paper proposes MAPFusion, a novel approach to enhance RoBERTa for news text classification using multi-head attention pooling. Traditional methods often rely on static pooling or simple averaging of [CLS] tokens, which can overlook subtle contextual information across token positions. The proposed method addresses this limitation with multi-head attention pooling (MAP), which captures context-aware representations by aggregating token-level embeddings with learned attention weights across multiple representation subspaces. The outputs from multiple attention heads are then integrated through a feature fusion layer to create a comprehensive sentence representation. This component is seamlessly integrated into the output layer of RoBERTa, preserving its pre-trained weights and requiring minimal additional parameters during fine-tuning. Experiments on benchmark datasets demonstrate that MAP-Fusion consistently outperforms baseline models, achieving significant improvements in classification accuracy. The framework is computationally efficient, broadly applicable to various text classification tasks, and provides a principled approach to more effectively utilize the latent representations of RoBERTa. Our work highlights the importance of adaptive feature aggregation in Transformer-based models, providing insights for future representation learning research. The code for this paper is available by contacting the respective authors.

 

References

[1] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N19-1423 DOI: https://doi.org/10.18653/v1/N19-1423

[2] Liu, Y., Ji, L., Huang, R., Ming, T., Gao, C., & Zhang, J. (2019). An attention-gated convolutional neural network for sentence classification. Intelligent Data Analysis, 23(5), 1091–1107. DOI: https://doi.org/10.3233/IDA-184311

[3] Lv, S., Dong, J., Wang, C., Wang, X., & Bao, Z. (2024). RB-GAT: A text classification model based on RoBERTa-BiGRU with graph attention network. Sensors, 24(11), 3365. DOI: https://doi.org/10.3390/s24113365 DOI: https://doi.org/10.3390/s24113365

[4] Lai, H., Wu, K., & Li, L. (2021). Multimodal emotion recognition with hierarchical memory networks. Intelligent Data Analysis, 25(4), 1031–1045. DOI: https://doi.org/10.3233/IDA-205183 DOI: https://doi.org/10.3233/IDA-205183

[5] Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical attention networks for document classification. In K. Knight, A. Nenkova, & O. Rambow (Eds.), Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480–1489). Association for Computational Linguistics. DOI: https://doi.org/10.18653/v1/N16-1174 DOI: https://doi.org/10.18653/v1/N16-1174

[6] Zhong, Q., & Shao, X. (2024). A cross-model hierarchical interactive fusion network for end-to-end multimodal aspect-based sentiment analysis. Intelligent Data Analysis, 28(5), 1293–1308. DOI: https://doi.org/10.3233/IDA-230305 DOI: https://doi.org/10.3233/IDA-230305

[7] Li, J., Peng, J., Liu, S., Weng, L., & Li, C. (2022). Temporal link prediction in directed networks based on self-attention mechanism. Intelligent Data Analysis, 26(1), 173–188. DOI: https://doi.org/10.3233/IDA-205524 DOI: https://doi.org/10.3233/IDA-205524

[8] Lin, Z., Feng, M., Santos, C. N. dos, Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. arXiv. DOI: https://arxiv.org/abs/1703.03130

[9] Kim, Y. (2014). Convolutional neural networks for sentence classification. In A. Moschitti, B. Pang, & W. Daelemans (Eds.), Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1746–1751). Association for Computational Linguistics. DOI: https://doi.org/10.3115/v1/D14-1181 DOI: https://doi.org/10.3115/v1/D14-1181

[10] Liu, Y., Ji, L., Huang, R., Ming, T., Gao, C., & Zhang, J. (2019). An attention-gated convolutional neural network for sentence classification. Intelligent Data Analysis, 23(5), 1091–1107. DOI: https://doi.org/10.3233/IDA-184311 DOI: https://doi.org/10.3233/IDA-184311

[11] Nguyen, V. Q., Anh, T. N., & Yang, H.-J. (2019). Real-time event detection using recurrent neural network in social sensors. International Journal of Distributed Sensor Networks, 15(6). DOI: https://doi.org/10.1177/1550147719856492 DOI: https://doi.org/10.1177/1550147719856492

[12] Zhang, D., Tian, L., Hong, M., Han, F., Ren, Y., & Chen, Y. (2018). Combining convolution neural network and bidirectional gated recurrent unit for sentence semantic classification. IEEE Access, 6, 73750–73759. DOI: https://doi.org/10.1109/ACCESS.2018.2882878 DOI: https://doi.org/10.1109/ACCESS.2018.2882878

Downloads

Published

2025-11-16

Issue

Section

Research Article

How to Cite

MAPFusion: Enhancing RoBERTa for News Classification with Multi-Head Attention Pooling and Feature Fusion. (2025). International Scientific Technical and Economic Research , 71-84. https://doi.org/10.71451/ISTAER2555

Similar Articles

1-10 of 59

You may also start an advanced similarity search for this article.