A Deep Reinforcement Learning Signal Control Algorithm for Traffic Carbon Emission Optimization

Hanyu Xu

doi:10.71451/ISTAER2610

Authors

Hanyu Xu Department of Architecture and Civil Engineering, City University of Hong Kong, Hong Kong, China Author https://orcid.org/0009-0005-7461-1810

DOI:

https://doi.org/10.71451/ISTAER2610

Keywords:

Deep reinforcement learning; Traffic signal control; Carbon emission optimization; Multi objective optimization; Constraint reinforcement learning

Abstract

Urban traffic congestion leads to frequent vehicle start-stop events and low-speed operation, which is one of the primary drivers of carbon emission growth. To address the problems of multi-objective conflict, training instability, and inadequate carbon emission modeling in existing traffic signal control methods for carbon emission optimization, this paper proposes a deep reinforcement learning signal control algorithm for carbon emission optimization. This method constructs a carbon-emission-aware dynamic reward mechanism and achieves collaborative optimization of traffic efficiency and emission reduction objectives through adaptive weight adjustment; Lagrange multiplier method is introduced to embed the carbon emission threshold as an explicit constraint into the strategy learning process to ensure that the emission level is controlled within an acceptable range; For multi-intersection scenarios, a distributed collaborative control framework based on parameter sharing and neighborhood information interaction is designed to enhance the model's ability to perceive the spatial propagation characteristics of traffic flow. Based on the SUMO simulation platform, experimental validation is conducted in three scenarios: a single intersection, a 4×4 grid network, and a real-world urban road network. The results show that compared with PPO algorithm, the average carbon emissions of this method are reduced by 11.3% to 12.8%, average delay is reduced by 15.7%, average speed is increased by 9.6%, and the comprehensive performance index is improved by 12.2%; During the training process, the fluctuation of strategy is reduced by about 50%, and the degradation rate of generalization performance is reduced by 34.2% compared with the comparison method. This study provides an effective intelligent solution for low-carbon-oriented urban traffic signal control.

References

[1] Li, B. W., Chen, Z. H., Zhu, X. H., Zhang, Z., Peng, Z. R., Zhao, H. M., & He, H. D. (2025). Assessment of eco-driving strategies on carbon emissions for hybrid vehicles through portable emissions measurement systems. Atmospheric Pollution Research, 16(3), 102365. DOI: https://doi.org/10.1016/j.apr.2024.102365

[2] Chavhan, S., Deepika, I. S., Gupta, D., & Rodrigues, J. J. (2025). Energy-Efficient-Enabled Edge-AI-IoT Integrated Traffic Incident Analysis and Avoidance of Secondary Incidents. IEEE Internet of Things Journal. DOI: https://doi.org/10.1109/JIOT.2025.3555408

[3] Li, X., Wang, G., Zhu, Y., & Liu, W. (2025). A System Dynamics-Based Simulation Study on Urban Traffic Congestion Mitigation and Emission Reduction Policies. Sustainability, 17(20), 9296. DOI: https://doi.org/10.3390/su17209296

[4] Li, D., Zhu, F., Wu, J., Wong, Y. D., & Chen, T. (2024). Managing mixed traffic at signalized intersections: An adaptive signal control and CAV coordination system based on deep reinforcement learning. Expert Systems with Applications, 238, 121959. DOI: https://doi.org/10.1016/j.eswa.2023.121959

[5] Benhamza, K., Seridi, H., Agguini, M., & Bentagine, A. (2024). A multi-agent reinforcement learning based approach for intelligent traffic signal control. Evolving Systems, 15(6), 2383-2397. DOI: https://doi.org/10.1007/s12530-024-09622-4

[6] Chen, X., Wang, X., Zhao, W., Wang, C., Cheng, S., & Luan, Z. (2025). Hierarchical deep reinforcement learning based multi-agent game control for energy consumption and traffic efficiency improving of autonomous vehicles. Energy, 323, 135669. DOI: https://doi.org/10.1016/j.energy.2025.135669

[7] Hu, J., Shan, Y., Yang, Y., Parisio, A., Li, Y., Amjady, N., ... & Rodríguez, J. (2023). Economic model predictive control for microgrid optimization: A review. IEEE Transactions on Smart Grid, 15(1), 472-484. DOI: https://doi.org/10.1109/TSG.2023.3266253

[8] Qadri, S. S. S. M., Gökçe, M. A., & Öner, E. (2020). State-of-art review of traffic signal control methods: challenges and opportunities. European transport research review, 12(1), 55. DOI: https://doi.org/10.1186/s12544-020-00439-1

[9] Tedjopurnomo, D. A., Bao, Z., Zheng, B., Choudhury, F. M., & Qin, A. K. (2020). A survey on modern deep neural network for traffic prediction: Trends, methods and challenges. IEEE Transactions on Knowledge and Data Engineering, 34(4), 1544-1561. DOI: https://doi.org/10.1109/TKDE.2020.3001195

[10] Liu, Y., Lyu, C., Zhang, Y., Liu, Z., Yu, W., & Qu, X. (2021). DeepTSP: Deep traffic state prediction model based on large-scale empirical data. Communications in transportation research, 1, 100012. DOI: https://doi.org/10.1016/j.commtr.2021.100012

[11] Luo, R., Peng, Z., & Hu, J. (2023). On model identification based optimal control and it’s applications to multi-agent learning and control. Mathematics, 11(4), 906. DOI: https://doi.org/10.3390/math11040906

[12] Nguyen, T. T., Nguyen, N. D., & Nahavandi, S. (2020). Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics, 50(9), 3826-3839. DOI: https://doi.org/10.1109/TCYB.2020.2977374

[13] Liu, H., Li, X., Zhang, L., & Cheng, R. (2026). Bridging phase and timing: A joint Q-value learning framework for synergistic traffic signal control at consecutive arterial road intersections. Physica A: Statistical Mechanics and its Applications, 131421. DOI: https://doi.org/10.1016/j.physa.2026

[14] Bernárdez, G., Suárez-Varela, J., López, A., Shi, X., Xiao, S., Cheng, X., ... & Cabellos-Aparicio, A. (2023). Magnneto: A graph neural network-based multi-agent system for traffic engineering. IEEE Transactions on Cognitive Communications and Networking, 9(2), 494-506. DOI: https://doi.org/10.1109/TCCN.2023.3235719

[15] Wang, X., Yue, X., Huang, J., & Li, S. (2025). Integrating traffic dynamics and emissions modeling: From classical approaches to data-driven futures. Atmosphere, 16(6), 695. DOI: https://doi.org/10.3390/atmos16060695

[16] Mera, Z., Varella, R., Baptista, P., Duarte, G., & Rosero, F. (2022). Including engine data for energy and pollutants assessment into the vehicle specific power methodology. Applied Energy, 311, 118690. DOI: https://doi.org/10.1016/j.apenergy.2022.118690

[17] He, K., Chen, C., Chen, S., Chen, B., Zhang, A., Chen, P., ... & Wu, Z. (2025). Reinforcement Learning for Multi-Objective Optimization: A Review. Archives of Computational Methods in Engineering, 1-30. DOI: https://doi.org/10.1007/s11831-025-10389-3

[18] Nguyen, T. T., Nguyen, N. D., Vamplew, P., Nahavandi, S., Dazeley, R., & Lim, C. P. (2020). A multi-objective deep reinforcement learning framework. Engineering Applications of Artificial Intelligence, 96, 103915. DOI: https://doi.org/10.1016/j.engappai.2020.103915

[19] Liu, X., Ye, K., van Vlijmen, H. W., Emmerich, M. T., IJzerman, A. P., & van Westen, G. J. (2021). DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology. Journal of cheminformatics, 13(1), 85. DOI: https://doi.org/10.1186/s13321-021-00561-9

[20] Pereira, V., Sousa, P., & Rocha, M. (2022). A comparison of multi-objective optimization algorithms for weight setting problems in traffic engineering. Natural Computing, 21(3), 507-522. DOI: https://doi.org/10.1007/s11047-020-09807-1

[21] Taha, K. (2020). Methods that optimize multi-objective problems: A survey and experimental evaluation. IEEE Access, 8, 80855-80878. DOI: https://doi.org/10.1109/ACCESS.2020.2989219

[22] Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., & Knoll, A. (2024). A review of safe reinforcement learning: Methods, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 11216-11235. DOI: https://doi.org/10.1109/TPAMI.2024.3457538

[23] Ceusters, G., Camargo, L. R., Franke, R., Nowé, A., & Messagie, M. (2023). Safe reinforcement learning for multi-energy management systems with known constraint functions. Energy and AI, 12, 100227. DOI: https://doi.org/10.1016/j.egyai.2022.100227

[24] Motte, M., & Pham, H. (2022). Mean-field Markov decision processes with common noise and open-loop controls. The Annals of Applied Probability, 32(2), 1421-1458. DOI: https://doi.org/10.1214/21-AAP1713

[25] Yang, J., Wu, J., Fang, L., Fan, H., Zhang, B., Zhao, H., ... & You, X. (2025). MSRFormer: road network representation learning using multi-scale feature fusion of heterogeneous spatial interactions. Geo-spatial Information Science, 1-20. DOI: https://doi.org/10.1080/10095020.2025.2583710

[26] Ye, C., Liu, F., Ou, Y., & Xu, Z. (2022). Optimization of Vehicle Paths considering Carbon Emissions in a Time‐Varying Road Network. Journal of advanced transportation, 2022(1), 9656262. DOI: https://doi.org/10.1155/2022/9656262

[27] Li, H., Qian, X., & Song, W. (2024). Prioritized experience replay based on dynamics priority. Scientific Reports, 14(1), 6014. DOI: https://doi.org/10.1038/s41598-024-56673-3

[28] Vadlamani, S. K., Xiao, T. P., & Yablonovitch, E. (2020). Physics successfully implements Lagrange multiplier optimization. Proceedings of the National Academy of Sciences, 117(43), 26639-26650. DOI: https://doi.org/10.1073/pnas.2015192117

[29] Saeed Chilmeran, H. T., Hamed, E. T., Ahmed, H. I., & Al-Bayati, A. Y. (2022). A method of two new augmented lagrange multiplier versions for solving constrained problems. International journal of mathematics and mathematical sciences, 2022(1), 3527623. DOI: https://doi.org/10.1155/2022/3527623

[30] Chen, R., Tsay, Y. S., & Ni, S. (2022). An integrated framework for multi-objective optimization of building performance: Carbon emissions, thermal comfort, and global cost. Journal of Cleaner Production, 359, 131978. DOI: https://doi.org/10.1016/j.jclepro.2022.131978

[31] Zubaer, K. H., Alam, Q. M., Toha, T. R., Salim, S. I., & Al Islam, A. A. (2020). Towards simulating non-lane based heterogeneous road traffic of less developed countries using authoritative polygonal GIS map. Simulation Modelling Practice and Theory, 105, 102156. DOI: https://doi.org/10.1016/j.simpat.2020.102156

[32] Chen, D., Zhu, M., Yang, H., Wang, X., & Wang, Y. (2024). Data-driven traffic simulation: A comprehensive review. IEEE Transactions on Intelligent Vehicles, 9(4), 4730-4748. DOI: https://doi.org/10.1109/TIV.2024.3367919