A Deep Reinforcement Learning Signal Control Algorithm for Traffic Carbon Emission Optimization

Urban traffic congestion leads to frequent vehicle start-stop events and low-speed operation, which is one of the primary drivers of carbon emission growth. To address the problems of multi-objective conflict, training instability, and inadequate carbon emission modeling in existing traffic signal control methods for carbon emission optimization, this paper proposes a deep reinforcement learning signal control algorithm for carbon emission optimization. This method constructs a carbon-emission-aware dynamic reward mechanism and achieves collaborative optimization of traffic efficiency and emission reduction objectives through adaptive weight adjustment; Lagrange multiplier method is introduced to embed the carbon emission threshold as an explicit constraint into the strategy learning process to ensure that the emission level is controlled within an acceptable range; For multi-intersection scenarios, a distributed collaborative control framework based on parameter sharing and neighborhood information interaction is designed to enhance the model's ability to perceive the spatial propagation characteristics of traffic flow. Based on the SUMO simulation platform, experimental validation is conducted in three scenarios: a single intersection, a 4×4 grid network, and a real-world urban road network. The results show that compared with PPO algorithm, the average carbon emissions of this method are reduced by 11.3% to 12.8%, average delay is reduced by 15.7%, average speed is increased by 9.6%, and the comprehensive performance index is improved by 12.2%; During the training process, the fluctuation of strategy is reduced by about 50%, and the degradation rate of generalization performance is reduced by 34.2% compared with the comparison method. This study provides an effective intelligent solution for low-carbon-oriented urban traffic signal control.

Li, B. W., Chen, Z. H., Zhu, X. H., Zhang, Z., Peng, Z. R., Zhao, H. M., & He, H. D. (2025). Assessment of eco-driving strategies on carbon emissions for hybrid vehicles through portable emissions measurement systems. Atmospheric Pollution Research, 16(3), 102365. DOI: 10.1016/j.apr.2024.102365
Chavhan, S., Deepika, I. S., Gupta, D., & Rodrigues, J. J. (2025). Energy-Efficient-Enabled Edge-AI-IoT Integrated Traffic Incident Analysis and Avoidance of Secondary Incidents. IEEE Internet of Things Journal. DOI: 10.1109/JIOT.2025.3555408
Li, X., Wang, G., Zhu, Y., & Liu, W. (2025). A System Dynamics-Based Simulation Study on Urban Traffic Congestion Mitigation and Emission Reduction Policies. Sustainability, 17(20), 9296. DOI: 10.3390/su17209296
Li, D., Zhu, F., Wu, J., Wong, Y. D., & Chen, T. (2024). Managing mixed traffic at signalized intersections: An adaptive signal control and CAV coordination system based on deep reinforcement learning. Expert Systems with Applications, 238, 121959. DOI: 10.1016/j.eswa.2023.121959
Benhamza, K., Seridi, H., Agguini, M., & Bentagine, A. (2024). A multi-agent reinforcement learning based approach for intelligent traffic signal control. Evolving Systems, 15(6), 2383-2397. DOI: 10.1007/s12530-024-09622-4
Chen, X., Wang, X., Zhao, W., Wang, C., Cheng, S., & Luan, Z. (2025). Hierarchical deep reinforcement learning based multi-agent game control for energy consumption and traffic efficiency improving of autonomous vehicles. Energy, 323, 135669. DOI: 10.1016/j.energy.2025.135669
Hu, J., Shan, Y., Yang, Y., Parisio, A., Li, Y., Amjady, N., ... & Rodríguez, J. (2023). Economic model predictive control for microgrid optimization: A review. IEEE Transactions on Smart Grid, 15(1), 472-484. DOI: 10.1109/TSG.2023.3266253
Qadri, S. S. S. M., Gökçe, M. A., & Öner, E. (2020). State-of-art review of traffic signal control methods: challenges and opportunities. European Transport Research Review, 12(1), 55. DOI: 10.1186/s12544-020-00439-1
Tedjopurnomo, D. A., Bao, Z., Zheng, B., Choudhury, F. M., & Qin, A. K. (2020). A survey on modern deep neural network for traffic prediction: Trends, methods and challenges. IEEE Transactions on Knowledge and Data Engineering, 34(4), 1544-1561. DOI: 10.1109/TKDE.2020.3001195
Liu, Y., Lyu, C., Zhang, Y., Liu, Z., Yu, W., & Qu, X. (2021). DeepTSP: Deep traffic state prediction model based on large-scale empirical data. Communications in Transportation Research, 1, 100012. DOI: 10.1016/j.commtr.2021.100012
Luo, R., Peng, Z., & Hu, J. (2023). On model identification based optimal control and it's applications to multi-agent learning and control. Mathematics, 11(4), 906. DOI: 10.3390/math11040906
Nguyen, T. T., Nguyen, N. D., & Nahavandi, S. (2020). Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 50(9), 3826-3839. DOI: 10.1109/TCYB.2020.2977374
Liu, H., Li, X., Zhang, L., & Cheng, R. (2026). Bridging phase and timing: A joint Q-value learning framework for synergistic traffic signal control at consecutive arterial road intersections. Physica A: Statistical Mechanics and its Applications, 131421. DOI: 10.1016/j.physa.2026
Bernárdez, G., Suárez-Varela, J., López, A., Shi, X., Xiao, S., Cheng, X., ... & Cabellos-Aparicio, A. (2023). Magnneto: A graph neural network-based multi-agent system for traffic engineering. IEEE Transactions on Cognitive Communications and Networking, 9(2), 494-506. DOI: 10.1109/TCCN.2023.3235719
Wang, X., Yue, X., Huang, J., & Li, S. (2025). Integrating traffic dynamics and emissions modeling: From classical approaches to data-driven futures. Atmosphere, 16(6), 695. DOI: 10.3390/atmos16060695
Mera, Z., Varella, R., Baptista, P., Duarte, G., & Rosero, F. (2022). Including engine data for energy and pollutants assessment into the vehicle specific power methodology. Applied Energy, 311, 118690. DOI: 10.1016/j.apenergy.2022.118690
He, K., Chen, C., Chen, S., Chen, B., Zhang, A., Chen, P., ... & Wu, Z. (2025). Reinforcement Learning for Multi-Objective Optimization: A Review. Archives of Computational Methods in Engineering, 1-30. DOI: 10.1007/s11831-025-10389-3
Nguyen, T. T., Nguyen, N. D., Vamplew, P., Nahavandi, S., Dazeley, R., & Lim, C. P. (2020). A multi-objective deep reinforcement learning framework. Engineering Applications of Artificial Intelligence, 96, 103915. DOI: 10.1016/j.engappai.2020.103915
Liu, X., Ye, K., van Vlijmen, H. W., Emmerich, M. T., IJzerman, A. P., & van Westen, G. J. (2021). DrugEx v2: de novo design of drug molecules by Pareto-based multi-objective reinforcement learning in polypharmacology. Journal of Cheminformatics, 13(1), 85. DOI: 10.1186/s13321-021-00561-9
Pereira, V., Sousa, P., & Rocha, M. (2022). A comparison of multi-objective optimization algorithms for weight setting problems in traffic engineering. Natural Computing, 21(3), 507-522. DOI: 10.1007/s11047-020-09807-1
Taha, K. (2020). Methods that optimize multi-objective problems: A survey and experimental evaluation. IEEE Access, 8, 80855-80878. DOI: 10.1109/ACCESS.2020.2989219
Gu, S., Yang, L., Du, Y., Chen, G., Walter, F., Wang, J., & Knoll, A. (2024). A review of safe reinforcement learning: Methods, theories, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(12), 11216-11235. DOI: 10.1109/TPAMI.2024.3457538
Ceusters, G., Camargo, L. R., Franke, R., Nowé, A., & Messagie, M. (2023). Safe reinforcement learning for multi-energy management systems with known constraint functions. Energy and AI, 12, 100227. DOI: 10.1016/j.egyai.2022.100227
Motte, M., & Pham, H. (2022). Mean-field Markov decision processes with common noise and open-loop controls. The Annals of Applied Probability, 32(2), 1421-1458. DOI: 10.1214/21-AAP1713
Yang, J., Wu, J., Fang, L., Fan, H., Zhang, B., Zhao, H., ... & You, X. (2025). MSRFormer: road network representation learning using multi-scale feature fusion of heterogeneous spatial interactions. Geo-spatial Information Science, 1-20. DOI: 10.1080/10095020.2025.2583710
Ye, C., Liu, F., Ou, Y., & Xu, Z. (2022). Optimization of Vehicle Paths considering Carbon Emissions in a Time‐Varying Road Network. Journal of Advanced Transportation, 2022(1), 9656262. DOI: 10.1155/2022/9656262
Li, H., Qian, X., & Song, W. (2024). Prioritized experience replay based on dynamics priority. Scientific Reports, 14(1), 6014. DOI: 10.1038/s41598-024-56673-3
Vadlamani, S. K., Xiao, T. P., & Yablonovitch, E. (2020). Physics successfully implements Lagrange multiplier optimization. Proceedings of the National Academy of Sciences, 117(43), 26639-26650. DOI: 10.1073/pnas.2015192117
Saeed Chilmeran, H. T., Hamed, E. T., Ahmed, H. I., & Al-Bayati, A. Y. (2022). A method of two new augmented lagrange multiplier versions for solving constrained problems. International Journal of Mathematics and Mathematical Sciences, 2022(1), 3527623. DOI: 10.1155/2022/3527623
Chen, R., Tsay, Y. S., & Ni, S. (2022). An integrated framework for multi-objective optimization of building performance: Carbon emissions, thermal comfort, and global cost. Journal of Cleaner Production, 359, 131978. DOI: 10.1016/j.jclepro.2022.131978
Zubaer, K. H., Alam, Q. M., Toha, T. R., Salim, S. I., & Al Islam, A. A. (2020). Towards simulating non-lane based heterogeneous road traffic of less developed countries using authoritative polygonal GIS map. Simulation Modelling Practice and Theory, 105, 102156. DOI: 10.1016/j.simpat.2020.102156
Chen, D., Zhu, M., Yang, H., Wang, X., & Wang, Y. (2024). Data-driven traffic simulation: A comprehensive review. IEEE Transactions on Intelligent Vehicles, 9(4), 4730-4748. DOI: 10.1109/TIV.2024.3367919