Urban traffic congestion leads to frequent vehicle start-stop events and low-speed operation, which is one of the primary drivers of carbon emission growth. To address the problems of multi-objective conflict, training instability, and inadequate carbon emission modeling in existing traffic signal control methods for carbon emission optimization, this paper proposes a deep reinforcement learning signal control algorithm for carbon emission optimization. This method constructs a carbon-emission-aware dynamic reward mechanism and achieves collaborative optimization of traffic efficiency and emission reduction objectives through adaptive weight adjustment; Lagrange multiplier method is introduced to embed the carbon emission threshold as an explicit constraint into the strategy learning process to ensure that the emission level is controlled within an acceptable range; For multi-intersection scenarios, a distributed collaborative control framework based on parameter sharing and neighborhood information interaction is designed to enhance the model's ability to perceive the spatial propagation characteristics of traffic flow. Based on the SUMO simulation platform, experimental validation is conducted in three scenarios: a single intersection, a 4×4 grid network, and a real-world urban road network. The results show that compared with PPO algorithm, the average carbon emissions of this method are reduced by 11.3% to 12.8%, average delay is reduced by 15.7%, average speed is increased by 9.6%, and the comprehensive performance index is improved by 12.2%; During the training process, the fluctuation of strategy is reduced by about 50%, and the degradation rate of generalization performance is reduced by 34.2% compared with the comparison method. This study provides an effective intelligent solution for low-carbon-oriented urban traffic signal control.