Adversarial learning enhanced multi-agent cooperative reinforcement learning for parallel batch processing machine scheduling in wafer fabrication

2025-11-24

Wenbin Xiang, Jie Zhang, Peng Zhang, Ming Wang, Hongsen Li,
Adversarial learning enhanced multi-agent cooperative reinforcement learning for parallel batch processing machine scheduling in wafer fabrication,
Computers & Industrial Engineering,
2025,
111660,
ISSN 0360-8352,
https://doi.org/10.1016/j.cie.2025.111660.
(https://www.sciencedirect.com/science/article/pii/S036083522500806X)
Abstract: The scheduling problem in the batch-processing area of wafer fabrication systems poses a critical challenge for parallel batch processors, primarily due to dynamic lot arrivals, re-entrant flows, and complex constraints such as finite capacity and recipe incompatibility. In this study, we propose a deep reinforcement learning (RL) approach that integrates multi-agent collaboration (MAC) with adversarial experience augmentation. Two types of agents are designed — a batch-formation agent and a batch-assignment agent — and both are equipped with Long Short-Term Memory (LSTM) modules to capture dynamic shop-floor state information and improve adaptability. To address the scarcity of informative experience in early training, we incorporate a generative adversarial module (Corr-GAN) that synthesizes additional state–action–next-state–reward tuples. Corr-GAN uses a relationship-correction network to enforce consistency between generated state–action pairs and their predicted next state and reward, ensuring that synthetic experience remains physically plausible for wafer batch scheduling. This augmented experience is mixed with real interaction data to accelerate convergence and stabilize policy learning. Experimental results and enterprise-scale case studies show that the proposed method learns faster and exhibits more stable convergence than standard RL baselines, and achieves substantial performance gains over both conventional RL and industrial heuristic approaches on larger and more complex instances, ultimately reducing mean wafer flow time.
Keywords: Wafer batch area scheduling; Deep reinforcement learning; Multi-agent collaboration (MAC); Adversarial learning