Research on Multi-Agent Collaborative Decision-Making Algorithm for Supply Chain Management

Changgeng Li; Zixi Liu

doi:10.71451/ISTAER2614

Authors

Changgeng Li International Operations, Shinhan University, Gyeonggi-do, Republic of Korea Author https://orcid.org/0009-0006-1115-105X
Zixi Liu International Operations, Shinhan University, Gyeonggi-do, Republic of Korea Author https://orcid.org/0009-0008-0026-2680

DOI:

https://doi.org/10.71451/ISTAER2614

Keywords:

Supply chain management; Multi-agent collaboration; Credit allocation; Graph attention network; Multi-agent deep deterministic policy gradient

Abstract

Addressing the key challenges of fuzzy credit allocation, low exploration efficiency, and insufficient robustness in multi-node collaborative decision-making in supply chain management, this paper proposes a hybrid local-global credit allocation multi-agent collaborative decision-making algorithm (HGA-MADDPG). This algorithm introduces a hierarchical graph attention mechanism to dynamically represent the state of the supply chain network topology. It quantifies the contribution of individual actions to sub-chain objectives and system-level indicators through local and global credit networks, respectively, and designs an adaptive fusion weight based on marginal returns to dynamically balance local and global credit. Furthermore, an adversarial disturbance and resilient training architecture is constructed, including modeling three types of disturbances: demand mutation, node failure, and transportation delay, as well as adversarial agent injection, a dynamic environment replay buffer, and a two-stage training strategy. In a baseline scenario of a four-level supply chain and a dynamic environment driven by real data based on SCDL and WSN, compared with eight baseline algorithms, experimental results show that HGA-MADDPG achieves a total cost reduction rate of 26.2%, a service level improvement rate of 42.8%, and a stockout rate controlled at 3.2%. In the extreme scenario of triple perturbation, the cost deviation rate (29.6%) and recovery time (58 hours) are significantly better than existing methods. It still maintains a cost reduction rate of 21.5% in a 120-node ultra-large-scale supply chain. Ablation experiments and scalability analysis further verify the effectiveness of each core module.

References

[1] Zhou, H., Yip, W. S., Ren, J., & To, S. (2020). An interaction investigation of the contributing factors of the bullwhip effect using a bi-level social network analysis approach. IEEE access, 8, 208737-208752. DOI: https://doi.org/10.1109/ACCESS.2020.3038680

[2] Tao, J., Aamir, M., Shoaib, M., Yasir, N., & Babar, M. (2025). Bridging the gap between supply chain risk and organizational performance conditioning to demand uncertainty. Sustainability, 17(6), 2462. DOI: https://doi.org/10.3390/su17062462

[3] Ivanov, D., & Dolgui, A. (2025). Tariff shocks, ripple effect, and deep uncertainty in supply chains: we are entering a turbulence zone, please fasten your seatbelts. International Journal of Production Research, 63(19), 7305-7317. DOI: https://doi.org/10.1080/00207543.2025.2520598

[4] Theodorakopoulos, L., Theodoropoulou, A., & Halkiopoulos, C. (2024). Enhancing decentralized decision-making with big data and blockchain technology: A comprehensive review. Applied Sciences, 14(16), 7007. DOI: https://doi.org/10.3390/app14167007

[5] Patari, N., Venkataramanan, V., Srivastava, A., Molzahn, D. K., Li, N., & Annaswamy, A. (2021). Distributed optimization in distribution systems: Use cases, limitations, and research needs. IEEE Transactions on Power Systems, 37(5), 3469-3481. DOI: https://doi.org/10.1109/TPWRS.2021.3132348

[6] Liu, J., Du, Y., Yang, K., Wu, J., Wang, Y., Hu, X., ... & Leung, V. C. (2026). Edge-cloud collaborative computing on distributed intelligence and model optimization: A survey. IEEE Communications Surveys & Tutorials. DOI: https://doi.org/10.1109/COMST.2026.3669216

[7] Lee, H., Lee, S. H., & Quek, T. Q. (2022). Artificial intelligence meets autonomy in wireless networks: A distributed learning approach. IEEE Network, 36(6), 100-107. DOI: https://doi.org/10.1109/MNET.105.2100450

[8] Tanwar, S., Popat, A., Bhattacharya, P., Gupta, R., & Kumar, N. (2022). A taxonomy of energy optimization techniques for smart cities: Architecture and future directions. Expert systems, 39(5), e12703. DOI: https://doi.org/10.1111/exsy.12703

[9] Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., & Spanò, S. (2021). Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11(11), 4948. DOI: https://doi.org/10.3390/app11114948

[10] Bahrpeyma, F., & Reichelt, D. (2022). A review of the applications of multi-agent reinforcement learning in smart factories. Frontiers in Robotics and AI, 9, 1027340. DOI: https://doi.org/10.3389/frobt.2022.1027340

[11] Li, T., Zhu, K., Luong, N. C., Niyato, D., Wu, Q., Zhang, Y., & Chen, B. (2022). Applications of multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Communications Surveys & Tutorials, 24(2), 1240-1279. DOI: https://doi.org/10.1109/COMST.2022.3160697

[12] Kumar, V. (2025). Interoperable Knowledge Graphs for Localized Supply Chains: Leveraging Graph Databases and RDF Standards. Logistics, 9(4), 144. DOI: https://doi.org/10.3390/logistics9040144

[13] Wiedmer, R., & Griffis, S. E. (2021). Structural characteristics of complex supply chain networks. Journal of Business Logistics, 42(2), 264-290. DOI: https://doi.org/10.1111/jbl.12283

[14] Tsantis, A., Mangan, J., & Palacin, R. (2026). Trade shocks and direct shipping connections: causal insights into network adaptability and supply chain resilience. WMU Journal of Maritime Affairs, 1-33. DOI: https://doi.org/10.1007/s13437-025-00399-0

[15] Feng, L. (2025). Joint optimization algorithm for vehicle scheduling and supply chain inventory management based on multi-agent deep reinforcement learning. Neural Computing and Applications, 37(34), 28643-28669. DOI: https://doi.org/10.1007/s00521-025-11661-0

[16] Feizabadi, J., Gligor, D., & Alibakhshi, S. (2021). Strategic supply chains: a configurational perspective. The International Journal of Logistics Management, 32(4), 1093-1123. DOI: https://doi.org/10.1108/IJLM-09-2020-0383

[17] Azadegan, A., & Dooley, K. (2021). A typology of supply network resilience strategies: complex collaborations in a complex world. Journal of Supply Chain Management, 57(1), 17-26. DOI: https://doi.org/10.1111/jscm.12256

[18] Kano, L., Tsang, E. W., & Yeung, H. W. C. (2020). Global value chains: A review of the multi-disciplinary literature: Liena Kano et al. Journal of international business studies, 51(4), 577-622. DOI: https://doi.org/10.1057/s41267-020-00304-2

[19] Zhang, S., Zheng, N., & Wang, D. L. (2022). A novel attention-based global and local information fusion neural network for group recommendation. Machine Intelligence Research, 19(4), 331-346. DOI: https://doi.org/10.1007/s11633-022-1336-1

[20] Liu, L., Shi, Y., Pi, Y., Guo, W., & Wang, S. (2025). Efficient multi-view graph convolutional networks via local aggregation and global propagation. Expert Systems with Applications, 266, 126131. DOI: https://doi.org/10.1016/j.eswa.2024.126131

[21] Liu, X., Wang, Q., Wei, X., & Liang, H. (2025, July). Hierarchical Attention-Driven Dynamic Graph Neural Networks for Accurate Supply Chain Demand Forecasting. In International Conference on Intelligent Computing (pp. 471-483). Singapore: Springer Nature Singapore. DOI: https://doi.org/10.1007/978-981-95-0009-3_40

[22] Farag, W. (2020, December). Multi-agent reinforcement learning using the deep distributed distributional deterministic policy gradients algorithm. In 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/3ICT51146.2020.9311945

[23] Fan, D., Shen, H., & Dong, L. (2021, October). Multi-agent distributed deep deterministic policy gradient for partially observable tracking. In Actuators (Vol. 10, No. 10, p. 268). MDPI. DOI: https://doi.org/10.3390/act10100268

[24] Ikpe, V., & Shamsuddoha, M. (2024). Functional model of supply chain waste reduction and control strategies for retailers—The USA retail industry. Logistics, 8(1), 22. DOI: https://doi.org/10.3390/logistics8010022

[25] Ovezmyradov, B. (2022). Product availability and stockpiling in times of pandemic: causes of supply chain disruptions and preventive measures in retailing. Annals of Operations Research, 1-33. DOI: https://doi.org/10.1007/s10479-022-05091-7

[26] Prashanth, L. A., & Michael, C. F. (2022). Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends in Machine Learning, 15(5), 537-693. DOI: https://doi.org/10.1561/9781638280279

[27] Moghaddam, A. R., & Kebriaei, H. (2024). Expected policy gradient for network aggregative Markov games in continuous space. IEEE Transactions on Neural Networks and Learning Systems, 36(4), 7372-7381. DOI: https://doi.org/10.1109/TNNLS.2024.3387871

[28] Tatarenko, T., Shi, W., & Nedić, A. (2020). Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking. IEEE Transactions on Automatic Control, 66(11), 5342-5353. DOI: https://doi.org/10.1109/TAC.2020.3046232

[29] Ma, C., Zhang, L., You, L., & Tian, W. (2024). A review of supply chain resilience: A network modeling perspective. Applied Sciences, 15(1), 265. DOI: https://doi.org/10.3390/app15010265

[30] Wang, J., Pal, A., Yang, Q., Kant, K., Zhu, K., & Guo, S. (2022). Collaborative machine learning: Schemes, robustness, and privacy. IEEE transactions on neural networks and learning systems, 34(12), 9625-9642. DOI: https://doi.org/10.1109/TNNLS.2022.3169347