Research on Multi-Agent Collaborative Decision-Making Algorithm for Supply Chain Management

Changgeng Li1 , Zixi Liu1
1 International Operations, Shinhan University, Gyeonggi-do, Republic of Korea
International Scientific Technical and Economic Research 2026, Vol. 4, No. 2, pp. 21-50
DOI: 10.71451/ISTAER2614
Received: 19 January 2026; Revised: 25 February 2026; Accepted: 30 March 2026; Published: 4 April 2026
Abstract

Addressing the key challenges of fuzzy credit allocation, low exploration efficiency, and insufficient robustness in multi-node collaborative decision-making in supply chain management, this paper proposes a hybrid local-global credit allocation multi-agent collaborative decision-making algorithm (HGA-MADDPG). This algorithm introduces a hierarchical graph attention mechanism to dynamically represent the state of the supply chain network topology. It quantifies the contribution of individual actions to sub-chain objectives and system-level indicators through local and global credit networks, respectively, and designs an adaptive fusion weight based on marginal returns to dynamically balance local and global credit. Furthermore, an adversarial disturbance and resilient training architecture is constructed, including modeling three types of disturbances: demand mutation, node failure, and transportation delay, as well as adversarial agent injection, a dynamic environment replay buffer, and a two-stage training strategy. In a baseline scenario of a four-level supply chain and a dynamic environment driven by real data based on SCDL and WSN, compared with eight baseline algorithms, experimental results show that HGA-MADDPG achieves a total cost reduction rate of 26.2%, a service level improvement rate of 42.8%, and a stockout rate controlled at 3.2%. In the extreme scenario of triple perturbation, the cost deviation rate (29.6%) and recovery time (58 hours) are significantly better than existing methods. It still maintains a cost reduction rate of 21.5% in a 120-node ultra-large-scale supply chain. Ablation experiments and scalability analysis further verify the effectiveness of each core module.

Keywords
Supply chain management Multi-agent collaboration Credit allocation Graph attention network Multi-agent deep deterministic policy gradient
References
  1. Zhou, H., Yip, W. S., Ren, J., & To, S. (2020). An interaction investigation of the contributing factors of the bullwhip effect using a bi-level social network analysis approach. IEEE Access, 8, 208737-208752. DOI: 10.1109/ACCESS.2020.3038680
  2. Tao, J., Aamir, M., Shoaib, M., Yasir, N., & Babar, M. (2025). Bridging the gap between supply chain risk and organizational performance conditioning to demand uncertainty. Sustainability, 17(6), 2462. DOI: 10.3390/su17062462
  3. Ivanov, D., & Dolgui, A. (2025). Tariff shocks, ripple effect, and deep uncertainty in supply chains: we are entering a turbulence zone, please fasten your seatbelts. International Journal of Production Research, 63(19), 7305-7317. DOI: 10.1080/00207543.2025.2520598
  4. Theodorakopoulos, L., Theodoropoulou, A., & Halkiopoulos, C. (2024). Enhancing decentralized decision-making with big data and blockchain technology: A comprehensive review. Applied Sciences, 14(16), 7007. DOI: 10.3390/app14167007
  5. Patari, N., Venkataramanan, V., Srivastava, A., Molzahn, D. K., Li, N., & Annaswamy, A. (2021). Distributed optimization in distribution systems: Use cases, limitations, and research needs. IEEE Transactions on Power Systems, 37(5), 3469-3481. DOI: 10.1109/TPWRS.2021.3132348
  6. Liu, J., Du, Y., Yang, K., Wu, J., Wang, Y., Hu, X., ... & Leung, V. C. (2026). Edge-cloud collaborative computing on distributed intelligence and model optimization: A survey. IEEE Communications Surveys & Tutorials. DOI: 10.1109/COMST.2026.3669216
  7. Lee, H., Lee, S. H., & Quek, T. Q. (2022). Artificial intelligence meets autonomy in wireless networks: A distributed learning approach. IEEE Network, 36(6), 100-107. DOI: 10.1109/MNET.105.2100450
  8. Tanwar, S., Popat, A., Bhattacharya, P., Gupta, R., & Kumar, N. (2022). A taxonomy of energy optimization techniques for smart cities: Architecture and future directions. Expert Systems, 39(5), e12703. DOI: 10.1111/exsy.12703
  9. Canese, L., Cardarilli, G. C., Di Nunzio, L., Fazzolari, R., Giardino, D., Re, M., & Spanò, S. (2021). Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11(11), 4948. DOI: 10.3390/app11114948
  10. Bahrpeyma, F., & Reichelt, D. (2022). A review of the applications of multi-agent reinforcement learning in smart factories. Frontiers in Robotics and AI, 9, 1027340. DOI: 10.3389/frobt.2022.1027340
  11. Li, T., Zhu, K., Luong, N. C., Niyato, D., Wu, Q., Zhang, Y., & Chen, B. (2022). Applications of multi-agent reinforcement learning in future internet: A comprehensive survey. IEEE Communications Surveys & Tutorials, 24(2), 1240-1279. DOI: 10.1109/COMST.2022.3160697
  12. Kumar, V. (2025). Interoperable Knowledge Graphs for Localized Supply Chains: Leveraging Graph Databases and RDF Standards. Logistics, 9(4), 144. DOI: 10.3390/logistics9040144
  13. Wiedmer, R., & Griffis, S. E. (2021). Structural characteristics of complex supply chain networks. Journal of Business Logistics, 42(2), 264-290. DOI: 10.1111/jbl.12283
  14. Tsantis, A., Mangan, J., & Palacin, R. (2026). Trade shocks and direct shipping connections: causal insights into network adaptability and supply chain resilience. WMU Journal of Maritime Affairs, 1-33. DOI: 10.1007/s13437-025-00399-0
  15. Feng, L. (2025). Joint optimization algorithm for vehicle scheduling and supply chain inventory management based on multi-agent deep reinforcement learning. Neural Computing and Applications, 37(34), 28643-28669. DOI: 10.1007/s00521-025-11661-0
  16. Feizabadi, J., Gligor, D., & Alibakhshi, S. (2021). Strategic supply chains: a configurational perspective. The International Journal of Logistics Management, 32(4), 1093-1123. DOI: 10.1108/IJLM-09-2020-0383
  17. Azadegan, A., & Dooley, K. (2021). A typology of supply network resilience strategies: complex collaborations in a complex world. Journal of Supply Chain Management, 57(1), 17-26. DOI: 10.1111/jscm.12256
  18. Kano, L., Tsang, E. W., & Yeung, H. W. C. (2020). Global value chains: A review of the multi-disciplinary literature. Journal of International Business Studies, 51(4), 577-622. DOI: 10.1057/s41267-020-00304-2
  19. Zhang, S., Zheng, N., & Wang, D. L. (2022). A novel attention-based global and local information fusion neural network for group recommendation. Machine Intelligence Research, 19(4), 331-346. DOI: 10.1007/s11633-022-1336-1
  20. Liu, L., Shi, Y., Pi, Y., Guo, W., & Wang, S. (2025). Efficient multi-view graph convolutional networks via local aggregation and global propagation. Expert Systems with Applications, 266, 126131. DOI: 10.1016/j.eswa.2024.126131
  21. Liu, X., Wang, Q., Wei, X., & Liang, H. (2025). Hierarchical Attention-Driven Dynamic Graph Neural Networks for Accurate Supply Chain Demand Forecasting. In International Conference on Intelligent Computing (pp. 471-483). Singapore: Springer Nature Singapore. DOI: 10.1007/978-981-95-0009-3_40
  22. Farag, W. (2020). Multi-agent reinforcement learning using the deep distributed distributional deterministic policy gradients algorithm. In 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT) (pp. 1-6). IEEE. DOI: 10.1109/3ICT51146.2020.9311945
  23. Fan, D., Shen, H., & Dong, L. (2021). Multi-agent distributed deep deterministic policy gradient for partially observable tracking. In Actuators (Vol. 10, No. 10, p. 268). MDPI. DOI: 10.3390/act10100268
  24. Ikpe, V., & Shamsuddoha, M. (2024). Functional model of supply chain waste reduction and control strategies for retailers—The USA retail industry. Logistics, 8(1), 22. DOI: 10.3390/logistics8010022
  25. Ovezmyradov, B. (2022). Product availability and stockpiling in times of pandemic: causes of supply chain disruptions and preventive measures in retailing. Annals of Operations Research, 1-33. DOI: 10.1007/s10479-022-05091-7
  26. Prashanth, L. A., & Michael, C. F. (2022). Risk-sensitive reinforcement learning via policy gradient search. Foundations and Trends in Machine Learning, 15(5), 537-693. DOI: 10.1561/9781638280279
  27. Moghaddam, A. R., & Kebriaei, H. (2024). Expected policy gradient for network aggregative Markov games in continuous space. IEEE Transactions on Neural Networks and Learning Systems, 36(4), 7372-7381. DOI: 10.1109/TNNLS.2024.3387871
  28. Tatarenko, T., Shi, W., & Nedić, A. (2020). Geometric convergence of gradient play algorithms for distributed Nash equilibrium seeking. IEEE Transactions on Automatic Control, 66(11), 5342-5353. DOI: 10.1109/TAC.2020.3046232
  29. Ma, C., Zhang, L., You, L., & Tian, W. (2024). A review of supply chain resilience: A network modeling perspective. Applied Sciences, 15(1), 265. DOI: 10.3390/app15010265
  30. Wang, J., Pal, A., Yang, Q., Kant, K., Zhu, K., & Guo, S. (2022). Collaborative machine learning: Schemes, robustness, and privacy. IEEE Transactions on Neural Networks and Learning Systems, 34(12), 9625-9642. DOI: 10.1109/TNNLS.2022.3169347