Identification of 37 kinds of herbs containing oligosaccharides by combining data fusion and machine learning

2025-11-24

Li-jie Zhang, Ya-ling An, Fei Huang, Wen-jie Zhao, Chun-qian Song, Yu-shi Huang, Zhen-wei Li, Xiao-kang Liu, Yang Yang, Qinhua Chen, De-an Guo,
Identification of 37 kinds of herbs containing oligosaccharides by combining data fusion and machine learning,
Journal of Food Composition and Analysis,
Volume 148, Part 1,
2025,
108190,
ISSN 0889-1575,
https://doi.org/10.1016/j.jfca.2025.108190.
(https://www.sciencedirect.com/science/article/pii/S0889157525010051)
Abstract: Oligosaccharide-rich herbs (ORHs) predominantly belong to the category of edible plants, making them an essential component of modern health-conscious diets, with their similar appearances often leading to misuse or misidentification. This study conducted an in-depth analysis of 487 batches across 37 varieties of ORHs, aiming to address critical challenges in their identification and quality control. Innovatively, the research employed two detection techniques—evaporative light scattering detector (ELSD) and diode array detector (DAD)—in conjunction with low-level and mid-level data fusion strategies. These approaches yielded four distinct datasets, which were utilized to develop seven classification machine learning models, culminating in the construction of 28 models in total. The findings revealed that data fusion significantly enhanced model accuracy and performance, particularly the Partial Least Squares-Discriminant Analysis (PLS-DA) model constructed using mid-level data fusion, which achieved training and testing set accuracies exceeding 98.0 %, and a validation set accuracy of 100 %. This study not only provides a comprehensive comparison of identification schemes for 37 types of ORHs, but also offers valuable insights for quality control in the herbs market.
Keywords: Oligosaccharide-rich herbs; HPLC-DAD/ELSD; Data fusion; Machine learning; Discrimination