Comprehensive Serum Glycopeptide Spectrum Analysis with Machine Learning for Non-Invasive Early Detection of Gastrointestinal Cancers

2025-11-25

Yuichi Hisamatsu, Kazuhiro Tanabe, Kensuke Kudo, Hirofumi Hasuda, Eiji Kusumoto, Hideo Uehara, Rintaro Yoshida, Mitsuhiko Ota, Yoshihisa Sakaguchi, Chihiro Hayashi, Mikio Mikami, Tetsuya Kusumoto,
Comprehensive Serum Glycopeptide Spectrum Analysis with Machine Learning for Non-Invasive Early Detection of Gastrointestinal Cancers,
Computational and Structural Biotechnology Journal,
2025,
,
ISSN 2001-0370,
https://doi.org/10.1016/j.csbj.2025.10.067.
(https://www.sciencedirect.com/science/article/pii/S2001037025004672)
Abstract: Purpose
Gastrointestinal cancers, including colorectal cancer (CRC), gastric cancer (GC), and esophageal cancer (EC), are among the most common and lethal malignancies worldwide. Early detection is critical for improving patient outcomes, but the current diagnostic methods, such as endoscopy, are burdensome, costly, and inaccessible for widespread screening. Here, we have identified the transformative potential of non-invasive blood-based diagnostics by integrating advanced glycan biomarkers and machine learning.
Experimental Design
This study analyzed serum samples from 296 CRC, 180 GC, and 42 EC patients, alongside 590 healthy controls. Nine conventional tumor markers were quantified and 1,688 enriched glycopeptides (EGPs) were identified via liquid chromatography-mass spectrometry. Using Comprehensive Serum Glycopeptide Spectrum Analysis (CSGSA), EGPs were integrated with conventional markers into machine learning models, including neural networks, to develop and validate diagnostic frameworks.
Results
Two glycopeptides, α1-antitrypsin at Asn271 and α2-macroglobulin at Asn70, were identified as highly cancer-specific biomarkers. Integrating these glycopeptides, tumor markers, and EGPs significantly improved the diagnostic performance. The neural network-based model achieved area under the curve values of 0.966, 0.992, and 0.995 for CRC, GC, and EC, respectively, with respective positive predictive values of 54.5%, 35.3%, and 11.1%, exceeding non-invasive diagnostic benchmarks. Remarkably, the CSGSA approach differentiated cancer types with high accuracy, even in early-stage disease.
Conclusion
CSGSA represents a breakthrough in non-invasive gastrointestinal cancer diagnostics, combining glycopeptide profiling with machine learning to achieve unprecedented accuracy. This method provides a cost-effective and scalable alternative to invasive procedures and may have potential utility in general health screening, which warrants further investigation.
Keywords: Gastrointestinal cancer; comprehensive serum glycopeptide spectra analysis; neural network; glycopeptide; mass spectrometry; glycomics