Machine learning prediction of heat capacity of polymers as a function of temperature

2025-11-24

Kazuhiko Ishikiriyama,
Machine learning prediction of heat capacity of polymers as a function of temperature,
Polymer,
Volume 339,
2025,
129171,
ISSN 0032-3861,
https://doi.org/10.1016/j.polymer.2025.129171.
(https://www.sciencedirect.com/science/article/pii/S0032386125011577)
Abstract: Machine learning models were developed using the high-quality ATHAS (Advanced Thermal Analysis System) data bank to predict the constant-pressure heat capacity (CP) of polymers at 10 K intervals from 10 to 500 K. Molecular fingerprints (FPs) were used as features; specifically, circular Morgan fingerprints with a bond diameter of 4 derived from the repeating structural units of polymers. For polymers contained in the ATHAS data bank (e.g., polypropylene and polyamide 6), the predicted CP values showed mean relative errors (MREs) within ±3 %. In contrast, for polymers absent from the data bank—including poly(p-dioxanone), poly(N-vinylpyrrolidone), and starch—a positive correlation was observed between MRE and the number of missing substructures (Nms), defined as hashed identifiers present in the target polymer but absent from the ATHAS-derived feature space. Using this correlation, CP predictions for polymers with Nms > 0 were adjusted, reducing the MREs to within ±3 %. To improve accuracy, additional models employing alternative FPs were built: polyBERT FP, generated from a pre-trained BERT-based chemical language model, and OMG FP and SMiPoly FP, derived from the virtual polymer libraries OMG and SMiPoly. For polymers with Nms > 0, all alternative FPs yielded lower MREs than uncorrected Morgan fingerprints. The lowest MREs were achieved using a hybrid FP constructed from OMG and 10 % of the SMiPoly dataset, demonstrating enhanced extrapolative performance. Due to computational limits, molecular dynamics struggles to capture this temperature dependence, whereas trained machine learning models may rapidly predict it for many polymers, suggesting their potential as a practical alternative.
Keywords: Polymer informatics; Machine learning; Heat capacity; Circular fingerprint; Natural language processing