Deep learning on genomic sequences for rapid identification of drug-resistant tuberculosis
Sunil Bajeja, Kandula Jayapaul, Poonam Gaur, Chinnem Rama Mohan, Abhinav Pathak, Mukesh Madanan,
Deep learning on genomic sequences for rapid identification of drug-resistant tuberculosis,
Indian Journal of Tuberculosis,
Volume 72, Supplement 3,
2025,
Pages S168-S175,
ISSN 0019-5707,
https://doi.org/10.1016/j.ijtb.2025.11.020.
(https://www.sciencedirect.com/science/article/pii/S0019570725002665)
Abstract: Background
The bacteria Mycobacterium tuberculosis (MTB) cause tuberculosis (TB), which is still a major public health problem around the world. This is especially true now that drug-resistant forms of TB like Multi-Drug Resistant (MDR) and Extensively Drug-Resistant (XDR) are becoming more common. Traditional ways of diagnosing drug resistance take a lot of time and resources, which means they can't be used in places with few resources. Next-generation sequencing (NGS) advances give us a lot of genetic data that can show us changes that make bacteria resistant. But, understanding this complicated data needs strong computing methods.
Methods
This study creates and tests deep learning models that can quickly and correctly guess drug resistance traits from Mtb's raw genome sequences. We used big, freely available whole-genome sequencing files that were marked up with results from drug resistance tests. Different types of model designs were looked at, such as Transformer-based models, Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs), along with different ways of storing data. Quality control, variant naming, and normalisation were all parts of data preparation. To make sure that all resistance phenotypes were fairly represented, models were trained with stratified training-validation splits. Metrics like accuracy, precision, recall, F1-score, and AUC-ROC were used to measure performance.
Results
The Transformer model did better than CNN and RNN designs, with a validation accuracy of 93.5 %, a precision of 91.8 %, a recall of 89.9 %, an F1-score of 90.8 %, and an AUC-ROC of 95.7 %. It also showed faster convergence and less training/validation loss, which showed that it could understand both local and global sequence relationships. Deep learning models were better at predicting the future than common machine learning baselines like logistic regression, random forests, and gradient boosting.
Conclusion
Deep learning methods, especially Transformer-based models, show a lot of potential for quickly predicting drug-resistant TB based on the genome. These models can automatically learn features from raw sequences, which cuts down on the need for human feature engineering and makes it possible to do accurate assessments on a large scale. Future work will focus on adding more types of datasets, making them easier to understand, and combining multi-omics data to make predictions even more accurate and useful in clinical settings.
Keywords: Tuberculosis; Drug resistance prediction; Deep learning; Whole-genome sequencing; Transformer models