Advancing ovarian cancer outcomes with CTGAN-enhanced hybrid machine learning approach
Rahman Shafique, Ahmad Sami Al-Shamayleh, Sarath Kumar Posa, Abid Ishaq, Furqan Rustam, Gyu Sang Choi,
Advancing ovarian cancer outcomes with CTGAN-enhanced hybrid machine learning approach,
Knowledge-Based Systems,
Volume 328,
2025,
114206,
ISSN 0950-7051,
https://doi.org/10.1016/j.knosys.2025.114206.
(https://www.sciencedirect.com/science/article/pii/S095070512501247X)
Abstract: Ovarian cancer is a gynecologic malignancy with a high mortality rate owing to its asymptomatic nature and often late diagnosis. Early detection is vital to improving patient outcomes, and machine learning techniques have shown promise in assisting with diagnosis and prognosis. However, a lack of data can make it difficult to achieve significant results. The objective of this study is to assess the ability of machine learning to detect ovarian cancer despite having only a limited amount of clinical data. To address the limited data issue, we employ the Conditional Tabular Generative Adversarial Network (CTGAN), a technique that generates highly correlated data by using the original data to increase the data size. Subsequently, we developed an ensemble model named Decision Logistic Forest (DLF) that combines three models (logistic regression, the decision tree, and random forest) and uses majority voting and probability based on voting criteria. The proposed DLF achieved significant accuracy of 99 % by using CTGAN data augmentation, demonstrating its potential in assisting with ovarian cancer diagnosis and prognosis. The statistical T-test demonstrates the significance of the proposed approach compared to other approaches.
Keywords: Machine learning; Ovarian cancer; Ensemble learning; Clinical data; Data augmentation