Digitizing audiograms with deep learning: structured data extraction and pseudonymization for hearing big data
Sunghwa You, Chanbeom Kwak, Chul Young Yoon, Young Joon Seo,
Digitizing audiograms with deep learning: structured data extraction and pseudonymization for hearing big data,
Hearing Research,
Volume 464,
2025,
109337,
ISSN 0378-5955,
https://doi.org/10.1016/j.heares.2025.109337.
(https://www.sciencedirect.com/science/article/pii/S0378595525001558)
Abstract: Purpose
hearing loss relies on pure-tone audiometry (PTA); however, audiograms are often stored as unstructured images, limiting their integration into electronic medical records (EMRs) and common data models (CDMs). This study developed a deep learning-based system to digitize audiograms, enabling the structured and numerical conversion of data for large-scale hearing big data collection.
Methods
A convolutional neural network (CNN) was trained to extract numerical frequency and threshold values from audiograms. The system consists of four modules: preprocessing, pattern classification, image analysis, and post-processing. Optical character recognition (OCR) was employed to extract patient data, which were then pseudonymized to prevent leakage of personal and sensitive information. The model was trained using 8847 audiometric symbols and tested using 2443 symbols.
Results
The model achieved accuracy of 95.01 % and 98.18 % for the right and left ears, respectively. It processed audiograms 17.72 times faster than manual digitization, reducing processing time from 63.27 s to 3.57 s per audiogram. The structured data format allows seamless integration into big data and CDMs, ensuring compliance with pseudonymization and anonymization protocols.
Discussion
The model improves data accessibility and scalability for both clinical and research applications. Unlike previous studies that primarily focused on classification or prediction, this framework ensures a structured numerical data output while adhering to data pseudonymization regulations.
Conclusion
This deep learning-based system enhanced the efficiency and accuracy of audiogram digitization, facilitating the construction of hearing big data, integration with CDMs, AI-driven diagnostics, and large-scale hearing data analysis.
Keywords: Digitization; Pseudonymization; Hearing big data; Deep learning algorithms; Audiogram