AI-driven extraction and intelligent retrieval of missionary archives in Malabar: advancing preservation and accessibility with machine learning

2025-11-24

Bincy Baburaj Kaluvilla, Subhash Abel Kalarikkal, G. Thamilvanan,
AI-driven extraction and intelligent retrieval of missionary archives in Malabar: advancing preservation and accessibility with machine learning,
Performance Measurement and Metrics,
Volume 26, Issue 4,
2025,
Pages 253-267,
ISSN 1467-8047,
https://doi.org/10.1108/PMM-02-2025-0008.
(https://www.sciencedirect.com/science/article/pii/S1467804725000033)
Abstract: Purpose
This study shows how AI improves the transcription, indexing and searchability of historical documents by utilizing AI-driven Optical Character Recognition (OCR), Handwritten Text Recognition (HTR), Named Entity Recognition (NER), machine learning-based classification and transformer-based retrieval models.
Design/methodology/approach
This study uses a computational archival science approach to analyze missionary records in Malabar by combining machine learning-based text recognition, natural language processing (NLP), document classification and AI-powered retrieval models.
Findings
The findings show that AI and ML significantly improve the speed, performance and efficiency of archival digitization. OCR achieves up to 97.5% performance for modern printed texts, while HTR models exceed 92.5% for structured handwriting, demonstrating the efficacy of deep learning in text recognition. NER models successfully extract missionary names (91.3% F1-score) and locations (90.0% F1-score), whereas classification models such as Random Forest achieve the performance of 89.3% when categorizing archival documents, and bidirectional encoder representations from transformers (BERT)-based search engines scoring 93.5% Precision@10 and 91.2% Recall@10, demonstrating their superior ability to retrieve relevant archival records. Precision@10 means that out of the top ten retrieved results, 93.5% are relevant, while Recall@10 indicates that 91.2% of all relevant results were found within the top ten retrieved results.
Originality/value
This study presents a novel integration of AI and machine learning for the systematic extraction, classification and retrieval of historical missionary records, bridging the gap between historical preservation and computational intelligence.
Keywords: Machine learning (ML); Artificial intelligence (AI); Digitization; Archives; Preservation; History; Malabar