Transformer-Based Assamese Spelling Detection Using Fine-Tuned Transformer Models

Title:Transformer-Based Assamese Spelling Detection Using Fine-Tuned Transformer Models

Authors:Rituraj Phukan, Pritom Jyoti Goutom, Mandira Neog, Nomi Baruah and Tulika Chutia

Conference:ACIIDS2026

Tags:Assamese, Spelling Correction, Spelling Detection and Transformers

Abstract:

The increasing use of regional languages in digital communication has intensified the need for accurate, context-aware spelling detection tools that can handle the language’s rich morphology and complex orthographic patterns. This paper presents a transformer-based framework for Assamese spelling error detection, utilizing three multilingual pre-trained models—IndicBERT v2, XLM-RoBERTa, and mBERT—fine-tuned on a curated dataset of 115,925 word-level samples. Each word is represented as a tuple (word, label, corrected_word), enabling both binary error detection and lexicon-assisted correction. The models were trained using standardized preprocessing, subword tokenization, and a weighted loss function to address class imbalance. Experimental results demonstrate that XLM-RoBERTa achieves the highest detection performance, with 94.84% accuracy and an F1-score of 0.9683, while IndicBERT v2 attains the best correction accuracy (85.96%), highlighting complementary strengths across the two architectures. A hybrid pipeline that combines neural detection with a Levenshtein-based nearest-neighbor correction mechanism further enhances real-world applicability. The findings underscore the effectiveness of transformer models for Assamese spelling detection and lay the groundwork for future advances in contextual correction, real-world error handling, and deployment in educational, social media, and content moderation applications.