Download PDFOpen PDF in browserNear Real-Time Automatic Speaker Recognition for Voice-Based InterfacesEasyChair Preprint 1409314 pages•Date: July 23, 2024AbstractIn recent years, the demand for efficient and secure voice-based interfaces has surged, driven by the proliferation of smart devices and the need for hands-free interaction. This paper presents a novel approach to near real-time automatic speaker recognition aimed at enhancing the security and usability of voice-based interfaces. Our system employs advanced machine learning algorithms and robust feature extraction techniques to achieve high accuracy in speaker identification and verification. We integrate a lightweight, yet powerful, deep neural network (DNN) architecture that processes voice input with minimal latency, making it suitable for real-time applications. The proposed method leverages a combination of mel-frequency cepstral coefficients (MFCCs), voice activity detection (VAD), and speaker embeddings to create a distinctive speaker profile. Experimental results demonstrate the system's efficacy in diverse acoustic environments and its resilience to common challenges such as background noise and voice mimicry. The implementation is evaluated on a publicly available dataset, showing promising results with an average identification accuracy of 98.2% and a verification equal error rate (EER) of 1.5%. This study underscores the potential of near real-time speaker recognition systems in enhancing user authentication and personalization in voice-activated applications, paving the way for more secure and intuitive human-computer interactions. Keyphrases: Mel Frequency Cepstral Coefficients, Near Real-Time Processing, automatic speaker recognition, feature extraction, speaker identification, speaker verification
|