Real-Time Voice-to-Voice Translation for Cross-Lingual Communication: Cascade Pipeline and RNN Based Approach
Keywords:
Real-Time Voice Translation, Voice-to-Voice Translation, Speech-to-Text Translation, Text-to-Speech Translation, Multi-Languages TranslationAbstract
To facilitate smooth conversations, language diversity presents communication challenges, particularly in face-to-face conversations. Real-time voice-to-voice translation for cross-lingual communication bridges these gaps. Most of the population of Pakistan speaks Urdu and is not proficient in English. Language is a major barrier to accessing information and participating in global discourse. This study focused on overcoming the barrier by utilizing machine learning for multilingual voice translation. This system is designed to translate Pakistan’s native languages into English, supporting real-time communication. A real-time speech translation system utilizes a two-stage approach. First, the System is trained by combining a custom and pre-trained Wav2Vec 2.0 unlabeled dataset, and achieves 98.76% accuracy. Second, the cascade pipeline is employed to support accurate translation of text from the source into the target language. In the cascade pipeline architecture, each language demonstrates a distinct recognition accuracy, which corresponds to its linguistic prominence and availability of training data. It operates by taking the user's voice as input from a microphone and employs Automatic Speech Recognition (ASR) for speech recognition and to convert speech into text [1]. To convert translated text back to the voice Text-to-Speech (TTS) [2] module is employed. End-to-end pipelines enable effective real-time communication and offer an effective and user-friendly solution for overcoming the language barrier in a multi-lingual environment. This work significantly minimizes the gaps in multilingual communication.
Downloads
Published
How to Cite
Issue
Section
License
This is an open Access Article published by Research Center of Computing & Biomedical Informatics (RCBI), Lahore, Pakistan under CCBY 4.0 International License