Real-Time Voice-to-Voice Translation for Cross-Lingual Communication: Cascade Pipeline and RNN Based Approach

Authors

  • Shanza Bibi Department of Computer Science, Government Sadiq College Women University, Bahawalpur, 63100, Punjab, Pakistan.
  • Hina Sattar Department of Computer Science, Government Sadiq College Women University, Bahawalpur, 63100, Punjab, Pakistan.
  • Laraib Fatima Department of Computer Science, Government Sadiq College Women University, Bahawalpur, 63100, Punjab, Pakistan.
  • Ayesha Iqbal Department of Computer Science, Government Sadiq College Women University, Bahawalpur, 63100, Punjab, Pakistan.
  • Umar Farooq Shafi Departement of Computer Science, The Islamia University of Bahawalpur, Bahawalpur, 63100, Punjab, Pakistan.

Keywords:

Real-Time Voice Translation, Voice-to-Voice Translation, Speech-to-Text Translation, Text-to-Speech Translation, Multi-Languages Translation

Abstract

To facilitate smooth conversations, language diversity presents communication challenges, particularly in face-to-face conversations. Real-time voice-to-voice translation for cross-lingual communication bridges these gaps. Most of the population of Pakistan speaks Urdu and is not proficient in English. Language is a major barrier to accessing information and participating in global discourse. This study focused on overcoming the barrier by utilizing machine learning for multilingual voice translation. This system is designed to translate Pakistan’s native languages into English, supporting real-time communication. A real-time speech translation system utilizes a two-stage approach. First, the System is trained by combining a custom and pre-trained Wav2Vec 2.0 unlabeled dataset, and achieves 98.76% accuracy. Second, the cascade pipeline is employed to support accurate translation of text from the source into the target language. In the cascade pipeline architecture, each language demonstrates a distinct recognition accuracy, which corresponds to its linguistic prominence and availability of training data. It operates by taking the user's voice as input from a microphone and employs Automatic Speech Recognition (ASR) for speech recognition and to convert speech into text [1]. To convert translated text back to the voice Text-to-Speech (TTS) [2] module is employed. End-to-end pipelines enable effective real-time communication and offer an effective and user-friendly solution for overcoming the language barrier in a multi-lingual environment. This work significantly minimizes the gaps in multilingual communication.

Downloads

Published

2025-06-01

How to Cite

Shanza Bibi, Hina Sattar, Laraib Fatima, Ayesha Iqbal, & Umar Farooq Shafi. (2025). Real-Time Voice-to-Voice Translation for Cross-Lingual Communication: Cascade Pipeline and RNN Based Approach. Journal of Computing & Biomedical Informatics, 9(01). Retrieved from https://jcbi.org/index.php/Main/article/view/1021