AI/ML

Build the Best Real Time Speech to Text & AI Translation System with Meta’s SeamlessM4T & FastAPI


Overview

SeamlessM4T model establishes superior standards for translation operations. With 20% higher BLEU(Bilingual Evaluation Understudy) scores than state of the art models on the FLEURS benchmark it operates. The quality rating for into English translations improves by 1.3 BLEU points in speech to text. SeamlessM4T operates in speech to text functions and demonstrates even stronger results in BLEU point advancements when compared to strong cascaded systems.

SeamlessM4T achieves outstanding performance when operating with noisy audio recordings and different speaking voices. The model delivers 38% greater resistance to background noise together with 49% superior capability in speaker adaptation relative to modern leading models in speech-to-text evaluation.

The Word Error Rate (WER) performance of SeamlessM4T Large exceeds OpenAI's Whisper Large V2 model by a comprehensive margin. The WER drops by 45% across 77 supported languages.

 

Key Features

The system detects speech contents automatically in supported languages then proceeds to generate transcripts.

The application enables processing audio data from several languages including English and Hindi and Spanish and other supported languages.

  • Real-Time Translation: Translates transcriptions into user selected target languages. The system delivers user friendly HTML based web interfaces to process and record audio through its Web Integration feature.
  • High Accuracy: Utilizes the SeamlessM4T model for state of the art translation quality. The web based accessibility feature runs through browsers so users do not need to install extra software to use the system.

 

How It Works

1. The system decodes the base64 encoded audio file and processes it using Pydub to ensure compatibility with SeamlessM4T.

2. SeamlessM4T handles transcription and translation tasks.

3. The model incorporates Google Translate for handling translations in natively spoken languages, ensuring greater efficiency.

 

Setup Instructions

1. Clone the Repository:

git clone https://github.com/facebookresearch/seamless_communication.git 

Official SeamlessM4T Github Repository: SeamlessM4T 

2. Install Dependencies: 

  • Before you begin, ensure you have the following installed:
  •  Python version 3.8-3.10
  •  FastAPI framework 
  • Pydantic for data validation
  • Pydub for audio processing
  • Googletrans for additional translations 
  • Sentencepiece

3. Prepare the SeamlessM4T Model

Ensure to set up the translator to initialize the SeamlessM4T model which is crucial for speech to text and translation tasks.

Translator specification:

-model_name = seamlessM4T_v2_large-vocoder_name = "vocoder_v2"

 

4. Run the Application

Start the FastAPI server:

uvicorn main:app --reload

 

5. Access the Web Interface

Open your browser and navigate to “http://127.0.0.1:8000/trans/” to interact with the app.

 

User Guide

1. Record Audio

  • Select the desired target language from the dropdown menu.
  • Click the "Start Recording" button to begin recording.
  • Click "Stop Recording" to finish the session.

2. View Translations

  • Transcriptions and translations appear in realtime in the interface.
  • Translations are displayed by language code (e.g., ENG, HIN).

 

Supported Languages by SeamlessM4T

Supported Languages by SeamlessM4T

 

API Integration in FastAPI

1) Transcription API

  • URL: /trans/
  • Method: GET
  • Purpose: HTML page with audio recording functionality and language selection.

2) Save Audio API

  • URL: /save_audio/
  • Method: POST
  • Request Body: {  "base64_audio": "<base64_encoded_audio>", "tgt_lang": "<target_language_code>" }
  • Response:   { "message": "Sent data", "data": {   "<lang_code>": "<translated_text>" } }   

Ready to elevate your business with cutting edge AI and ML solutions? Contact us today to harness the power of our expert technology services and drive innovation. 

0

AI/ML

Related Center Of Excellence