AI/ML

Build the Best Real Time Speech to Text & AI Translation System with Meta’s SeamlessM4T & FastAPI

Need technical help?

Our experts will get back to you within 24 hours.

Overview

SeamlessM4T model establishes superior standards for translation operations. With 20% higher BLEU(Bilingual Evaluation Understudy) scores than state of the art models on the FLEURS benchmark it operates. The quality rating for into English translations improves by 1.3 BLEU points in speech to text. SeamlessM4T operates in speech to text functions and demonstrates even stronger results in BLEU point advancements when compared to strong cascaded systems.

SeamlessM4T achieves outstanding performance when operating with noisy audio recordings and different speaking voices. The model delivers 38% greater resistance to background noise together with 49% superior capability in speaker adaptation relative to modern leading models in speech-to-text evaluation.

The Word Error Rate (WER) performance of SeamlessM4T Large exceeds OpenAI's Whisper Large V2 model by a comprehensive margin. The WER drops by 45% across 77 supported languages.

Key Features

The system detects speech contents automatically in supported languages then proceeds to generate transcripts.

The application enables processing audio data from several languages including English and Hindi and Spanish and other supported languages.

Real-Time Translation: Translates transcriptions into user selected target languages. The system delivers user friendly HTML based web interfaces to process and record audio through its Web Integration feature.
High Accuracy: Utilizes the SeamlessM4T model for state of the art translation quality. The web based accessibility feature runs through browsers so users do not need to install extra software to use the system.

How It Works

1. The system decodes the base64 encoded audio file and processes it using Pydub to ensure compatibility with SeamlessM4T.

2. SeamlessM4T handles transcription and translation tasks.

3. The model incorporates Google Translate for handling translations in natively spoken languages, ensuring greater efficiency.

Setup Instructions

1. Clone the Repository:

git clone https://github.com/facebookresearch/seamless_communication.git

Official SeamlessM4T Github Repository: SeamlessM4T

2. Install Dependencies:

Before you begin, ensure you have the following installed:
Python version 3.8-3.10
FastAPI framework
Pydantic for data validation
Pydub for audio processing
Googletrans for additional translations
Sentencepiece

3. Prepare the SeamlessM4T Model

Ensure to set up the translator to initialize the SeamlessM4T model which is crucial for speech to text and translation tasks.

Translator specification:

-model_name = seamlessM4T_v2_large-vocoder_name = "vocoder_v2"

4. Run the Application

Start the FastAPI server:

uvicorn main:app --reload

5. Access the Web Interface

Open your browser and navigate to “http://127.0.0.1:8000/trans/” to interact with the app.

User Guide

1. Record Audio

Select the desired target language from the dropdown menu.
Click the "Start Recording" button to begin recording.
Click "Stop Recording" to finish the session.

2. View Translations

Transcriptions and translations appear in realtime in the interface.
Translations are displayed by language code (e.g., ENG, HIN).

Supported Languages by SeamlessM4T

API Integration in FastAPI

1) Transcription API

URL: /trans/
Method: GET
Purpose: HTML page with audio recording functionality and language selection.

2) Save Audio API

URL: /save_audio/
Method: POST

Request Body:{ "base64_audio": "<base64_encoded_audio>", "tgt_lang": "<target_language_code>" }

Response: { "message": "Sent data", "data": { "<lang_code>": "<translated_text>" } }

Ready to elevate your business with cutting edge AI and ML solutions? Contact us today to harness the power of our expert technology services and drive innovation.

Experts in AI, ML, and automation at OneClick IT Consultancy

AI Force

AI Force at OneClick IT Consultancy pioneers artificial intelligence and machine learning solutions. We drive COE initiatives by developing intelligent automation, predictive analytics, and AI-driven applications that transform businesses.

AI/ML

Related Center Of Excellence

See all