Sarvam AI
Sarvam Motif

Speech to Text, perfected for India

Trust every transcript, even with background noise, multiple accents, or mid-sentence language switches.

Speak to see live captions

Flexible output Same audio,
multiple formats

Transcribe

With formatting and number normalization.

Output
मेरा फोन नंबर है 9840950950

Translate

From Indic languages to English.

Output
My phone number is 9840950950

Transliteration

Indian languages written in English letters.

Output
mera phone number hai 9840950950

Verbatim

Preserves fillers and spoken numbers.

Output
मेरा फोन नंबर है नौ आठ चार zero नौ पांच zero नौ पांच zero

Turn speech into text you can trust

Seamless code-mixing

00:00

HCG MCC Hospital ஒரு ground breaking achievement பண்ணிட்டாங்க.

Telephony-optimized

00:00

नमस्कार डब्ल्यू सी बैंक में संपर्क करने के लिए आपका धन्यवाद

Handle noisy audio

00:00

अच्छा, मौजूदा सरकार का कामकाज आपको कैसा लगता है?

Powering real-world
voice solutions

Voice agents

Real-time transcription for live voice agents and customer interactions.

Customer support

Sales & lead qualification

Edtech tutors

Social & companion bots

0:00

Voice agents

Made for developers. Scales for enterprises

22 Indian languages with automatic detection

Comprehensive coverage across all major Indian languages with seamless code-mixing support.

Streaming & batch APIs

Real-time for voice agents, batch for analytics.

Speaker diarization

Differentiate speakers in conversations.

Domain prompting

Boost accuracy for specialized vocabulary.

Plug & play integrations

LiveKit / Pipecat: deploy a voice agent in under 10 minutes.

<250ms

Median latency

100M+

Mins transcribed

>99.5%

Uptime

Your questions, answered

Sarvam Speech to Text supports 22 Indian languages including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English (Indian accent). The API supports automatic language detection and handles code-mixed audio seamlessly.
Saaras v3 is our latest speech-to-text model that auto-detects the spoken language and provides accurate transcription across all 22 supported Indian languages. It handles code-mixed audio seamlessly and is optimized for both real-time and batch processing.
We offer three variants: REST API for files under 30 seconds with synchronous processing, Batch API for files up to 1 hour with speaker diarization and timestamps, and Streaming API for real-time transcription via WebSocket.
We support 10+ formats including MP3, WAV, AAC, OGG, Opus, FLAC, M4A, AMR, WMA, and WebM. The Streaming API supports WAV and raw PCM formats at 16kHz sample rate.
Yes, our Batch API supports speaker diarization—it identifies and labels different speakers in your audio. This is ideal for meeting transcriptions, interviews, and call center analytics where you need to know who said what.

Start transcribing in minutes