Speech to Text,
perfected for India

Question 1

What languages are supported?

Answer

Sarvam Speech to Text supports 22 Indian languages including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English (Indian accent). The API supports automatic language detection and handles code-mixed audio seamlessly.

Question 2

What is Saaras v3?

Answer

Saaras v3 is our latest speech-to-text model that auto-detects the spoken language and provides accurate transcription across all 22 supported Indian languages. It handles code-mixed audio seamlessly and is optimized for both real-time and batch processing.

Question 3

What API types are available?

Answer

We offer three variants: REST API for files under 30 seconds with synchronous processing, Batch API for files up to 1 hour with speaker diarization and timestamps, and Streaming API for real-time transcription via WebSocket.

Question 4

What audio formats are supported?

Answer

We support 10+ formats including MP3, WAV, AAC, OGG, Opus, FLAC, M4A, AMR, WMA, and WebM. The Streaming API supports WAV and raw PCM formats at 16kHz sample rate.

Question 5

Does it support speaker diarization?

Answer

Yes, our Batch API supports speaker diarization—it identifies and labels different speakers in your audio. This is ideal for meeting transcriptions, interviews, and call center analytics where you need to know who said what.

Speech to Text,
perfected for India

Speak to see live captions

Flexible output Same audio,
multiple formats

Transcribe

Translate

Transliteration

Verbatim

Turn speech into text you can trust

Seamless code-mixing

Telephony-optimized

Handle noisy audio

Powering real-world
voice solutions

Voice agents

Made for developers. Scales for enterprises

22 Indian languages with automatic detection

Streaming & batch APIs

Speaker diarization

Domain prompting

Plug & play integrations

Your questions, answered

What languages are supported?

What is Saaras v3?

What API types are available?

What audio formats are supported?

Does it support speaker diarization?

Speech to Text, perfected for India

Speak to see live captions

Flexible output Same audio, multiple formats

Transcribe

Translate

Transliteration

Verbatim

Turn speech into text you can trust

Seamless code-mixing

Telephony-optimized

Handle noisy audio

Powering real-world voice solutions

Voice agents

Made for developers. Scales for enterprises

22 Indian languages with automatic detection

Streaming & batch APIs

Speaker diarization

Domain prompting

Plug & play integrations

Your questions, answered

What languages are supported?

What is Saaras v3?

What API types are available?

What audio formats are supported?

Does it support speaker diarization?

Speech to Text,
perfected for India

Flexible output Same audio,
multiple formats

Powering real-world
voice solutions