Beta. Realtime speech-to-speech translation is available for testing in the Python and TypeScript SDKs. Event shapes, configuration options, audio formats, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK version you test against.
Overview
Speak (or stream a file) in one language and receive the translation as live text and synthesized speech over a single WebSocket. The session exposes a typed event dispatcher, a built-in microphone helper, and a forward-compatibleon_any subscription for events the server may add in future releases.
Key features:
- Translated text and audio β receive incremental translated text (
response.text.delta/response.text.done) and translated speech audio (response.audio.delta) as you speak. - Typed events β a single
ServerEventTypeenum with per-event typed payloads. - Microphone + file helpers β
Microphone(viasounddevice) andFileAudioSourceship with the SDK. - Easy extensibility β new server event = one enum entry + one payload type + one parser entry.
Prerequisites
Create an account
Sign up at CAMB.AI Studio if you havenβt already.
Get your API key
Go to Settings β API Keys in Studio and copy your key. See Authentication for details.
Install the SDK
Supported languages
source_language and target_language accept the BCP-47 tags below (case-insensitive). Pick any supported language as the source and any supported language as the target. See the WebSocket API reference for the authoritative list.
Supported realtime languages (14)
Supported realtime languages (14)
| Code | Language |
|---|---|
ar-ae | Arabic (United Arab Emirates) |
ar-eg | Arabic (Egypt) |
ar-sa | Arabic (Saudi Arabia) |
de-de | German (Germany) |
en-gb | English (United Kingdom) |
en-us | English (United States) |
es-es | Spanish (Spain) |
fr-ca | French (Canada) |
fr-fr | French (France) |
hi-in | Hindi (India) |
ja-jp | Japanese (Japan) |
ko-kr | Korean (Korea) |
pt-br | Portuguese (Brazil) |
zh-cn | Chinese (Mandarin, Simplified) |
Get Started
Create an API Key
Generate a key at CAMB.AI Studio and export it asCAMB_API_KEY for the snippets below.
Install
sounddevice ships with camb-sdk, so the Microphone and Speaker helpers work out of the box (on Linux you may need PortAudio, e.g. apt install libportaudio2). In Node, microphone capture uses node-record-lpcm16 and audio playback uses the SDKβs SoX-backed speaker β both need the host sox binary (e.g. brew install sox).
Quickstart (microphone)
Speak into your mic; the translated speech plays back through your speakers and the translated text prints as it arrives.Quickstart (file β file)
Useful on machines with no microphone (CI, servers). The input WAV must be 16-bit PCM, mono, 24 kHz; the translated audio is written to an output WAV.Feed the session clear speech. Music, silence, or noisy/low-quality audio may not be recognized by the speech model, in which case no transcript or translation is produced for that audio.
Events and Payloads
Supported events
All events are exposed through theServerEventType enum.
| Event | Wire type | Notes |
|---|---|---|
SESSION_STARTING | session.starting | Pipeline is booting (non-iris cold boot). Not yet ready for audio. |
SESSION_CREATED | session.created | Session is authorized and ready. wait_until_ready() resolves here. |
SESSION_UPDATED | session.updated | Echo of the active session configuration. |
TRANSCRIPT_COMPLETED | conversation.item.input_audio_transcription.completed | Final transcript of a user utterance (source language). |
TEXT_DELTA | response.text.delta | Incremental translated text; additive within one response. |
TEXT_DONE | response.text.done | Complete translated text for the current response. |
AUDIO_DELTA | response.audio.delta (or binary frame) | Chunk of synthesized translated speech (event.data is raw PCM16 bytes). |
AUDIO_DONE | response.audio.done | Current translated audio response is complete. |
ERROR | error | Server error, or a handler exception surfaced by the SDK. |
CLOSED | Closed | Synthetic β emitted by the SDK when the WebSocket closes. Carries code and reason. |
Translated text arrives via
TEXT_DELTA / TEXT_DONE and translated speech via AUDIO_DELTA. A source-language transcript (TRANSCRIPT_COMPLETED) is not emitted for every session, so donβt rely on it as your only signal.session.on_any(...) with the raw payload, so applications stay forward-compatible.
Subscribing to events
Configuration
| Option | Default | Description |
|---|---|---|
source_language | β (required) | BCP-47 tag of the input speech, e.g. en-us. Must be a supported language. |
target_language | β (required) | BCP-47 tag of the translation, e.g. de-de. Must be a supported language. |
output_modalities | ["text", "audio"] | Subset of text and audio. |
voice_id / voiceId | built-in voice | ID of one of your cloned voices to synthesize the translation with. Get it from voice_cloning.list_voices(). |
More Information
- Speech To Speech WebSocket reference β the underlying wire protocol and full event list.
- Python SDK Β· TypeScript SDK β the full SDK guides.
- Source:
cambai-python-sdk(examples/realtime_translation_microphone.py,examples/realtime_translation_file.py) Β·cambai-typescript-sdk(examples/realtime-translation-microphone.js,examples/realtime-translation-file.js).