Skip to main content
Beta. Realtime speech-to-speech translation is available for testing in the Python and TypeScript SDKs. Event shapes, configuration options, audio formats, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK version you test against.

Overview

Speak (or stream a file) in one language and receive the translation as live text and synthesized speech over a single WebSocket. The session exposes a typed event dispatcher, a built-in microphone helper, and a forward-compatible on_any subscription for events the server may add in future releases. Key features:
  • Translated text and audio β€” receive incremental translated text (response.text.delta / response.text.done) and translated speech audio (response.audio.delta) as you speak.
  • Typed events β€” a single ServerEventType enum with per-event typed payloads.
  • Microphone + file helpers β€” Microphone (via sounddevice) and FileAudioSource ship with the SDK.
  • Easy extensibility β€” new server event = one enum entry + one payload type + one parser entry.
Audio is PCM16, mono, 24 kHz in both directions β€” the rate the realtime endpoint expects for input and emits for output.

Prerequisites

1

Create an account

Sign up at CAMB.AI Studio if you haven’t already.
2

Get your API key

Go to Settings β†’ API Keys in Studio and copy your key. See Authentication for details.
3

Install the SDK

pip install camb-sdk
Skip this step if you’re using the direct API.
4

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Supported languages

source_language and target_language accept the BCP-47 tags below (case-insensitive). Pick any supported language as the source and any supported language as the target. See the WebSocket API reference for the authoritative list.
CodeLanguage
ar-aeArabic (United Arab Emirates)
ar-egArabic (Egypt)
ar-saArabic (Saudi Arabia)
de-deGerman (Germany)
en-gbEnglish (United Kingdom)
en-usEnglish (United States)
es-esSpanish (Spain)
fr-caFrench (Canada)
fr-frFrench (France)
hi-inHindi (India)
ja-jpJapanese (Japan)
ko-krKorean (Korea)
pt-brPortuguese (Brazil)
zh-cnChinese (Mandarin, Simplified)

Get Started

Create an API Key

Generate a key at CAMB.AI Studio and export it as CAMB_API_KEY for the snippets below.

Install

pip install camb-sdk
In Python, sounddevice ships with camb-sdk, so the Microphone and Speaker helpers work out of the box (on Linux you may need PortAudio, e.g. apt install libportaudio2). In Node, microphone capture uses node-record-lpcm16 and audio playback uses the SDK’s SoX-backed speaker β€” both need the host sox binary (e.g. brew install sox).

Quickstart (microphone)

Speak into your mic; the translated speech plays back through your speakers and the translated text prints as it arrives.
import asyncio
import os
import threading

import sounddevice as sd

from camb.client import CambAI
from camb.live_transcription import Microphone
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000  # PCM16 mono, both directions


class Speaker:
    """Plays raw PCM16 mono bytes through the default output device."""

    def __init__(self, sample_rate: int = SAMPLE_RATE) -> None:
        self._buf = bytearray()
        self._lock = threading.Lock()
        self._stream = sd.RawOutputStream(
            samplerate=sample_rate, channels=1, dtype="int16", callback=self._cb
        )

    def _cb(self, outdata, frames, time_info, status) -> None:
        want = len(outdata)
        with self._lock:
            take = min(want, len(self._buf))
            outdata[:take] = bytes(self._buf[:take])
            del self._buf[:take]
        if take < want:
            outdata[take:] = b"\x00" * (want - take)  # underrun β†’ silence

    def start(self):
        self._stream.start()

    def feed(self, pcm: bytes):
        with self._lock:
            self._buf.extend(pcm)

    def close(self):
        self._stream.stop()
        self._stream.close()


async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
    )

    speaker = Speaker()

    @session.on(ServerEventType.TRANSCRIPT_COMPLETED)
    def _(event):
        print(f"\n[you]         {event.transcript}")

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        speaker.feed(event.data)

    async with session:
        await session.wait_until_ready()
        speaker.start()
        mic = Microphone(sample_rate=SAMPLE_RATE, chunk_size=SAMPLE_RATE // 10)
        try:
            await session.stream_audio(mic)
        finally:
            speaker.close()


asyncio.run(main())

Quickstart (file β†’ file)

Useful on machines with no microphone (CI, servers). The input WAV must be 16-bit PCM, mono, 24 kHz; the translated audio is written to an output WAV.
import asyncio
import os
import wave

from camb.client import CambAI
from camb.live_transcription import FileAudioSource
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000


async def main(in_path: str, out_path: str):
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
    )

    out_audio = bytearray()
    audio_done = asyncio.Event()

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        out_audio.extend(event.data)

    @session.on(ServerEventType.AUDIO_DONE)
    def _(_):
        audio_done.set()

    async with session:
        await session.wait_until_ready()
        await session.stream_audio(FileAudioSource(in_path, real_time=True))
        try:
            await asyncio.wait_for(audio_done.wait(), timeout=30)
        except asyncio.TimeoutError:
            pass

    if out_audio:
        with wave.open(out_path, "wb") as out:
            out.setnchannels(1)
            out.setsampwidth(2)
            out.setframerate(SAMPLE_RATE)
            out.writeframes(bytes(out_audio))
        print(f"Wrote {len(out_audio) / (SAMPLE_RATE * 2):.1f}s to {out_path}")


asyncio.run(main("input_24k_mono.wav", "translated_output.wav"))
Re-encode any source file to the required format with:
ffmpeg -i input.wav -ar 24000 -ac 1 -sample_fmt s16 input_24k_mono.wav
Feed the session clear speech. Music, silence, or noisy/low-quality audio may not be recognized by the speech model, in which case no transcript or translation is produced for that audio.

Events and Payloads

Supported events

All events are exposed through the ServerEventType enum.
EventWire typeNotes
SESSION_STARTINGsession.startingPipeline is booting (non-iris cold boot). Not yet ready for audio.
SESSION_CREATEDsession.createdSession is authorized and ready. wait_until_ready() resolves here.
SESSION_UPDATEDsession.updatedEcho of the active session configuration.
TRANSCRIPT_COMPLETEDconversation.item.input_audio_transcription.completedFinal transcript of a user utterance (source language).
TEXT_DELTAresponse.text.deltaIncremental translated text; additive within one response.
TEXT_DONEresponse.text.doneComplete translated text for the current response.
AUDIO_DELTAresponse.audio.delta (or binary frame)Chunk of synthesized translated speech (event.data is raw PCM16 bytes).
AUDIO_DONEresponse.audio.doneCurrent translated audio response is complete.
ERRORerrorServer error, or a handler exception surfaced by the SDK.
CLOSEDClosedSynthetic β€” emitted by the SDK when the WebSocket closes. Carries code and reason.
Translated text arrives via TEXT_DELTA / TEXT_DONE and translated speech via AUDIO_DELTA. A source-language transcript (TRANSCRIPT_COMPLETED) is not emitted for every session, so don’t rely on it as your only signal.
Catch-all subscription. A future server event the SDK doesn’t model yet is still delivered to any handler registered via session.on_any(...) with the raw payload, so applications stay forward-compatible.

Subscribing to events

@session.on(ServerEventType.TEXT_DELTA)
def on_text(event):
    print(event.delta, end="", flush=True)

@session.on(ServerEventType.ERROR)
def on_error(err):
    print("error:", err.message)

# Forward-compat: receive every event, including ones added later.
@session.on_any
def on_any(event_type, payload):
    print(event_type, payload)

Configuration

OptionDefaultDescription
source_languageβ€” (required)BCP-47 tag of the input speech, e.g. en-us. Must be a supported language.
target_languageβ€” (required)BCP-47 tag of the translation, e.g. de-de. Must be a supported language.
output_modalities["text", "audio"]Subset of text and audio.
voice_id / voiceIdbuilt-in voiceID of one of your cloned voices to synthesize the translation with. Get it from voice_cloning.list_voices().
session = await client.realtime.connect(
    source_language="en-us",
    target_language="es-es",
    output_modalities=["text", "audio"],
    voice_id=147320,  # optional: one of your cloned voices
)
Voice selection. Translated speech uses a built-in voice for the target language by default. Pass voice_id (voiceId in TypeScript) to synthesize it with one of your cloned voices instead. For the most natural-sounding results, choose a voice whose reference language matches target_language.

More Information