Realtime Speech To Speech

Beta. Realtime speech-to-speech translation is available for testing in the Python and TypeScript SDKs. Event shapes, configuration options, audio formats, and error semantics may change in backwards-incompatible ways before GA. Pin to the SDK version you test against.

Overview

Speak (or stream a file) in one language and receive the translation as live text and synthesized speech over a single WebSocket. The session exposes a typed event dispatcher, a built-in microphone helper, and a forward-compatible on_any subscription for events the server may add in future releases. Key features:

Translated text and audio — receive incremental translated text (response.text.delta / response.text.done) and translated speech audio (response.audio.delta) as you speak.
Typed events — a single ServerEventType enum with per-event typed payloads.
Microphone + file helpers — Microphone (via sounddevice) and FileAudioSource ship with the SDK.
Easy extensibility — new server event = one enum entry + one payload type + one parser entry.

Audio is PCM16, mono, 24 kHz in both directions — the rate the realtime endpoint expects for input and emits for output.

Prerequisites

Create an account

Get your API key

Go to Settings → API Keys in Studio and copy your key. See Authentication for details.

Install the SDK

pip install camb-sdk

npm install @camb-ai/sdk

Skip this step if you’re using the direct API.

Set your API key to use in your code

export CAMB_API_KEY="your_api_key_here"

Supported languages

source_language and target_language accept the BCP-47 tags below (case-insensitive). Pick any supported language as the source and any supported language as the target. See the WebSocket API reference for the authoritative list.

Supported realtime languages (14)

Code	Language
`ar-ae`	Arabic (United Arab Emirates)
`ar-eg`	Arabic (Egypt)
`ar-sa`	Arabic (Saudi Arabia)
`de-de`	German (Germany)
`en-gb`	English (United Kingdom)
`en-us`	English (United States)
`es-es`	Spanish (Spain)
`fr-ca`	French (Canada)
`fr-fr`	French (France)
`hi-in`	Hindi (India)
`ja-jp`	Japanese (Japan)
`ko-kr`	Korean (Korea)
`pt-br`	Portuguese (Brazil)
`zh-cn`	Chinese (Mandarin, Simplified)

Get Started

Create an API Key

Generate a key at CAMB.AI Studio and export it as CAMB_API_KEY for the snippets below.

Install

pip install camb-sdk

npm install @camb-ai/sdk node-record-lpcm16

In Python, sounddevice ships with camb-sdk, so the Microphone and Speaker helpers work out of the box (on Linux you may need PortAudio, e.g. apt install libportaudio2). In Node, microphone capture uses node-record-lpcm16 and audio playback uses the SDK’s SoX-backed speaker — both need the host sox binary (e.g. brew install sox).

Quickstart (microphone)

Speak into your mic; the translated speech plays back through your speakers and the translated text prints as it arrives.

import asyncio
import os
import threading

import sounddevice as sd

from camb.client import CambAI
from camb.live_transcription import Microphone
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000  # PCM16 mono, both directions


class Speaker:
    """Plays raw PCM16 mono bytes through the default output device."""

    def __init__(self, sample_rate: int = SAMPLE_RATE) -> None:
        self._buf = bytearray()
        self._lock = threading.Lock()
        self._stream = sd.RawOutputStream(
            samplerate=sample_rate, channels=1, dtype="int16", callback=self._cb
        )

    def _cb(self, outdata, frames, time_info, status) -> None:
        want = len(outdata)
        with self._lock:
            take = min(want, len(self._buf))
            outdata[:take] = bytes(self._buf[:take])
            del self._buf[:take]
        if take < want:
            outdata[take:] = b"\x00" * (want - take)  # underrun → silence

    def start(self):
        self._stream.start()

    def feed(self, pcm: bytes):
        with self._lock:
            self._buf.extend(pcm)

    def close(self):
        self._stream.stop()
        self._stream.close()


async def main():
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
    )

    speaker = Speaker()

    @session.on(ServerEventType.TRANSCRIPT_COMPLETED)
    def _(event):
        print(f"\n[you]         {event.transcript}")

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        speaker.feed(event.data)

    async with session:
        await session.wait_until_ready()
        speaker.start()
        mic = Microphone(sample_rate=SAMPLE_RATE, chunk_size=SAMPLE_RATE // 10)
        try:
            await session.stream_audio(mic)
        finally:
            speaker.close()


asyncio.run(main())

import {
  CambClient,
  Microphone,
  RealtimeServerEventType,
  createSoxPcmSpeakerChecked,
} from "@camb-ai/sdk";

const SAMPLE_RATE = 24000; // PCM16 mono, both directions

const client = new CambClient({ apiKey: process.env.CAMB_API_KEY });
const session = await client.realtime.connect({
  sourceLanguage: "en-us",
  targetLanguage: "de-de",
});

// Plays raw PCM16 mono audio through SoX's `play` (install sox to hear it).
const speaker = await createSoxPcmSpeakerChecked({ sampleRate: SAMPLE_RATE });

session.on(RealtimeServerEventType.TranscriptCompleted, (event) =>
  console.log(`\n[you]         ${event.transcript}`),
);
session.on(RealtimeServerEventType.TextDone, (event) =>
  console.log(`[translation] ${event.text}`),
);
session.on(RealtimeServerEventType.AudioDelta, (event) =>
  speaker.feed(Buffer.from(event.data)),
);

await session.waitUntilReady();

const mic = Microphone.fromNode({ sampleRate: SAMPLE_RATE });
await mic.start();
await session.stream(mic);
await speaker.close();

Quickstart (file → file)

Useful on machines with no microphone (CI, servers). The input WAV must be 16-bit PCM, mono, 24 kHz; the translated audio is written to an output WAV.

import asyncio
import os
import wave

from camb.client import CambAI
from camb.live_transcription import FileAudioSource
from camb.realtime import ServerEventType

SAMPLE_RATE = 24000


async def main(in_path: str, out_path: str):
    client = CambAI(api_key=os.environ["CAMB_API_KEY"])
    session = await client.realtime.connect(
        source_language="en-us",
        target_language="de-de",
    )

    out_audio = bytearray()
    audio_done = asyncio.Event()

    @session.on(ServerEventType.TEXT_DONE)
    def _(event):
        print(f"[translation] {event.text}")

    @session.on(ServerEventType.AUDIO_DELTA)
    def _(event):
        out_audio.extend(event.data)

    @session.on(ServerEventType.AUDIO_DONE)
    def _(_):
        audio_done.set()

    async with session:
        await session.wait_until_ready()
        await session.stream_audio(FileAudioSource(in_path, real_time=True))
        try:
            await asyncio.wait_for(audio_done.wait(), timeout=30)
        except asyncio.TimeoutError:
            pass

    if out_audio:
        with wave.open(out_path, "wb") as out:
            out.setnchannels(1)
            out.setsampwidth(2)
            out.setframerate(SAMPLE_RATE)
            out.writeframes(bytes(out_audio))
        print(f"Wrote {len(out_audio) / (SAMPLE_RATE * 2):.1f}s to {out_path}")


asyncio.run(main("input_24k_mono.wav", "translated_output.wav"))

import fs from "node:fs";

import { CambClient, RealtimeServerEventType } from "@camb-ai/sdk";

const SAMPLE_RATE = 24000;
const sleep = (ms) => new Promise((r) => setTimeout(r, ms));

// Minimal 16-bit PCM mono WAV read/write.
function readWav(path) {
  const buf = fs.readFileSync(path);
  let offset = 12;
  while (offset < buf.length - 8) {
    const id = buf.toString("ascii", offset, offset + 4);
    const size = buf.readUInt32LE(offset + 4);
    if (id === "data") return buf.subarray(offset + 8, offset + 8 + size);
    offset += 8 + size;
  }
  throw new Error("no data chunk in WAV");
}

function writeWav(path, pcm, sampleRate) {
  const h = Buffer.alloc(44);
  h.write("RIFF", 0); h.writeUInt32LE(36 + pcm.length, 4); h.write("WAVE", 8);
  h.write("fmt ", 12); h.writeUInt32LE(16, 16); h.writeUInt16LE(1, 20);
  h.writeUInt16LE(1, 22); h.writeUInt32LE(sampleRate, 24);
  h.writeUInt32LE(sampleRate * 2, 28); h.writeUInt16LE(2, 32); h.writeUInt16LE(16, 34);
  h.write("data", 36); h.writeUInt32LE(pcm.length, 40);
  fs.writeFileSync(path, Buffer.concat([h, pcm]));
}

const client = new CambClient({ apiKey: process.env.CAMB_API_KEY });
const session = await client.realtime.connect({
  sourceLanguage: "en-us",
  targetLanguage: "de-de",
});

const outChunks = [];
let resolveDone;
const audioDone = new Promise((r) => (resolveDone = r));

session.on(RealtimeServerEventType.TextDone, (event) =>
  console.log(`[translation] ${event.text}`),
);
session.on(RealtimeServerEventType.AudioDelta, (event) =>
  outChunks.push(Buffer.from(event.data)),
);
session.on(RealtimeServerEventType.AudioDone, () => resolveDone());

await session.waitUntilReady();

// Input WAV must be 16-bit PCM, mono, 24 kHz. Stream at real-time pace.
const pcm = readWav("input_24k_mono.wav");
const chunkSize = Math.floor(SAMPLE_RATE * 2 * 0.1); // 100 ms
for (let i = 0; i < pcm.length; i += chunkSize) {
  await session.sendAudio(pcm.subarray(i, i + chunkSize));
  await sleep(100);
}

await Promise.race([audioDone, sleep(30_000)]);
await session.close();

writeWav("translated_output.wav", Buffer.concat(outChunks), SAMPLE_RATE);

Re-encode any source file to the required format with:

ffmpeg -i input.wav -ar 24000 -ac 1 -sample_fmt s16 input_24k_mono.wav

Feed the session clear speech. Music, silence, or noisy/low-quality audio may not be recognized by the speech model, in which case no transcript or translation is produced for that audio.

Events and Payloads

Supported events

All events are exposed through the ServerEventType enum.

Event	Wire `type`	Notes
`SESSION_STARTING`	`session.starting`	Pipeline is booting (non-`iris` cold boot). Not yet ready for audio.
`SESSION_CREATED`	`session.created`	Session is authorized and ready. `wait_until_ready()` resolves here.
`SESSION_UPDATED`	`session.updated`	Echo of the active session configuration.
`TRANSCRIPT_COMPLETED`	`conversation.item.input_audio_transcription.completed`	Final transcript of a user utterance (source language).
`TEXT_DELTA`	`response.text.delta`	Incremental translated text; additive within one response.
`TEXT_DONE`	`response.text.done`	Complete translated text for the current response.
`AUDIO_DELTA`	`response.audio.delta` (or binary frame)	Chunk of synthesized translated speech (`event.data` is raw PCM16 bytes).
`AUDIO_DONE`	`response.audio.done`	Current translated audio response is complete.
`ERROR`	`error`	Server error, or a handler exception surfaced by the SDK.
`CLOSED`	`Closed`	Synthetic — emitted by the SDK when the WebSocket closes. Carries `code` and `reason`.

Translated text arrives via TEXT_DELTA / TEXT_DONE and translated speech via AUDIO_DELTA. A source-language transcript (TRANSCRIPT_COMPLETED) is not emitted for every session, so don’t rely on it as your only signal.

Catch-all subscription. A future server event the SDK doesn’t model yet is still delivered to any handler registered via session.on_any(...) with the raw payload, so applications stay forward-compatible.

Subscribing to events

@session.on(ServerEventType.TEXT_DELTA)
def on_text(event):
    print(event.delta, end="", flush=True)

@session.on(ServerEventType.ERROR)
def on_error(err):
    print("error:", err.message)

# Forward-compat: receive every event, including ones added later.
@session.on_any
def on_any(event_type, payload):
    print(event_type, payload)

session.on(RealtimeServerEventType.TextDelta, (event) => {
  process.stdout.write(event.delta);
});

session.on(RealtimeServerEventType.Error, (err) => {
  console.error("error:", err.message);
});

// Forward-compat: receive every event, including ones added later.
session.onAny((eventType, payload) => {
  console.log(eventType, payload);
});

Configuration

Option	Default	Description
`source_language`	— (required)	BCP-47 tag of the input speech, e.g. `en-us`. Must be a supported language.
`target_language`	— (required)	BCP-47 tag of the translation, e.g. `de-de`. Must be a supported language.
`output_modalities`	`["text", "audio"]`	Subset of `text` and `audio`.
`voice_id` / `voiceId`	built-in voice	ID of one of your cloned voices to synthesize the translation with. Get it from `voice_cloning.list_voices()`.

session = await client.realtime.connect(
    source_language="en-us",
    target_language="es-es",
    output_modalities=["text", "audio"],
    voice_id=147320,  # optional: one of your cloned voices
)

const session = await client.realtime.connect({
  sourceLanguage: "en-us",
  targetLanguage: "es-es",
  outputModalities: ["text", "audio"],
  voiceId: 147320, // optional: one of your cloned voices
});

Voice selection. Translated speech uses a built-in voice for the target language by default. Pass voice_id (voiceId in TypeScript) to synthesize it with one of your cloned voices instead. For the most natural-sounding results, choose a voice whose reference language matches target_language.

More Information

Speech To Speech WebSocket reference — the underlying wire protocol and full event list.
Python SDK · TypeScript SDK — the full SDK guides.
Source: cambai-python-sdk (examples/realtime_translation_microphone.py, examples/realtime_translation_file.py) · cambai-typescript-sdk (examples/realtime-translation-microphone.js, examples/realtime-translation-file.js).

Getting Started

Models

Tutorials

SDK Guides

Hosting Platforms

Integrations

API Reference

Other Products

Release Logs

Realtime Speech To Speech

Overview

Prerequisites

Supported languages

Get Started

Create an API Key

Install

Quickstart (microphone)

Quickstart (file → file)

Events and Payloads

Supported events

Subscribing to events

Configuration

More Information

​Overview

​Prerequisites

​Supported languages

​Get Started

​Create an API Key

​Install

​Quickstart (microphone)

​Quickstart (file → file)

​Events and Payloads

​Supported events

​Subscribing to events

​Configuration

​More Information

Overview

Prerequisites

Supported languages

Get Started

Create an API Key

Install

Quickstart (microphone)

Quickstart (file → file)

Events and Payloads

Supported events

Subscribing to events

Configuration

More Information