Scribe API

The Scribe API delivers scalable, high-performance transcription across a broad range of media formats and use cases.

It enables organizations to convert audio and video into accurate text, and handle large archives in bulk or real-time interactions at scale.

Key features

Supports high-volume transcription from cloud storage
Accepts multiple input formats, including common audio and video formats
Handles call conversations, podcasts, and long-form media processing
Adds timestamps for each segment
Applies punctuation and formatting for human-readable output
Separates speakers when multiple people are present
Filters profanity when enabled
Supports short command-and-control scenarios through fast mode

Processing modes

Fast mode

Fast mode provides synchronous, low-latency transcription for individual files.

Processes one audio file at a time
Responses return immediately after the transcription completes
Works best for short recordings

Note: Fast mode can handle one (mono) or two (stereo) audio channels. The API returns either a single combined transcript or separate transcripts for each channel.

Example workflow

To convert an audio recording into searchable text on demand:

Use fast mode for near real-time transcription.
Your app uploads the audio file.
The backend generates a JWT with Build platform credentials.
The backend sends a transcription request.
The API returns a JSON transcript with timestamps.
Your app displays the transcript to the user.

For details, see Fast mode.

Batch mode

Batch mode provides asynchronous transcription for large or complex jobs.

Processes many files in a single request
Runs in the background — submit jobs and retrieve results when processing is complete

Batch mode is best for:

Long recordings
Large collections of files
Multi-speaker audio

Each audio file generates its own transcript in the corresponding output location.

Example workflow

To transcribe stored call recordings in S3:

Submit a batch job and specify the input folder in your bucket.
The job runs asynchronously.
The service writes transcripts to the specified output location.
Use batch job status endpoints or webhooks to monitor progress.
Retrieve per-file results when processing completes.

For details, see Batch mode.