Introducing Zoom AI Services
Audio, documents, and images are some of the most information-dense inputs in modern applications. Extracting reliable, structured intelligence from that data is still hard. Getting from a media file to accurate text, searchable entities, summaries, or translations usually means stitching together models, infrastructure, storage, queuing, retries, monitoring, and cost controls.
Today we’re announcing Zoom AI Services: a suite of enterprise-grade intelligence APIs for speech, language and vision processing, built on the same technology that powers AI features across Zoom and processes millions of media hours daily.
What are Zoom AI Services?
Zoom AI Services are designed for developers building products that need to process their own data and want production-ready quality, predictable operations, and a pricing model that scales with usage. Zoom AI Services will be available as APIs that enable your system to request AI processing and receive structured results — giving you the outputs you need without having to manage the underlying model infrastructure.
By using AI Services, your team can focus on delivering product value while Zoom handles the infrastructure. You pay only for what you use with simple, consumption-based pricing. The platform is built for enterprise-grade reliability and high-volume workloads, and your application continuously benefits from model improvements without requiring any re-platforming.
Built for Both Real-Time and Large-Scale Workloads
Many AI tasks are either interactive (you want an answer now) or operational (you want to process lots of data reliably). Zoom AI Services is designed to support both patterns:
- Fast Mode: request–response for quick workloads you want immediately. For example, translating a single review, transcribing a short audio clip for notes, summarizing a single ticket.
- Batch Mode: submit larger jobs and let them run in the background with resilient processing. For example, transcribing thousands of recordings, translating an entire knowledge base, summarizing an archive of transcripts.
Zoom AI Services will cover multiple modalities so you can build end-to-end processing pipelines. Today, we’re launching the first service in this suite.
Transcription with the Zoom Scribe API
Zoom Scribe API delivers scalable, high-performance transcription across a broad range of media formats and use cases. It enables organizations to convert audio and video into accurate text, and handle large archives in bulk or real-time interactions at scale. The Zoom Scribe model is ranked #1 on the Open ASR Leaderboard making it the most accurate for speech recognition.
Use the Zoom Scribe API to convert audio files into accurate, high-quality text to build workflows such as post-call summaries and ticket enrichment, compliance and audit logging, searchable archives for podcasts and webinars and more.
Get started today
Sign up for a Zoom developer account on the Build platform. We have an example project on GitHub to get you started with the Scribe API. Read more about the Scribe API in our blog and documentation.