Video SDK adds support for Realtime Media Streams (RTMS)

We're excited to announce Realtime Media Streams (RTMS) is now available for the Video SDK. RTMS offers AI/ML workloads a simplified method of accessing session data server-side. Learn more about RTMS in our RTMS launch blog and our RTMS developer docs for Video SDK.

Video SDK developers will be able to access per-participant media streams over WebSockets. As part of the inital release, you can start building with participant audio data and text transcripts today.

Accessing Video SDK data with RTMS

Context is at the core of AI applications. With RTMS, you can instantly access per-participant audio streams over WebSockets and process them to power AI/ML workloads.

Access to server-side per-participant audio and transcript data enables you to focus on building value-added features rather than managing complex infrastructure. Get started with this blog where we showcase how to access real-time audio and transcript data from your server:

Simplified access to session data

We see enterprises leveraging RTMS to build solutions that transform how they engage with customers:

  • Real-time coaching assistance: Instantly process session transcripts on your server to deliver real-time coaching for service providers. Help them identify the next-best action and prompt for resources based on the conversation.

  • Comprehensive customer engagement insights: Gain a holistic view of each customer interaction by leveraging transcript diarization and active speaker detection. Access precise audio and transcript data complete with timestamps and participant identification to create a detailed timeline of every video session.

  • Sentiment analysis: Analyze entire conversations in real time to understand customer sentiment and engagement levels, enabling deeper insights into customer experiences and satisfaction.

Let's Build

To get started with RTMS for Video SDK, you need to:

Enable RTMS for your Video SDK app

  1. Sign into the Zoom App Marketplace with your Video SDK credentials
  2. Navigate to DevelopBuild Video SDK
  3. Under Add feature, enable Event Subscriptions
  4. Configure your subscription:
    • Add a descriptive name for your subscription
    • Add the RTMS Started and RTMS Stopped events
  5. Set your Event notification endpoint URL - this is where Zoom will send webhook events when RTMS sessions start and stop
  6. Save your configuration

Start RTMS for a session

Now that the server is configured, you can use the REST API to start the RTMS streams:

fetch(`https://api.zoom.us/v2/videosdk/sessions/${sessionId}/rtms_app/status`, {
    method: "PATCH",
    headers: {
        "Content-Type": "application/json",
        Authorization: "Bearer YOUR_SECRET_TOKEN",
    },
    body: JSON.stringify({
        action: "start",
    }),
});

Alternatively, you can also use the Video SDK RealTimeMediaStreamsClient object to start/stop the RTMS streams.

Accessing the RTMS data

Once you've started the RTMS session, you can use the RTMS SDK to access the session data on your server:

import rtms from "@zoom/rtms";
rtms.onWebhookEvent(({ payload }) => {
    const client = new rtms.Client();
    client.onTranscriptData((data, size, timestamp, metadata) =>
        console.log(`${metadata.userName}: ${data}`),
    );
    rtms.join(payload);
});

Once you've setup the ZM_RTMS_CLIENT and ZM_RTMS_SECRET environment variables with your Video SDK Key and Secret, this will start receiving transcript data for the session. You can also access other session data like user audio data. Check out our blog post showcasing how to transcribe Video SDK sessions locally to learn how you can process audio streams from RTMS.

We have a suite of sample apps to get you started quickly. Check out our blog post showcasing how to transcribe Video SDK sessions to learn how you can process audio streams from RTMS. If you're ready to start building, head over to the RTMS docs.