# Load environment variables from .env

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

In this guide, you'll build a server that uses WebSockets to get Zoom Video SDK session audio from Realtime Media Streams (RTMS).

Find the full code on [GitHub](https://github.com/zoom/rtms-samples/tree/main/video-sdk).

The server will:

1. Listen for incoming webhook events `session.rtms_started` and `session.rtms_stopped`
2. Generate a signature for handshake requests
3. Connect to the WebSocket endpoint for the session
4. Receive session audio in real time

## Prerequisites

1. Zoom Video SDK Universal Credit Account
2. RTMS enabled on your account
3. Have a Zoom Video SDK app in the [Zoom Marketplace](https://marketplace.zoom.us/develop/create)
4. Subscribe to RTMS webhook events

## Get started

First, create a new Node.js project and install [express](https://expressjs.com/), [dotenv](https://www.npmjs.com/package/dotenv), and [ws](https://www.npmjs.com/package/ws) as dependencies.

```shell
npm init -y
npm install express dotenv ws
```

Next, we'll create a basic server on `localhost:3000`. Create a new file named `index.js` and add the following code to it:

```javascript
import express from "express";
import dotenv from "dotenv";
import WebSocket from "ws";

// Load environment variables from .env
dotenv.config();

const app = express();

// Enable JSON body parsing
app.use(express.json());

// Basic root route for testing
app.get("/", (req, res) => {
    res.send("Zoom RTMS Server is up and running.");
});

// Listen on localhost:3000
const PORT = 3000;
app.listen(PORT, () => {
    console.log(`Server is listening on http://localhost:${PORT}`);
});
```

First, create a new Python project and install [Flask](https://flask.palletsprojects.com/en/stable/), [python-dotenv](https://pypi.org/project/python-dotenv/), and [websockets](https://websockets.readthedocs.io/en/stable/) as dependencies.

Create a `requirements.txt` file and add the following.

**requirements.txt sample**

```plaintext
Flask
python-dotenv
websockets
```

Install the dependencies

```shell
pip install -r requirements.txt
```

Next, we'll create a basic server on `localhost:3000`. Create a new file named `main.py` and add the following code to it.

```python
from flask import Flask, request, jsonify
import os
from dotenv import load_dotenv
import asyncio
import websockets
import json
import hmac
import hashlib
import threading

# Load environment variables from .env
load_dotenv()

app = Flask(__name__)

# Basic root route for testing
@app.route('/')
def home():
    return 'Zoom RTMS Server is up and running.'

# Listen on localhost:3000
if __name__ == '__main__':
    PORT = 3000
    print(f"Server is listening on http://localhost:{PORT}")
    app.run(host='localhost', port=PORT, debug=True)
```

### Setup environment variables

Create a `.env` file in your project root. Add the following environment variables:

**.env sample**

```ini
ZOOM_CLIENT_ID=your_client_id
ZOOM_CLIENT_SECRET=your_client_secret
PORT=3000
```

Get your `ZOOM_CLIENT_ID` and `ZOOM_CLIENT_SECRET` from a Zoom Video SDK app.

## Build the webhook receiver

When a RTMS session starts or stops, your app will receive a webhook event with the following payloads.

When a stream starts: [`session.rtms_started`](/docs/api/rtms/events/)

**session.rtms_started sample**

```json
{
    "event": "session.rtms_started",
    "event_ts": 1626230691572,
    "payload": {
        "account_id": "xxxxxxxxxx",
        "session_id": "xxxxxxxxxx",
        "session_key": "xxxxxxxxxx",
        "rtms_stream_id": "xxxxxxxxxx",
        "server_urls": "wss://127.0.0.1:443"
    }
}
```

When a stream stops: [`session.rtms_stopped`](/docs/api/rtms/events/)

**session.rtms_stopped sample**

```json
{
    "event": "session.rtms_stopped",
    "event_ts": 1732313171881,
    "payload": {
        "session_id": "xxxxxxxxxxxxxx",
        "session_key": "xxxxxxxxxxxxxx",
        "rtms_stream_id": "xxxxxxxxxxc",
        "stop_reason": 6
    }
}
```

To connect to the stream, our app needs the `session_id`, `rtms_stream_id`, and `server_urls` from the payload.

To handle these webhook events, we'll build a simple webhook receiver and create a `/webhook` route to receive the POST requests from our event subscriptions.

Add the following code to `index.js`.

```javascript
app.use(express.json());

app.post("/webhook", (req, res) => {
    const { event, payload } = req.body;
    console.log("Webhook received:", event);
    console.log("Payload:", JSON.stringify(payload, null, 2));
    res.sendStatus(200);
});
```

Next we will handle the `session.rtms_started` and `session.rtms_stopped` events.

When we receive the `session.rtms_started` event, we extract the session details to open a signaling WebSocket connection to start the RTMS handshake.

```javascript
// Handle RTMS start event
if (event === "session.rtms_started") {
    const { session_id, rtms_stream_id, server_urls } = payload;
    console.log(`Starting RTMS for Video session ${session_id}`);
    // Connect to signaling WebSocket to establish RTMS connection
    connectToSignalingWebSocket(session_id, rtms_stream_id, server_urls);
}

// Handle RTMS stop event
if (event === "session.rtms_stopped") {
    const { session_id } = payload;
    console.log(`Stopping RTMS for Video session ${session_id}`);
}
```

Put together, the code for the webhook receiver looks like this:

```javascript
app.post("/webhook", (req, res) => {
    const { event, payload } = req.body;

    // Handle RTMS start event
    if (event === "session.rtms_started") {
        const { session_id, rtms_stream_id, server_urls } = payload;
        console.log(`Starting RTMS for session ${session_id}`);
        // Connect to signaling WebSocket to establish RTMS connection
        connectToSignalingWebSocket(session_id, rtms_stream_id, server_urls);
        // Handle RTMS stop event
    } else if (event === "session.rtms_stopped") {
        const { session_id } = payload;
        console.log(`Stopping RTMS for Video session ${session_id}`);
    } else {
        console.log("Unknown event:", event);
    }
    res.sendStatus(200);
});
```

Add the following code to `main.py`.

```python
@app.route('/webhook', methods=['POST'])
def webhook():
    data = request.get_json()
    event = data.get('event')
    payload = data.get('payload', {})

    print(f'Webhook received: {event}')
    print(f'Payload: {json.dumps(payload, indent=2)}')

    return '', 200
```

Next we will handle the `session.rtms_started` and `session.rtms_stopped` events.

When we receive the `session.rtms_started` event, we extract the session details to open a signaling WebSocket connection to start the RTMS handshake.

```python
# Handle RTMS start event
if event == 'session.rtms_started':
    session_id = payload.get('session_id')
    rtms_stream_id = payload.get('rtms_stream_id')
    server_urls = payload.get('server_urls')
    print(f"Starting RTMS for Video session {session_id}")
    # Connect to signaling WebSocket to establish RTMS connection
    threading.Thread(target=lambda: asyncio.run(
        connect_to_signaling_websocket(session_id, rtms_stream_id, server_urls)
    )).start()

if event == 'session.rtms_stopped':
    session_id = payload.get('session_id')
    print(f"Stopping RTMS for Video session {session_id}")
```

Put together, the code for the webhook receiver looks like this:

```python
@app.route('/webhook', methods=['POST'])
def webhook():
    data = request.get_json()
    event = data.get('event')
    payload = data.get('payload')

    print(f'Webhook received: {event}')
    print(f'Payload: {json.dumps(payload, indent=2)}')

    # Handle session started event
if event == 'session.started':
    print('session started, initiating RTMS...')
    meeting_object = payload.get('object')
    session_id = meeting_object.get('uuid')

    try:
        # Get access token
        access_token = generate_access_token()

        # Make API call to start RTMS
        start_rtms(session_id, access_token)

        print(f'RTMS started for session {session_id}')

        # Schedule automatic RTMS stop after 10 seconds
        schedule_rtms_stop(session_id, access_token)
    except Exception as error:
        print(f'Error starting RTMS: {error}')

    return '', 200
```

## Create the signature generator

Next, we will create a function to generate the signature for the signaling WebSocket connection using HMAC SHA256. This will be used to authenticate the handshake request to the signaling server.

Add the following code to `index.js`.

```javascript
import crypto from "crypto";

function generateSignature(session_id, rtmsStreamId) {
    const message = `${process.env.ZOOM_CLIENT_ID},${session_id},${rtmsStreamId}`;
    const signature = crypto
        .createHmac("sha256", process.env.ZOOM_CLIENT_SECRET)
        .update(message)
        .digest("hex");

    console.log(`Generated signature: ${signature}`);
    return signature;
}
```

Add the following code to `main.py`.

```python
def generate_signature(session_id, rtms_stream_id):
    message = f"{os.getenv('ZOOM_CLIENT_ID')},{session_id},{rtms_stream_id}"
    signature = hmac.new(
        os.getenv('ZOOM_CLIENT_SECRET').encode(),
        message.encode(),
        hashlib.sha256
    ).hexdigest()

    print(f'Generated signature: {signature}')
    return signature
```

This helper returns the computed signature string.

## Connect to the signaling server with WebSockets

Next, we'll use the signature inside a `connectToSignalingWebSocket()` function to establish the signaling connection. The [signaling handshake request](/docs/rtms/event-reference/#signaling-handshake-request) passes in the `session_id` and `rtms_stream_id` from the `session.rtms_started` event and includes fields like `msg_type`, `protocol_version` and `sequence` as required.

Add the following code to `index.js`.

```javascript
function connectToSignalingWebSocket(session_id, rtmsStreamId, serverUrls) {
    const signalingWs = new WebSocket(serverUrls);

    signalingWs.on("open", () => {
        console.log(`Signaling WebSocket opened for session ${session_id}`);

        const signature = generateSignature(session_id, rtmsStreamId);

        const handshakeMsg = {
            msg_type: 1, // SIGNALING_HAND_SHAKE_REQ
            meeting_uuid: session_id, // share signaling server with Zoom Meeting
            rtms_stream_id: rtmsStreamId,
            signature,
        };

        console.log("Sending handshake message:", handshakeMsg);
        signalingWs.send(JSON.stringify(handshakeMsg));
    });

    signalingWs.on("error", (error) => {
        console.error("Signaling WebSocket error:", error);
    });

    signalingWs.on("close", (code, reason) => {
        console.log("Signaling WebSocket closed:", code, reason);
    });
}
```

Add the following code to `main.py`.

```python
def connect_to_signaling_websocket(session_id, rtms_stream_id, server_urls):
    def on_open(ws):
        print(f'Signaling WebSocket opened for session {session_id}')

        signature = generate_signature(session_id, rtms_stream_id)

        handshake_msg = {
            'msg_type': 1,  # SIGNALING_HAND_SHAKE_REQ
            'meeting_uuid': session_id, # share signaling server with Zoom Meeting
            'rtms_stream_id': rtms_stream_id,
            'signature': signature
        }

        print(f'Sending handshake message: {handshake_msg}')
        ws.send(json.dumps(handshake_msg))

    signaling_ws = websocket.WebSocketApp(server_urls, on_open=on_open)
    # Start WebSocket in a separate thread
    threading.Thread(target=signaling_ws.run_forever).start()
```

This function sends the signature and required handshake fields to the signaling server to authorize the connection.

### Handling keep-alive requests

When the signaling WebSocket connection is active, the RTMS server periodically sends [keep-alive messages](/docs/rtms/event-reference/#keep-alive-request) to check if the client is still connected. The client needs to respond promptly with a [keep-alive response message](/docs/rtms/event-reference/#keep-alive-response), including the timestamp received in the request, to maintain the WebSocket connection. Add this if-statement:

Add the following code to `index.js`.

```javascript
if (msg.msg_type === 12) {
    // KEEP_ALIVE_REQ
    console.log("Received KEEP_ALIVE_REQ, responding with KEEP_ALIVE_RESP");
    signalingWs.send(
        JSON.stringify({
            msg_type: 13, // KEEP_ALIVE_RESP
            timestamp: msg.timestamp,
        }),
    );
}
```

Add the following code to `main.py`.

```python
if msg.get('msg_type') == 12:  # KEEP_ALIVE_REQ
    print('Received KEEP_ALIVE_REQ, responding with KEEP_ALIVE_ACK')
    await signaling_ws.send(json.dumps({
        'msg_type': 13,  # KEEP_ALIVE_ACK
        'timestamp': msg.get('timestamp')
    }))
```

## Connect to the media server with a WebSocket

When the signaling handshake is successful, the RTMS signaling server sends a [handshake response](/docs/rtms/event-reference#signaling-handshake-response) with media server URLs in `media_server.server_urls`:

**Signaling handshake response sample**

```json
{
    "msg_type": 2,
    "protocol_version": 1,
    "sequence": 0,
    "status_code": 0,
    "reason": "",
    "media_server": {
        "server_urls": {
            "audio": "wss://..."
            // "video": "wss://...",
            // "transcript": "wss://...",
            // "all": "wss://..."
        }
    }
}
```

Next, our app will need to open one of the media URLs to open a WebSocket connection to receive media data. In this example, we will request the audio stream, which uses `media_type: 1`.

When the Media WebSocket connection opens, we build and send a [handshake request to the media server](/docs/rtms/event-reference#media-handshake-request) with our session details and signature:

Add the following code to `index.js`.

```javascript
function connectToMediaWebSocket(
    mediaUrl,
    session_id,
    rtmsStreamId,
    signalingSocket,
) {
    // Open the media WebSocket connection using the URL from the handshake response
    const mediaWs = new WebSocket(mediaUrl);

    mediaWs.on("open", () => {
        // Build the media handshake for audio only
        const handshakeMsg = {
            msg_type: 3, // DATA_HAND_SHAKE_REQ
            protocol_version: 1,
            sequence: 0,
            meeting_uuid: session_id,
            rtms_stream_id: rtmsStreamId,
            signature: generateSignature(session_id, rtmsStreamId),
            media_type: 1, // Request only audio (AUDIO enum)
        };

        console.log("Sending audio handshake:", handshakeMsg);
        mediaWs.send(JSON.stringify(handshakeMsg));
    });

    // Listen for incoming transcript data packets
    mediaWs.on("message", (data) => {
        console.log("Received audio data:", data);
    });
}
```

Add the following code to `main.py`.

```python
async def connect_to_media_websocket(media_url, session_id, stream_id, signaling_socket):
    async with websockets.connect(media_url) as media_ws:
        # Build the media handshake for audio only
        handshake_msg = {
            'msg_type': 3,  # DATA_HAND_SHAKE_REQ
            'protocol_version': 1,
            'sequence': 0,
            'meeting_uuid': session_id,
            'rtms_stream_id': stream_id,
            'signature': generate_signature(session_id, stream_id),
            'media_type': 1  # Request only audio (AUDIO enum)
        }

        print('Sending audio handshake:', handshake_msg)
        await media_ws.send(json.dumps(handshake_msg))

        # Listen for incoming audio data packets
        async for message in media_ws:
            print('Received audio data:', message)
```

After a successful handshake to the media server, the RTMS server responds with a [media handshake response](/docs/rtms/event-reference#media-handshake-response):

**Client ready acknowledgement (ACK) sample**

```json
{
    "msg_type": 7,
    "protocol_version": 1,
    "status_code": 0,
    "reason": "",
    "sequence": 0,
    "payload_encrypted": true,
    "media_params": {
        "transcript": {
            "content_type": 1
        }
    }
}
```

### Send the client ready acknowledgement (ACK)

To verify our app is ready to receive media, we send a client ready ACK message back to the signaling WebSocket. This tells the RTMS server our client is ready to receive a stream on the media server:

Add the following code to `index.js`.

```javascript
// If handshake response is OK, send CLIENT_READY_ACK on signaling socket
if (msg.msg_type === 4 && msg.status_code === 0) {
    console.log(
        "Media handshake successful, sending CLIENT_READY_ACK via signaling socket",
    );
    signalingSocket.send(
        JSON.stringify({
            msg_type: 7, // CLIENT_READY_ACK
            rtms_stream_id: rtmsStreamId,
        }),
    );
}
```

Add the following code to `main.py`.

```python
# If handshake response is OK, send CLIENT_READY_ACK on signaling socket
if msg.get('msg_type') == 4 and msg.get('status_code') == 0:
    print('Media handshake successful, sending CLIENT_READY_ACK via signaling socket')
    await signaling_socket.send(json.dumps({
        'msg_type': 7,  # CLIENT_READY_ACK
        'rtms_stream_id': rtms_stream_id
    }))
```

### Receive audio data

Once the `CLIENT_READY_ACK` is sent, the RTMS server will begin streaming the actual media data, in our case audio, through the media WebSocket.

Incoming media packets have different `msg_type` values depending on the type of media you requested in your `DATA_HAND_SHAKE_REQ`.

For audio, each chunk arrives as a message with `msg_type 14`.

When the media WebSocket is active and the stream has started, you need to handle incoming packets.

Add the following code to `index.js`.

```javascript
// When receiving a MEDIA_DATA_AUDIO message
if (msg.msg_type === 14) {
    console.log("Received audio:", msg.content);
}
```

Add the following code to `main.py`.

```python
# When receiving a MEDIA_DATA_AUDIO message
if msg.get('msg_type') == 14:
    print('Received audio:', msg.get('content'))
```

### Send a keep-alive message to the media WebSocket

Similar to the signaling connection, we also need to keep the media connection alive. We will use the same logic.

Add the following code to `index.js`.

```javascript
if (msg.msg_type === 12) {
    // KEEP_ALIVE_REQ
    console.log("Received KEEP_ALIVE_REQ, responding with KEEP_ALIVE_ACK");
    mediaWs.send(
        JSON.stringify({
            msg_type: 13, // KEEP_ALIVE_ACK
            timestamp: msg.timestamp,
        }),
    );
}
```

Add the following code to `main.py`.

```python
if msg.get('msg_type') == 12:  # KEEP_ALIVE_REQ
    print('Received KEEP_ALIVE_REQ, responding with KEEP_ALIVE_ACK')
    await media_ws.send(json.dumps({
        'msg_type': 13,  # KEEP_ALIVE_ACK
        'timestamp': msg.get('timestamp')
    }))
```