Failover and reconnection

Realtime Media Streams (RTMS) uses a signaling connection that manages session lifecycle, and media connections that carry audio, video, and transcript data. Depending on which connection is interrupted and why, the recovery steps differ.

ScenarioTriggerWhat to reconnect
RTMS server failuremeeting.rtms_started arrives for an already-active streamSignaling + media connections
Signaling connection droppedmeeting.rtms_interrupted webhookSignaling + media connections
Media connection droppedMEDIA_CONNECTION_INTERRUPTED signal eventAffected media connection(s) only
Stream terminatedmeeting.rtms_stopped with stop_reason 10–19 or 24Full RTMS connection restart

The code snippets included here are for meetings; for webinars, replace events with their webinar equivalents. For example, replace meeting.rtms_started with webinar.rtms_started. To see the full solution and test your reconnection logic with chaos mode, see RTMS Reconnection & Chaos Mode on GitHub.

Exponential backoff

Scenarios 2 and 3 use this delay function. Reconnection attempts use exponential backoff starting at 3 seconds and doubling after each attempt until 30 seconds. Reset reconnectAttempts to 0 after each successful handshake so the backoff counter doesn't carry over into future disconnections.

/**
 * Calculate reconnection delay with exponential backoff.
 * Attempt 0: 3s, Attempt 1: 6s, Attempt 2: 12s, Attempt 3: 24s, then capped at 30s.
 */
function getReconnectDelay(attempts) {
    return Math.min(
        RECONNECT_BASE_DELAY_MS * Math.pow(RECONNECT_BACKOFF_FACTOR, attempts),
        RECONNECT_MAX_DELAY_MS,
    );
}

Scenario 1: RTMS server failure

RTMS server failures require a full reconnection between your app and the RTMS server.

Problem

A meeting.rtms_started event arrives with an rtms_stream_id that's already an active stream, indicating an RTMS server failure.

Solution

Complete the entire connection process to create a new connection to the server.

To detect that the stream is active

Use an if/then statement to determine if a stream is already active.

if (activeStreams.has(rtms_stream_id)) {
    // Scenario 1: RTMS server failed and restarted. Reconnect with new URLs.
    handleServerFailureReconnect(meeting_uuid, rtms_stream_id, server_urls);
}

To create the new connections

Use the handleServerFailureReconnect function to reestablish all the connections.

/**
 * SCENARIO 1: RTMS Server Failure
 *
 * What happened:
 *   The Zoom RTMS server went down. A new RTMS server has spun up and sent
 *   a fresh meeting.rtms_started webhook with (possibly new) server_urls.
 *
 * What to do:
 *   Tear down all existing connections and start fresh with the new URLs.
 *   The meeting_uuid and rtms_stream_id stay the same.
 *
 * Trigger:
 *   meeting.rtms_started webhook arrives for a streamId we already have.
 *
 * See: Failover and reconnection > Scenario 1: RTMS server failure
 */
function handleServerFailureReconnect(meetingUuid, streamId, serverUrls) {
    log(streamId, "RECONNECT", "========================================");
    log(streamId, "RECONNECT", "SCENARIO 1: RTMS SERVER FAILURE");
    log(
        streamId,
        "RECONNECT",
        "A new meeting.rtms_started arrived for an existing stream.",
    );
    log(
        streamId,
        "RECONNECT",
        "Tearing down old connections and reconnecting with new server URLs.",
    );
    log(streamId, "RECONNECT", "========================================");
    // Close any existing sockets
    const existing = activeStreams.get(streamId);
    if (existing) {
        safeCloseWs(existing.signalingWs);
        safeCloseWs(existing.mediaWs);
    }
    // Create a fresh connection state with the new server URLs
    const conn = createStreamConnection(meetingUuid, streamId, serverUrls);
    conn.state = "RECONNECTING";
    activeStreams.set(streamId, conn);
    // Connect immediately — the new server is ready for us
    connectToSignalingWebSocket(conn);
}

Scenario 2: Signaling connection dropped

If the signaling connection is dropped, the signaling and media connections between your app and the RTMS server need to be reestablished.

Problem

A meeting.rtms_interrupted event arrives, indicating that the signaling connection was dropped.

Solution

Close all connections, increment the reconnect attempt counter, schedule connectToSignalingWebSocket() after an exponential backoff delay, and create a new connection to the server. The RTMS server allows approximately 60 seconds for signaling reconnection before ending the stream.

To detect a meeting.rtms_interrupted event

Use a case statement to detect a signaling connection issue.

// ---------------------------------------------------------------
// meeting.rtms_interrupted
//
// SCENARIO 2: Our signaling connection dropped. The server interrupted
// both signaling and media. We must reconnect both.
// ---------------------------------------------------------------
case 'meeting.rtms_interrupted': {
  const { meeting_uuid, rtms_stream_id, server_urls } = payload;
  log(rtms_stream_id, 'WEBHOOK', `meeting.rtms_interrupted — meeting: ${meeting_uuid}`);
  handleSignalingInterruptedReconnect(meeting_uuid, rtms_stream_id, server_urls);
  break;
}

To create the new connections

Use the handleSignalingInterruptedReconnect function to reestablish all the connections.

/**
 * SCENARIO 2: Signal Connection Down (App Issue)
 *
 * What happened:
 *   Our app's signaling WebSocket dropped (network issue, chaos mode, etc.).
 *   Since signaling controls the session, the RTMS server interrupted BOTH
 *   the signaling and media connections.
 *
 * What to do:
 *   Re-establish both signaling and media connections.
 *   The server waits ~60 seconds for us to reconnect before ending the stream.
 *
 * Trigger:
 *   meeting.rtms_interrupted webhook
 *
 * See: Failover and reconnection > Scenario 2: Signaling connection dropped
 */
function handleSignalingInterruptedReconnect(
    meetingUuid,
    streamId,
    serverUrls,
) {
    log(streamId, "RECONNECT", "========================================");
    log(streamId, "RECONNECT", "SCENARIO 2: SIGNAL CONNECTION INTERRUPTED");
    log(streamId, "RECONNECT", "meeting.rtms_interrupted webhook received.");
    log(
        streamId,
        "RECONNECT",
        "Must re-establish BOTH signaling and media connections.",
    );
    log(
        streamId,
        "RECONNECT",
        "Server allows ~60 seconds for signaling reconnection.",
    );
    log(streamId, "RECONNECT", "========================================");
    let conn = activeStreams.get(streamId);
    if (!conn) {
        // Edge case: we lost track of this stream. Create a new connection state.
        log(
            streamId,
            "RECONNECT",
            "No existing state found. Creating fresh connection.",
        );
        conn = createStreamConnection(meetingUuid, streamId, serverUrls);
        activeStreams.set(streamId, conn);
    }
    // Close any lingering sockets
    safeCloseWs(conn.signalingWs);
    safeCloseWs(conn.mediaWs);
    // Update server URLs in case the webhook provides updated ones
    if (serverUrls) {
        conn.serverUrls = serverUrls;
    }
    conn.state = "RECONNECTING";
    conn.reconnectAttempts++;
    // Reset chaos mode suppression counters so we can observe the cycle again
    conn.signalingKeepAliveSuppressed = 0;
    conn.mediaKeepAliveSuppressed = 0;
    const delay = getReconnectDelay(conn.reconnectAttempts);
    log(
        streamId,
        "RECONNECT",
        `Reconnecting in ${delay}ms (attempt #${conn.reconnectAttempts})...`,
    );
    setTimeout(() => {
        if (conn.state === "STOPPED") return;
        connectToSignalingWebSocket(conn);
    }, delay);
}

Scenario 3: A media connection dropped

If a media connection is dropped, only the affected media connection between your app and the RTMS server needs to be reestablished. The signaling connection remains active.

Problem

The signaling connection delivers an EVENT_UPDATE message with event_type: 7, indicating which media connection was dropped.

Solution

Close the affected media connection, schedule connectToMediaWebSocket() after an exponential backoff delay, and create a new connection to the server. The RTMS server allows approximately 30 seconds for media reconnection before terminating the session.

To detect the dropped media connection

Use a case statement to detect a media connection issue.

// ---------------------------------------------------------------
// RECONNECTION SCENARIO 3: Media Connection Interrupted
//
// The signaling connection is still alive, but a media socket went down.
// The server notifies us through the signaling channel.
//
// Action: Reconnect ONLY the media WebSocket. Signaling stays up.
// See: Failover and reconnection > App issue (data socket only)
// ---------------------------------------------------------------
case EVENT_TYPE.MEDIA_CONNECTION_INTERRUPTED:
  log(conn.streamId, 'RECONNECT', '========================================');
  log(conn.streamId, 'RECONNECT', 'SCENARIO 3: MEDIA_CONNECTION_INTERRUPTED');
  log(conn.streamId, 'RECONNECT', 'Signaling is still alive. Reconnecting ONLY the media socket.');
  log(conn.streamId, 'RECONNECT', 'Server allows ~30 seconds for media reconnection.');
  log(conn.streamId, 'RECONNECT', '========================================');
  handleMediaOnlyReconnect(conn);
break;

To create the new connection

Use the handleMediaOnlyReconnect function to reestablish the affected media connection.

/**
 * SCENARIO 3: Media Connection Down Only
 *
 * What happened:
 *   Only the media WebSocket dropped. The signaling connection is still alive.
 *   The server notified us through the signaling channel via either:
 *     - EVENT_UPDATE (msg_type 6) with event_type MEDIA_CONNECTION_INTERRUPTED (7)
 *     - STREAM_STATE_UPDATE (msg_type 8) with state INTERRUPTED (2) and reason 14
 *
 * What to do:
 *   Reconnect ONLY the media WebSocket. Signaling stays up.
 *   The server waits ~30 seconds for media reconnection.
 *
 * See: Failover and reconnection > Scenario 3: A media connection dropped
 */
function handleMediaOnlyReconnect(conn) {
    log(
        conn.streamId,
        "RECONNECT",
        "Closing old media socket and scheduling reconnection...",
    );
    // Close the old media socket
    safeCloseWs(conn.mediaWs);
    conn.mediaWs = null;
    conn.state = "RECONNECTING";
    conn.reconnectAttempts++;
    // Reset media chaos counter so we can observe the cycle again
    conn.mediaKeepAliveSuppressed = 0;
    const delay = getReconnectDelay(conn.reconnectAttempts);
    log(
        conn.streamId,
        "RECONNECT",
        `Reconnecting media in ${delay}ms (attempt #${conn.reconnectAttempts})...`,
    );
    setTimeout(() => {
        if (conn.state === "STOPPED") return;
        connectToMediaWebSocket(conn);
    }, delay);
}

Scenario 4: Stream terminated

If the above reconnection scenarios are missed and the reconnection window times out, the RTMS server terminates the stream and sends a webhook to notify your app.

Problem

A meeting.rtms_stopped (or webinar.rtms_stopped) webhook arrives with a stop_reason in the range of 10–19 (inclusive) or 24, indicating the stream was terminated due to a missed reconnection window.

Solution

Restart the entire RTMS connection process from the beginning to resume receiving complete data.

If reconnection fails

If reconnect attempts are exhausted (for example, after the backoff cap of 30 seconds has been hit repeatedly), treat the stream as unrecoverable. Close all open connections, stop tracking the stream, and surface the failure to your application. Do not retry indefinitely.