Failover and reconnection
Realtime Media Streams (RTMS) uses a signaling connection that manages session lifecycle, and media connections that carry audio, video, and transcript data. Depending on which connection is interrupted and why, the recovery steps differ.
| Scenario | Trigger | What to reconnect |
|---|---|---|
| RTMS server failure | meeting.rtms_started arrives for an already-active stream | Signaling + media connections |
| Signaling connection dropped | meeting.rtms_interrupted webhook | Signaling + media connections |
| Media connection dropped | MEDIA_CONNECTION_INTERRUPTED signal event | Affected media connection(s) only |
| Stream terminated | meeting.rtms_stopped with stop_reason 10–19 or 24 | Full RTMS connection restart |
The code snippets included here are for meetings; for webinars, replace events with their webinar equivalents. For example, replace
meeting.rtms_startedwithwebinar.rtms_started. To see the full solution and test your reconnection logic with chaos mode, see RTMS Reconnection & Chaos Mode on GitHub.
Exponential backoff
Scenarios 2 and 3 use this delay function. Reconnection attempts use exponential backoff starting at 3 seconds and doubling after each attempt until 30 seconds. Reset reconnectAttempts to 0 after each successful handshake so the backoff counter doesn't carry over into future disconnections.
/**
* Calculate reconnection delay with exponential backoff.
* Attempt 0: 3s, Attempt 1: 6s, Attempt 2: 12s, Attempt 3: 24s, then capped at 30s.
*/
function getReconnectDelay(attempts) {
return Math.min(
RECONNECT_BASE_DELAY_MS * Math.pow(RECONNECT_BACKOFF_FACTOR, attempts),
RECONNECT_MAX_DELAY_MS,
);
}
Scenario 1: RTMS server failure
RTMS server failures require a full reconnection between your app and the RTMS server.
Problem
A meeting.rtms_started event arrives with an rtms_stream_id that's already an active stream, indicating an RTMS server failure.
Solution
Complete the entire connection process to create a new connection to the server.
To detect that the stream is active
Use an if/then statement to determine if a stream is already active.
if (activeStreams.has(rtms_stream_id)) {
// Scenario 1: RTMS server failed and restarted. Reconnect with new URLs.
handleServerFailureReconnect(meeting_uuid, rtms_stream_id, server_urls);
}
To create the new connections
Use the handleServerFailureReconnect function to reestablish all the connections.
/**
* SCENARIO 1: RTMS Server Failure
*
* What happened:
* The Zoom RTMS server went down. A new RTMS server has spun up and sent
* a fresh meeting.rtms_started webhook with (possibly new) server_urls.
*
* What to do:
* Tear down all existing connections and start fresh with the new URLs.
* The meeting_uuid and rtms_stream_id stay the same.
*
* Trigger:
* meeting.rtms_started webhook arrives for a streamId we already have.
*
* See: Failover and reconnection > Scenario 1: RTMS server failure
*/
function handleServerFailureReconnect(meetingUuid, streamId, serverUrls) {
log(streamId, "RECONNECT", "========================================");
log(streamId, "RECONNECT", "SCENARIO 1: RTMS SERVER FAILURE");
log(
streamId,
"RECONNECT",
"A new meeting.rtms_started arrived for an existing stream.",
);
log(
streamId,
"RECONNECT",
"Tearing down old connections and reconnecting with new server URLs.",
);
log(streamId, "RECONNECT", "========================================");
// Close any existing sockets
const existing = activeStreams.get(streamId);
if (existing) {
safeCloseWs(existing.signalingWs);
safeCloseWs(existing.mediaWs);
}
// Create a fresh connection state with the new server URLs
const conn = createStreamConnection(meetingUuid, streamId, serverUrls);
conn.state = "RECONNECTING";
activeStreams.set(streamId, conn);
// Connect immediately — the new server is ready for us
connectToSignalingWebSocket(conn);
}
Scenario 2: Signaling connection dropped
If the signaling connection is dropped, the signaling and media connections between your app and the RTMS server need to be reestablished.
Problem
A meeting.rtms_interrupted event arrives, indicating that the signaling connection was dropped.
Solution
Close all connections, increment the reconnect attempt counter, schedule connectToSignalingWebSocket() after an exponential backoff delay, and create a new connection to the server. The RTMS server allows approximately 60 seconds for signaling reconnection before ending the stream.
To detect a meeting.rtms_interrupted event
Use a case statement to detect a signaling connection issue.
// ---------------------------------------------------------------
// meeting.rtms_interrupted
//
// SCENARIO 2: Our signaling connection dropped. The server interrupted
// both signaling and media. We must reconnect both.
// ---------------------------------------------------------------
case 'meeting.rtms_interrupted': {
const { meeting_uuid, rtms_stream_id, server_urls } = payload;
log(rtms_stream_id, 'WEBHOOK', `meeting.rtms_interrupted — meeting: ${meeting_uuid}`);
handleSignalingInterruptedReconnect(meeting_uuid, rtms_stream_id, server_urls);
break;
}
To create the new connections
Use the handleSignalingInterruptedReconnect function to reestablish all the connections.
/**
* SCENARIO 2: Signal Connection Down (App Issue)
*
* What happened:
* Our app's signaling WebSocket dropped (network issue, chaos mode, etc.).
* Since signaling controls the session, the RTMS server interrupted BOTH
* the signaling and media connections.
*
* What to do:
* Re-establish both signaling and media connections.
* The server waits ~60 seconds for us to reconnect before ending the stream.
*
* Trigger:
* meeting.rtms_interrupted webhook
*
* See: Failover and reconnection > Scenario 2: Signaling connection dropped
*/
function handleSignalingInterruptedReconnect(
meetingUuid,
streamId,
serverUrls,
) {
log(streamId, "RECONNECT", "========================================");
log(streamId, "RECONNECT", "SCENARIO 2: SIGNAL CONNECTION INTERRUPTED");
log(streamId, "RECONNECT", "meeting.rtms_interrupted webhook received.");
log(
streamId,
"RECONNECT",
"Must re-establish BOTH signaling and media connections.",
);
log(
streamId,
"RECONNECT",
"Server allows ~60 seconds for signaling reconnection.",
);
log(streamId, "RECONNECT", "========================================");
let conn = activeStreams.get(streamId);
if (!conn) {
// Edge case: we lost track of this stream. Create a new connection state.
log(
streamId,
"RECONNECT",
"No existing state found. Creating fresh connection.",
);
conn = createStreamConnection(meetingUuid, streamId, serverUrls);
activeStreams.set(streamId, conn);
}
// Close any lingering sockets
safeCloseWs(conn.signalingWs);
safeCloseWs(conn.mediaWs);
// Update server URLs in case the webhook provides updated ones
if (serverUrls) {
conn.serverUrls = serverUrls;
}
conn.state = "RECONNECTING";
conn.reconnectAttempts++;
// Reset chaos mode suppression counters so we can observe the cycle again
conn.signalingKeepAliveSuppressed = 0;
conn.mediaKeepAliveSuppressed = 0;
const delay = getReconnectDelay(conn.reconnectAttempts);
log(
streamId,
"RECONNECT",
`Reconnecting in ${delay}ms (attempt #${conn.reconnectAttempts})...`,
);
setTimeout(() => {
if (conn.state === "STOPPED") return;
connectToSignalingWebSocket(conn);
}, delay);
}
Scenario 3: A media connection dropped
If a media connection is dropped, only the affected media connection between your app and the RTMS server needs to be reestablished. The signaling connection remains active.
Problem
The signaling connection delivers an EVENT_UPDATE message with event_type: 7, indicating which media connection was dropped.
Solution
Close the affected media connection, schedule connectToMediaWebSocket() after an exponential backoff delay, and create a new connection to the server. The RTMS server allows approximately 30 seconds for media reconnection before terminating the session.
To detect the dropped media connection
Use a case statement to detect a media connection issue.
// ---------------------------------------------------------------
// RECONNECTION SCENARIO 3: Media Connection Interrupted
//
// The signaling connection is still alive, but a media socket went down.
// The server notifies us through the signaling channel.
//
// Action: Reconnect ONLY the media WebSocket. Signaling stays up.
// See: Failover and reconnection > App issue (data socket only)
// ---------------------------------------------------------------
case EVENT_TYPE.MEDIA_CONNECTION_INTERRUPTED:
log(conn.streamId, 'RECONNECT', '========================================');
log(conn.streamId, 'RECONNECT', 'SCENARIO 3: MEDIA_CONNECTION_INTERRUPTED');
log(conn.streamId, 'RECONNECT', 'Signaling is still alive. Reconnecting ONLY the media socket.');
log(conn.streamId, 'RECONNECT', 'Server allows ~30 seconds for media reconnection.');
log(conn.streamId, 'RECONNECT', '========================================');
handleMediaOnlyReconnect(conn);
break;
To create the new connection
Use the handleMediaOnlyReconnect function to reestablish the affected media connection.
/**
* SCENARIO 3: Media Connection Down Only
*
* What happened:
* Only the media WebSocket dropped. The signaling connection is still alive.
* The server notified us through the signaling channel via either:
* - EVENT_UPDATE (msg_type 6) with event_type MEDIA_CONNECTION_INTERRUPTED (7)
* - STREAM_STATE_UPDATE (msg_type 8) with state INTERRUPTED (2) and reason 14
*
* What to do:
* Reconnect ONLY the media WebSocket. Signaling stays up.
* The server waits ~30 seconds for media reconnection.
*
* See: Failover and reconnection > Scenario 3: A media connection dropped
*/
function handleMediaOnlyReconnect(conn) {
log(
conn.streamId,
"RECONNECT",
"Closing old media socket and scheduling reconnection...",
);
// Close the old media socket
safeCloseWs(conn.mediaWs);
conn.mediaWs = null;
conn.state = "RECONNECTING";
conn.reconnectAttempts++;
// Reset media chaos counter so we can observe the cycle again
conn.mediaKeepAliveSuppressed = 0;
const delay = getReconnectDelay(conn.reconnectAttempts);
log(
conn.streamId,
"RECONNECT",
`Reconnecting media in ${delay}ms (attempt #${conn.reconnectAttempts})...`,
);
setTimeout(() => {
if (conn.state === "STOPPED") return;
connectToMediaWebSocket(conn);
}, delay);
}
Scenario 4: Stream terminated
If the above reconnection scenarios are missed and the reconnection window times out, the RTMS server terminates the stream and sends a webhook to notify your app.
Problem
A meeting.rtms_stopped (or webinar.rtms_stopped) webhook arrives with a stop_reason in the range of 10–19 (inclusive) or 24, indicating the stream was terminated due to a missed reconnection window.
Solution
Restart the entire RTMS connection process from the beginning to resume receiving complete data.
If reconnection fails
If reconnect attempts are exhausted (for example, after the backoff cap of 30 seconds has been hit repeatedly), treat the stream as unrecoverable. Close all open connections, stop tracking the stream, and surface the failure to your application. Do not retry indefinitely.