# Add closed captions to user videos in real-time with Zoom Video SDK
Live transcription in the Zoom Video SDK already gives you machine-generated subtitles. With media processor, you can burn those captions directly onto the video or screenshare streams. In this blog post, we'll render closed captions on top of your outgoing video and screen share.
## Prerequisites
- Node & NPM LTS
- A Zoom Video SDK account
- Transcription service enabled for Video SDK
We'll build on top of the [Zoom Video SDK quickstart guide](/blog/build-a-video-conferencing-app-with-the-zoom-video-sdk). If you're new to the SDK, we recommend checking out the quickstart guide first. You can clone that repo and follow the steps to get started:
```bash
git clone https://github.com/zoom/videosdk-web-helloworld
```
At a high level, the flow looks like this:
1. The client joins a Video SDK session and starts video, audio, and live transcriptions.
2. Zoom emits `caption-message` events for the active speaker.
3. For each caption, we render styled text onto a canvas using [`fabric`](https://fabricjs.com/), then convert it into a bitmap image.
4. Two media processors (`caption-video.js` and `caption-screen.js`) receive the image and draw it on top of the outgoing video and screen share frames.
For this example, we'll build processors that overlay stylized subtitle text on top of each video or screen-share frame.
## Step 1: Create caption processors for video and screen share
We define one processor for camera video and one for screen share. They look almost identical, but extend different base classes:
- `public/caption-video.js` extends `VideoProcessor`
- `public/caption-screen.js` extends `ShareProcessor`
Both processors share the same idea:
- Listen for messages on a `MessagePort` with a `cmd` of `"caption"` and an `ImageBitmap` payload.
- Store that bitmap in `this.captionImage`.
- On each frame, draw the input frame onto a canvas and then draw the caption bitmap on top.
We can define the video processor by extending the `VideoProcessor` interface. We'll define a `context` field to store the canvas context and a `captionImage` field to store the caption image:
```js
// public/caption-video.js
class CaptionProcessor extends VideoProcessor {
context = null;
captionImage = null;
constructor(port, options = {}) {
super(port, options);
port.onmessage = (event) => {
const { cmd, image } = event.data;
if (cmd === "caption") {
this.captionImage = image;
}
};
}
onInit() {
const canvas = this.getOutput();
if (canvas) {
this.context = canvas.getContext("2d");
}
}
onUninit() {
this.context = null;
this.captionImage = null;
}
async processFrame(input, output) {
if (!this.context) return;
this.context.drawImage(input, 0, 0, output.width, output.height);
if (this.captionImage) {
this.context.imageSmoothingEnabled = true;
this.context.drawImage(
this.captionImage,
0,
0,
output.width,
output.height,
);
}
return true;
}
}
registerProcessor("caption", CaptionProcessor);
```
`public/caption-screen.js` follows the same pattern, but extends `ShareProcessor` instead so it can process the screen-share stream:
```js
// public/caption-screen.js
class CaptionProcessor extends ShareProcessor {
// same code as CaptionProcessor but extends ShareProcessor
}
registerProcessor("caption", CaptionProcessor);
```
Now that we've defined the processor classes, we need to register and attach them in the Video SDK client.
## Step 2: Add the media processors to the Video SDK
To use the processors within the Video SDK, we first check if the browser has support for video and share processors using the `isSupportVideoProcessor` and `isSupportShareProcessor` methods on the `mediaStream`:
```ts
const client = ZoomVideo.createClient();
const mediaStream = client.getMediaStream();
let videoprocessor;
let shareProcessor;
if (!mediaStream.isSupportVideoProcessor()) {
alert("Your browser does not support video processor");
}
if (!mediaStream.isSupportShareProcessor()) {
alert("Your browser does not support share processor");
}
```
We can then create processor instances by calling the `createProcessor` method on the `mediaStream`:
```ts
videoprocessor = await mediaStream.createProcessor({
name: "caption",
type: "video",
url: window.location.origin + "/caption-video.js",
});
await mediaStream.addProcessor(videoprocessor);
shareprocessor = await mediaStream.createProcessor({
name: "caption",
type: "share",
url: window.location.origin + "/caption-screen.js",
options: { needFixedCaptureRate: true },
});
```
Note: By default, the screen share processor only processes a frame (i.e., applies our processor) when the video data from the shared screen changes. This optimization improves performance and avoids wasting resources. However, in our use case, we might be sharing a static webpage or image where we still want to update the captions on that static frame. To handle this, we set `options: { needFixedCaptureRate: true }` to call the `processFrame` function at a fixed rate several times per second.
We'll pass in a `name` for each processor and the `type` of the processor. The `url` specifies the script location; it must originate from the same domain or have the appropriate CORS headers.
When the user starts screen sharing, we add the share processor to the share stream pipeline by calling `startShareScreen` with either a video element or canvas:
```ts
const startShare = async () => {
const mediaStream = client.getMediaStream();
if (mediaStream.isStartShareScreenWithVideoElement()) {
await mediaStream.startShareScreen(myShareEle, {
captureHeight: 720,
captureWidth: 1280,
displaySurface: "monitor",
});
myShareEle.style.display = "block";
} else {
console.log("can't use video element");
await mediaStream.startShareScreen(myShareCanvas, {
captureHeight: 720,
captureWidth: 1280,
displaySurface: "monitor",
});
myShareCanvas.style.display = "block";
}
await mediaStream.addProcessor(shareprocessor);
};
```
## Step 3: Apply captions
To apply captions, we need to send an image bitmap to the processors whenever a caption message is received. We'll use [`fabric.js`](https://fabricjs.com/) to render the caption text into a bitmap.
You can check out [utils.ts](https://github.com/zoom/videosdk-web-mediaprocessor-closedcaptions/blob/main/src/utils.ts) in the repo for an example that does text splitting and dynamic text sizing. Here's a simplified version:
```ts
export async function getBitmap(message: string, w: number, h: number) {
let width = w;
let height = h;
if (w === 0) width = 1920;
if (h === 0) height = 1080;
const padding = (height * 0.025);
const canvas = new Canvas(document.createElement("canvas"), { width, height });
const fontSize = Math.round(height * 0.05);
const strokeWidth = Math.round(height * 0.01);
const totalTextHeight = fontSize + padding;
let startY = height - padding - totalTextHeight;
```
We create a canvas with the width and height of the frame, and then add a text object to the canvas.
```
export async function getBitmap(message: string, w: number, h: number) {
...
const textObj = new FabricText(message, {
textAlign: 'center',
fontFamily: 'sans-serif',
fill: "yellow",
stroke: "black",
paintFirst: 'stroke',
fontSize,
strokeWidth,
width: width - (padding * 2),
left: width / 2,
originX: 'center',
top: startY,
originY: 'top',
});
canvas.add(textObj);
canvas.renderAll();
return canvas.toDataURL();
}
```
This creates a bitmap image of the caption text and returns it as an [`ImageBitmap`](https://developer.mozilla.org/en-US/docs/Web/API/ImageBitmap) object.
We can listen for caption messages and pass the image bitmap data to both processors using the `postMessage` method:
```ts
client.on("caption-message", async (payload) => {
if (payload.userId === client.getCurrentUserInfo().userId) {
const mediaStream = client.getMediaStream();
const { width, height } = mediaStream.getCapturedVideoResolution();
const videoImageBitmap = await getBitmap(payload.text, width, height);
videoprocessor.port.postMessage({
cmd: "caption",
image: videoImageBitmap,
});
const shareStreamSettings = mediaStream.getShareStreamSettings();
if (!shareStreamSettings) return;
if (!shareStreamSettings.width || !shareStreamSettings.height) return;
const imageBitmap = await getBitmap(
payload.text,
shareStreamSettings.width,
shareStreamSettings.height,
);
shareprocessor.port.postMessage({ cmd: "caption", image: imageBitmap });
}
});
```
This renders captions on each outgoing frame of the user's video and screen share. We filter on `payload.userId` so that each client only burns captions into their own outgoing streams.
Finally, to start transcription and get caption messages, we use the Live Transcription client:
```ts
const liveTranscriptionTranslation = client.getLiveTranscriptionClient();
await liveTranscriptionTranslation.startLiveTranscription();
liveTranscriptionTranslation.setSpeakingLanguage(
LiveTranscriptionLanguage.English,
);
```
That's all the code you need to get closed captions burned into your outgoing streams.
## Step 4: Build a minimal UI in `index.html`
The UI in `index.html` is intentionally simple: three buttons (`Join`, `Share Screen`, `Leave`), a `video-player-container` for remote video, and some canvases for screen-share rendering:
```html
```
You can check out the full code in [`index.html`](https://github.com/zoom/videosdk-web-mediaprocessor-closedcaptions/blob/main/index.html).
## Conclusion
With just a few lines of code, you can turn live transcription into high-quality, burned-in captions in the Zoom Video SDK. Beyond closed captions, you can experiment with overlays for highlights, reactions, or other real-time context on top of the video and screen share streams.
To dive deeper, check out our [raw-data documentation](/docs/video-sdk/web/raw-data) and explore the [sample processor repo](https://github.com/zoom/videosdk-web-processor-sample/tree/main) for more inspiration.