Live transcription and translation
You can receive speech as JSON objects, in real time, using the Zoom live transcription and translation feature (LTT). This feature can also translate speech from one language in real time to text in another language. This can power use cases like auto closed captioning, sentiment analysis, and language translation for e-learning.
For example, one user can say "Hello world" in English, and the other users can receive this speech, as text, in the language of their choice, like Italian "Ciao mondo", Spanish "Hola Mundo", and French "Bonjour le monde".
Zoom recommends that you provide an in-product notice to your end-users when a participant enables live transcription.
View usage
To monitor transcription (caption) and translation activity for your Video SDK sessions, see Build Platform → View usage reports: Captions and translation for detailed usage data.
Initialize LTT
After joining a session, call zoom.liveTranscriptionHelper.canStartLiveTranscription() to see if the live transcription and translation feature is enabled.
const canStartLiveTranscription =
await zoom.liveTranscriptionHelper.canStartLiveTranscription();
Start transcription and translation
To start live transcription and translation, call zoom.liveTranscriptionHelper.startLiveTranscription().
const result = await zoom.liveTranscriptionHelper.startLiveTranscription();
Receive transcription
To receive status changes of the live transcription service (for example, start or stop), add the following event listener.
const liveTranscriptionStatusChangeListener = zoom.addListener(
EventType.onLiveTranscriptionStatus,
({status}: {status: LiveTranscriptionStatus}) => {
console.log(`onLiveTranscriptionStatus: ${status}`);
}
);
Get supported languages
To get the supported languages list, call zoom.liveTranscriptionHelper.getAvailableSpokenLanguages().
zoom.liveTranscriptionHelper.getAvailableSpokenLanguages();
Set speaking language
To specify the language you are speaking in, call zoom.liveTranscriptionHelper.setSpokenLanguage(languageId: number). This is optional. If you don't specify the language, the SDK will use the default language, English.
zoom.liveTranscriptionHelper.setSpokenLanguage(0);
Set translation language
To set the language you want speech text translated to, call zoom.liveTranscriptionHelper.setTranslationLanguage(languageId: number).
zoom.liveTranscriptionHelper.setTranslationLanguage(0);
Receive translation
To receive translated speech text, add the following event listener.
const liveTranscriptionMsgInfoReceivedListener = zoom.addListener(
EventType.onLiveTranscriptionMsgInfoReceived,
({messageInfo}: {messageInfo: ZoomVideoSdkLiveTranscriptionMessageInfoType}) => {
console.log(messageInfo);
const message = new ZoomVideoSdkLiveTranscriptionMessageInfo(messageInfo);
console.log(`onLiveTranscriptionMsgInfoReceived: ${message.messageContent}`);
}
);
To enable receiving original and translated speech text, call zoom.liveTranscriptionHelper.enableReceiveSpokenLanguageContent(enable: boolean).
zoom.liveTranscriptionHelper.enableReceiveSpokenLanguageContent(true);
To receive the original speech text, add the following event listener.
const originalLanguageMsgInfoReceivedListener = zoom.addListener(
EventType.onOriginalLanguageMsgReceived,
({messageInfo}: {messageInfo: ZoomVideoSdkLiveTranscriptionMessageInfoType}) => {
console.log(messageInfo);
const message = new ZoomVideoSdkLiveTranscriptionMessageInfo(messageInfo);
console.log(`onOriginalLanguageMsgReceived: ${message.messageContent}`);
}
);
Stop transcription and translation
To stop live transcription and translation, call stopLiveTranscription.
zoom.liveTranscriptionHelper.stopLiveTranscription();
LTT best practices
When implementing Live Transcription and Translation (LTT) for your integration, consider the following best practices:
- If the feature is enabled for the session, provide a button to allow people to start closed captioning and select the spoken and translated languages.
- Display only the supported languages you wish to offer, rather than presenting all available options for transcription and translation.
- If the session won't include LTT, programmatically disable when the host starts the session. When someone joins the session, check if the feature is enabled by the host. If not, inform users.
- Use an event listener to detect when the host has enabled captions. You can use this to notify people that the feature is active, programmatically render a button for starting transcription or translation, or both.
- Set
enableReceiveSpokenLanguageContent()to false if you don't want to receive the spoken language data. - Use an event listener to detect when the host disables captions. You can use this to notify people who have enabled the feature that it has been disabled.
- Offer closed captioning customization options, such as font sizes and colors, to differentiate between transcription and translation texts, to enhance readability and follow accessibility standards.
- Inform people of these best practices when speaking:
- Minimize background noise, avoiding activities like shuffling papers, typing loudly, or engaging in side conversations.
- Speak clearly into the microphone.
- Position the microphone near active speakers.
- Opt for an external microphone over a built-in one to improve sound quality.
More LTT features
You can also control user audio volume and mute local audio to further refine live transcription and translation use cases.
For the full set of live transcription and translation features, see the Video SDK reference.