Live translation

Zoom live transcription and translation (LTT) translates speech from one language in real time to text in another language. This can power use cases like auto closed captioning, sentiment analysis, and language translation for e-learning.

For example, one user can say "Hello world" in English, and the other users can receive this speech, as text, in the language of their choice, like Italian "Ciao mondo", Spanish "Hola Mundo", and French "Bonjour le monde".

Initialize LTT

After joining a session, call ZoomVideoSDK.shareInstance()?.getLiveTranscriptionHelper() to get the live translation helper.

let liveTranscriptionHelper = ZoomVideoSDK.shareInstance()?.getLiveTranscriptionHelper()
ZoomVideoSDKLiveStreamHelper *liveTranscriptionHelper = [[ZoomVideoSDK shareInstance] getLiveTranscriptionHelper];

Supported languages

We are continuously adding to our supported languages for translation. The following code samples demonstrate how you can check on the list of supported spoken language and translated languages.

List supported translation languages

// Iterate and print out the language ID and language name of supported translation languages
if let availableTranslationLanguages = liveTranscriptionHelper.getAvailableTranslationLanguages() {
    for language in availableTranslationLanguages {
        print("Translate Language ID: \(language.languageID)")
        print("Translate Language Name: \(language.languageName)")
    }
}
// Iterate and print out the language ID and language name of supported translation languages
NSArray<ZoomVideoSDKLiveTranscriptionLanguage *> *availableTranslationLanguages = [liveTranscriptionHelper getAvailableTranslationLanguages];
if (availableTranslationLanguages != nil) {
    for (ZoomVideoSDKLiveTranscriptionLanguage* language in availableTranslationLanguages) {
        NSLog(@"Translate Language ID: %ld", (long)[language languageID]);
        NSLog(@"Translate Language Name: %@", [language languageName]);
    }
}

Set spoken language

Typically, you might set the spoken language using setSpokenLanguage(language_id) before starting live translation. If you do not set it, the spoken language defaults to "English" (0).

liveTranscriptionHelper.setSpokenLanguage(0)
[liveTranscriptionHelper setSpokenLanguage: 0];

Start translation

Translation is part of the transcription service. To start translation, call setTranslationLanguage() before you start, or during, translation. If you want to change your translation language during translation, you can also use setTranslationLanguage().

First specify the language you are speaking in with setSpokenLanguage(). Then call setTranslationLanguage() to set the translated language.

// Set spoken language and translation language before starting translation services
liveTranscriptionHelper.setSpokenLanguage(language_id)
liveTranscriptionHelper.setTranslationLanguage(language_id)
liveTranscriptionHelper.startLiveTranscription()
// Optional: change to a different translation language midway
liveTranscriptionHelper.setTranslationLanguage(different_language_id)
// Set spoken language and translation language before starting translation services
[liveTranscriptionHelper setSpokenLanguage: language_id];
[liveTranscriptionHelper setTranslationLanguage: language_id];
[liveTranscriptionHelper startLiveTranscription];
// Optional: change to a different translation language midway
[liveTranscriptionHelper setTranslationLanguage: different_language_id];

Receive translation

To receive translated speech text, handle the callback in the onLiveTranscriptionMsgReceived event listener.

The following code sample uses messageType to determine the type of message returned by messageContent.

func onLiveTranscriptionMsgReceived(_ messageInfo: ZoomVideoSDKLiveTranscriptionMessageInfo!) {
    print("MessageInfoContent is :\(messageInfo.messageContent)")
    print("MessageInfoType is :\(messageInfo.messageType)")
}
- (void)onLiveTranscriptionMsgReceived:(ZoomVideoSDKLiveTranscriptionMessageInfo *)messageInfo {
    NSLog(@"MessageInfoContent is : %@", [messageInfo messageContent]);
    NSLog(@"MessageInfoType is : %@", [messageInfo messageType]);
}

messageType returns either 1, 2, 3, or 4, which respectively refers to the following enums.

ValueEnumDescription
1ZoomVideoSDKLiveTranscriptionOperationType_AddRefers to adding words during recognition and translation, which occurs once, at the start of sentence detection.
2ZoomVideoSDKLiveTranscriptionOperationType_UpdateRefers to correcting (updating) words during recognition and translation, which typically occurs during sentence detection.
3ZoomVideoSDKLiveTranscriptionOperationType_DeleteRefers to correcting (deleting) words during recognition and translation, which typically happens during sentence detection.
4ZoomVideoSDKLiveTranscriptionOperationType_CompleteRefers to completing the recognition and translation, which happens once, at the end of sentence detection.

Stop translation

To stop translation set the language_id to -1.

liveTranscriptionHelper.setTranslationLanguage(-1)
[liveTranscriptionHelper setTranslationLanguage: -1];

Transcription operation type enums

typedef NS_ENUM(NSUInteger, ZoomVideoSDKLiveTranscriptionOperationType) {
ZoomVideoSDKLiveTranscriptionOperationType_None,
ZoomVideoSDKLiveTranscriptionOperationType_Add,
ZoomVideoSDKLiveTranscriptionOperationType_Update,
ZoomVideoSDKLiveTranscriptionOperationType_Delete,
ZoomVideoSDKLiveTranscriptionOperationType_Complete,
ZoomVideoSDKLiveTranscriptionOperationType_NotSupported,
ZoomVideoSDKLiveTranscriptionOperationType_NoTranslation,
};

LTT best practices

When implementing Live Transcription and Translation (LTT) for your integration, consider the following best practices:

  • If the feature is enabled for the session, provide a button to allow people to start closed captioning and select the spoken and translated languages.
  • Display only the supported languages you wish to offer, rather than presenting all available options for transcription and translation.
  • If the session won't include LTT, programmatically disable when the host starts the session. When someone joins the session, check if the feature is enabled by the host. If not, inform users.
  • Use an event listener to detect when the host has enabled captions. You can use this to notify people that the feature is active, programmatically render a button for starting transcription or translation, or both.
  • Set enableReceiveSpokenLanguageContent() to false if you don't want to receive the spoken language data.
  • Use an event listener to detect when the host disables captions. You can use this to notify people who have enabled the feature that it has been disabled.
  • Offer closed captioning customization options, such as font sizes and colors, to differentiate between transcription and translation texts, to enhance readability and follow accessibility standards.
  • Inform people of these best practices when speaking:
    • Minimize background noise, avoiding activities like shuffling papers, typing loudly, or engaging in side conversations.
    • Speak clearly into the microphone.
    • Position the microphone near active speakers.
    • Opt for an external microphone over a built-in one to improve sound quality.

More live translation features

For the full set of live transcription and translation features, see ZoomVideoSDKLiveTranscriptionHelper in the Zoom SDK for iOS.