Live translation

Zoom live transcription and translation (LTT) translates speech from one language in real time to text in another language. This can power use cases like auto closed captioning, sentiment analysis, and language translation for e-learning.

For example, one user can say "Hello world" in English, and the other users can receive this speech, as text, in the language of their choice, like Italian "Ciao mondo", Spanish "Hola Mundo", and French "Bonjour le monde".

Initialize LTT

After joining a session, call video_sdk_obj->getLiveTranscriptionHelper() to get the live translation helper.

IZoomVideoSDKLiveTranscriptionHelper* m_ltthelper = video_sdk_obj->getSessionInfo()->getMyself()->getLiveTranscriptionHelper();

Supported languages

We are continuously adding to our supported languages for translation. The following code samples demonstrate how you can check on the list of supported spoken language and translated languages.

List supported translation languages

//iterate and print out language ID and language Name of supported translation languages
IVideoSDKVector<ILiveTranscriptionLanguage*>* availableTranslateLanguages = m_ltthelper->getAvailableTranslationLanguages();
if (availableTranslateLanguages) {
    for (size_t i = 0; i < availableTranslateLanguages->GetCount(); ++i) {
        ILiveTranscriptionLanguage* language = availableTranslateLanguages->GetItem(i);
        if (language) {
            printf("Translate Language ID: %d\n", language->getLTTLanguageID());
            printf("Translate Language Name: %s\n", language->getLTTLanguageName());
            // Print other properties as needed
            printf("---------------------\n");
        }
    }
}

Set spoken language

Typically, you might set the spoken language using setSpokenLanguage(language_id) before starting live translation. If you do not set it, the spoken language defaults to "English" (0).

m_ltthelper->setSpokenLanguage(0);

Start translation

Translation is part of the transcription service. To start translation, call setTranslationLanguage() before you start, or during, translation. If you want to change your translation language during translation, you can also use setTranslationLanguage().

First specify the language you are speaking in with setSpokenLanguage(). Then call setTranslationLanguage() to set the translated language.

// Set the spoken language and translation language before starting transcription services
m_ltthelper->setSpokenLanguage(language_id);
m_ltthelper->setTranslationLanguage(language_id);
ZoomVideoSDKErrors err= m_ltthelper->startLiveTranscription();
// Optional: Change to a different translation language midway
m_ltthelper->setTranslationLanguage(different_language_id);

Receive translation

To receive translated speech text, handle the callback in the onLiveTranscriptionMsgInfoReceived event listener.

The following code sample uses getMessageType() to retrieve an enum to determine the type of message returned by getMessageContent().

virtual void onLiveTranscriptionMsgInfoReceived(ILiveTranscriptionMessageInfo* messageInfo) {
    printf("MessageInfoContent is : %s\n",messageInfo->getMessageContent());
    printf("MessageInfoType is : %d\n",messageInfo->getMessageType());
};

getMessageType() returns either 1, 2, 3, or 4, which respectively refers to the following enums.

ValueEnumDescription
1ZoomVideoSDKLiveTranscription_OperationType_AddRefers to adding words during recognition and translation, which occurs once, at the start of sentence detection.
2ZoomVideoSDKLiveTranscription_OperationType_UpdateRefers to correcting (updating) words during recognition and translation, which typically occurs during sentence detection.
3ZoomVideoSDKLiveTranscription_OperationType_DeleteRefers to correcting (deleting) words during recognition and translation, which typically happens during sentence detection.
4ZoomVideoSDKLiveTranscription_OperationType_CompleteRefers to completing the recognition and translation, which happens once, at the end of sentence detection.

Stop translation

To stop translation set the language_id to -1.

m_ltthelper->setSpokenLanguage(-1);

LTT best practices

When implementing Live Transcription and Translation (LTT) for your integration, consider the following best practices:

  • If the feature is enabled for the session, provide a button to allow people to start closed captioning and select the spoken and translated languages.
  • Display only the supported languages you wish to offer, rather than presenting all available options for transcription and translation.
  • If the session won't include LTT, programmatically disable when the host starts the session. When someone joins the session, check if the feature is enabled by the host. If not, inform users.
  • Use an event listener to detect when the host has enabled captions. You can use this to notify people that the feature is active, programmatically render a button for starting transcription or translation, or both.
  • Set enableReceiveSpokenLanguageContent() to false if you don't want to receive the spoken language data.
  • Use an event listener to detect when the host disables captions. You can use this to notify people who have enabled the feature that it has been disabled.
  • Offer closed captioning customization options, such as font sizes and colors, to differentiate between transcription and translation texts, to enhance readability and follow accessibility standards.
  • Inform people of these best practices when speaking:
    • Minimize background noise, avoiding activities like shuffling papers, typing loudly, or engaging in side conversations.
    • Speak clearly into the microphone.
    • Position the microphone near active speakers.
    • Opt for an external microphone over a built-in one to improve sound quality.

More live translation features

For the full set of live transcription and translation features, see IZoomVideoSDKLiveTranscriptionHelper in the Zoom Video SDK for Linux.