Developing a picture-in-picture experience with Zoom Video SDK

Document Picture-in-Picture is a browser API that enables developers to create a picture-in-picture window that stays visible on top of other windows. At the time of writing, this API is supported on Chromium-based web browsers (Chrome, Edge) and Safari.

In this post, we'll use these APIs with the Zoom Video SDK to create a picture-in-picture experience for both participant and self view.

Motivation

The current picture-in-picture specification only supports <video> elements, meaning that if you want to implement a picture-in-picture experience outside of that context — for example, with <canvas> or another element — it isn't possible.

This is why Document Picture-in-Picture was developed. With it, a new, always-on-top window is opened on screen, giving the developer complete DOM access that can support a variety of use cases, including video conferencing, where users can multitask while still having access to view themselves or others in the meeting.

As Document Picture-in-Picture gives full access to the new window's DOM, you can even expand your application's functionality, giving users the ability to raise their hand, mute and unmute, or even chat with others, all without having them switch screens or tabs.

Feature detection

Before your application can use the Document Picture-in-Picture browser APIs, it will need to determine if they are currently available within the browser context.

To do this, check if documentPictureInPicture is present in the window global variable by writing a quick helper function, isDocumentPictureInPictureSupported(). If the function returns true, then your application can make full use of the document picture-in-picture APIs.

const isDocumentPictureInPictureSupported = () =>
    "documentPictureInPicture" in window;

Starting and handling picture-in-picture

Once your application has confirmed that it has access to documentPictureInPicture, you can now spawn a picture-in-picture window using the following code snippet, or similar:

const startPictureInPicture = async () => {
    if (
        isDocumentPictureInPictureSupported() &&
        !window.documentPictureInPicture?.window
    ) {
        const pipWindow = await window.documentPictureInPicture.requestWindow({
            height: 512,
            width: 512,
        });
        const thisDocument = window.document;
        const pipDocument = pipWindow.document;
        // Copy the video-player-container element to the PiP window
        const videoContainer = thisDocument.querySelector(
            "video-player-container",
        );
        pipDocument.body.appendChild(videoContainer);
        // Create a <p> element that tells the user the video has moved
        const movedElement = thisDocument.createElement("p");
        movedElement.classList.add("text-center");
        movedElement.id = "msg-moved-to-pip";
        movedElement.textContent =
            "Video has been moved to picture-in-picture window";
        thisDocument.body.appendChild(movedElement);
        pipWindow.addEventListener("pagehide", ({ target }) => {
            const parentWindow = window.document;
            const pipDocument = target;
            // Re-capture our video-player-container element inside the PiP window
            const videoContainer = pipDocument.querySelector(
                "video-player-container",
            );
            parentWindow
                .querySelector("p#msg-moved-to-pip")
                .replaceWith(videoContainer);
        });
    } else {
        console.error(
            "Document PiP is not supported, or a PiP window is already open",
        );
    }
};

In the snippet above, we check for isDocumentPictureInPictureSupported() (defined in the previous section), and ensure that there is currently not a picture-in-picture window already open.

Once the picture-in-picture window opens the video-player-container is then moved from the "parent" window to the picture-in-picture window, with a message presented to the user that the video player has moved.

Finally, when the pagehide event fires, meaning that the picture-in-picture window has been closed or hidden, the reverse order is executed, where the video-player-container element is then moved back to the "parent" window from the picture-in-picture window.

Conclusion

As shown above, when using a combination of the browser-native Document Picture-in-Picture API, in combination with Zoom Video SDK and individual HTML elements, it's straightforward to create a picture-in-picture experience for your users.

For further community discussion and insight from Zoom Developer Advocates and other developers, please check out the Zoom Developer Forum. For prioritized assistance and troubleshooting, take advantage of Premier Developer Support plans.