# Dynamic audio processing: how to add real time audio effects to your Zoom Video SDK app With the release of [Zoom Video SDK 2.1.5](/changelog/video-sdk/web/2.1.5#added), we've added support for media processors. This allows you to modify a user's audio, video or screen share feed before it is sent to remote users. In this blog post, we'll show you how to use an audio processor to dynamically change the pitch of the user's voice in real-time. ## Prerequisites - Node & NPM LTS - A Zoom Video SDK Account We'll build on top of the [Zoom Video SDK quickstart guide](/blog/build-a-video-conferencing-app-with-the-zoom-video-sdk). If you're new to the SDK, we recommend checking out the quickstart guide first. You can clone that repo and follow the steps to get started: ```bash git clone https://github.com/zoom/videosdk-web-helloworld ``` The completed code for this guide is available on [GitHub](https://github.com/zoom/videosdk-web-audioprocessor-quickstart). ## Media processors The media processor design is inspired by the [AudioWorklet](https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet) API. The processor runs within the `AudioWorkletGlobalScope` to enhance performance. To define custom audio processing logic, we'll create an audio processor to increase the pitch of the user's voice. ## How does pitch shifting work? The pitch of an audio track is determined by the frequency of its sound waves — essentially, how many times the wave oscillates per second. Increasing the frequency raises the pitch, while lowering it makes the sound deeper. One simple way to raise the pitch is by speeding up playback. When you play audio faster, the sound waves oscillate more times per second, making the voice sound higher — like the classic "chipmunk effect." Here's a simplified explanation of some of the audio jargon: - **Frequency**: The number of times the sound wave oscillates per second. - **Pitch**: How "high" or "low" a sound is, determined by its frequency. - **Audio Sample**: A single value that represents the amplitude of the sound wave at a specific point in time. - **Buffer**: A fixed-size array that stores audio samples. - **Sample Rate**: The number of audio samples captured or played per second. ## Step 1: Create a pitch shift audio processor To define an audio processor, we'll create a new file `public/pitch-processor.js`. We'll define a `PitchShiftProcessor` class that extends the `AudioProcessor` interface. 1. The processor will input audio samples and store them in a circular buffer. 2. We can read the audio values faster than the user's sample rate to increase the pitch. 3. We'll pass these values through a filter to remove unwanted low sounds i.e. noise. 4. We then mix the filtered and original audio together based on the `dryWet` ratio. 5. We output the modified audio samples. ### **`constructor`** The `constructor` initializes the processor and sets up the circular buffer for pitch shifting. We initialize various buffer positions and timing parameters: ```js class PitchShiftProcessor extends AudioProcessor { constructor(port, options) { super(port, options); this.bufferSize = 11025; this.buffer = new Float32Array(this.bufferSize); this.writePos = 0; this.readPos = 0.0; this.pitchRatio = 1.5; this.dryWet = 0.7; this.hpf = { prevIn: 0, prevOut: 0, alpha: 0.86 }; } ... } ``` ### **`process`** The `process` function is called for every audio buffer. This is the main entry point where we handle the audio processing pipeline. We define the input and output audio channels from the inputs array. We check if the input channel is empty to return early. We read the input channel and write it to the circular buffer. ```js class PitchShiftProcessor extends AudioProcessor { ... process(inputs, outputs) { const input = inputs[0]; const output = outputs[0]; if (input.length === 0 || !input[0]) return true const inputChannel = input[0]; const outputChannel = output[0]; for (let i = 0; i < inputChannel.length; i++) { this.buffer[this.writePos] = inputChannel[i]; this.writePos = (this.writePos + 1) % this.bufferSize; } ``` Next, we read from the circular buffer at a different rate to achieve pitch shifting. The variable `raw` is calculated using linear interpolation between the current and the next buffer. This helps us to get a smoother transition. ```js process(inputs, outputs) { ... for (let i = 0; i < outputChannel.length; i++) { let readPos = this.readPos % this.bufferSize; if (readPos < 0) readPos += this.bufferSize; const intPos = Math.floor(readPos); const frac = readPos - intPos; const nextPos = (intPos + 1) % this.bufferSize; const raw = this.buffer[intPos] * (1 - frac) + this.buffer[nextPos] * frac; ``` We use a filter to remove unwanted low sounds. We blend the filtered and original audio together based on the `dryWet` ratio and send it to the `outputChannel`. ```js const filtered = raw - this.hpf.prevIn + this.hpf.alpha * this.hpf.prevOut; this.hpf.prevIn = raw; this.hpf.prevOut = filtered; outputChannel[i] = filtered * this.dryWet + raw * (1 - this.dryWet); ``` We move the reading point forward by the set pitch ratio. If the reading point goes too far, it starts over from the beginning. ```js this.readPos += this.pitchRatio; if (this.readPos >= this.bufferSize) { this.readPos -= this.bufferSize; this.writePos = 0; } } return true; } } ``` We also have `onInit` and `onUninit` functions that are triggered when the processor initializes or shuts down. You can use these to allocate and release resources. Now that we've defined the processor class, we need to register it with the SDK. This is done by calling the `registerProcessor` function with the processor name and the processor class: ```js class PitchShiftProcessor extends AudioProcessor { ... } registerProcessor('pitch-shift-audio-processor', PitchShiftProcessor); ``` ## Step 3: Add the media processor to the Video SDK To use the audio processor script within the Video SDK. In `main.ts` we check if the browser has support for audio processors using the `isSupportAudioProcessor` method on the `mediaStream`: ```ts const startCall = async () => { ... const client = ZoomVideo.createClient(); const mediaStream = client.getMediaStream(); if (!mediaStream.isSupportAudioProcessor()) { alert("Your browser does not support audio processor"); } ``` We can then create a processor instance by calling the `createProcessor` method on the `mediaStream`: ```ts const processor = await mediaStream.createProcessor({ name: "pitch-shift-audio-processor", type: "audio", url: window.location.origin + "/pitch-processor.js", }); ``` We'll pass in a `name` for the processor and the `type` of the processor. The `url` specifies the script location; it must originate from the same domain or have the appropriate CORS headers. We can add the processor to the audio stream pipeline using the `addProcessor` method. You can perform this operation before or after starting the audio. ```ts await mediaStream.addProcessor(processor); ``` This changes the pitch of the user's voice in real-time, making it higher pitched. The pitch change is audible to all other remote participants as well. That's all the code you need to get basic pitch shifting working. ## Next steps Audio processors are extremely powerful for audio processing and modification. You could build processors for **voice effects**: Add reverb, echo, or distortion, **voice masking**: Implement voice anonymization, and **audio enhancement**: Noise reduction or audio quality improvement. ## Conclusion With just a few lines of code, you can create powerful custom audio processors with Zoom Video SDK. Beyond pitch shifting, you can experiment with voice effects, real-time audio analysis, or even voice synthesis. To dive deeper, check out our [raw-data documentation](/docs/video-sdk/web/raw-data) and explore the [sample processor repo](https://github.com/zoom/videosdk-web-processor-sample/tree/main) for more inspiration.