Use web sync to manage a knowledge base

You can use the Zoom Virtual Agent web sync on your existing knowledge base to update your chatbot to deliver accurate and relevant responses. Web sync is a web crawler. Web crawlers search and automatically index website content.

This topic provides details on how to use web sync to create a knowledge base, improve the accuracy of the web sync, and how to create scripts to optimize the content.

Before you begin

For information about Zoom Virtual Agent prerequisites, see Getting started with Zoom Virtual Agent.

Create a knowledge base with web sync

To provide current and accurate information for a website's knowledge base, web sync enables you to use one of these options:

Use a sitemap to sync the knowledge base
Use link discovery to provide a source URL and rules to guide the website system to sync relevant pages
Upload specific URLs to sync the knowledge base

For each of these options, you can use content selectors or custom scripts to refine the content you want to extract for your knowledge base. They can target specific parts of the page you want to use, such as titles, URLs, or specific content for knowledge base articles.

Web sync is appropriate when a Zoom integration is unavailable or when only a limited number of web pages exist to create a knowledge base.

Sitemap

A sitemap contains all the information about the content on your website. To sync your knowledge base using a sitemap, you can enter a specific URL in the Zoom crawler to find a sitemap, or you can directly enter a sitemap address. After the crawler finds the sitemap, you can select which URLs to include in the sync.

Using a sitemap to sync your knowledge base is a good option for organizations that:

Maintain a semantically and hierarchically sound sitemap
Use a content management system without a Zoom Integration

Link discovery

Link discovery initiates a crawl with the URL of a specific page. Then the crawl follows links on the starting page that share the same directories and sub directories. You must set the URL conditions that determine which pages to store as knowledge base articles.

Using link discovery to sync your relevant pages is useful for organizations when:

All articles link from one page
The content management system doesn't have a Zoom Integration
A sitemap doesn't exist or if the sitemap is incomplete

Manual URL upload

The simplest way to sync your knowledge base is to manually upload the URLs you want to crawl and store as knowledge base articles. You can directly enter the URLs or you can add them to a .csv file. Note that if you change the URLs, the sync breaks and you must update the URLs or add new URLs.

Using manual URL upload works best for organizations that offer a single page FAQ.

For more information about how to use web sync to create a knowledge base, see Creating a knowledge base through web sync.

Improve the accuracy of web sync

You can use content selectors or custom scripts to refine the content you want for your knowledge base.

Content selectors

A basic function of Zoom web sync is web crawling for pages that have content for the knowledge base articles. You can use content selectors to precisely target parts of the page you want to use.

General content selectors include:

Page title trimmer

Removes leading or trailing words from the title.
Content selector

CSS selector that identifies the article content. For example, enter .article-content as the selector.
Ignore selector

CSS Selector to identify items to remove such as author and dates.
Dismiss click selector

Only available when Javascript support is enabled. CSS Selector identifies items to click to dismiss pop ups, overlays, cookie consent, and so on.
Dismiss click delay

Only available when Javascript support is enabled. The amount of time to wait after the dismiss click.
Page load delay

Only available when Javascript support is enabled. The amount of time it takes for a web page to fully load and become interactive after a user initiates a request to access it.

Some web pages require JavaScript to load and need JavaScript enablement to select the content. Enabling JavaScript support allows the crawler to interpret and execute JavaScript code when you access and process web content.

Note that enabling JavaScript causes the sync to be slower. So we recommend you only enable JavaScript support if it's necessary.

Javascript click simulation

You can use JavaScript click simulation for page navigation or vertical accordions.

Use JavaScript click simulation when your site uses a JavaScript powered navigation with the tabs or categories.

You are not required to use JavaScript click simulation with a site that has tabs. Those could be full page loads you don't need.

Navigation

If the page uses tabs, categories, menus, or other secondary navigation, you can use these options to reveal additional content on the same page.

Navigation click selector

Only available when Javascript support is enabled. CSS Selector identifies items to click such as tabs, categories, menus, and so on.
Navigation click delay

Only available when Javascript support is enabled. The amount of time to wait after the navigation click.

Accordion

JavaScript Accordion is a container control with vertically collapsible panels (or vertical accordion). It contains stacked headers that expand or collapse one or more panels at a time in an available space.

If the page uses accordions, such as some FAQ formats, you can use these options to reveal additional content on the same page.

Accordion content extraction

Only available when Javascript support is enabled. Choose how you want the page to be extracted as an article. Each accordion can be extracted as an article or the entire page can be extracted as an article.
Accordion click selector

Only available when Javascript support is enabled. CSS Selector identifies items to click to expand and close accordions.
Accordion click delay

Only available when Javascript support is enabled. The amount of time to wait after each accordion click.
Accordion behavior

Specify whether one or multiple items can be expanded at the same time.
Accordion title selector

Only available when Javascript support is enabled. CSS Selector to identify a custom title to replace the page title.

Custom scripts

Use custom scripts to improve your knowledge base content.

You can enable custom scripts in the advanced knowledge base settings. You can use a custom JavaScript function to customize most article fields, such as title, content, URL, tags, and category.

To edit your custom script:

In AI Management, navigate to Knowledge Base.
Choose the Knowledge base you would like to update.
Select Settings - Advanced - Custom Script. Then select Edit.
After you complete your edits, choose Save.

This section describes the various ways you can create and implement custom scripts, as well as example custom scripts.

Custom JavaScript

You can use a custom JavaScript function to customize the content or other article details, such as title and URL. The main function executes on each article.

The argument is an article object with the following properties:

{
  "content": string,      // HTML contents
  "url": string,          // full URL
  "title": string,        // title
  "language": string,     // language code (e.g. en, en-US, es)
  "tags": [               // array of tag objects:
    {
      "id": string,       // tag ID
      "name": string,     // tag name
    },
    {
      // ...
    }
  ],
  "category": {           // array of category hierarchy objects (root is first):
    "id": string,         // category ID
    "name": string,       // category name
  }.
  "externalId": string    // external ID or URL of the source
}

The return value is an object with the updated article properties. You can omit unchanged properties.

You can return the value null to prevent the article from being synced. You can also return an array of objects if the source should be split into multiple articles.

Examples

Debug with the console log

This script enables you to perform basic debugging tasks using the console log.

function main({ content, url, title }) {
    console.log("hello");
}

Modify HTML contents

This script enables you to modify the HTML contents of an element.

function main({ content, url, title }) {
    // load content into cheerio (https://cheerio.js.org)
    const $ = cheerio.load(content);
    $("#example-id1").html("<p>example new content</p>");
    return {
        // return modified contents from cheerio
        content: $.html(),
    };
}

Remove elements

This script enables you to remove specific elements.

function main({ content, url, title }) {
    // load content into cheerio (https://cheerio.js.org)
    const $ = cheerio.load(content);
    $("#example-id2,.example-class").remove();
    return {
        // return modified contents from cheerio
        content: $.html(),
    };
}

Add new elements

This script enables you to add a new element.

function main({ content, url, title }) {
    // load content into cheerio (https://cheerio.js.org)
    const $ = cheerio.load(content);
    $.root().append("<p>extra example content</p>");
    return {
        // return modified contents from cheerio
        content: $.html(),
    };
}

Customize titles

We provide a title trimmer feature. However, sometimes the default single match and the remove pattern is not enough. You can use the custom script to create a more advanced custom title trimmer, as seen in the example below.

function main({ title }) {
    return {
        title: title.replace("Example Inc | ", "").replace(" | Support", ""),
    };
}

When a website supports multiple languages, you might need a different title trimmer based on the article language, as seen in the following example:

function main({ title, language }) {
    return {
        title: title.replace(" | Support", "").replace(" | Soporte", ""),
    };
    // const trimmers = {
    //   'en': ' | Support',
    //   'es': ' | Soporte'
    // };
    // return {
    //   title: title.replace(trimmers[language], '')
    // };
}

Replace the hostname in the URL

This script enables you to replace the hostname in the URL.

function main({ content, url, title }) {
    return {
        url: url.replace("old.example.com", "new.example.com"),
    };
}

Other custom scripts

Custom scripts are available on any knowledge base type, but they are most useful for web knowledge bases. Note that scripts cannot access sensitive information.

Here is an example of a basic custom script.

As you implement this script, be aware of the following:

Must define a function named main
Pass the input article object as the only argument
View web preview or sync log in the console.log output
Get the imported cheerio library (jQuery subset) to use
Return an object with updated fields

function main(article) {
    console.log("input article", article);
    const { content, url, title } = article;
    const $ = cheerio.load(content);
    // modify some HTML elements
    $("h1").text("My Example Domain");
    $("a").remove();
    $("body").append("<p>extra content</p>");
    return {
        // return modified contents from cheerio
        content: $.html(),
        // content: 'Hello World!',
        // customize other fields
        title: title + "!",
        url: url.replace("www", "foo"),
    };
}

Extracting table content with a custom script

Zoom Virtual Agent does not maintain table elements after a sync. But you can use a custom script to extract, modify, and present table content. Each table is different, such as those with column headers or row headers. You could opt to include them or not. Here are some examples of different table setups and the custom script that can help modify them.

Extract table cells individually

function main({ content }) {
    const $ = cheerio.load(content);
    $("table").each(function (i, tableEl) {
        const $table = $(tableEl);
        const divs = [];
        $table.find("td").each(function (j, tdEl) {
            const cellHtml = $(tdEl).html();
            divs.push($("</div>").html(cellHtml));
        });
        $table.replaceWith(divs);
    });
    return {
        content: $.html(),
    };
}

Extract table rows with headers

function main({ content }) {
    const $ = cheerio.load(content);
    $("table").each(function (i, tableEl) {
        const $table = $(tableEl);
        const headers = $table
            .find("thead th")
            .map((j, thEl) => {
                return $(thEl).text();
            })
            .toArray();
        const divs = [];
        $table.find("tbody tr").each(function (j, trEl) {
            const prefixedCells = $(trEl)
                .find("td")
                .map((k, tdEl) => {
                    const cellText = $(tdEl).text();
                    return `${headers[k]}: ${cellText}`;
                })
                .toArray();
            divs.push($("<div>").html(prefixedCells.join(", ")));
        });
        $table.replaceWith(divs);
    });
    return {
        content: $.html(),
    };
}

Setting categories from breadcrumbs with a custom script

We support the extraction of article categories from the URL path structure by default. But URL paths are often short abbreviations. They might not be the desired category names to use. Often the website has a hierarchical breadcrumb structure, which represents the article's category structure. You can use the custom script to extract the breadcrumbs as the article category tree. Here is an example.

function main({ content }) {
    const $ = cheerio.load(content);
    const names = $("#breadcrumb li")
        .toArray()
        .map((el) => $(el).text());
    const category = names.map((c) => ({ id: c, name: c }));
    // category = [
    //   { id: 'Support',        name: 'Support' },
    //   { id: 'Account',        name: 'Account' },
    //   { id: 'Reset Password', name: 'Reset Password' }
    // ]
    return {
        content: $("p").html(),
        category,
    };
}

Extracting tags with a custom script

Since there is not one standard structure for how to tag articles, we don't extract any tags by default for web sync knowledge base types. You can display tags at the beginning or end of the article. You can also define tags in the article's breadcrumb structure (similar to categories), or hide the tags defined in the HTML head element. Regardless, you can use the custom script to extract tags. Here is an example.

function main({ content }) {
    const $ = cheerio.load(content);
    const tagNames = $(".tags").text().replace("Tags: ", "").split(/,\s*/);
    tags = tagNames.map((t) => ({ id: t, name: t }));
    // tags = [
    //   { id: 'account', name: 'account' },
    //   { id: 'how-to', name: 'how-to' },
    //   { id: 'security', name: 'security' }
    // ]
    return {
        content: $(".content").html(),
        tags,
    };
}

Splitting a webpage into multiple articles with a custom script

Some FAQ websites list all question and answer items on a single page. By default, we consider the entire page as a single article, which works if that is your intention. However, you could use the custom script to split the webpage question and answer items into multiple articles. This action is not required, but it provides better analytics, the option to parse categories for each item, and more. Here is an example.

// https://shop.omzo.net/faq.html
// (requires JavaScript and a page load delay)
function main({ content, url, title }) {
    const $ = cheerio.load(content);
    const articles = [];
    $("#search-results > div").each(function (i, el) {
        const title = $(el).find("h3").text();
        const content = $(el).find("p").eq(0).html();
        const category = $(el).find("p").eq(1).text().replace("Category: ", "");
        articles.push({
            externalId: title,
            title,
            content,
            category: [{ id: category, name: category }],
        });
    });
    return articles;
}

Changing articles URLs with a custom script

Some cases exist when you might want to modify the article URLs to differ from what is fetched during the sync.

Two common examples include:

You want the base URL to be different from what is crawled.

function main() {
    return {
        url: url.replace("zingy-nougat-8dbda7.netlify.app", "example.com"),
    };
}

You want to turn off URLs (i.e. you only want the content to be available through the bot).

function main() {
    return {
        url: null,
    };
}