Introduction

This tool helps you crawl documentation websites incrementally, extract their content, and create a search index in Upstash Search.

Usage

It is available both as a CLI tool and a library.

CLI Usage

You can run the CLI directly using npx (no installation required):
npx @upstash/search-crawler
Or with command-line options:
npx @upstash/search-crawler \
  --upstash-url "UPSTASH_SEARCH_REST_URL" \
  --upstash-token "UPSTASH_SEARCH_REST_TOKEN" \
  --index-name "my-index" \
  --doc-url "https://example.com/docs"
You will be prompted for any missing options:
  • Your Upstash Search URL
  • Your Upstash Search token
  • (Optional) Custom index name
  • The documentation URL to crawl

What the Tool Does

  1. Discover all internal documentation links
  2. Crawl each page and extract content
  3. Track new or obsolete data
  4. Upsert the new records into your Upstash Search index

Library Usage

You can also use this as a library in your own code:
import {
  crawlAndIndex,
  type CrawlerOptions,
  type CrawlerResult,
} from "@upstash/search-crawler";

const options: CrawlerOptions = {
  upstashUrl: "UPSTASH_SEARCH_REST_URL",
  upstashToken: "UPSTASH_SEARCH_REST_TOKEN",
  indexName: "my-docs",
  docUrl: "https://example.com/docs",
  silent: true, // no console output
};

const result: CrawlerResult = await crawlAndIndex(options);

Obtaining Upstash Credentials

  1. Go to your Upstash Console.
  2. Select your Search index. (See How to Create Search Index)
  3. Under the Details section, copy your UPSTASH_SEARCH_REST_URL and UPSTASH_SEARCH_REST_TOKEN.
    • --upstash-url corresponds to UPSTASH_SEARCH_REST_URL
    • --upstash-token corresponds to UPSTASH_SEARCH_REST_TOKEN

Further Reading

Try combining this tool with Qstash Schedule to keep your database up to date with docs. You may deploy your crawler on a server and call it on a schedule regularly to fetch updates in your docs. Check out our example project for implementation details: A modern documentation library to search and track the docs. For further insights, see @upstash/search-crawler