·8 min read

Building a RAG Chatbot for the Health Domain with Next.js

Elif Nur DenizElif Nur DenizSoftware Developer - Guest Author

Health Assistant Chatbot


In this tutorial, we'll dive into how we built a modern Health Assistant application using Next.js, Upstash, Vercel AI, Langchain and Open AI. The Health Assistant project exemplifies how Retrieval-Augmented Generation (RAG) can be utilized to train a chatbot for the health domain. Our goal is to create an interactive platform that uses AI to provide insights and advices to users for their health related questions.

Tech Stack

Getting Started

What is a RAG Chatbot?

Retrieval-Augmented Generation (RAG) is an AI model that optimizes its responses by including information retrieved from a knowledge base based on the user input. It has 2 main phases:

  1. Retrieval Phase: When a user asks a question, the chatbot first searches through a large database of previously stored knowledge bases to find relevant information. Vector databases provides the opportunity for similarity searches and retrieves related data.
  2. Generation Phase: The relevant information obtained during the retrieval phase is then fed into a generative AI model. Once the response is created, it is displayed to the user in real-time, using the streaming properties.

Data Collection And Storage

Data collection of this project is handled by Scrapy, an open-source and powerful web-crawling framework written in Python. We start by initializing a Scrapy project and customizing our spider based on the data source. The parse_page function in the spider collects selected sections' data, splits them into chunks, generates vector embeddings for them, and uploads those embeddings to the Upstash Vector Database.

To run the crawler locally, follow these steps:

  • Clone the repository: git clone https://github.com/YOUR_GITHUB_ACCOUNT/Health-Assistant-Chat-Bot.git
  • Create a .env file in the health_scraper folder as in the example. It will look like this:

If you don't already have an Upstash Vector Database, create one here and set 1536 as the vector dimensions, which is the one used by Open AI embeddings. Similarly, if you dont have an Open AI key, creare one here.

  • Then, run the crawlers in order of docker compose up collect links and docker compose up fetch_content. This will create a container running your crawler.

To customize your crawler, you can change the code segment that handles data extractions as in this example:

elements = response.xpath("//div[contains(@class, 'content-repository-content')]//p | //div[contains(@class, 'content-repository-content')]//li")```.
❗ Note: Running crawler may take time. To see the progress, you can check check the logs or monitor your vector database from your Upstash account. 
### Data Retrieval And Response Generation
#### Retrieval with Upstash Vector Database
The core of our data retrieval process utilizes Upstash's Vector Database, which allows for quick similarity searches of data vectors. When a user submits a query, our system retrieves the most relevant information by comparing the query's vector representation against our database of health-related vectors.
// app/vectorstore/UpstashVectorStore.js
async similaritySearchVectorWithScore(query, k, filter) {
const result = await this.index.query({
  vector: query,
  topK: k,
  includeVectors: false,
  includeMetadata: true,
const results = [];
for (let i = 0; i < result.length; i++) {
    new Document({
      pageContent: JSON.stringify(result[i]?.metadata) || "",
return results;
async maxMarginalRelevanceSearch(query, options) {
const queryEmbedding = await this.embeddings.embedQuery(query);
const result = await this.index.query({
  vector: queryEmbedding,
  topK: options.fetchK ?? 20,
  includeVectors: true,
  includeMetadata: true,
const embeddingList = result.map((r) => r.vector)
const mmrIndexes = maximalMarginalRelevance(
const topMmrMatches = mmrIndexes.map((idx) => result[idx]);
const results = [];
for (let i = 0; i < topMmrMatches.length; i++) {
    new Document({
      pageContent: JSON.stringify(topMmrMatches[i]?.metadata) || "",
return results;

The UpstashVectorStore class is used in api/route.tsx as a retriever.

const vectorstore = new UpstashVectorStore(new OpenAIEmbeddings());
const documents = await vectorstore.similaritySearch(currentMessageContent, 6);
const context = (documents.map((doc) => doc.pageContent)).join("\n");

Handling User And AI Messages

The frontend implementation for our Health Assistant chatbot uses the useChat hook from Vercel AI's SDK. The useChat hook is initialized with an api endpoint, which it uses to send and receive messages. initialMessages are pre-defined messages that appear when the chat interface loads. In this case, a welcoming message is set up to greet users. The onResponse callback function is triggered when a response is received from the backend after a user submits their query.

messages, input, handleInputChange, handleSubmit, and setInput are part of the state management utilities provided by useChat. These handle the input field changes, submit actions, and update the messages displayed in the chat interface.

// /app/page.tsx

Agent Template

You can determine the behaviour of your chatbot using an agent template. The template begins by defining the AI assistant's identity as HealthAssistant, emphasizing its purpose to provide systematic and data-driven health information and stay within the context. Where applicable, responses include URLs to sources for further reading and enhancing the credibility of the information. previousMessages are added to the template for increased sense of context.

// /app/api/route.tsx
     You are an artificial intelligence assistant named HealthAssistant, providing systematic and data-driven health information.
      Begin your answers with a greeting and end with a relevant health tip.
      Your responses should be precise and factual, with an emphasis on using the context provided and providing urls from the context all the time.
      Don't repeat yourself in responses, and if an answer is unavailable in the retrieved content, state that you don't know.
      Now, answer the message below:
      Based on the context below:
      And the previous messages:
      ${previousMessages.map((message: ChatMessage) => message.content).join("\n")}

Streaming Response

Streaming text functionality is crucial for maintaining a smooth and engaging user experience, especially when handling complex queries that require thoughtful and detailed responses. We implemented streaming responses using Vercel AI and OpenAI's capabilities. The streamText function from Vercel AI's SDK is used to initiate a streaming response. This function takes a model and a prompt, which is the AGENT_SYSTEM_TEMPLATE, as parameters. As the AI generates responses, they are immediately sent back to the user, simulating a natural and dynamic conversation flow.

import { streamText } from "ai";
import { openai } from '@ai-sdk/openai';
// ...
export async function POST(req: NextRequest) {
    try {
    const model = openai('gpt-4o-mini');
    // ...
    const result = await streamText({
        model: model,
        prompt: AGENT_SYSTEM_TEMPLATE,
    return result.toDataStreamResponse();
    } catch (e) {
        if (e instanceof Error) {
        } else {
        return NextResponse.json({ error: e instanceof Error ? e.message : String(e) }, { status: 500 });

Rate Limiting

Managing the flow of requests is crucial to maintain performance and prevent abuse. This is where Upstash rate limiting comes into play, ensuring that our resources are not overwhelmed by too many requests from a single user or IP address in a short period. We initialize a Redis client using Redis.fromEnv() with the UPSTASH_VECTOR_REST_URL and UPSTASH_VECTOR_REST_TOKEN provided in .env file. When a user hits the rate limit, a custom error message is sent back, politely informing them to try again later.

// /app/api/route.tsx
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";
const redis = Redis.fromEnv();
const ratelimit = new Ratelimit({
    redis: redis,
    limiter: Ratelimit.slidingWindow(1, "10 s"),
export async function POST(req: NextRequest) {
    try {
        const ip = req.headers.get("x-forwarded-for") ?? "";
        const { success } = await ratelimit.limit(ip);
        if (!success) {
            const customString =
                "Oops! It seems you've reached the rate limit. Please try again later.";
            return NextResponse.json({ error: customString }, { status: 429 });

Running Health Assistant Locally

The required packages can be installed by npm install command. After that, a .env file should be created in the root folder as in the example.

It will look like the following:


Once the variables are set, you can run the application by npm run dev command. And, done! You can start asking your questions to your chatbot from http://localhost:3000 address.

Deploying To Vercel

Deploying a Next.js application with Vercel is quite easy. By uploading you repository to Github and authorising Vercel, you can deploy your app easily.


In this tutorial, we explored the technologies behind our Health Assistant chatbot. With the integration of Next.js, Upstash, and Vercel AI with OpenAI's powerful models, we've built a robust and interactive platform that responds accurately to health-related inquiries and also enhances user engagement through real-time interactions and intelligent response generation.

We hope this guide is useful to enhance your own projects using similar technologies. The Health Assistant chatbot brings critical benefits to the health domain by letting users instantly access vital health information anytime and helping them make informed health decisions. With right adjustments made to the crawler, the same access can be provided in any domain.