Build a RAG Chatbot with Next.js: Step-by-Step Developer Guide

DDevWithAI Team

Jun 21, 202615 min read

Want to build a chatbot that actually knows your data? This guide walks through building a full RAG chatbot with Next.js, OpenAI, and a vector database from scratch.

Standard AI chatbots are great for general conversations, but they have one major weakness — they don't know anything about your specific data.

That's where Retrieval-Augmented Generation (RAG) comes in.

A RAG chatbot can answer questions about your own documents, product knowledge base, or internal company data by fetching the most relevant content before generating a response.

In this guide, you'll build a fully working RAG chatbot using Next.js, OpenAI, and a vector database — step by step.

What Is RAG?

RAG stands for Retrieval-Augmented Generation.

Instead of relying purely on what the LLM learned during training, a RAG system:

Converts your documents into vector embeddings
Stores them in a vector database
At query time, retrieves the most relevant chunks
Passes those chunks to the LLM as context
Returns a grounded, accurate response

This means the chatbot answers questions based on your real data, not hallucinated guesses.

For a deeper conceptual explanation, read RAG Explained for Developers.

What You'll Build

By the end of this guide you'll have:

A Next.js app with a chat UI
An API route that performs vector search
Document ingestion that creates embeddings
Streaming AI responses powered by OpenAI

Prerequisites

Node.js 18+ installed
Basic React and Next.js knowledge
An OpenAI API key
Familiarity with REST APIs

Tech Stack

Layer	Tool
Framework	Next.js 14+ (App Router)
AI Model	OpenAI GPT-4o
Embeddings	OpenAI text-embedding-3-small
Vector Store	Chroma (local)
Styling	Tailwind CSS

Step 1: Create the Next.js Project

bash

npx create-next-app@latest rag-chatbot
cd rag-chatbot

When prompted, select:

TypeScript: No (or Yes if preferred)
Tailwind CSS: Yes
App Router: Yes

Step 2: Install Dependencies

bash

npm install openai chromadb @xenova/transformers

openai — OpenAI SDK for chat completions and embeddings
chromadb — local vector database client
@xenova/transformers — optional local embedding fallback

Step 3: Set Up Environment Variables

Create a .env.local file at the root of your project:

bash

OPENAI_API_KEY=your_openai_api_key_here

Never commit this file to git.

Step 4: Create the Document Ingestion Script

This script reads your documents, generates embeddings, and stores them in Chroma.

Create scripts/ingest.js:

javascript

const { ChromaClient } = require("chromadb");
const OpenAI = require("openai");
const fs = require("fs");

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const chroma = new ChromaClient();

const documents = [
  {
    id: "doc1",
    text: "Our return policy allows returns within 30 days of purchase.",
  },
  {
    id: "doc2",
    text: "We offer free shipping on orders over $50 to the continental US.",
  },
  {
    id: "doc3",
    text: "Customer support is available Monday to Friday, 9am to 6pm EST.",
  },
];

async function ingest() {
  const collection = await chroma.getOrCreateCollection({
    name: "knowledge_base",
  });

  for (const doc of documents) {
    const response = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: doc.text,
    });

    await collection.add({
      ids: [doc.id],
      embeddings: [response.data[0].embedding],
      documents: [doc.text],
    });

    console.log(`Ingested: ${doc.id}`);
  }

  console.log("Ingestion complete.");
}

ingest();

Run it once to populate your vector store:

bash

node scripts/ingest.js

Replace the documents array with your own content — PDFs, markdown files, database records, or any text you want the chatbot to know about.

Step 5: Create the Chat API Route

Create src/app/api/chat/route.js:

javascript

import { NextResponse } from "next/server";
import OpenAI from "openai";
import { ChromaClient } from "chromadb";

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const chroma = new ChromaClient();

export async function POST(req) {
  const { message } = await req.json();

  // 1. Embed the user's question
  const embeddingResponse = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: message,
  });
  const queryEmbedding = embeddingResponse.data[0].embedding;

  // 2. Retrieve the most relevant documents
  const collection = await chroma.getCollection({ name: "knowledge_base" });
  const results = await collection.query({
    queryEmbeddings: [queryEmbedding],
    nResults: 3,
  });

  const context = results.documents[0].join("\n\n");

  // 3. Build the prompt with retrieved context
  const systemPrompt = `You are a helpful assistant. Answer questions using only the context provided below. If the answer is not in the context, say you don't know.

Context:
${context}`;

  // 4. Stream the response
  const stream = await openai.chat.completions.create({
    model: "gpt-4o",
    stream: true,
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: message },
    ],
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        const text = chunk.choices[0]?.delta?.content || "";
        controller.enqueue(encoder.encode(text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { "Content-Type": "text/plain; charset=utf-8" },
  });
}

Step 6: Build the Chat UI

Replace the contents of src/app/page.jsx:

jsx

"use client";

import { useState, useRef, useEffect } from "react";

export default function ChatPage() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState("");
  const [loading, setLoading] = useState(false);
  const bottomRef = useRef(null);

  useEffect(() => {
    bottomRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  async function sendMessage(e) {
    e.preventDefault();
    if (!input.trim()) return;

    const userMessage = { role: "user", content: input };
    setMessages((prev) => [...prev, userMessage]);
    setInput("");
    setLoading(true);

    const res = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message: input }),
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();
    let assistantText = "";

    setMessages((prev) => [...prev, { role: "assistant", content: "" }]);

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      assistantText += decoder.decode(value);
      setMessages((prev) => {
        const updated = [...prev];
        updated[updated.length - 1] = {
          role: "assistant",
          content: assistantText,
        };
        return updated;
      });
    }

    setLoading(false);
  }

  return (
    <main className="max-w-2xl mx-auto p-6 flex flex-col h-screen">
      <h1 className="text-2xl font-bold mb-4">RAG Chatbot</h1>

      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((msg, i) => (
          <div
            key={i}
            className={`p-3 rounded-lg ${
              msg.role === "user"
                ? "bg-blue-100 ml-8"
                : "bg-gray-100 mr-8"
            }`}
          >
            <p className="text-sm font-semibold mb-1 capitalize">{msg.role}</p>
            <p>{msg.content}</p>
          </div>
        ))}
        <div ref={bottomRef} />
      </div>

      <form onSubmit={sendMessage} className="flex gap-2">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Ask a question about your data..."
          className="flex-1 border rounded-lg px-4 py-2 focus:outline-none"
          disabled={loading}
        />
        <button
          type="submit"
          disabled={loading}
          className="bg-blue-600 text-white px-4 py-2 rounded-lg hover:bg-blue-700 disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </main>
  );
}

Step 7: Run the App

Start Chroma in a separate terminal:

bash

chroma run --path ./chroma_db

Then start Next.js:

bash

npm run dev

Open http://localhost:3000 and ask your chatbot a question about the documents you ingested.

How the RAG Pipeline Works

text

User Question
     ↓
Embed the question (OpenAI)
     ↓
Search vector database (Chroma)
     ↓
Retrieve top 3 relevant chunks
     ↓
Build prompt with context
     ↓
Stream GPT-4o response
     ↓
Display to user

Ingesting Real Documents

To ingest PDF or markdown files instead of hardcoded text, update your ingestion script to use a document loader.

For PDFs:

bash

npm install pdf-parse

javascript

const pdfParse = require("pdf-parse");
const fs = require("fs");

const buffer = fs.readFileSync("./docs/handbook.pdf");
const data = await pdfParse(buffer);
const text = data.text;

Then split text into chunks (e.g. 500 characters each) and embed each chunk separately. Smaller chunks give more precise retrieval results.

RAG vs Fine-Tuning

Approach	Best For	Cost	Updates
RAG	Frequently changing data	Low	Instant
Fine-tuning	Style and behavior	High	Requires retraining

For most real-world applications, RAG is the right choice because your data changes often and fine-tuning doesn't handle retrieval at all.

Common Issues

Chroma not connecting — make sure the Chroma server is running before starting Next.js.

Empty context returned — run your ingestion script first. If still empty, check your collection name matches exactly.

Slow responses — reduce nResults in the query, or switch to a smaller embedding model.

Frequently Asked Questions

Can I use a different vector database?

Yes. Pinecone, Weaviate, and Qdrant are popular hosted alternatives. The ingestion and query logic is similar — only the client library changes.

Can I use a different LLM?

Yes. Swap gpt-4o for any model that supports the OpenAI chat completions format, including open-source models via Ollama.

How do I handle large documents?

Split them into chunks of 300–500 tokens before embedding. Each chunk becomes a separate document in the vector store.

Is this production ready?

The pattern is production ready. For scale, replace local Chroma with a hosted vector database and add authentication to your API route.

Final Thoughts

RAG is one of the most practical AI patterns for real-world applications.

By combining a vector database with an LLM, you get a chatbot that answers accurately from your own data — without retraining, without hallucinations, and without giving the model access to your entire knowledge base upfront.

Once you understand the pipeline, you can extend it with conversation history, multi-document sources, re-ranking, and more.

Start with the basic setup in this guide, then build from there.

◈ More Like This

Best AI Agent Frameworks in 2026: LangGraph vs CrewAI vs AutoGen

Looking to build AI agents? This guide compares LangGraph, CrewAI, AutoGen, and other leading AI agent frameworks to help developers choose the right solution.

Jun 21, 2026·12 min read

Build an AI Chatbot with Next.js: Complete Developer Guide

Want to build your own AI chatbot? This guide walks through creating an AI-powered chatbot with Next.js, React, and modern AI APIs.

Jun 21, 2026·15 min read

LangChain Tutorial for Developers: Build AI Applications with LangChain

LangChain is one of the most popular frameworks for building AI applications. Learn how developers use LangChain for chatbots, RAG systems, and AI agents.

DevWithAI Editorial

Read Article

Build a RAG Chatbot with Next.js: Step-by-Step Developer Guide

What Is RAG?

What You'll Build

Prerequisites

Tech Stack

Step 1: Create the Next.js Project

Step 2: Install Dependencies

Step 3: Set Up Environment Variables

Step 4: Create the Document Ingestion Script

Step 5: Create the Chat API Route

Step 6: Build the Chat UI

Step 7: Run the App

How the RAG Pipeline Works

Ingesting Real Documents

RAG vs Fine-Tuning

Common Issues

Frequently Asked Questions

Can I use a different vector database?

Can I use a different LLM?

How do I handle large documents?

Is this production ready?

Further Reading

Final Thoughts

Related Articles

Best AI Agent Frameworks in 2026: LangGraph vs CrewAI vs AutoGen

Build an AI Chatbot with Next.js: Complete Developer Guide

LangChain Tutorial for Developers: Build AI Applications with LangChain