How to Build Customer Support AI Agent In NodeJS with Memory

Your AI Agent Has Amnesia. ByteRover Fixes It — Without Vectors

Let's see how to built a customer support agent that remembers everything — using plain markdown files, BM25 search, zero embeddings, and pure vectorless approach

01 — The Problem

RAG promised AI memory. It lied.

We've seen building RAG systems the same way for years. Chunk your docs. Embed them into vectors. Store in a vector database. At query time, compute similarity and retrieve the top-k chunks. Feed to LLM. Hope for the best.

It works. Until it doesn't. And when it stops working, you have no idea why.

I've built RAG pipelines with ChromaDB, Pinecone, and Weaviate. I've implemented HyDE, reranking, query rewriting, MMR, and hybrid search. Every optimisation added complexity and every deployment added new failure modes I couldn't debug. The core problem was never the retrieval algorithm — it was that the knowledge store was a black box.

You cannot cat an embedding vector and understand what your agent knows. You cannot open Pinecone in VS Code and read what it stored. When retrieval fails silently, you are left guessing.

Three real pain points that drove me to look for something different:

① Similarity Search Degrades Silently As your knowledge base grows, cosine similarity loses precision. A query about "refund policy" starts returning chunks about "return window" and "shipping" because everything is semantically close. You don't get an error — you get subtly wrong answers.

② Your Agent Forgets Between Sessions Traditional RAG has no persistent agent memory. Every session starts cold. The agent doesn't remember what it learned last time, what conventions it figured out, or what the customer told it yesterday.

③ The Knowledge Store is a Black Box Vector databases store opaque numerical representations. When retrieval fails you cannot inspect what was stored, how it was chunked, or why a query matched the wrong document. Debugging means adding more logs, not reading the data.

Then I came across the PageIndex and ByteRover approach — a thin idea that turned my thinking upside down. Instead of embedding documents, what if an LLM simply read them, reasoned about them, and stored the knowledge in a human-readable hierarchy? No similarity search. No vectors. Just files.

ByteRover and other tools try to productionises exactly this idea. And after experimenting it for a months, I'm not going back.

02 — The Idea

No embeddings. No similarity search. Just files.

The insight is embarrassingly simple. When you want to find information in a technical manual, you don't compute cosine similarity between your query and every page. You open the table of contents, navigate to the right chapter, then the right section.

ByteRover makes your AI agent do the same thing — but for any knowledge base you give it.

Traditional RAG vs ByteRover — Side by Side

Step	Traditional RAG	ByteRover
Ingest	Chunk into 512-token fragments	LLM reads and reasons about full doc
Store	Compute embeddings → Vector DB	Organise into markdown hierarchy
Retrieve	Cosine similarity (black box)	BM25 + Cache + LLM reasoning
Debug	Impossible — opaque vectors	Open the `.md` file and read it
Version control	Not possible	`git diff` your knowledge
Infrastructure	Vector DB + embedding service	Zero — just local files

The critical difference: everything ByteRover stores is human-readable. You can open .brv/context-tree/ in your file explorer and read exactly what your agent knows. You can git diff what changed after a curation. You can delete a file if the knowledge is wrong. It's just files.

Key Insight: ByteRover is not a retrieval optimisation on top of vector search. It is a fundamentally different approach: LLM-curated, file-based knowledge that you can read, edit, version-control, and debug without any special tooling.

03 — How It Works

The architecture, explained simply

ByteRover has two main operations: Curation (writing to memory) and Query (reading from memory). Both run entirely on your local machine. No cloud required.

The Context Tree — Local File Structure

All knowledge lives in .brv/context-tree/ as a three-level hierarchy. Every node is a folder with a context.md describing its purpose. Every piece of knowledge is a plain markdown file.

.brv/context-tree/
├── orders/                              # Domain
│   ├── context.md                       # auto-generated overview
│   ├── _index.md                        # auto-generated summary
│   ├── cancellation/                    # Topic
│   │   ├── context.md
│   │   └── order_cancellation_policy.md
│   └── wrong-items/
│       └── wrong_item_resolution.md
├── payments/
│   ├── context.md
│   ├── payment-methods/
│   │   └── supported_payment_methods.md
│   └── refunds/
│       ├── refund_timelines.md
│       └── cod_refund_policy.md
├── shipping/
│   ├── context.md
│   └── delivery-timelines/
│       └── city_delivery_windows.md
└── returns/
    ├── context.md
    └── return-window/
        └── return_eligibility.md

This is the entire knowledge store. No database. No server. A folder you can open in Finder or Explorer.

Curation — Writing to Memory

When you call brv curate -d docs/, ByteRover's LLM runs in a sandbox with a ToolsSDK. It reads your documents, reasons about the content, and writes structured knowledge using five atomic operations:

Operation	What It Does
`ADD`	Creates a new knowledge file. Auto-generates `context.md` at each hierarchy level
`UPDATE`	Modifies an existing entry. Resets recency, boosts importance score
`UPSERT`	Creates if new, updates if exists — no pre-check needed
`MERGE`	Combines two entries intelligently, deletes source
`DELETE`	Removes a file or entire subtree

The LLM gets feedback on every operation. If a MERGE fails, it sees the error and retries with a corrected approach. This stateful feedback loop means the system doesn't silently drop errors — it surfaces them as actionable information back to the agent.

After curation, ByteRover also extracts structured Facts from your content — project conventions, architecture decisions, rules — and appends them directly to each markdown file:

## Facts
- **refund_upi**: UPI refunds processed within 2-3 business days [convention]
- **refund_card**: Credit/Debit card refunds take 5-7 business days [convention]
- **cod_refund**: COD refunds issued as ShopEase Wallet credits only [project]

Query — The 5-Tier Retrieval Pipeline

This is where ByteRover really shines. Every query goes through a tiered strategy that starts with the cheapest possible path and escalates only when needed. Most queries resolve in under 200ms — without touching an LLM at all.

Tier	Name	Speed	How It Works
0	Exact Cache	~0ms	MD5 fingerprint match — same query, cached result (60s TTL)
1	Fuzzy Cache	~50ms	≥60% token similarity to a recent query
2	BM25 Direct	~200ms	Full-text search — if top result scores high, return immediately. No LLM.
3	LLM Pre-fetch	<5s	BM25 results injected as context for a single LLM synthesis call
4	Agentic Loop	8–15s	Full multi-step reasoning: reads files, follows relations, iterates

Tiers 0–2 bypass the LLM entirely. The compound scoring formula that ranks results:

score = (0.6 × BM25) + (0.25 × importance) + (0.15 × recency) × tierBoost

Where importance accumulates every time a file is queried (+3) or updated (+5), and recency decays exponentially over time. Files that are frequently accessed naturally rank higher — the knowledge base gets smarter the more you use it.

Out-of-Domain Detection: When a query has no good match in the knowledge tree, ByteRover tells you explicitly: "this topic isn't covered — curate relevant knowledge first." No hallucinated answers. No confident wrong responses.

Session Learning — Memory That Grows

After every session, ByteRover automatically extracts durable knowledge from the conversation — patterns you used, decisions you made, preferences you expressed — and persists them as agent memories across five categories:

Category	What It Captures
Patterns	Reusable code or workflow patterns
Preferences	User style, naming, structure decisions
Entities	Key files, modules, APIs, dependencies
Decisions	Architectural choices (immutable log)
Skills	Tool invocation recipes that worked

Your agent gets smarter over time without any extra work.

Two layers of memory working together:

Layer	Tool	What it remembers
Knowledge memory	ByteRover	ShopEase policies, timelines, rules
Session memory	`conversationHistory` array	What this customer said today

04 — Building the Agent

A customer support agent with ByteRover memory

Let's build a real thing. A customer support agent for ShopEase — a fictional Indian e-commerce platform. The agent will answer questions about orders, payments, shipping, and refunds using ByteRover as its memory layer and Groq (Llama 3.3 70B) for responses.

No vector database. No embeddings. The entire retrieval happens through brv query — a subprocess call that returns the right markdown context in under a second.

Agent Flow:

Customer question
       │
       ▼
 brv query "..."          ← subprocess to ByteRover daemon
       │
       ▼
 ByteRover 5-tier retrieval
   Tier 0/1 → cache hit   ~0ms
   Tier 2   → BM25 match  ~200ms  (no LLM)
   Tier 3/4 → LLM reason  <15s
       │
       ▼
 Retrieved context (markdown)
       │
       ▼
 Groq / Llama 3.3 70B
   system prompt + context + conversation history
       │
       ▼
 Support response → Customer

Project Structure

support-agent/
├── docs/
│   ├── faq.md                # ShopEase FAQ
│   ├── shipping-policy.md    # Shipping rules
│   └── refund-policy.md      # Refund rules
├── src/
│   ├── curate.js             # Feeds docs → ByteRover memory
│   ├── query.js              # Queries ByteRover memory
│   └── agent.js              # Main chat loop
├── .env.example
└── package.json

Step 1 — Install ByteRover

# Install ByteRover CLI (starts a local daemon)
curl -fsSL https://www.byterover.dev/install.sh | sh

# Verify it's running
brv status

# Install project dependencies
npm install

Step 2 — The Sample Docs

Create these three files in docs/. ByteRover will reason about them during curation and organise them into the context tree.

docs/faq.md — covers orders, payments, shipping, returns, account management.

docs/shipping-policy.md — dispatch timelines, shipping partners, failed deliveries, COD rules.

docs/refund-policy.md — refund eligibility, timelines by payment method, COD refunds, escalations.

You can find the full code implementation and content of all three docs in the GitHub repo.

Step 3 — The Query Helper (`src/query.js`)

This is the core of the integration. brv query runs as a subprocess against the local ByteRover daemon. It returns plain text — the retrieved context from your markdown knowledge tree. Twenty lines. No SDK. No API key for ByteRover.

import { exec } from "child_process";
import { promisify } from "util";

const execAsync = promisify(exec);

/**
 * Queries ByteRover's local context tree.
 *
 * ByteRover's 5-tier strategy handles routing automatically:
 *   Tier 0/1 → cache hit     ~0ms
 *   Tier 2   → BM25 direct   ~200ms  (no LLM needed)
 *   Tier 3/4 → LLM reasoning <15s
 */
export async function queryMemory(question) {
  try {
    const sanitized = question.replace(/"/g, '\\"');
    const { stdout, stderr } = await execAsync(
      `brv query "${sanitized}"`,
      { timeout: 30000 } // 30s covers Tier 4 agentic loop
    );

    if (stderr && !stdout) {
      console.error("[ByteRover] Error:", stderr);
      return null;
    }

    return stdout.trim();
  } catch (err) {
    console.error("[ByteRover] Query failed:", err.message);
    return null;
  }
}

Step 4 — Curate Your Docs (`src/curate.js`)

Run this once before starting the agent. ByteRover reads all files in docs/, reasons about the content, and organises everything into the context tree. After this runs, you can literally cat .brv/context-tree/payments/refunds/refund_timelines.md and read what your agent knows.

import { exec } from "child_process";
import { promisify } from "util";
import path from "path";
import { fileURLToPath } from "url";

const execAsync = promisify(exec);
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const DOCS_DIR = path.resolve(__dirname, "../docs");

/**
 * Curates all support docs into ByteRover's local context tree.
 *
 * ByteRover's curation engine:
 *   1. Reads and reasons about each document
 *   2. Organises into Domain → Topic → Subtopic hierarchy
 *   3. Stores as plain markdown in .brv/context-tree/
 *   4. Generates _index.md summaries at every level
 *   5. Extracts structured facts (policies, timelines, rules)
 *
 * Run once. Re-run whenever docs are updated.
 */
async function curateDocs() {
  console.log("🔄 Curating support docs into ByteRover memory...");
  console.log(`   Source: ${DOCS_DIR}\n`);

  const { stdout } = await execAsync(
    // -d flag: curate entire directory at once
    `brv curate "ShopEase support docs: FAQs, shipping, refunds" -d ${DOCS_DIR}`,
    { timeout: 120000 } // 2 min — multiple LLM calls during curation
  );

  console.log(stdout);
  console.log("✅ Done. Run `npm run chat` to start the agent.");
  console.log("   Tip: run `brv vc status` to see what was stored.\n");
}

curateDocs().catch(console.error);

Step 5 — The Main Agent (`src/agent.js`)

import Groq from "groq-sdk";
import readline from "readline";
import { queryMemory } from "./query.js";
import "dotenv/config";

const groq = new Groq({ apiKey: process.env.GROQ_API_KEY });
const MODEL = "llama-3.3-70b-versatile"; // Open-source via Groq

// Session memory — what this customer said today
const conversationHistory = [];

const SYSTEM_PROMPT = `You are a helpful customer support agent for ShopEase — an Indian e-commerce platform.
Use ONLY the retrieved context to answer. Never invent policies or timelines.
If the context doesn't cover the question, say: "Please contact support at 1800-123-4567."
Keep responses under 150 words.`;

/**
 * For each user message:
 *   1. Query ByteRover for relevant context (no similarity search)
 *   2. Inject context into the message
 *   3. Call Groq with full conversation history
 *   4. Return and store the response
 */
async function chat(userMessage) {
  // Step 1 — Retrieve relevant context from ByteRover
  process.stdout.write("\n🔍 Querying ByteRover memory...");
  const context = await queryMemory(userMessage);
  process.stdout.write(" done\n\n");

  if (!context) {
    return "Knowledge base unavailable. Please call 1800-123-4567.";
  }

  // Step 2 — Inject context into the message
  const enrichedMessage = `RETRIEVED CONTEXT FROM BYTEEROVER:
---
${context}
---
CUSTOMER QUESTION: ${userMessage}`;

  conversationHistory.push({ role: "user", content: enrichedMessage });

  // Step 3 — Call Groq with system prompt + full history
  const response = await groq.chat.completions.create({
    model: MODEL,
    messages: [
      { role: "system", content: SYSTEM_PROMPT },
      ...conversationHistory,
    ],
    temperature: 0.3, // Low — factual support, not creative
    max_tokens: 512,
  });

  const reply = response.choices[0].message.content;

  // Store in history without injected context (saves tokens on next turn)
  conversationHistory.push({ role: "assistant", content: reply });
  return reply;
}

async function main() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  console.log("ShopEase Support Agent — ByteRover + Groq Llama 3.3 70B");
  console.log('Type your question. Type "exit" to quit.\n');

  const ask = () => {
    rl.question("You: ", async (input) => {
      const msg = input.trim();
      if (!msg) return ask();
      if (msg === "exit") { rl.close(); return; }

      const reply = await chat(msg);
      console.log(`Agent: ${reply}\n`);
      ask();
    });
  };
  ask();
}

main();

Running It

# 1. Add your Groq API key
cp .env.example .env
# Edit .env → GROQ_API_KEY=your_key_here

# 2. Curate docs into ByteRover memory (run once)
npm run curate

# 3. Inspect what ByteRover stored
brv vc status
brv query "refund policy UPI"

# 4. Start the agent
npm run chat

The wow moment: After running npm run curate, open .brv/context-tree/ in your file explorer. You'll see your support docs organised into readable markdown files by domain and topic — exactly what your agent knows, readable by a human, editable with any text editor. No black box. No guessing.

05 — Honest Tradeoffs

When to use it. When not to.

ByteRover is not a drop-in replacement for all RAG use cases.

✅ Use ByteRover when:

Your agent needs persistent memory across sessions
You want to onboard a codebase your AI coding agent should understand
You're working with private or regulated data — local-first, no cloud dependency required
You want to debug why retrieval failed — just open the markdown file
Your knowledge base has clear structure (policies, docs, conventions, code patterns)

❌ Skip it when:

You need pure semantic similarity search over millions of unstructured documents
One-shot queries with no repeated patterns — curation overhead not worth it
Real-time data that changes faster than you can re-curate
Multi-session synthesis is your primary use case — ByteRover is still improving here

07 — Wrapping Up

The model is interchangeable. The memory architecture isn't.

I've been building RAG systems for two years. The assumption I carried throughout — that embeddings and vector similarity search are the only serious approach — turned out to be wrong.

ByteRover showed me a different model: curate knowledge once, query it intelligently forever. Files you can read. Structure you can understand. Memory that persists across sessions. A retrieval system that escalates through tiers rather than blindly computing similarity over everything.

The support agent we built has three files of actual code. The memory system is handled by ByteRover. The response generation is handled by Groq. The only thing you write is the glue — and the glue is twenty lines.

The bottom line: When your agent makes a mistake, you shouldn't have to guess why. You should be able to open a markdown file and read what it knew. An Agentic memory tool like Byterover makes that possible.

Prompt: Reconstruct customer-support-agent-101
Build a Node.js (ES modules) project named support-agent (folder name can be customer-support-agent-101) that implements an interactive CLI customer-support agent for a fictional Indian e-commerce brand ShopEase.

Stack
Runtime: Node, "type": "module" in package.json
Dependencies: groq-sdk (^0.9.x), dotenv (^16.x)
Scripts:
"curate": "node src/curate.js"
"chat": "node src/agent.js"
External tool (documented in README comments / console messages): ByteRover CLI (brv). User installs via ByteRover’s install script; agent assumes brv is on PATH and a local daemon/context tree works.
Environment
Add .env.example with: GROQ_API_KEY=your_groq_api_key_here
Load env with import "dotenv/config" where needed
.gitignore: node_modules/, .env, .DS_Store, .byterover-*.xml, .brv/_queue_status.json, .brv/config.json, .claude/settings.local.json
Knowledge base (docs/)
Create three markdown files with realistic ShopEase policy and FAQ content (Indian context: ₹, pincode, partners like BlueDart/Delhivery, etc.):

docs/shipping-policy.md — dispatch timelines, shipping partners, failed delivery attempts (3 attempts → return → refund window), damaged-in-transit procedure, India-only shipping, no PO boxes, heavy items (20kg+) timelines, COD limits (e.g. up to ₹10,000), COD + EMI note
docs/refund-policy.md — eligibility, refund timelines table (UPI, net banking, cards, wallet, EMI, COD→wallet only), partial refunds, tracking, disputes (refunds@shopease.in, 1800-123-4567), non-refundable categories
docs/faq.md — sections: Orders, Payments, Shipping & Delivery, Returns & Refunds, Account & Security (place/cancel orders, wrong item, payment failure, coupons, delivery timelines, free shipping threshold, tracking, return window, refund times, password reset, addresses, account deletion)
src/query.js
Export queryMemory(question) that runs brv query "<sanitized>" via child_process.exec + promisify, timeout 30s
Escape double quotes in the user question for the shell
On failure or empty useful stdout, return null; log stderr/helpful hints
If stdout indicates ByteRover cannot connect to API / LLM, hint running npm run curate first
src/curate.js
Resolve project root from import.meta.url; docs/ path = path.join(PROJECT_ROOT, "docs")
Poll .brv/_queue_status.json every ~3s (max ~180s): treat success as !processing && processed > 0; failure if failed > 0
Flow:
brv vc init (cwd = project root, timeout ~60s)
Require GROQ_API_KEY; else exit with message to copy .env.example → .env
brv providers connect groq -k <key> -m llama-3.1-8b-instant (timeout ~30s)
brv curate "<description>" -d <DOCS_DIR> --detach where description is like: ShopEase customer support knowledge base. FAQs, shipping policy, and refund policy. Organize by topic: orders, payments, shipping, returns, account.
Wait for curation via queue status file; print success/failure guidance (brv vc status, install URL)
Use clear console banners and user-facing error messages
src/agent.js
Groq client from env GROQ_API_KEY
Model: llama-3.3-70b-versatile
conversationHistory array in memory for the session
SYSTEM_PROMPT: ShopEase support agent; must use only retrieved ByteRover context; polite/concise/warm; if missing info use fallback line with 1800-123-4567 and refunds@shopease.in; don’t invent policies; ~150 word soft limit; acknowledge order IDs even if not lookup-capable
chat(userMessage):
Print “Querying ByteRover memory…” then await queryMemory(userMessage)
If no context, return friendly “knowledge base unavailable” message with support phone
Build enriched user message: header RETRIEVED CONTEXT FROM BYTEEROVER KNOWLEDGE BASE: + context + CUSTOMER QUESTION:
Push that string as user message to history
Call Groq chat.completions.create with system + full history, temperature: 0.3, max_tokens: 512
Append assistant reply to history (assistant content = answer only, not the injected context block)
main(): readline REPL; banner “ShopEase Support Agent / ByteRover Memory + Groq (Llama 3.3 70B)”; exit quits; loop questions
Quality bar
Match behavior above closely (timeouts, models, prompts, file layout)
No secrets in repo; document npm install, .env, npm run curate once, then npm run chat
Code style: clear comments explaining ByteRover tiers / curation purpose where the original had them

Resources:

If this was useful, now you AI agent newver forgets topics and conversation and its smarter than before. Do like the post and share it with someone building AI agents. And drop a comment — I'd love to know what you're building.

No RAG. No Vectors. No Amnesia. Here's How to Built a Smarter Support Agent with Persistent Memory

Your AI Agent Has Amnesia. ByteRover Fixes It — Without Vectors

01 — The Problem

RAG promised AI memory. It lied.

02 — The Idea

No embeddings. No similarity search. Just files.

Traditional RAG vs ByteRover — Side by Side

03 — How It Works

The architecture, explained simply

The Context Tree — Local File Structure

Curation — Writing to Memory

Query — The 5-Tier Retrieval Pipeline

Session Learning — Memory That Grows

04 — Building the Agent

A customer support agent with ByteRover memory

Project Structure

Step 1 — Install ByteRover

Step 2 — The Sample Docs

Step 3 — The Query Helper (`src/query.js`)

Step 4 — Curate Your Docs (`src/curate.js`)

Step 5 — The Main Agent (`src/agent.js`)

Running It

05 — Honest Tradeoffs

When to use it. When not to.

07 — Wrapping Up

The model is interchangeable. The memory architecture isn't.

Comments

Real AI Engineering

More from this blog

Embarking on the GenAI Journey: Day 01 – Demystifying the Magic

How Building A Full Stack Application Helped me gained Industry Level Experience.

Git in Action - Upload Project Code To Github Repository Using Git

Basic Commands In Git

Command Palette

Your AI Agent Has Amnesia. ByteRover Fixes It — Without Vectors

01 — The Problem

RAG promised AI memory. It lied.

02 — The Idea

No embeddings. No similarity search. Just files.

Traditional RAG vs ByteRover — Side by Side

03 — How It Works

The architecture, explained simply

The Context Tree — Local File Structure

Curation — Writing to Memory

Query — The 5-Tier Retrieval Pipeline

Session Learning — Memory That Grows

04 — Building the Agent

A customer support agent with ByteRover memory

Project Structure

Step 1 — Install ByteRover

Step 2 — The Sample Docs

Step 3 — The Query Helper (src/query.js)

Step 4 — Curate Your Docs (src/curate.js)

Step 5 — The Main Agent (src/agent.js)

Running It

05 — Honest Tradeoffs

When to use it. When not to.

07 — Wrapping Up

The model is interchangeable. The memory architecture isn't.

Share the following prompt with your LLM to code the demo-code we talked in these blog.

Comments

Real AI Engineering

More from this blog

Step 3 — The Query Helper (`src/query.js`)

Step 4 — Curate Your Docs (`src/curate.js`)

Step 5 — The Main Agent (`src/agent.js`)