PHASE 7 · অধ্যায় 34

রিট্রিভাল অগমেন্টেড জেনারেশন

RAG

নিজের data দিয়ে ChatGPT-style system বানানো।

ভূমিকা

LLM সব কিছু জানে — কিন্তু আপনার company এর internal document, আজকের news, আপনার project এর code? জানে না। সমাধান? RAG — Retrieval-Augmented Generation। নিজের data কে LLM এর সাথে combine করে ChatGPT-style assistant বানানো — modern AI app এর সবচেয়ে important pattern।

ধারণা

RAG = Retrieval + Generation। Workflow: (1) Document collection কে chunk করে embed করে vector DB তে store, (2) User query কে embed করে similar chunk retrieve, (3) Retrieved chunk + query কে LLM এ pass করে answer generate। ফলে LLM এর creativity + আপনার data এর accuracy + citation = hallucination-free, up-to-date answer।

সহজ ব্যাখ্যা

ভাবুন একজন consultant — সে অনেক জানে, কিন্তু আপনার নির্দিষ্ট company এর data দিলে আরো ভালো answer দিতে পারবে। RAG ঠিক তাই — LLM consultant, vector DB হলো consultant এর সামনে রাখা reference book। Question এলে relevant page খুলে দেখায়, তারপর answer compose করে। Citation ও দিতে পারে — কোন source থেকে এসেছে।

বাস্তব ব্যবহার

ChatGPT 'Custom GPT with file' — RAG behind the scenes।
Perplexity AI — web search + LLM = real-time RAG।
Notion AI Q&A — workspace document এর উপর RAG।
Customer support — knowledge base RAG।
Legal/Medical AI — domain document এর উপর।

ধাপে ধাপে বিশ্লেষণ

Step 1 — Document collect + chunk

PDF/text কে 500-1000 char chunk এ ভাগ।

Step 2 — Embed

প্রতিটা chunk এর embedding (Gemini/OpenAI)।

Step 3 — Store in vector DB

pgvector, Chroma, Pinecone, Qdrant।

Step 4 — User query এলে embed

Same model দিয়ে query vector।

Step 5 — Retrieve top-k

Cosine similarity দিয়ে relevant chunk।

Step 6 — Augment prompt

Retrieved context + question → LLM।

Step 7 — Generate + cite

LLM context থেকে answer, source mention।

Python কোড

import os
import numpy as np
from openai import OpenAI

client = OpenAI(
    base_url="https://ai.gateway.lovable.dev/v1",
    api_key=os.environ["LOVABLE_API_KEY"],
)

documents = [
    "Our company offers a 30-day return policy on all electronics.",
    "Free shipping is available for orders above $50 within the US.",
    "Customer support is available Monday through Friday, 9 AM to 6 PM EST.",
    "We accept Visa, Mastercard, American Express, and PayPal.",
    "All products come with a 1-year manufacturer warranty.",
]

def embed(texts):
    res = client.embeddings.create(model="google/gemini-embedding-001", input=texts)
    return np.array([d.embedding for d in res.data])

def cosine(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

doc_vectors = embed(documents)

def rag_answer(question, k=2):
    q_vec = embed([question])[0]
    sims = [cosine(q_vec, dv) for dv in doc_vectors]
    top_idx = np.argsort(sims)[-k:][::-1]
    context = "\n".join(f"- {documents[i]}" for i in top_idx)

    messages = [
        {"role": "system", "content": "Answer using ONLY the provided context. If not in context, say 'I don't know'."},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"},
    ]
    res = client.chat.completions.create(
        model="google/gemini-2.5-flash",
        messages=messages,
        temperature=0.2,
    )
    return res.choices[0].message.content, [documents[i] for i in top_idx]

answer, sources = rag_answer("How long is the return period?")
print("Answer:", answer)
print("Sources:", sources)

ব্যাখ্যা

Minimal RAG: documents embed করে memory তে, query এলে embed + cosine similarity দিয়ে top-2 retrieve, context সহ LLM এ pass। Production এ vector DB (pgvector/Chroma) ব্যবহার করতে হবে, in-memory না।

সাধারণ ভুল

Chunk size ভুল — খুব ছোট = context কম, খুব বড় = irrelevant noise।
Same model দিয়ে embed না করা query + document।
Top-k সঠিক না (3-5 sweet spot)।
Context exceed করে LLM এর limit।
Source cite না করা — user trust হারায়।

অনুশীলন

Wikipedia article দিয়ে RAG QA system বানান।
Chunk size 200/500/1000 — accuracy compare।
pgvector setup করে production RAG implement।
Re-ranking (Cohere Rerank) যোগ করে quality বাড়ান।

ছোট প্রজেক্ট

PDF Chat Bot (RAG)

একটি app যেখানে user PDF upload করে এবং সেই PDF এর content নিয়ে question করতে পারে। PyPDF2 দিয়ে text extract, chunk, embed (Gemini), pgvector এ store, query এলে retrieve + LLM answer with citation।

সারাংশ

RAG = Retrieval + Generation।
Custom knowledge + LLM = personalized AI assistant।
Workflow: Chunk → Embed → Store → Retrieve → Augment → Generate।
Hallucination কমে, citation possible, up-to-date।
Modern AI app (ChatGPT plugin, Perplexity, Notion AI) — সবার core।

পূর্ববর্তী · অধ্যায় 33

প্রম্পট ইঞ্জিনিয়ারিং

পরবর্তী · অধ্যায় 35

ভেক্টর ডাটাবেস