PHASE 8 · অধ্যায় 40

মনিটরিং ও লগিং

Monitoring & Logging

Production model এর health track করা।

ভূমিকা

Production এ model deploy হলো — but কিছুদিন পর users complain: 'answer ভুল হচ্ছে'। কেন? Data drift? Model degraded? API slow? Without monitoring এবং logging — অন্ধের মত guessing। প্রতিটা serious ML system এর core: observability।

ধারণা

ML Observability তিনটা pillar: (1) Logs — কী request এসেছে, কী response, error trace, (2) Metrics — latency (p50/p95/p99), throughput (RPS), error rate, GPU utilization, (3) Traces — distributed request flow। Tools: Prometheus + Grafana (metrics), ELK/Loki (logs), OpenTelemetry (traces), Sentry (error tracking), Evidently AI / WhyLabs (data drift, model drift)।

সহজ ব্যাখ্যা

ভাবুন hospital এ ICU patient — heart rate, BP, oxygen, EKG — continuous monitor। Anomaly হলে alarm। ML model production এ patient এর মত — latency rise, accuracy drop, GPU memory leak — early detect না করলে disaster। Logging = patient diary, metrics = vital signs, traces = full body scan।

বাস্তব ব্যবহার

OpenAI — full Prometheus + Grafana stack।
Sentry — error tracking, alert।
Datadog APM — production performance monitoring।
Evidently AI — data/model drift detection।
Langfuse, Helicone — LLM-specific observability।

ধাপে ধাপে বিশ্লেষণ

Step 1 — Structured logging

JSON log, contextual fields।

Step 2 — Metrics emit

Counter, Histogram, Gauge — Prometheus format।

Step 3 — Dashboard

Grafana এ key metric visualize।

Step 4 — Alert rule

Latency > X, error rate > Y — notify Slack।

Step 5 — Tracing

OpenTelemetry instrument — distributed request flow।

Step 6 — Drift detection

Weekly data distribution check।

Python কোড

import logging
import time
import json
from fastapi import FastAPI, Request
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
from pydantic import BaseModel
from transformers import pipeline

logging.basicConfig(
    level=logging.INFO,
    format='%(message)s',
)
logger = logging.getLogger("nlp-api")

REQUEST_COUNT = Counter("nlp_requests_total", "Total predict requests", ["endpoint", "status"])
REQUEST_LATENCY = Histogram("nlp_request_latency_seconds", "Latency", ["endpoint"])

app = FastAPI()
classifier = pipeline("sentiment-analysis")

class Req(BaseModel):
    text: str

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    elapsed = time.time() - start
    log = {
        "method": request.method,
        "path": request.url.path,
        "status": response.status_code,
        "latency_ms": round(elapsed * 1000, 2),
    }
    logger.info(json.dumps(log))
    return response

@app.post("/predict")
async def predict(req: Req):
    with REQUEST_LATENCY.labels(endpoint="/predict").time():
        try:
            result = classifier(req.text)[0]
            REQUEST_COUNT.labels(endpoint="/predict", status="success").inc()
            return {"label": result["label"], "score": result["score"]}
        except Exception as e:
            REQUEST_COUNT.labels(endpoint="/predict", status="error").inc()
            logger.error(json.dumps({"error": str(e), "input": req.text[:100]}))
            raise

@app.get("/metrics")
def metrics():
    from fastapi.responses import Response
    return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)

ব্যাখ্যা

Structured JSON logging middleware প্রতিটা request এ method, path, status, latency log। Prometheus Counter + Histogram দিয়ে metric। /metrics endpoint Prometheus scrape করবে। Grafana সেখান থেকে dashboard বানাবে। Error case এ input এর first 100 char log (PII careful)।

সাধারণ ভুল

PII (email, password, full message) log করা — compliance violation।
Plain text log — search/filter কঠিন।
Alert threshold ভুল — false alarm flood।
Drift monitor না করা — accuracy silently degrade।
Logs retention policy নেই — disk full।

অনুশীলন

docker-compose এ Prometheus + Grafana + API stack চালান।
Grafana dashboard বানান: RPS, p95 latency, error rate।
Sentry integrate করে exception track।
Evidently দিয়ে data drift detect example।

ছোট প্রজেক্ট

Full Observability Stack

FastAPI NLP app + Prometheus + Grafana + Loki — docker-compose এ একসাথে। Dashboard এ live latency, throughput, error rate। Latency > 500ms হলে Slack alert webhook।

সারাংশ

Observability = Logs + Metrics + Traces।
Production ML অন্ধভাবে চালানো বিপজ্জনক।
Prometheus + Grafana = industry standard metrics stack।
Drift detection — accuracy degradation early catch।
Phase 8 complete! আপনি এখন NLP system production এ build → deploy → scale → monitor সব pipeline এর master। Real NLP Engineer।

পূর্ববর্তী · অধ্যায় 39

স্কেলিং NLP সিস্টেম

পরবর্তী · অধ্যায় 41

প্রজেক্ট: স্প্যাম ক্লাসিফায়ার