Open Source Contribution
HuggingFace, NLTK তে contribute করা।
শুধু coding জানলেই NLP Engineer হওয়া যায় না। দরকার end-to-end pipeline build করার ক্ষমতা — data collection থেকে production deployment পর্যন্ত। এই chapter এ আমরা একটা real-world capstone project design করব যেটা আপনার portfolio এর crown jewel হবে।
Capstone Project হলো এমন একটা comprehensive project যেখানে আপনি পুরো NLP stack apply করেন — data scraping, preprocessing, model training/fine-tuning, evaluation, API deployment, frontend integration, এবং monitoring। Recruiter এটাই দেখতে চায়।
ভাবুন একটা restaurant — আপনি শুধু রান্না জানলে chef, কিন্তু whole restaurant চালাতে পারলে owner। NLP তেও একই। Model train করতে পারা ভালো, কিন্তু whole pipeline ship করতে পারা = senior engineer। Capstone project এই senior-level capability prove করে।
# Capstone architecture skeleton — Bengali News Summarizer SaaS
# Tech stack: FastAPI + HuggingFace + React + Postgres + Docker
# backend/main.py
from fastapi import FastAPI, BackgroundTasks, Depends
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel, HttpUrl
from transformers import pipeline
import newspaper, uuid, redis, json
app = FastAPI(title="Bangla News Summarizer API")
app.add_middleware(CORSMiddleware, allow_origins=["*"], allow_methods=["*"])
cache = redis.Redis(host="redis", port=6379, decode_responses=True)
summarizer = pipeline("summarization", model="csebuetnlp/mT5_multilingual_XLSum")
class SummarizeRequest(BaseModel):
url: HttpUrl
max_length: int = 150
class SummarizeResponse(BaseModel):
job_id: str
status: str
summary: str | None = None
def process_summary(job_id: str, url: str, max_len: int):
try:
article = newspaper.Article(url, language="bn")
article.download(); article.parse()
result = summarizer(article.text, max_length=max_len, min_length=40)[0]
cache.set(job_id, json.dumps({
"status": "done",
"summary": result["summary_text"],
"title": article.title,
}), ex=3600)
except Exception as e:
cache.set(job_id, json.dumps({"status": "error", "error": str(e)}))
@app.post("/summarize", response_model=SummarizeResponse)
async def summarize(req: SummarizeRequest, bg: BackgroundTasks):
job_id = str(uuid.uuid4())
cache.set(job_id, json.dumps({"status": "processing"}))
bg.add_task(process_summary, job_id, str(req.url), req.max_length)
return SummarizeResponse(job_id=job_id, status="processing")
@app.get("/result/{job_id}", response_model=SummarizeResponse)
async def get_result(job_id: str):
raw = cache.get(job_id)
if not raw:
return SummarizeResponse(job_id=job_id, status="not_found")
data = json.loads(raw)
return SummarizeResponse(job_id=job_id, **data)
@app.get("/health")
def health(): return {"status": "ok"}এই capstone Bengali News Summarizer একটা production-ready async SaaS। User URL submit করে, background এ news scrape হয় (`newspaper3k`), mT5 multilingual summarizer summary বানায়, Redis এ cache হয়, এবং user `job_id` দিয়ে result poll করে। CORS enabled — যেকোনো frontend থেকে call করা যায়। Health endpoint, error handling, TTL — সব production pattern present।
একটা complete SaaS: Bengali Resume Parser। Upload PDF → extract structured data (NER) → match with job description (semantic similarity) → score। FastAPI backend + React frontend + Postgres + Stripe (paid tier) + deployed on Cloud Run with custom domain।