AI Engineer & Builder
Building AI that cut through the hype.
Model Context Protocol — Gmail as AI-native tooling
Built an MCP server that exposes Gmail as a set of AI-native tools. Any MCP-compatible agent — including Anthropic's Claude — can read, search, filter and send emails through standardised tool calls over IMAP/SMTP, without any browser automation. Ships with a FastAPI-backed web UI for direct inbox management and a full Docker setup for one-command deployment.
Agent-to-Agent Protocol — Google ADK + Gemini 2.5 Pro
A multi-agent pipeline where four specialised agents coordinate over Google's A2A protocol to generate structured courses on any topic. The Orchestrator routes tasks to a Researcher (Google Search), a Judge (pass/fail quality gate), and a Content Builder — iterating up to three times until the research meets the bar, then streaming the final course directly to the browser via SSE.
Small Language Model Evaluation on Consumer Hardware
A rigorous benchmarking framework for small language models running locally via Ollama, targeting a 4 GB VRAM machine (RTX 3050 Laptop). Measures time-to-first-token, throughput (tokens/sec), and quality across 6 prompt categories using 5 distinct scoring strategies. Results surface in a live Streamlit dashboard with radar charts, heatmaps, and side-by-side model comparison. Qwen2.5:3B ranks #1 on quality; Gemma2:2B leads on speed.
LLM Serving Benchmark — Qwen2.5-7B on RTX 5090
End-to-end benchmark of the three leading LLM serving backends across practical, overload, and extreme concurrency levels. Measures RPS, TPS, TTFT, and inter-token latency alongside quality evaluation (MMLU, GSM8K, HumanEval) with a Streamlit dashboard and Parquet export. TRT-LLM delivers 41% more TPS than vLLM at saturation; vLLM and SGLang reclaim the lead when the queue never empties.