AI Engineer & Builder

CHINMAY HEBBAL.

Building AI that cut through the hype.

Projects
MCP GMAIL MCP SERVER Model Context Protocol A2A MULTI-AGENT COURSE GEN Agent-to-Agent Protocol SLM SLM BENCHMARK SUITE Small Language Models Inference vLLM vs SGLang vs TRT-LLM LLM Serving Benchmark
Scroll
01
01 / 03 MCP

GMAIL MCP SERVER

Model Context Protocol Gmail as AI-native tooling

Built an MCP server that exposes Gmail as a set of AI-native tools. Any MCP-compatible agent including Anthropic's Claude can read, search, filter and send emails through standardised tool calls over IMAP/SMTP, without any browser automation. Ships with a FastAPI-backed web UI for direct inbox management and a full Docker setup for one-command deployment.

  • 8 MCP tools: list, search by sender/subject, get unread, get details, send, list folders
  • Two deployment modes: browser-based web UI (port 8000) or headless MCP server
  • App Password auth no OAuth flow, no browser dependency
  • Fully Dockerised with compose file for both modes
8 MCP Tools
2 Deploy Modes
02
02 / 03 A2A

MULTI-AGENT COURSE GEN

Agent-to-Agent Protocol Google ADK + Gemini 2.5 Pro

A multi-agent pipeline where four specialised agents coordinate over Google's A2A protocol to generate structured courses on any topic. The Orchestrator routes tasks to a Researcher (Google Search), a Judge (pass/fail quality gate), and a Content Builder — iterating up to three times until the research meets the bar, then streaming the final course directly to the browser via SSE.

  • 4 agents: Researcher, Judge, Content Builder, Orchestrator each a separate microservice
  • Quality loop: Researcher retries up to 3× based on Judge feedback before handoff
  • Real-time SSE streaming from Orchestrator to frontend as course is written
  • All agents powered by Gemini 2.5 Pro with Google Search grounding
4 Agents
Quality Loop
5 Microservices
03
03 / 04 SLM

SLM BENCHMARK SUITE

Small Language Model Evaluation on Consumer Hardware

A rigorous benchmarking framework for small language models running locally via Ollama, targeting a 4 GB VRAM machine (RTX 3050 Laptop). Measures time-to-first-token, throughput (tokens/sec), and quality across 6 prompt categories using 5 distinct scoring strategies. Results surface in a live Streamlit dashboard with radar charts, heatmaps, and side-by-side model comparison. Qwen2.5:3B ranks #1 on quality; Gemma2:2B leads on speed.

  • 3 models: Qwen2.5:3B, Gemma2:2B, Llama3.2 all run on 4 GB VRAM
  • 20 prompts × 6 categories: reasoning, coding, factual, maths, summarisation, instruction-following
  • 5 scorers: keyword overlap, exact match, code fence, structural (word/line count), ROUGE-1
  • Dashboard: radar chart, quality heatmap, TTFT box plot, history browser
3 Models Tested
20 Benchmark Prompts
4GB VRAM Target
04
04 / 04 Inference

vLLM vs SGLang vs TRT-LLM

LLM Serving Benchmark Qwen2.5-7B on RTX 5090

End-to-end benchmark of the three leading LLM serving backends across practical, overload, and extreme concurrency levels. Measures RPS, TPS, TTFT, and inter-token latency alongside quality evaluation (MMLU, GSM8K, HumanEval) with a Streamlit dashboard and Parquet export. TRT-LLM delivers 41% more TPS than vLLM at saturation; vLLM and SGLang reclaim the lead when the queue never empties.

  • 3 backends: vLLM (FlashInfer), SGLang, TRT-LLM (compiled TRT engine, CUDA 12.9)
  • 4 test suites: baseline sweep, fine-grained saturation, overload, and extreme overload up to c=5,000
  • Quality eval: MMLU, GSM8K, HumanEval direct-API fallback when GuideLLM is absent
  • Streamlit dashboard: KPI cards, TPS/TTFT curves, tradeoff scatter, raw data export
41% More TPS vs vLLM
22 TTFT ms @ c=1
5000 Max Concurrency
Work Experience
Jun 2024 Present Current
Vulcan Materials Company
AI Data Engineer
Apr 2021 Aug 2023
Pocket FM
Data Engineer - 2
Oct 2018 Feb 2021
Recosense Infosolutions
Software Engineer Platform
Education
Aug 2023 May 2025
The University of Texas at Dallas
Master of Science, Information Systems
Tools & Technologies
AI Protocols & Frameworks
MCP Protocol (Model Context Protocol)
A2A Protocol (Agent-to-Agent)
Google ADK (Agent Development Kit)
Gemini 2.5 Pro
Anthropic Claude API
Ollama local LLM runtime
Backend & APIs
Python 3.12+
FastAPI + uvicorn
Server-Sent Events (SSE)
REST API design
IMAP / SMTP
Async Python (asyncio)
Infrastructure & Tooling
Docker & Docker Compose
uv Python package manager
Streamlit dashboards
Git / GitHub
Linux / Bash scripting
Evaluation & Research
LLM Benchmarking methodology
ROUGE-1 scoring
TTFT & throughput measurement
Category-aware quality scoring
Small language model research
LLM Serving & Inference
vLLM continuous batching, PagedAttention
SGLang RadixAttention, KV-cache sharing
TensorRT-LLM compiled CUDA kernels, graph capture
CUDA 12.9 & Blackwell GPU (SM 12.0)
MMLU / GSM8K / HumanEval evaluation
GuideLLM & OpenAI-compatible APIs
Certifications
Hugging Face Hugging Face
The LLM Course
Fundamentals of AI Agents
Apache Airflow Astronomer
Apache Airflow