AI Engineer & Builder

CHINMAY HEBBAL.

Building AI that cut through the hype.

Projects
MCP GMAIL MCP SERVER Model Context Protocol A2A MULTI-AGENT COURSE GEN Agent-to-Agent Protocol SLM SLM BENCHMARK SUITE Small Language Models Inference vLLM vs SGLang vs TRT-LLM LLM Serving Benchmark
Scroll
01
01 / 03 MCP

GMAIL MCP SERVER

Model Context Protocol — Gmail as AI-native tooling

Built an MCP server that exposes Gmail as a set of AI-native tools. Any MCP-compatible agent — including Anthropic's Claude — can read, search, filter and send emails through standardised tool calls over IMAP/SMTP, without any browser automation. Ships with a FastAPI-backed web UI for direct inbox management and a full Docker setup for one-command deployment.

  • 8 MCP tools: list, search by sender/subject, get unread, get details, send, list folders
  • Two deployment modes: browser-based web UI (port 8000) or headless MCP server
  • App Password auth — no OAuth flow, no browser dependency
  • Fully Dockerised with compose file for both modes
8 MCP Tools
2 Deploy Modes
02
02 / 03 A2A

MULTI-AGENT COURSE GEN

Agent-to-Agent Protocol — Google ADK + Gemini 2.5 Pro

A multi-agent pipeline where four specialised agents coordinate over Google's A2A protocol to generate structured courses on any topic. The Orchestrator routes tasks to a Researcher (Google Search), a Judge (pass/fail quality gate), and a Content Builder — iterating up to three times until the research meets the bar, then streaming the final course directly to the browser via SSE.

  • 4 agents: Researcher, Judge, Content Builder, Orchestrator — each a separate microservice
  • Quality loop: Researcher retries up to 3× based on Judge feedback before handoff
  • Real-time SSE streaming from Orchestrator to frontend as course is written
  • All agents powered by Gemini 2.5 Pro with Google Search grounding
4 Agents
Quality Loop
5 Microservices
03
03 / 04 SLM

SLM BENCHMARK SUITE

Small Language Model Evaluation on Consumer Hardware

A rigorous benchmarking framework for small language models running locally via Ollama, targeting a 4 GB VRAM machine (RTX 3050 Laptop). Measures time-to-first-token, throughput (tokens/sec), and quality across 6 prompt categories using 5 distinct scoring strategies. Results surface in a live Streamlit dashboard with radar charts, heatmaps, and side-by-side model comparison. Qwen2.5:3B ranks #1 on quality; Gemma2:2B leads on speed.

  • 3 models: Qwen2.5:3B, Gemma2:2B, Llama3.2 — all run on 4 GB VRAM
  • 20 prompts × 6 categories: reasoning, coding, factual, maths, summarisation, instruction-following
  • 5 scorers: keyword overlap, exact match, code fence, structural (word/line count), ROUGE-1
  • Dashboard: radar chart, quality heatmap, TTFT box plot, history browser
3 Models Tested
20 Benchmark Prompts
4GB VRAM Target
04
04 / 04 Inference

vLLM vs SGLang vs TRT-LLM

LLM Serving Benchmark — Qwen2.5-7B on RTX 5090

End-to-end benchmark of the three leading LLM serving backends across practical, overload, and extreme concurrency levels. Measures RPS, TPS, TTFT, and inter-token latency alongside quality evaluation (MMLU, GSM8K, HumanEval) with a Streamlit dashboard and Parquet export. TRT-LLM delivers 41% more TPS than vLLM at saturation; vLLM and SGLang reclaim the lead when the queue never empties.

  • 3 backends: vLLM (FlashInfer), SGLang, TRT-LLM (compiled TRT engine, CUDA 12.9)
  • 4 test suites: baseline sweep, fine-grained saturation, overload, and extreme overload up to c=5,000
  • Quality eval: MMLU, GSM8K, HumanEval — direct-API fallback when GuideLLM is absent
  • Streamlit dashboard: KPI cards, TPS/TTFT curves, tradeoff scatter, raw data export
41% More TPS vs vLLM
22 TTFT ms @ c=1
5000 Max Concurrency
Work Experience
Jun 2024 — Present Current
Vulcan Materials Company
AI Data Engineer
Apr 2021 — Aug 2023
Pocket FM
Data Engineer - 2
Oct 2018 — Feb 2021
Recosense Infosolutions
Software Engineer — Platform
Education
Aug 2023 — May 2025
The University of Texas at Dallas
Master of Science, Information Systems
Tools & Technologies
AI Protocols & Frameworks
MCP Protocol (Model Context Protocol)
A2A Protocol (Agent-to-Agent)
Google ADK (Agent Development Kit)
Gemini 2.5 Pro
Anthropic Claude API
Ollama — local LLM runtime
Backend & APIs
Python 3.12+
FastAPI + uvicorn
Server-Sent Events (SSE)
REST API design
IMAP / SMTP
Async Python (asyncio)
Infrastructure & Tooling
Docker & Docker Compose
uv — Python package manager
Streamlit dashboards
Git / GitHub
Linux / Bash scripting
Evaluation & Research
LLM Benchmarking methodology
ROUGE-1 scoring
TTFT & throughput measurement
Category-aware quality scoring
Small language model research
LLM Serving & Inference
vLLM — continuous batching, PagedAttention
SGLang — RadixAttention, KV-cache sharing
TensorRT-LLM — compiled CUDA kernels, graph capture
CUDA 12.9 & Blackwell GPU (SM 12.0)
MMLU / GSM8K / HumanEval evaluation
GuideLLM & OpenAI-compatible APIs
Certifications
AWS Amazon Web Services
AWS Certified AI Practitioner
AWS Certified Solutions Architect – Associate
Anthropic Anthropic
Claude Code in Action
Advanced Model Context Protocol
Hugging Face Hugging Face
The LLM Course
Fundamentals of AI Agents
Databricks Databricks
Generative AI Fundamentals
Databricks Certified Data Engineer Associate
Apache Airflow Astronomer
Apache Airflow
Microsoft Microsoft
Microsoft Certified: Azure Fundamentals