Chinmay Hebbal — AI Engineer

01 / 03 MCP

GMAIL MCP SERVER

Model Context Protocol — Gmail as AI-native tooling

Built an MCP server that exposes Gmail as a set of AI-native tools. Any MCP-compatible agent — including Anthropic's Claude — can read, search, filter and send emails through standardised tool calls over IMAP/SMTP, without any browser automation. Ships with a FastAPI-backed web UI for direct inbox management and a full Docker setup for one-command deployment.

8 MCP tools: list, search by sender/subject, get unread, get details, send, list folders
Two deployment modes: browser-based web UI (port 8000) or headless MCP server
App Password auth — no OAuth flow, no browser dependency
Fully Dockerised with compose file for both modes

8 MCP Tools

2 Deploy Modes

02 / 03 A2A

MULTI-AGENT COURSE GEN

Agent-to-Agent Protocol — Google ADK + Gemini 2.5 Pro

A multi-agent pipeline where four specialised agents coordinate over Google's A2A protocol to generate structured courses on any topic. The Orchestrator routes tasks to a Researcher (Google Search), a Judge (pass/fail quality gate), and a Content Builder — iterating up to three times until the research meets the bar, then streaming the final course directly to the browser via SSE.

4 agents: Researcher, Judge, Content Builder, Orchestrator — each a separate microservice
Quality loop: Researcher retries up to 3× based on Judge feedback before handoff
Real-time SSE streaming from Orchestrator to frontend as course is written
All agents powered by Gemini 2.5 Pro with Google Search grounding

4 Agents

3× Quality Loop

5 Microservices

03 / 03 SLM

SLM BENCHMARK SUITE

Small Language Model Evaluation on Consumer Hardware

A rigorous benchmarking framework for small language models running locally via Ollama, targeting a 4 GB VRAM machine (RTX 3050 Laptop). Measures time-to-first-token, throughput (tokens/sec), and quality across 6 prompt categories using 5 distinct scoring strategies. Results surface in a live Streamlit dashboard with radar charts, heatmaps, and side-by-side model comparison. Qwen2.5:3B ranks #1 on quality; Gemma2:2B leads on speed.

3 models: Qwen2.5:3B, Gemma2:2B, Llama3.2 — all run on 4 GB VRAM
20 prompts × 6 categories: reasoning, coding, factual, maths, summarisation, instruction-following
5 scorers: keyword overlap, exact match, code fence, structural (word/line count), ROUGE-1
Dashboard: radar chart, quality heatmap, TTFT box plot, history browser

3 Models Tested

20 Benchmark Prompts

4GB VRAM Target

Tools & Technologies

AI Protocols & Frameworks

MCP Protocol (Model Context Protocol)

A2A Protocol (Agent-to-Agent)

Google ADK (Agent Development Kit)

Gemini 2.5 Pro

Anthropic Claude API

Ollama — local LLM runtime

Backend & APIs

Python 3.12+

FastAPI + uvicorn

Server-Sent Events (SSE)

REST API design

IMAP / SMTP

Async Python (asyncio)

Infrastructure & Tooling

Docker & Docker Compose

uv — Python package manager

Streamlit dashboards

Git / GitHub

Linux / Bash scripting

Evaluation & Research

LLM Benchmarking methodology

ROUGE-1 scoring

TTFT & throughput measurement

Category-aware quality scoring

Small language model research

CHINMAY HEBBAL.

GMAIL MCP SERVER

MULTI-AGENT COURSE GEN

SLM BENCHMARK SUITE