The Agent-to-Agent (A2A) protocol is Google's open standard for agent interoperability — the same idea as MCP but for agent-to-agent communication rather than agent-to-tool. Each agent exposes an HTTP endpoint that speaks A2A's JSON envelope, so any orchestrator can discover, invoke, and stream results from any agent without knowing its internals.
The Problem: Single-Agent Course Generation Sucks
The obvious approach — one LLM call, one prompt, one output — produces courses that are either shallow (when the model hallucinates confidently) or padded (when it fills length with filler). The root issue is that a single agent can't critique itself reliably. What you need is a separate judge that evaluates research quality before any content is written.
The Four-Agent Architecture
┌─────────────────────────────────────────────────────┐
│ Frontend (port 8000) │
│ React-like UI · app/main.py │
└─────────────────────┬───────────────────────────────┘
│ A2A
┌─────────────────────▼───────────────────────────────┐
│ Orchestrator (port 8004) │
│ SequentialAgent + LoopAgent pipeline │
└────┬──────────────────┬──────────────────┬───────────┘
│ A2A │ A2A │ A2A
┌────▼──────┐ ┌──────▼──────┐ ┌──────▼──────────┐
│ Researcher│ │ Judge │ │ Content Builder │
│ port 8001 │ │ port 8002 │ │ port 8003 │
│ │ │ │ │ │
│ Searches │ │ Evaluates │ │ Transforms │
│ the web │ │ research │ │ findings into │
│ via Gemini│ │ quality │ │ a structured │
│ │ │ (pass/fail) │ │ course │
└───────────┘ └─────────────┘ └──────────────────┘
The Quality Loop
The Orchestrator doesn't hand research directly to the Content Builder. It first routes it through the Judge. If the Judge fails the brief, the Orchestrator sends the failure feedback back to the Researcher with the instruction to improve it. This retries up to three times. Only a passing brief proceeds to content generation.
async def generate_course(topic: str) -> AsyncIterator[str]: feedback = None for attempt in range(3): research = await researcher.research(topic, feedback=feedback) verdict = await judge.evaluate(research) if verdict.passed: break feedback = verdict.feedback # loop: researcher retries with critique async for chunk in builder.stream_course(research): yield chunk # SSE stream to browser
Each Agent Is a Microservice
In the A2A model every agent is an independent process — typically a FastAPI server —
that advertises its capabilities via an agent card at
/.well-known/agent.json.
The Orchestrator discovers each agent at startup by fetching those cards.
from google.adk import Agent, AgentServer from google.adk.tools import google_search researcher = Agent( name="researcher", model="gemini-2.5-pro", tools=[google_search], instruction=""" You produce a structured research brief for a given topic. Use Google Search to find current, accurate information. Format output as: Overview, Key Concepts, Practical Examples, Further Reading. If you receive feedback from a previous attempt, address every point raised. """ ) AgentServer(researcher).serve(port=8001)
The Judge Prompt Engineering
The Judge is the most sensitive agent to get right. A Judge that's too strict causes infinite retry loops; too lenient and shallow research slips through. The prompt uses a rubric with explicit pass criteria to keep it consistent:
You are a rigorous quality judge for AI-generated research briefs.
Evaluate the brief on these dimensions (score 0–10 each):
- Depth: Does it go beyond surface-level facts?
- Accuracy: Are claims specific and verifiable?
- Coverage: Does it address the core concepts a learner needs?
- Structure: Is it well-organised with clear sections?
PASS if all dimensions score ≥ 7.
FAIL otherwise.
Return JSON: { "passed": bool, "scores": {...}, "feedback": "..." }
The feedback field must cite specific gaps, not generic advice.
SSE Streaming
Rather than waiting for the full course to finish generating (which can take 30–60 seconds for a long topic), the Orchestrator streams chunks directly to the browser as they arrive from the Content Builder. This uses Server-Sent Events — a much lighter protocol than WebSockets for one-way streaming.
from fastapi import FastAPI from fastapi.responses import StreamingResponse app = FastAPI() @app.post("/generate") async def generate(req: TopicRequest): async def event_stream(): async for chunk in pipeline.generate_course(req.topic): yield f"data: {chunk}\n\n" return StreamingResponse( event_stream(), media_type="text/event-stream" )
Docker Compose: 5 Services
The system runs as 5 Docker services — one per agent plus the frontend. All inter-agent traffic stays on an internal Docker network; only the Orchestrator and frontend are exposed externally.
uv is used as the Python package manager for all services. It resolves and installs the full dependency tree in under 3 seconds per image — a huge improvement over pip for fast iteration on multi-container setups.
What I'd Improve
- Parallelise the Researcher and a separate Fact-Checker agent that runs simultaneously
- Add a streaming Judge — evaluate research sections incrementally rather than waiting for the full brief
- Persist completed courses to a lightweight SQLite store so repeat topics skip the pipeline
- Expose the Judge's rubric scores in the UI so users understand why retries happened