Multi-Agent Course Gen Chinmay Hebbal

Four specialised agents Researcher, Judge, Content Builder, Orchestrator coordinate over Google's Agent-to-Agent protocol to generate structured courses on any topic, with a built-in quality loop and SSE streaming to the browser.

The Agent-to-Agent (A2A) protocol is Google's open standard for agent interoperability the same idea as MCP but for agent-to-agent communication rather than agent-to-tool. Each agent exposes an HTTP endpoint that speaks A2A's JSON envelope, so any orchestrator can discover, invoke, and stream results from any agent without knowing its internals.

The Problem: Single-Agent Course Generation Sucks

The obvious approach one LLM call, one prompt, one output produces courses that are either shallow (when the model hallucinates confidently) or padded (when it fills length with filler). The root issue is that a single agent can't critique itself reliably. What you need is a separate judge that evaluates research quality before any content is written.

The Four-Agent Architecture

architecture from README

┌─────────────────────────────────────────────────────┐
│                   Frontend (port 8000)               │
│              React-like UI  ·  app/main.py           │
└─────────────────────┬───────────────────────────────┘
                      │ A2A
┌─────────────────────▼───────────────────────────────┐
│              Orchestrator (port 8004)                │
│         SequentialAgent + LoopAgent pipeline         │
└────┬──────────────────┬──────────────────┬───────────┘
     │ A2A              │ A2A              │ A2A
┌────▼──────┐    ┌──────▼──────┐   ┌──────▼──────────┐
│ Researcher│    │    Judge    │   │ Content Builder  │
│ port 8001 │    │  port 8002  │   │   port 8003      │
│           │    │             │   │                  │
│ Searches  │    │ Evaluates   │   │ Transforms       │
│ the web   │    │ research    │   │ findings into    │
│ via Gemini│    │ quality     │   │ a structured     │
│           │    │ (pass/fail) │   │ course           │
└───────────┘    └─────────────┘   └──────────────────┘

ORCHESTRATOR

Receives the topic, coordinates the pipeline, streams the final course via SSE to the browser.

↓

RESEARCHER

Runs Google Search grounding via Gemini 2.5 Pro. Produces a structured research brief.

↓

JUDGE

Evaluates the brief on depth, accuracy, and coverage. Returns pass/fail + structured feedback.

↓

CONTENT BUILDER

Converts a approved research brief into a structured, section-by-section course.

The Quality Loop

The Orchestrator doesn't hand research directly to the Content Builder. It first routes it through the Judge. If the Judge fails the brief, the Orchestrator sends the failure feedback back to the Researcher with the instruction to improve it. This retries up to three times. Only a passing brief proceeds to content generation.

orchestrator logic (simplified)

async def generate_course(topic: str) -> AsyncIterator[str]:
    feedback = None

    for attempt in range(3):
        research = await researcher.research(topic, feedback=feedback)
        verdict  = await judge.evaluate(research)

        if verdict.passed:
            break
        feedback = verdict.feedback   # loop: researcher retries with critique

    async for chunk in builder.stream_course(research):
        yield chunk                  # SSE stream to browser

Each Agent Is a Microservice

In the A2A model every agent is an independent process typically a FastAPI server — that advertises its capabilities via an agent card at /.well-known/agent.json. The Orchestrator discovers each agent at startup by fetching those cards.

researcher/main.py

from google.adk import Agent, AgentServer
from google.adk.tools import google_search

researcher = Agent(
    name="researcher",
    model="gemini-2.5-pro",
    tools=[google_search],
    instruction="""
    You produce a structured research brief for a given topic.
    Use Google Search to find current, accurate information.
    Format output as: Overview, Key Concepts, Practical Examples, Further Reading.
    If you receive feedback from a previous attempt, address every point raised.
    """
)

AgentServer(researcher).serve(port=8001)

The Judge Prompt Engineering

The Judge is the most sensitive agent to get right. A Judge that's too strict causes infinite retry loops; too lenient and shallow research slips through. The prompt uses a rubric with explicit pass criteria to keep it consistent:

judge system prompt

You are a rigorous quality judge for AI-generated research briefs.

Evaluate the brief on these dimensions (score 0–10 each):
  - Depth: Does it go beyond surface-level facts?
  - Accuracy: Are claims specific and verifiable?
  - Coverage: Does it address the core concepts a learner needs?
  - Structure: Is it well-organised with clear sections?

PASS if all dimensions score ≥ 7.
FAIL otherwise.

Return JSON: { "passed": bool, "scores": {...}, "feedback": "..." }
The feedback field must cite specific gaps, not generic advice.

SSE Streaming

Rather than waiting for the full course to finish generating (which can take 30–60 seconds for a long topic), the Orchestrator streams chunks directly to the browser as they arrive from the Content Builder. This uses Server-Sent Events a much lighter protocol than WebSockets for one-way streaming.

orchestrator/app.py

from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.post("/generate")
async def generate(req: TopicRequest):
    async def event_stream():
        async for chunk in pipeline.generate_course(req.topic):
            yield f"data: {chunk}\n\n"

    return StreamingResponse(
        event_stream(),
        media_type="text/event-stream"
    )

Docker Compose: 5 Services

The system runs as 5 Docker services one per agent plus the frontend. All inter-agent traffic stays on an internal Docker network; only the Orchestrator and frontend are exposed externally.

◆

uv is used as the Python package manager for all services. It resolves and installs the full dependency tree in under 3 seconds per image a huge improvement over pip for fast iteration on multi-container setups.

What I'd Improve

Parallelise the Researcher and a separate Fact-Checker agent that runs simultaneously
Add a streaming Judge evaluate research sections incrementally rather than waiting for the full brief
Persist completed courses to a lightweight SQLite store so repeat topics skip the pipeline
Expose the Judge's rubric scores in the UI so users understand why retries happened

Google ADK A2A Protocol Gemini 2.5 Pro FastAPI SSE Docker Python 3.12

MULTI-AGENT PIPELINES WITH GOOGLE ADK & A2A PROTOCOL

The Problem: Single-Agent Course Generation Sucks

The Four-Agent Architecture

The Quality Loop

Each Agent Is a Microservice

The Judge Prompt Engineering

SSE Streaming

Docker Compose: 5 Services

What I'd Improve