Multi-Agent Communication: A2A, MCP, and Message Buses
The big architectural question that goes from “I have an agent” to “I have an agent system” is how agents talk to each other. By 2026 the answer crystallized into two protocols stacked on top of a familiar pattern:
- MCP (Model Context Protocol) — how an agent talks to tools. Standardized in late 2025; the de-facto vocabulary for tool surfaces.
- A2A (Agent-to-Agent Protocol) — how an agent talks to other agents. Capability-based, peer-to-peer, transport-agnostic.
- Message buses — the durable infrastructure underneath everything multi-agent that needs to survive process restarts.
This post is the map of how they fit, with the patterns that hold up in production.
MCP vs A2A in one paragraph
MCP connects a single agent to external tools, databases, and APIs. A2A enables multiple agents to communicate and delegate tasks to each other. They are not competitors; most production deployments use both. The pithy framing from the 2026 protocol surveys: MCP is the universal adapter for tools; A2A is the standard for structured, secure communication between autonomous agents.
Solid arrows are direct calls. Dashed are responses (A2A is bidirectional). The dashed bottom box is the optional layer underneath: for long-running multi-agent workflows, you do not let two agents talk over an ephemeral HTTP socket — you persist the message on a durable bus.
A2A: the protocol
A2A’s core idea is capability-based representation. Each agent publishes an Agent Card — a structured JSON document that describes:
- The agent’s identity (name, version, owner).
- The capabilities it offers (named methods with schemas).
- How to invoke them (transport URL, auth requirements).
- Policy metadata (cost hints, latency expectations).
An Agent Card is the equivalent of an OpenAPI spec for an agent. A consumer reads the card, decides what to call, and invokes the capability. The transport in 2026’s reference implementation is HTTP plus Server-Sent Events for streaming, with WebSockets for fully duplex when needed.
A minimal Agent Card:
{
"agent_id": "writer-v3",
"version": "3.2.1",
"owner": "content-platform",
"capabilities": [
{
"name": "draft_article",
"description": "Write a 1200-word article from a research brief.",
"input_schema": {
"type": "object",
"properties": {
"brief": { "type": "string" },
"style": { "type": "string", "enum": ["news", "explainer", "opinion"] }
},
"required": ["brief"]
},
"output_schema": { "type": "object", "properties": { "article": { "type": "string" } } }
}
],
"invocation": {
"url": "https://agents.acme.internal/writer-v3",
"auth": { "type": "oauth2", "scope": "agent:invoke" }
}
}
A delegating agent reads the card, builds a request matching input_schema, calls the URL with the required auth, and reads back the response. Streaming intermediate updates flow over SSE.
What A2A gives you that ad-hoc agent-calls-agent doesn’t:
- Discovery. An agent registry lists cards; agents find each other at runtime.
- Versioning. Cards include version; consumers pin or accept ranges.
- Auth surface. OAuth scopes are part of the card; you can’t accidentally call a capability you’re not authorized for.
- Streaming. Long-running delegations stream progress instead of blocking.
- Cross-framework portability. A LangGraph agent can call a CrewAI agent through A2A without either side knowing or caring.
When to use A2A vs subgraphs
The decision tree, in practice:
| Pattern | Use |
|---|---|
| Same process, same framework | LangGraph subgraph / CrewAI sub-crew / ADK sub-agent |
| Same team, separate processes | A2A or internal RPC |
| Different teams, same org | A2A with auth, internal registry |
| External vendor agents | A2A with strict input validation |
The mistake is using A2A when a subgraph would do. Network calls, even on a local mesh, add latency and failure modes. If the “other agent” is in the same repo and run by the same team, make it a node in your graph and skip the protocol overhead.
The complementary mistake is using a subgraph when A2A is needed. If the other agent is owned by another team, has its own deployment lifecycle, or might be in a different framework, you want the protocol boundary. The transport is a feature, not a tax.
The durable bus underneath
For long-running multi-agent systems — anything that crosses minutes or sessions — synchronous A2A calls are not enough. The pattern that holds up:
producer agent → durable message bus → consumer agent
↓
dead-letter queue + retry
The bus is Kafka, Redis Streams, NATS, AWS SQS/SNS, or GCP Pub/Sub — pick one. The bus owns three responsibilities the protocol doesn’t:
- Persistence. A message survives the producer crashing before the consumer reads it.
- Replay. When a consumer fails after acknowledging, you can re-deliver from a snapshot.
- Backpressure. Consumers that are slow don’t take down producers.
A LangGraph agent that emits work for another agent looks like:
async def dispatch_to_writer(state: State):
payload = {
"trace_id": state["trace_id"],
"brief": state["brief"],
"style": "explainer",
"callback_topic": "agent.writer.results",
}
await bus.publish("agent.writer.requests", payload)
return {"pending_writer_request": payload}
The consumer is a worker (its own LangGraph app, or another framework’s runtime) subscribed to agent.writer.requests. It picks up the message, calls draft_article, and publishes the result to agent.writer.results. The original agent listens on the callback topic and resumes when the result arrives — using LangGraph’s Command(resume=...) pattern from post 4, or the equivalent in other frameworks.
This shape eats latency but survives outages. Choose it for workflows where the user is not waiting in real time. Don’t use it for foreground chat where every 200ms matters.
Orchestration patterns
Three multi-agent topologies dominate:
Supervisor / workers
One supervisor agent delegates to N worker agents. The supervisor decides which worker runs and merges results. CrewAI’s hierarchical process, LangGraph’s standard supervisor pattern, ADK’s LLM-routed agents are all variants.
Use when: tasks vary in shape and a router is genuinely needed.
Pipeline
Agents arranged in a fixed sequence. Each agent’s output is the next agent’s input. CrewAI’s sequential process, ADK’s SequentialAgent, LangGraph’s straight edges are variants.
Use when: the steps are stable and the value is in specialization, not routing.
Mesh / market
Agents publish what they need; matching agents bid or auto-pick the work. Rare in production today (too much variance), common in research. Some early enterprise mesh deployments exist, mostly behind the scenes at large platforms.
Use when: you have many independent agents with overlapping capabilities and want load balancing. Otherwise, supervisor or pipeline.
Failure modes specific to multi-agent
The new ones beyond single-agent debugging:
- Cascading failures. Agent A timed out, Agent B retried, Agent C is now flooded with stale requests. Add per-edge timeouts and circuit breakers.
- Identity confusion. Two agents using the same
thread_idfor unrelated work clobber each other’s state. Makethread_ids scoped (agent:writer:thread:42). - Authorization drift. Agent A can read what Agent B can’t, but B is asking A to read it. The authorization boundary is the agent’s identity, not the user’s. Plumb actor context through A2A calls.
- Trace fragmentation. Each agent has its own trace tree; the cross-agent flow is invisible. Use OpenTelemetry context propagation across A2A calls so the parent trace stitches sub-traces together.
- Coordination overhead exceeds value. Two agents talking to each other can be slower than one agent doing both jobs. Measure before adopting multi-agent for its own sake.
What to build in 2026
If you’re starting a multi-agent system today, the defaults that don’t burn you:
- MCP for tools, A2A for inter-agent. Don’t invent your own protocols. The standards exist and the ecosystem rewards using them.
- Subgraphs first. Reach for A2A only when you need a protocol boundary — separate team, separate process, separate lifecycle.
- Durable bus for anything async. SQS, NATS, Redis Streams. Don’t ship async multi-agent on plain HTTP.
- Trace propagation from day one. OpenTelemetry context across A2A calls is the only way to debug what happened.
- Per-agent step and budget caps. A worker agent that loops forever takes down the supervisor’s budget too.
Next week: RAG for agents — hybrid retrieval, the patterns that work, and how this is different from “RAG for chatbots.”