Multi-Agent Communication: A2A, MCP, and Message Buses

The big architectural question that goes from “I have an agent” to “I have an agent system” is how agents talk to each other. By 2026 the answer crystallized into two protocols stacked on top of a familiar pattern:

MCP (Model Context Protocol) — how an agent talks to tools. Standardized in late 2025; the de-facto vocabulary for tool surfaces.
A2A (Agent-to-Agent Protocol) — how an agent talks to other agents. Capability-based, peer-to-peer, transport-agnostic.
Message buses — the durable infrastructure underneath everything multi-agent that needs to survive process restarts.

This post is the map of how they fit, with the patterns that hold up in production.

MCP vs A2A in one paragraph

MCP connects a single agent to external tools, databases, and APIs. A2A enables multiple agents to communicate and delegate tasks to each other. They are not competitors; most production deployments use both. The pithy framing from the 2026 protocol surveys: MCP is the universal adapter for tools; A2A is the standard for structured, secure communication between autonomous agents.

Solid arrows are direct calls. Dashed are responses (A2A is bidirectional). The dashed bottom box is the optional layer underneath: for long-running multi-agent workflows, you do not let two agents talk over an ephemeral HTTP socket — you persist the message on a durable bus.

A2A: the protocol

A2A’s core idea is capability-based representation. Each agent publishes an Agent Card — a structured JSON document that describes:

The agent’s identity (name, version, owner).
The capabilities it offers (named methods with schemas).
How to invoke them (transport URL, auth requirements).
Policy metadata (cost hints, latency expectations).

An Agent Card is the equivalent of an OpenAPI spec for an agent. A consumer reads the card, decides what to call, and invokes the capability. The transport in 2026’s reference implementation is HTTP plus Server-Sent Events for streaming, with WebSockets for fully duplex when needed.

A minimal Agent Card:

{
  "agent_id": "writer-v3",
  "version": "3.2.1",
  "owner": "content-platform",
  "capabilities": [
    {
      "name": "draft_article",
      "description": "Write a 1200-word article from a research brief.",
      "input_schema": {
        "type": "object",
        "properties": {
          "brief": { "type": "string" },
          "style": { "type": "string", "enum": ["news", "explainer", "opinion"] }
        },
        "required": ["brief"]
      },
      "output_schema": { "type": "object", "properties": { "article": { "type": "string" } } }
    }
  ],
  "invocation": {
    "url": "https://agents.acme.internal/writer-v3",
    "auth": { "type": "oauth2", "scope": "agent:invoke" }
  }
}

A delegating agent reads the card, builds a request matching input_schema, calls the URL with the required auth, and reads back the response. Streaming intermediate updates flow over SSE.

What A2A gives you that ad-hoc agent-calls-agent doesn’t:

Discovery. An agent registry lists cards; agents find each other at runtime.
Versioning. Cards include version; consumers pin or accept ranges.
Auth surface. OAuth scopes are part of the card; you can’t accidentally call a capability you’re not authorized for.
Streaming. Long-running delegations stream progress instead of blocking.
Cross-framework portability. A LangGraph agent can call a CrewAI agent through A2A without either side knowing or caring.

When to use A2A vs subgraphs

The decision tree, in practice:

Pattern	Use
Same process, same framework	LangGraph subgraph / CrewAI sub-crew / ADK sub-agent
Same team, separate processes	A2A or internal RPC
Different teams, same org	A2A with auth, internal registry
External vendor agents	A2A with strict input validation

The mistake is using A2A when a subgraph would do. Network calls, even on a local mesh, add latency and failure modes. If the “other agent” is in the same repo and run by the same team, make it a node in your graph and skip the protocol overhead.

The complementary mistake is using a subgraph when A2A is needed. If the other agent is owned by another team, has its own deployment lifecycle, or might be in a different framework, you want the protocol boundary. The transport is a feature, not a tax.

The durable bus underneath

For long-running multi-agent systems — anything that crosses minutes or sessions — synchronous A2A calls are not enough. The pattern that holds up:

producer agent → durable message bus → consumer agent
                                     ↓
                          dead-letter queue + retry

The bus is Kafka, Redis Streams, NATS, AWS SQS/SNS, or GCP Pub/Sub — pick one. The bus owns three responsibilities the protocol doesn’t:

Persistence. A message survives the producer crashing before the consumer reads it.
Replay. When a consumer fails after acknowledging, you can re-deliver from a snapshot.
Backpressure. Consumers that are slow don’t take down producers.

A LangGraph agent that emits work for another agent looks like:

async def dispatch_to_writer(state: State):
    payload = {
        "trace_id": state["trace_id"],
        "brief": state["brief"],
        "style": "explainer",
        "callback_topic": "agent.writer.results",
    }
    await bus.publish("agent.writer.requests", payload)
    return {"pending_writer_request": payload}

The consumer is a worker (its own LangGraph app, or another framework’s runtime) subscribed to agent.writer.requests. It picks up the message, calls draft_article, and publishes the result to agent.writer.results. The original agent listens on the callback topic and resumes when the result arrives — using LangGraph’s Command(resume=...) pattern from post 4, or the equivalent in other frameworks.

This shape eats latency but survives outages. Choose it for workflows where the user is not waiting in real time. Don’t use it for foreground chat where every 200ms matters.

Orchestration patterns

Three multi-agent topologies dominate:

Supervisor / workers

One supervisor agent delegates to N worker agents. The supervisor decides which worker runs and merges results. CrewAI’s hierarchical process, LangGraph’s standard supervisor pattern, ADK’s LLM-routed agents are all variants.

Use when: tasks vary in shape and a router is genuinely needed.

Pipeline

Agents arranged in a fixed sequence. Each agent’s output is the next agent’s input. CrewAI’s sequential process, ADK’s SequentialAgent, LangGraph’s straight edges are variants.

Use when: the steps are stable and the value is in specialization, not routing.

Mesh / market

Agents publish what they need; matching agents bid or auto-pick the work. Rare in production today (too much variance), common in research. Some early enterprise mesh deployments exist, mostly behind the scenes at large platforms.

Use when: you have many independent agents with overlapping capabilities and want load balancing. Otherwise, supervisor or pipeline.

Failure modes specific to multi-agent

The new ones beyond single-agent debugging:

Cascading failures. Agent A timed out, Agent B retried, Agent C is now flooded with stale requests. Add per-edge timeouts and circuit breakers.
Identity confusion. Two agents using the same thread_id for unrelated work clobber each other’s state. Make thread_ids scoped (agent:writer:thread:42).
Authorization drift. Agent A can read what Agent B can’t, but B is asking A to read it. The authorization boundary is the agent’s identity, not the user’s. Plumb actor context through A2A calls.
Trace fragmentation. Each agent has its own trace tree; the cross-agent flow is invisible. Use OpenTelemetry context propagation across A2A calls so the parent trace stitches sub-traces together.
Coordination overhead exceeds value. Two agents talking to each other can be slower than one agent doing both jobs. Measure before adopting multi-agent for its own sake.

What to build in 2026

If you’re starting a multi-agent system today, the defaults that don’t burn you:

MCP for tools, A2A for inter-agent. Don’t invent your own protocols. The standards exist and the ecosystem rewards using them.
Subgraphs first. Reach for A2A only when you need a protocol boundary — separate team, separate process, separate lifecycle.
Durable bus for anything async. SQS, NATS, Redis Streams. Don’t ship async multi-agent on plain HTTP.
Trace propagation from day one. OpenTelemetry context across A2A calls is the only way to debug what happened.
Per-agent step and budget caps. A worker agent that loops forever takes down the supervisor’s budget too.

Next week: RAG for agents — hybrid retrieval, the patterns that work, and how this is different from “RAG for chatbots.”