Bridging the Gap: Understanding the Agent2Agent (A2A) Protocol for AI Interoperability

Overview of the A2A architecture

Bridging the Gap: Understanding the Agent2Agent (A2A) Protocol for AI Interoperability

The landscape of AI is rapidly evolving, with sophisticated agents capable of complex reasoning and task execution becoming commonplace. However, a significant challenge remains: enabling these often distinct and opaque agentic systems to communicate and collaborate effectively. Enter the Agent2Agent (A2A) Protocol, an open standard designed specifically to facilitate interoperability between these "black box" agents.

If you're an engineer working with Large Language Models (LLMs) or building agentic applications, understanding A2A can unlock new possibilities for creating more powerful, collaborative AI systems. This post dives into the core concepts, features, and design principles of A2A, drawing directly from its official documentation.

Why A2A? The Core Problem

Modern agents often operate as self-contained units. They might have their own internal state ("memory"), reasoning processes ("thoughts"), and access to specific tools. Sharing these internal workings directly can be complex, insecure, or simply undesirable. A2A addresses this by providing a standardized way for agents to interact without needing to expose their internal mechanisms.

Key Principles Guiding A2A

A2A is built on several key principles:

Simple: Leverages existing, widely adopted standards like HTTP and JSON-RPC 2.0.
Enterprise Ready: Incorporates considerations for Authentication, Security, Privacy, Tracing, and Monitoring from the outset.
Async First: Designed to handle tasks that might take seconds, minutes, hours, or even days, supporting long-running operations and human-in-the-loop scenarios naturally.
Modality Agnostic: Supports various data types beyond text, including audio, video, files, structured data (forms), etc.
Opaque Execution: Crucially, agents interact based on exchanged context, status, instructions, and data, not by sharing internal thoughts, plans, or tools directly.

The A2A Ecosystem: Actors and Transport

A2A interactions involve three primary actors:

User: The end-user (human or service) initiating a task.
Client: The agent, service, or application acting on behalf of the user, requesting an action from another agent.
Remote Agent (Server): The opaque agent receiving the request and performing the task (the A2A server).

Communication happens over HTTP(S), using JSON-RPC 2.0 as the payload format. For real-time updates, A2A supports Server-Sent Events (SSE) if both client and server enable this capability.

Discovering Agents: The Agent Card

Before a client can interact with a remote agent, it needs to know what the agent can do and how to talk to it. This is achieved through the Agent Card.

What it is: A JSON document published by the remote agent describing its:
- Basic identity (name, description, provider, version).
- Endpoint URL.
- Capabilities: Does it support streaming (streaming: true)? Push notifications (pushNotifications: true)? State history (stateTransitionHistory: true)?
- Authentication Requirements: How must clients authenticate? (e.g., OAuth2, Bearer token - aligned with OpenAPI standards).
- Default Modalities: What input/output MIME types does it generally support (e.g., text/plain, application/json)?
- Skills: Specific capabilities the agent offers, each with an ID, name, description, tags, examples, and potentially skill-specific input/output modes.
Discovery Mechanisms:
- Open Discovery: Recommended practice is hosting the card at a well-known path: https://<DOMAIN>/.well-known/agent.json.
- Curated Discovery: Enterprises might maintain registries or catalogs of approved agents.
- Private Discovery: Agents might be exposed via private APIs or marketplaces.
Security: Agent cards, especially if containing sensitive info like credential hints, should be protected. Standard mechanisms like mTLS or requiring authentication to access the card are recommended, particularly in enterprise settings.

The Core of Communication: Tasks, Artifacts, Messages, Parts

A2A communication revolves around Tasks.

Task:
- Represents the unit of work being performed. Created by the Client, status managed by the Remote Agent.
- Has a unique id and an optional sessionId (client-generated) to group related tasks.
- Maintains a status (e.g., submitted, working, input-required, completed, canceled, failed).
- Can contain artifacts (results) and optionally a history of messages.
- TaskStatus includes the state, an optional explanatory message, and a timestamp.
Artifact:
- Represents the output or result generated by the agent for a task (e.g., a report, an image, structured data).
- Immutable, can be named, and composed of one or more parts.
- Streaming responses can append parts to existing artifacts.
Message:
- Contains any content other than final results/artifacts. Used for:
  - Client requests/instructions.
  - Agent status updates, requests for information, or intermediate thoughts (though sharing thoughts is not required).
  - Errors.
- Has a role (user or agent) and consists of one or more parts.
Part:
- A discrete piece of content within a Message or Artifact.
- Has a type (text, file, data) and associated content (e.g., text string, file details like name/MIME type/URI/bytes, or structured data object).
- Can include metadata, potentially specifying schemas for structured data.

Key Communication Patterns & Methods

A2A defines standard JSON-RPC methods for common interactions:

tasks/send: The fundamental method for a Client to send a Message to initiate or continue a Task. Returns the current Task state, potentially including immediate results (Artifacts) if the task completes quickly.
tasks/get: Allows a Client to poll for the latest status and Artifacts of a specific Task. Can optionally request recent message history.
tasks/cancel: Allows a Client to request cancellation of an ongoing Task.
Multi-turn Conversations: If an agent needs more input, it sets the Task status to input-required and includes a Message explaining what's needed (e.g., "Please provide the account number," potentially with a structured form definition in another Part). The Client then uses tasks/send again with the same Task id to provide the required information.
Streaming (tasks/sendSubscribe, tasks/resubscribe):
- For agents and clients supporting SSE, tasks/sendSubscribe initiates a task and opens a stream.
- The server sends TaskStatusUpdateEvent (for status changes) and TaskArtifactUpdateEvent (for streaming artifact parts) over SSE. Events indicate if they are final.
- tasks/resubscribe allows a disconnected client to reconnect to an existing task's stream (if supported and the task is still active).
Push Notifications (Disconnected Updates):
- Crucial for long-running tasks where maintaining a connection is impractical.
- The Client provides a PushNotificationConfig (URL, optional task-specific token, authentication details) either in the initial tasks/send or later using tasks/pushNotification/set.
- The Remote Agent sends the full Task object payload to the specified URL when a significant update occurs (e.g., task reaches a stopping state like completed or input-required).
- Security is paramount:
  - Agent Verification: Agents should verify the callback URL (e.g., via a GET challenge with a validationToken) to prevent DDOS attacks.
  - Receiver Verification: The notification receiver (which might be a dedicated service, not the client itself) MUST authenticate the incoming notification from the agent. Recommended methods include:
    - Asymmetric Keys (e.g., JWT/JWKS): Agent signs with private key, receiver verifies with public key (published via JWKS or pre-shared). Recommended for better security.
    - Symmetric Keys: Shared secret used for signing/verification (e.g., HMAC in JWT).
    - OAuth: Agent obtains a token and includes it; receiver verifies with the OAuth provider.
    - Bearer Token: Simple token provided by the receiver to the agent. Less secure as the token itself can be leaked.
  - Replay Prevention: Use timestamps (e.g., iat in JWT) included in the signed payload; reject old notifications.
  - Key Rotation: Use protocols like JWKS to manage key rotation without downtime.
- tasks/pushNotification/get: Allows retrieval of the currently set configuration.

Enterprise Readiness: Security, Auth, Tracing

A2A is designed to fit within enterprise ecosystems:

Transport Security: HTTPS/TLS is expected for production.
Authentication:
- Relies on standard HTTP authentication mechanisms (defined in the Agent Card, aligning with OpenAPI). Think OAuth 2.0, OIDC, API Keys, mTLS.
- Credentials (tokens) are passed in HTTP headers, not in A2A payloads. Identity negotiation happens out-of-band.
- Servers authenticate every request using standard HTTP responses (401/403) and headers (e.g., WWW-Authenticate).
Authorization: Servers should authorize requests based on client/user identity against:
- Skills: Granting access only to specific advertised capabilities.
- Tools/Data: Implementing finer-grained checks if the agent uses internal tools or accesses sensitive data.
Tracing & Observability: Leverage standard HTTP practices and tooling (e.g., OpenTelemetry headers) for distributed tracing and monitoring. A2A itself doesn't define tracing specifics.

A2A vs. MCP (Model Context Protocol)

It's important to understand A2A's relationship with protocols like MCP:

MCP: Primarily focuses on connecting models/agents to Tools and data sources (like function calling).
A2A: Focuses on Agent-to-Agent collaboration at a higher level, enabling conversation, task delegation, and status tracking between opaque agents.
Complementary: They solve different problems. The recommendation is often to use MCP for tool interactions and A2A for agent-level interactions. An agent might even be represented as an MCP resource, discoverable via its Agent Card.

Getting Started

The A2A protocol provides a robust framework for building interoperable agentic systems. By leveraging familiar standards and focusing on task completion between opaque entities, it offers a practical path towards more collaborative AI.

Explore the official A2A GitHub repository for the full specification, detailed examples (including sample Python code for agents and clients demonstrating concepts like push notifications with JWT/JWKS), and to provide feedback as the protocol evolves towards version 1.0.

By adopting standards like A2A, we can move towards a future where diverse AI agents can work together seamlessly, much like specialized teams collaborate in the human world.

Bridging the Gap: Understanding the Agent2Agent (A2A) Protocol for AI Interoperability

Bridging the Gap: Understanding the Agent2Agent (A2A) Protocol for AI Interoperability

You May Also Like

Making AI Agents Work Together: Understanding Google's A2A Protocol

Welcome to the MCPFly Blog