A2A Protocol Interaction Specification

!!! note "Note" This document defines the extended implementation specification for the A2A protocol within the trpc-agent-go framework. Regular users do not need to read this document when using A2A Client/Server — the framework automatically handles all protocol conversion details. You only need to refer to this specification when developing a non-trpc-agent-go A2A Client/Server that interoperates with this framework.

Background

The A2A (Agent-to-Agent) protocol defines the basic data models (Message, Task, Part, etc.) and operation interfaces (SendMessage, StreamMessage, etc.) for inter-Agent communication. The A2A specification states its design goals at the very beginning:

The Agent2Agent (A2A) Protocol is an open standard designed to facilitate communication and interoperability between independent, potentially opaque AI agent systems.

Its primary goal is to enable agents to:

Discover each other's capabilities.

Negotiate interaction modalities (text, files, structured data).

Manage collaborative tasks.

Securely exchange information to achieve user goals without needing access to each other's internal state, memory, or tools.

— A2A Protocol Specification

As shown above, A2A envisions agents collaborating as "black boxes" — discovering capabilities, negotiating interaction modalities, and managing collaborative tasks, but without needing access to each other's Tools, Memory, or Internal State. Based on this design philosophy, trace data from an Agent's execution process — such as tool call chains (Function Call / Response), code execution steps, and model reasoning steps (Reasoning) — falls outside the scope of the A2A protocol, which does not define how such data should be transmitted.

However, in real-world multi-Agent orchestration scenarios, some users want to see the execution path of a remote Agent for debugging, auditing, or finer-grained coordination. To address this need, trpc-agent-go leverages the extension mechanisms reserved by the A2A protocol (DataPart, Message.metadata, etc.) to support the transmission of such data without violating the protocol specification.

This document defines the interaction specification of trpc-agent-go on top of the A2A protocol, serving as the standard reference for Client and Server implementations. This document will be updated as the A2A protocol evolves.

!!! info "Future Plans" To better align with the A2A specification's design philosophy, the cross-Agent transmission of execution process data such as tool calls will be designed as an independent extension, allowing users to decide whether to enable it through configuration. If you prefer to strictly follow A2A's black-box collaboration model without exposing internal execution details, simply disable it — only final results will be transmitted.

For the complete A2A protocol specification, see: https://a2a-protocol.org/latest/specification/

For the framework usage guide, see: A2A Integration Guide

Version Identifier

Current interaction specification version: 0.1

trpc-agent-go declares the interaction specification version through the standard A2A AgentCard.capabilities.extensions field, allowing the Client to detect the specification version supported by the remote Agent during discovery, enabling compatible handling during future upgrades.

This extension identifies the Agent interaction specification — the overall protocol-level conventions such as message encoding formats, streaming control, and metadata field definitions. The cross-Agent transmission of execution process data like tool call traces is an independent capability that will be declared and controlled through a separate extension in the future, and is not within the scope of this extension.

Declaration in AgentCard

The AgentCard automatically generated by the A2A Server includes the following extension:

{
  "capabilities": {
    "streaming": true,
    "extensions": [
      {
        "uri": "trpc-a2a-version",
        "params": {
          "version": "0.1"
        }
      }
    ]
  }
}

Field	Description
`uri`	Extension identifier, fixed as `trpc-a2a-version`
`required`	Omitted (defaults to `false`), indicating this extension is declarative and does not require the Client to support it. Standard A2A Clients that do not recognize this extension can still perform basic interactions
`params.version`	Interaction specification version number, following semantic versioning (currently `0.1`)

Version Declaration in Requests

When the A2A Client sends a request (message/send or message/stream), it includes the interaction specification version it supports in the Message metadata:

{
  "role": "user",
  "parts": [...],
  "metadata": {
    "interaction_spec_version": "0.1",
    "invocation_id": "...",
    "user_id": "..."
  }
}

Field	Description
`interaction_spec_version`	The interaction specification version supported by the Client; the Server can use this to determine the response encoding format

The Server can determine the Client's capabilities based on this field: - Field present: The Client is a trpc-agent-go client that supports the corresponding version of the interaction specification (e.g., tool call, reasoning content, and other extended encodings) - Field absent: The Client is a standard A2A client or a client from another framework; the Server should respond using basic A2A protocol behavior

Compatibility Strategy

Major version change (e.g., 1.0 → 2.0): Indicates incompatible protocol changes; the Client should check the version and degrade or reject
Minor version change (e.g., 1.0 → 1.1): Indicates backward-compatible extensions; the Client can safely ignore unknown fields
When the Client does not detect this extension, it should fall back to basic A2A protocol behavior
When the Server receives a request without interaction_spec_version, it should respond using basic A2A protocol behavior

Overall Conversion Flow

sequenceDiagram
    participant C as A2AAgent (Client)
    participant S as A2A Server
    participant R as Agent Runner
    participant A as Agent

    Note over C,A: Non-streaming (message/send)
    C->>C: Invocation → protocol.Message (TextPart/FilePart)
    C->>S: HTTP POST JSON-RPC: message/send
    S->>S: protocol.Message → model.Message
    S->>R: runner.Run(userID, ctxID, message)
    R->>A: agent.Run(ctx, invocation)
    A-->>R: event.Event (ToolCall)
    A-->>R: event.Event (ToolResponse)
    A-->>R: event.Event (Content)
    R-->>S: <-chan event.Event
    S->>S: Convert each event → protocol.Message
    S->>S: Single message returns Message / multiple messages wrapped in Task
    S-->>C: JSON-RPC Response (Message or Task)
    C->>C: Message/Task → event.Event sequence

    Note over C,A: Streaming (message/stream)
    C->>C: Invocation → protocol.Message (TextPart/FilePart)
    C->>S: HTTP POST JSON-RPC: message/stream
    S->>S: protocol.Message → model.Message
    S-->>C: SSE: TaskStatusUpdateEvent (submitted)
    S->>R: runner.Run(userID, ctxID, message)
    R->>A: agent.Run(ctx, invocation)
    loop Agent produces events
        A-->>R: event.Event (Delta/ToolCall/...)
        R-->>S: event.Event
        S->>S: event → TaskArtifactUpdateEvent
        S-->>C: SSE: TaskArtifactUpdateEvent
        C->>C: ArtifactUpdate → event.Event
    end
    S-->>C: SSE: TaskArtifactUpdateEvent (lastChunk=true)
    S-->>C: SSE: TaskStatusUpdateEvent (completed)

SendMessage waits for the complete response before returning it all at once, while StreamMessage pushes incremental events in real-time via SSE. The framework automatically handles format conversion on both ends.

Event Types and A2A Mapping

Agent Event	A2A Part Type	Part Metadata	Message Metadata
Text Reply	`TextPart`	—	`object_type: chat.completion`
Reasoning Content	`TextPart`	`thought: true`	Same as above
Tool Call	`DataPart` (id/name/args)	`type: function_call`	`object_type: chat.completion`
Tool Response	`DataPart` (id/name/response)	`type: function_response`	`object_type: tool.response`
Executable Code	`DataPart` (code/language)	`type: executable_code`	`tag: code_execution_code`
Code Execution Result	`DataPart` (output/outcome)	`type: code_execution_result`	`tag: code_execution_result`

Tool Call Transmission Flow

Non-streaming (message/send)

The Server collects all Agent events and returns them at once. A single message is returned directly as protocol.Message; multiple messages are wrapped in protocol.Task (intermediate processes in history, final reply in artifacts).

sequenceDiagram
    participant C as A2AAgent (Client)
    participant S as A2A Server
    participant R as Agent Runner
    participant A as Agent + LLM

    C->>C: Invocation → protocol.Message
    C->>S: HTTP POST message/send (TextPart: "What's the weather in Beijing?")
    S->>S: protocol.Message → model.Message
    S->>R: runner.Run(userID, ctxID, message)
    R->>A: agent.Run(ctx, invocation)
    A->>A: Call LLM

    Note over A,R: LLM decides to call a tool
    A-->>R: event(ToolCall: get_weather, args: {city:"Beijing"})
    R->>R: Execute tool get_weather
    R-->>A: Tool result: {temp:"20°C"}
    A-->>R: event(ToolResponse: {temp:"20°C"})

    Note over A,R: LLM generates final reply
    A->>A: Call LLM again (with tool result)
    A-->>R: event(Content: "The current temperature in Beijing is 20°C")
    A-->>R: event(RunnerCompletion)
    R-->>S: channel closed

    Note over S,C: Server converts and wraps
    S->>S: ToolCall event → DataPart (function_call)
    S->>S: ToolResponse event → DataPart (function_response)
    S->>S: Content event → TextPart
    S->>S: 3 messages → wrapped as Task

    S-->>C: JSON-RPC Response: Task
    Note left of C: history: [function_call msg, function_response msg]<br/>artifacts: [TextPart "The current temperature in Beijing is 20°C"]

    C->>C: Parse history → ToolCall/ToolResponse event
    C->>C: Parse artifacts → Content event
    C->>C: Output event.Event sequence downstream

Streaming (message/stream)

The Server converts each Agent event in real-time to an SSE TaskArtifactUpdateEvent and pushes it to the Client. Tool calls and tool responses are sent as complete events, while text content is pushed incrementally.

sequenceDiagram
    participant C as A2AAgent (Client)
    participant S as A2A Server
    participant R as Agent Runner
    participant A as Agent + LLM

    C->>C: Invocation → protocol.Message
    C->>S: HTTP POST message/stream (TextPart: "What's the weather in Beijing?")
    S->>S: protocol.Message → model.Message
    S->>S: BuildTask → taskID
    S-->>C: SSE: TaskStatusUpdateEvent (submitted) [Client filters]
    S->>R: runner.Run(userID, ctxID, message)
    R->>A: agent.Run(ctx, invocation)

    Note over A,S: Tool call (complete event)
    A-->>R: event(ToolCall: get_weather)
    R-->>S: event
    S->>S: ToolCall → DataPart (function_call)
    S-->>C: SSE: ArtifactUpdate {DataPart, llm_response_id: "chatcmpl-1"}
    C->>C: DataPart → ToolCall event

    R->>R: Execute tool get_weather
    A-->>R: event(ToolResponse: {temp:"20°C"})
    R-->>S: event
    S->>S: ToolResponse → DataPart (function_response)
    S-->>C: SSE: ArtifactUpdate {DataPart, llm_response_id: "chatcmpl-1"}
    C->>C: DataPart → ToolResponse event

    Note over A,S: Final reply (incremental push)
    A-->>R: event(Delta: "The current")
    R-->>S: event
    S-->>C: SSE: ArtifactUpdate {TextPart: "The current", llm_response_id: "chatcmpl-2"}
    C->>C: TextPart → Delta event

    A-->>R: event(Delta: " temperature is 20°C")
    R-->>S: event
    S-->>C: SSE: ArtifactUpdate {TextPart: " temperature is 20°C", llm_response_id: "chatcmpl-2"}
    C->>C: Same llm_response_id → aggregate into same message

    Note over S,C: Stream ends
    A-->>R: event(RunnerCompletion)
    R-->>S: channel closed
    S-->>C: SSE: ArtifactUpdate (lastChunk=true) [Client filters]
    S-->>C: SSE: TaskStatusUpdateEvent (completed) [Client filters]
    S->>S: CleanTask(taskID)

Client-Side Filtering Rules

TaskStatusUpdateEvent (submitted/completed): Task lifecycle signals, no user content
TaskArtifactUpdateEvent with lastChunk=true: Stream end signal or aggregated result

Role of `llm_response_id`

The Server includes llm_response_id in the Metadata of each response (from the ID returned by the LLM API, such as OpenAI's chatcmpl-xxx). All events produced by the same LLM call share the same llm_response_id. When the Agent makes a second LLM call (e.g., the final reply after a tool call), the llm_response_id changes. The Client uses this to determine that multiple incremental events belong to the same message.

This mechanism is primarily used for message aggregation in AG-UI scenarios — AG-UI's translator uses rsp.ID to decide when to emit TextMessageStart/TextMessageEnd events, and a change in llm_response_id indicates a new message has started.

Reasoning Content Transmission

Model reasoning processes (e.g., DeepSeek R1) are marked via TextPart.metadata.thought:

Direction	ReasoningContent	Content
Agent → A2A	`TextPart` + `metadata: {thought: true}`	`TextPart` (no thought marker)
A2A → Agent	`thought=true` → restored as `ReasoningContent`	No marker → restored as `Content`

A single Message can contain both reasoning content and formal reply as two TextParts.

Metadata Specification

Request Direction (Client → Server)

Carrier	Field	Description
HTTP Header	`X-User-ID`	User identifier (primary source)
HTTP Header	`traceparent`	W3C Trace Context (auto-injected by OpenTelemetry)
Message.Metadata	`invocation_id`	Client-side invocation ID for trace correlation
Message.Metadata	`user_id`	User identifier (supplementary)

Response Direction (Server → Client)

Carrier	Field	Description
Message/Artifact Metadata	`object_type`	Event business type (`chat.completion`, `tool.response`, etc.)
Message/Artifact Metadata	`tag`	Event tag (distinguishes code execution vs code execution result)
Message/Artifact Metadata	`llm_response_id`	LLM response ID (used for Client-side message aggregation, e.g., OpenAI's `chatcmpl-xxx`)
Part Metadata	`type`	Data semantic type (`function_call`, `function_response`, `executable_code`, `code_execution_result`)
Part Metadata	`thought`	Whether this is reasoning/thinking content

Network Packet Examples

Non-streaming: Request and Response with Tool Call

Request:

POST / HTTP/1.1
Host: agent.example.com
Content-Type: application/json
X-User-ID: user_12345
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01

{
  "jsonrpc": "2.0",
  "id": "req-001",
  "method": "message/send",
  "params": {
    "message": {
      "kind": "message",
      "messageId": "msg-001",
      "role": "user",
      "contextId": "ctx-001",
      "parts": [
        { "kind": "text", "text": "What's the weather in Beijing?" }
      ],
      "metadata": {
        "invocation_id": "inv-001",
        "user_id": "user_12345"
      }
    }
  }
}

Response (Task, with tool call intermediate process):

HTTP/1.1 200 OK
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "id": "req-001",
  "result": {
    "id": "task-001",
    "contextId": "ctx-001",
    "status": {
      "state": "completed",
      "timestamp": "2025-01-23T10:30:00Z"
    },
    "history": [
      {
        "kind": "message",
        "messageId": "msg-tool-call",
        "role": "agent",
        "parts": [
          {
            "kind": "data",
            "data": {
              "id": "call_001",
              "type": "function",
              "name": "get_weather",
              "args": "{\"city\":\"Beijing\"}"
            },
            "metadata": { "type": "function_call" }
          }
        ],
        "metadata": {
          "object_type": "chat.completion",
          "tag": "",
          "llm_response_id": "chatcmpl-abc123"
        }
      },
      {
        "kind": "message",
        "messageId": "msg-tool-resp",
        "role": "agent",
        "parts": [
          {
            "kind": "data",
            "data": {
              "id": "call_001",
              "name": "get_weather",
              "response": "{\"temp\":\"20°C\",\"condition\":\"sunny\"}"
            },
            "metadata": { "type": "function_response" }
          }
        ],
        "metadata": {
          "object_type": "tool.response",
          "tag": "",
          "llm_response_id": "chatcmpl-abc123"
        }
      }
    ],
    "artifacts": [
      {
        "artifactId": "msg-final",
        "parts": [
          { "kind": "text", "text": "The current temperature in Beijing is 20°C, sunny." }
        ]
      }
    ]
  }
}

Streaming: SSE Event Sequence with Tool Call