Runner Component User Guide
Overview
Runner provides the interface to run Agents, responsible for session management and event stream processing. The core responsibilities of Runner are: obtain or create sessions, generate an Invocation ID, call the Agent (via agent.RunWithPlugins), process the returned event stream, and append non-partial response events to the session.
๐ฏ Key Features
- ๐พ Session Management: Obtain/create sessions via sessionService, using inmemory.NewSessionService() by default.
- ๐ Event Handling: Receive Agent event streams and append non-partial response events to the session.
- ๐ ID Generation: Automatically generate Invocation IDs and event IDs.
- ๐ Observability Integration: Integrates telemetry/trace to automatically record spans.
- โ
Completion Event: Generates a
runner.completionevent after the Agent event stream ends. - ๐ Plugins: Register once on a Runner to apply global hooks across agent, tool, and model lifecycles.
Architecture
๐ Quick Start
๐ Requirements
- Go 1.21 or later.
- Valid LLM API key (OpenAI-compatible interface).
- Redis (optional, for distributed session management).
๐ก Minimal Example
๐ Run the Example
๐ฌ Interactive Features
After running the example, the following special commands are supported:
/history- Ask AI to show conversation history./new- Start a new session (reset conversation context)./exit- End the conversation.
When the AI uses tools, detailed invocation processes will be displayed:
๐ง Core API
Create Runner
๐งฉ Request-Scoped Agent Creation (Agent Factory)
By default, runner.NewRunner(...) takes a fully built agent.Agent and
reuses that same instance for every request.
If your agent needs request-specific configuration (for example, prompt, model, sandbox instance, tools), you can build a fresh agent for every run.
Option A: Create the default agent on demand
Option B: Register named factories and select them by name
Notes:
- The factory is called once per
Runner.Run(...). agent.WithAgent(...)still overrides everything (useful for tests).
Resource Ownership Inside Agent Factories
AgentFactory is ideal for request-scoped Agent construction, but it
does not transfer ownership of resources created inside the factory.
- The
Runneronly asks the factory for anagent.Agent. Runner.Close()only closes resources created or owned by the Runner itself; it does not automatically close request-scopedtool.ToolSetinstances, temporary MCP connections, sandbox sessions, or similar resources created inside the factory.- The reason is structural: the
agent.Agentinterface does not expose aClose()method, so the Runner has no generic way to reclaim those resources.
Recommended patterns:
- If a
ToolSetor external connection can be reused across requests, create it once outside the factory, reuse it inside the factory, and close it during application shutdown. - If a resource must be created per request, the caller should clean it up explicitly after that run finishes. Common patterns are wrapping the Agent with cleanup logic, or running cleanup from an after-agent callback.
This boundary is especially important when using MCP ToolSets. See the
ToolSet lifecycle notes in the tool documentation for more details.
Resume the Next User Turn at a Specific Agent
In a multi-Agent conversation, a transferred SubAgent may ask the user for
missing information. The next request is a brand-new Runner.Run(...) call, so
Runner needs an explicit signal if that next user message should resume at
the same Agent instead of the normal entry Agent.
Enable the one-shot route consumer on Runner:
There are two ways to produce that route:
LLMAgent: enablellmagent.WithAwaitUserReplyTool(true)and instruct the model to callawait_user_replyimmediately before it asks the user for missing information.- Custom Agent implementations: call
agent.MarkAwaitingUserReply(invocation)before you emit the final clarifying question event.
Low-Level Example for a Custom Agent
Behavior summary:
- The route is stored in session state as a stable agent path, not by mutating message roles.
- It is consumed once, right before the next user turn starts.
agent.WithAgent(...)andagent.WithAgentByName(...)still take precedence.- If the recorded Agent path no longer exists, Runner clears the stale route and falls back to the default entry Agent.
- Nested SubAgents are resumed by their full invocation path, so the common
coordinator + WithSubAgents(...)setup works without manually registering every child Agent.
Advanced note:
- The built-in
Runnerrecords the stable root lookup key automatically, includingAgentFactorycases where the runtimeInfo().Namediffers from the registered factory name. - If you build and run invocations manually outside
Runner, and the stable root lookup key is different frominv.AgentName, callagent.SetAwaitUserReplyRootLookupName(inv, rootLookupName)before the Agent emits its final clarifying reply.
๐ Plugins
Runner plugins are global, runner-scoped hooks. Register plugins once and they will apply automatically to all agents, tools, and model calls executed by that Runner.
Notes:
- Plugin names must be unique per Runner.
- Plugins run in the order they are registered.
- If a plugin implements
plugin.Closer, Runner will call it inClose().
๐ Ralph Loop Mode
Ralph Loop is an "outer loop" mode. Instead of trusting a Large Language Model (LLM) to decide when it is done, Runner will keep iterating until a verifiable completion condition is met.
Common completion conditions:
- A completion promise in the assistant output (for example,
<promise>DONE</promise>). - A verification command exits with code 0 (for example,
go test ./...). - Additional custom checks via
runner.Verifier. MaxIterationsis always recommended as a safety valve.
When MaxIterations is reached without success, Runner emits an error event
with error type stop_agent_error.
Run Conversation
Request ID (requestID) and Run Control
Each call to Runner.Run is a run. If you want to cancel a run or query
its status, you need a request identifier (requestID).
You can provide your own requestID (recommended) via agent.WithRequestID
(for example, a Universally Unique Identifier (UUID)). Runner injects it into
every emitted event.Event (event.RequestID).
Queue a New User Message into the Same Run
Sometimes you do not want to start a second run. You want to keep the current
requestID, queue a new role=user message, and insert it only after the
current assistant round is finished.
Use runner.EnqueueUserMessage(...):
Think of one assistant output as one round:
- If the assistant only replies with text, the round ends at that reply
- If the assistant emits
tool_calls, the round ends only after that whole tool batch finishes
The queued user message can only be inserted between rounds. It is never inserted in the middle of a round.
The simplest valid shape is:
If one assistant message emits multiple tool calls, the framework still waits for the whole round:
It will not insert like this:
because that would break the tool_call -> tool_response structure of the
same assistant round.
So the behavior is:
- This does not start a second run
- The message is queued first, not written to session immediately
- It is appended only after the previous assistant round and its tool work are fully finished
- This keeps the
tool_call -> tool_responsestructure intact - If the run has already finished, enqueue returns an error
If you want the implementation-level mapping, this happens after one
runOneStep() finishes and before the next runOneStep() starts.
Runnable example: examples/steer/
Per-Request App Name Override (multi-tenant isolation)
By default, Runner uses the appName supplied at construction for session keys
and event filter keys. If a single Runner instance serves multiple projects or
tenants, you can override the app name on each Run call with
agent.WithAppName:
When WithAppName is not provided (or the value is empty), the runner
falls back to the constructor-supplied default app name. The override affects:
| Dimension | Default (no override) | With WithAppName("X") |
|---|---|---|
session.Key.AppName |
constructor appName |
"X" |
Default EventFilterKey |
constructor appName |
"X" |
Other runner-level registrations (observability appid, agent registry) remain
bound to the original constructor appName.
Note
appNamemust not be empty. If neither the constructor norWithAppNameprovides a non-empty value, the session service returnssession.ErrAppNameRequired.
Detached Cancellation (background execution)
In Go, context.Context (often named ctx) carries both cancellation and a
deadline. By default, Runner stops when ctx is cancelled.
If you want the run to continue after a parent cancellation, enable detached cancellation and use a timeout to bound the total runtime:
Runner enforces the earlier of:
- the parent context deadline (if any)
MaxRunDuration(if set)
Resume Interrupted Runs (tools-first resume)
In long-running conversations, users may interrupt the agent while it is still
in a tool-calling phase (for example, the last message in the session is an
assistant message with tool_calls, but no tool result has been written yet).
When you later reuse the same sessionID, you can ask the Runner to resume
from that point instead of asking the model to repeat the tool calls:
When WithResume(true) is set:
- Runner inspects the latest persisted session event.
- If the last event is an assistant response that contains
tool_callsand there is no later tool result, Runner will execute those pending tools first (using the same tool set and callbacks as a normal step) and persist the tool results into the session. - After tools finish, the normal LLM cycle continues using the updated session history, so the model sees both the original tool calls and their results.
If the last event is a user or tool message (or a plain assistant reply
without tool_calls), WithResume(true) is a no-op and the flow behaves like
todayโs Run call.
Tool Call Arguments Auto Repair
Some models may emit non-strict JSON arguments for tool_calls (for example, unquoted object keys or trailing commas), which can break tool execution or external parsing.
When agent.WithToolCallArgumentsJSONRepairEnabled(true) is enabled in runner.Run, the framework will best-effort repair toolCall.Function.Arguments. For detailed usage, see Tool Call Arguments Auto Repair.
Provide Conversation History (auto-seed + session reuse)
If your upstream service maintains the conversation and you want the agent to
see that context, you can pass a full history ([]model.Message) directly. The
runner will seed an empty session with that history automatically and then
merge in new session events.
Option A: Use the convenience helper runner.RunWithMessages
Example: examples/runwithmessages (uses RunWithMessages; runner auto-seeds and
continues reusing the session)
Option B: Pass via RunOption explicitly (same philosophy as ADK Python)
When []model.Message is provided, the runner persists that history into the
session on first use (if empty). The content processor does not read this
option; it only derives messages from session events (or falls back to the
single invocation.Message if the session has no events). RunWithMessages
still sets invocation.Message to the latest user turn so graph/flow agents
that inspect it continue to work.
User Message Rewriting
agent.WithUserMessageRewriter(...) rewrites the current-turn user message
before the run starts. The rewritten result is written into the session as the
effective input for the current turn and continues to participate in subsequent
turns. This is useful for adding business context, normalizing user wording, or
splitting one input into multiple messages that are easier for the model to
process.
The signature of UserMessageRewriter is:
OriginalMessage is the raw user input for the current turn. The other fields
provide stable identifiers for the current run.
The returned messages are processed in order. The last message becomes
invocation.Message, and any preceding messages are persisted as leading
messages for the same turn. This allows the interface to support both 1 -> 1
rewrites and 1 -> N expansions. If the same call also passes historical
messages via agent.WithMessages(...), the rewritten result is written
together with that history. The rewriter must not return an empty slice; if it
does, the runner returns an error immediately.
Example:
A complete example is available at examples/usermessagerewriter.
Override Runtime Surfaces for a Specific Node by nodeID
If you need to change one specific node in a runner.Run(...) call instead of
changing the entire agent, pass agent.WithSurfacePatchForNode(nodeID, patch).
Prefer obtaining a stable nodeID from structure.Export(...) and then pass
it to WithSurfacePatchForNode(...). If you need to patch multiple nodes in
the same run, pass multiple WithSurfacePatchForNode(...) options. For full
details and more examples, see Agent: Override Runtime Surfaces by nodeID.
Override code executor Per Run
If you need to specify a different execution environment for a particular
request on an agent that resolves its executor from RunOptions.CodeExecutor,
such as LLMAgent, pass agent.WithCodeExecutor(exec) to runner.Run(...).
Notes:
- This option applies only to the current
runner.Run(...)call and does not change the agent's default configuration. - This option only applies to agents that read
RunOptions.CodeExecutor. If you use a custom agent, make sure its implementation handles this run option. - If the agent was created with
llmagent.WithCodeExecutor(...), the executor passed here temporarily overrides that default for this run. - Capabilities that resolve their executor from
RunOptions.CodeExecutor(for exampleworkspace_exec) use the executor passed here for this run. - If you do not want Markdown fenced code blocks in model replies to auto-execute, set
llmagent.WithEnableCodeExecutionResponseProcessor(false)when creating the agent. See Skill for more details.
โ Detecting End-of-Run and Reading Final Output (Graph-friendly)
When driving a GraphAgent workflow, the LLMโs โfinal responseโ is not the end of
the workflowโnodes like output may still be pending. Instead of checking
Response.IsFinalResponse(), always stop on the Runnerโs terminal completion
event:
For convenience, Runner now propagates the graphโs final snapshot into this last
event. You can extract the final textual output via graph.StateKeyLastResponse:
This keeps application code simple and consistent across Agent types while still preserving detailed graph events for advanced use.
Fatal Errors Before a Graph Completion Event
For the full framework-level recommendation, including the standard graph collector and A2A conventions, see Error Handling.
Sometimes a run stops early because of a fatal error before the graph emits its
final graph.execution event. A common example is:
- a node callback emits a custom state delta with fatal-error details
- the run then aborts before the graph can produce its normal final snapshot
In that case, Runner still emits the final runner.completion event. When the
terminal error is a real fatal error (not stop_agent_error), Runner now copies
the accumulated fallback business state onto that last event for you:
StateDelta: the accumulated state delta from the error path
Two details matter here:
- Runner keeps the original fatal event as the only carrier of
Response.Error, so downstream translators can still treatrunner.completionas a normal finish signal. - Graph metadata keys such as
graph.MetadataKeyNodeandgraph.MetadataKeyToolare filtered out from the fallback delta to avoid re-translating node/tool lifecycle events in consumers such as AGUI.
This lets application code keep the same simple rule: read the last event first for business-level fatal details, instead of scanning the whole stream to find the callback/error event.
If the graph uses graph.NewExecutionErrorCollector(), any collected
execution_errors in that StateDelta may come from the default recoverable
contract as well, for example errors that implement Recoverable() bool or
errors wrapped by graph.MarkRecoverable(err).
Example:
Recommended mental model:
- Success path with graph completion: read final output from the completion
eventโs
StateDelta(for example,graph.StateKeyLastResponse) - Fatal exit before graph completion: read your custom fatal keys from the same
completion event; if you also need the structured
Response.Error, it remains on the original fatal event stop_agent_error: still behaves like a controlled stop signal and is not duplicated onto the completion event
๐ Option: Emit Final Graph LLM Responses
Graph-based agents (for example, GraphAgent) can call a Large Language Model (LLM) many times inside a single run. Each model call can produce a stream of events:
- Partial chunks:
IsPartial=true,Done=false, incremental text inchoice.Delta.Content - Final message:
IsPartial=false,Done=true, full text inchoice.Message.Content
By default, graph LLM nodes only emit the partial chunks. This avoids treating intermediate node outputs as normal assistant replies (for example, persisting them into the Session by Runner or showing them to end users).
To opt into the newer behavior (emit the final Done=true assistant message
events from graph LLM nodes), enable this RunOption:
Behavior summary:
First, one key idea: this option controls whether each graph Large Language
Model (LLM) node emits an extra final Done=true assistant message event. It
does not mean the Runner completion event will always have (or not have)
Response.Choices.
Assume your graph is llm1 -> llm2 -> llm3, and llm3 produces the final
answer:
- Case 1:
agent.WithGraphEmitFinalModelResponses(false)(default)llm1/llm2/llm3: emit only partial chunks (Done=false), no finalDone=trueassistant message events.- Runner completion event: to keep the โread only the last eventโ pattern
working, Runner echoes
llm3โs final output into completionResponse.Choices(when the graph provides final choices). The final text is also always available viaStateDelta[graph.StateKeyLastResponse].
- Case 2:
agent.WithGraphEmitFinalModelResponses(true)llm1/llm2/llm3: in addition to partial chunks, each node emits a finalDone=trueassistant message event (so intermediate nodes may now produce complete assistant messages, and Runner may persist those non-partial events into the Session).- Runner completion event: to avoid duplicating the final message, Runner
deduplicates by response identifier (ID). When it can confirm the final
message already appeared earlier, it omits the echo, so completion
Response.Choicesmay be empty. The final text should still be read fromStateDelta[graph.StateKeyLastResponse].
Recommendation: for GraphAgent workflows, always read the final output from the
Runner completion eventโs StateDelta (for example,
graph.StateKeyLastResponse). Treat Response.Choices on the completion event
as optional when this option is enabled.
Option: Keep Only Terminal Graph Message Events
When a graph contains multiple Large Language Model (LLM) nodes or sub-agent nodes, the caller-visible message stream may include intermediate drafts from earlier nodes. To preserve full backward compatibility, that behavior remains the default.
If you want the caller-visible stream to keep only terminal graph message events, enable:
Behavior summary:
- Default (
false): unchanged. Intermediate graph node message events are still forwarded. - Enabled (
true): caller-visible message events are limited to terminal LLM nodes and terminal sub-agent nodes. - Parallel terminal nodes are all preserved. The option does not collapse a fan-out graph into a single winner.
- Internal graph execution is unchanged. State handoff, history aggregation, tracing, and token accounting still use the full raw graph event stream.
This option is especially useful when your product experience should stream only the last user-facing graph step, while keeping intermediate graph messages internal.
For graph LLM nodes, pair this option with
agent.WithGraphEmitFinalModelResponses(true) when you also want terminal
Done=true assistant message events to be forwarded. See
examples/graph/terminal_messages_only.
๐๏ธ Option: StreamMode
Runner can filter the event stream before it reaches your application code.
This provides a single, run-level switch to select which categories of events
are forwarded to your eventChan.
Use agent.WithStreamMode(...):
Supported modes (graph workflows):
messages: model output events (for example,chat.completion.chunk)updates:graph.state.update/graph.channel.update/graph.executioncheckpoints:graph.checkpoint.*tasks: task lifecycle events (graph.node.*,graph.pregel.*)debug: same ascheckpoints+taskscustom: node-emitted events (graph.node.custom)
Notes:
- When
agent.StreamModeMessagesis selected, graph-based Large Language Model (LLM) nodes enable final model response events automatically for that run. To override it, callagent.WithGraphEmitFinalModelResponses(false)afteragent.WithStreamMode(...). - StreamMode only affects what Runner forwards to your
eventChan. Runner still processes and persists events internally. - For graph workflows, some event types (for example,
graph.checkpoint.*) are emitted only when their corresponding mode is selected. - Runner always emits a final
runner.completionevent.
๐พ Session Management
In-memory Session (Default)
Redis Session (Distributed)
Session Configuration
๐ค Agent Configuration
Runner's core responsibility is to manage the Agent execution flow. A created Agent needs to be executed via Runner.
Basic Agent Creation
Switch Agents Per Request
Runner can register multiple optional agents at construction time and pick one per Run:
runner.NewRunner("my-app", agent): Set the default agent when creating the Runner.runner.WithAgent("agentName", agent): Pre-register an agent by name so later requests can switch via name.agent.WithAgentByName("agentName"): Choose a registered agent by name for a single request without changing the default.agent.WithAgent(agent): Provide an agent instance directly for a single request; highest priority and no pre-registration needed.
Agent selection priority: agent.WithAgent > agent.WithAgentByName > default agent set at construction.
The selected agent name is used as the event author and is recorded via appid.RegisterRunner for observability.
Generation Configuration
Runner passes generation configuration to the Agent:
Tool Integration
Tool configuration is done inside the Agent, while Runner is responsible for running the Agent with tools:
Tool invocation flow: Runner itself does not directly handle tool invocation. The flow is as follows:
- Pass tools: Runner passes context to the Agent via Invocation.
- Agent processing: Agent.Run handles the tool invocation logic.
- Event forwarding: Runner receives the event stream returned by the Agent and forwards it.
- Session recording: Append non-partial response events to the session.
Multi-Agent Support
Runner can execute complex multi-Agent structures (see multiagent.md for details):
๐ Event Processing
Completion Semantics
Runner uses a few related but different completion signals:
Done=true: the current event itself is complete. This can appear on final assistant messages, tool responses, graph events, and runner completion events.runner.completion/event.IsRunnerCompletion(): the entireRunner.Run()call has finished. This is the recommended condition for stopping consumption ofeventChan.
Event Types
Complete Event Handling Example
๐ฎ Execution Context Management
Runner creates and manages the Invocation structure:
โ Best Practices
Error Handling
Stopping a Run Safely
When you call Runner.Run, the framework starts goroutines that keep producing
events until the run ends.
There are two different โstopsโ people often confuse:
- Stopping your reader loop (your code stops reading events)
- Stopping the run (the agent stops calling models/tools and exits)
If you only stop reading but the run is still active, the agent goroutine may block trying to write to the event channel. This can lead to goroutine leaks and โstuckโ runs.
The safe pattern is always:
- Trigger cancellation (ctx cancel / requestID cancel / StopError)
- Keep draining the event channel until it is closed
Option A: Ctrl+C (terminal programs)
In a CLI or local demo, a common approach is to translate Ctrl+C into context cancellation:
Option B: Cancel the context (recommended default)
Wrap Runner.Run with context.WithCancel and call cancel() when you decide
to stop (for example, max turns, token budget, user clicked โStopโ, etc.).
llmflow treats context.Canceled as a graceful exit and closes the agent
event channel, so the runner loop can finish cleanly without blocking writers.
If you need to return early (for example, your HTTP handler timed out) but still want to avoid blocking writers, you can drain in a separate goroutine:
Option C: Cancel by requestID (ManagedRunner)
In server scenarios, you often want to cancel a run from a different goroutine or even a different request. For that, use a request identifier (requestID).
- Generate a requestID and pass it into
Runviaagent.WithRequestID. - Type-assert the runner to
runner.ManagedRunner. - Call
Cancel(requestID).
Option D: Stop from inside the run (StopError)
Sometimes the best place to decide โstop nowโ is inside a tool, callback, or processor (for example, policy checks, budget limits, or user-defined rules).
Return agent.NewStopError("reason") (or wrap it with other errors). llmflow
converts it into a stop_agent_error event and stops the flow.
Still prefer context deadlines (WithTimeout, WithMaxRunDuration) for
hard cutoffs.
Common mistakes
- Breaking the event-loop reader without cancellation: the run may keep going and block on channel writes.
- Using
context.Background()everywhere: you cannot stop a run if you have no way to cancel. - Writing tools that ignore
ctx: cancellation is cooperative; long-running tools should checkctx.Done()or passctxinto network/DB requests.
See runnable demos:
examples/cancelrun(cancel via Enter/Ctrl+C, drain events)examples/managedrunner(requestID cancel, detached cancel, max duration)
Resource Management
๐ Closing Runner (Important)
You MUST call Close() when the Runner is no longer needed to prevent goroutine leaks(trpc-agent-go >= v0.5.0).
Runner Only Closes Resources It Created
When a Runner is created without providing a Session Service, it automatically creates a default inmemory Session Service. This service starts background goroutines internally (for asynchronous summary processing, TTL-based session cleanup, etc.). Runner only manages the lifecycle of this self-created inmemory Session Service. If you provide your own Session Service via WithSessionService(), you are responsible for managing its lifecycleโRunner won't close it.
If you don't call Close() on a Runner that owns an inmemory Session Service, the background goroutines will run forever, causing resource leaks.
Recommended Practice:
When You Provide Your Own Session Service:
Long-Running Services:
Important Notes:
- โ
Close()is idempotent; calling it multiple times is safe - โ Runner only closes the inmemory Session Service it creates by default
- โ
If you provide your own Session Service via
WithSessionService(), Runner won't close it (you manage it yourself) - โ Not calling
Close()when Runner owns an inmemory Session Service will cause goroutine leaks
Context Lifecycle Control
Health Check
๐ Summary
The Runner component is a core part of the tRPC-Agent-Go framework, providing complete conversation management and Agent orchestration capabilities. By properly using session management, tool integration, and event handling, you can build powerful intelligent conversational applications.