Runner Component User Guide
Overview
Runner provides the interface to run Agents, responsible for session management and event stream processing. The core responsibilities of Runner are: obtain or create sessions, generate an Invocation ID, call the Agent (via agent.RunWithPlugins), process the returned event stream, and append non-partial response events to the session.
๐ฏ Key Features
- ๐พ Session Management: Obtain/create sessions via sessionService, using inmemory.NewSessionService() by default.
- ๐ Event Handling: Receive Agent event streams and append non-partial response events to the session.
- ๐ ID Generation: Automatically generate Invocation IDs and event IDs.
- ๐ Observability Integration: Integrates telemetry/trace to automatically record spans.
- โ Completion Event: Generates a runner-completion event after the Agent event stream ends.
- ๐ Plugins: Register once on a Runner to apply global hooks across agent, tool, and model lifecycles.
Architecture
๐ Quick Start
๐ Requirements
- Go 1.21 or later.
- Valid LLM API key (OpenAI-compatible interface).
- Redis (optional, for distributed session management).
๐ก Minimal Example
๐ Run the Example
๐ฌ Interactive Features
After running the example, the following special commands are supported:
/history- Ask AI to show conversation history./new- Start a new session (reset conversation context)./exit- End the conversation.
When the AI uses tools, detailed invocation processes will be displayed:
๐ง Core API
Create Runner
๐งฉ Request-Scoped Agent Creation (Agent Factory)
By default, runner.NewRunner(...) takes a fully built agent.Agent and
reuses that same instance for every request.
If your agent needs request-specific configuration (for example, prompt, model, sandbox instance, tools), you can build a fresh agent for every run.
Option A: Create the default agent on demand
Option B: Register named factories and select them by name
Notes:
- The factory is called once per
Runner.Run(...). agent.WithAgent(...)still overrides everything (useful for tests).
๐ Plugins
Runner plugins are global, runner-scoped hooks. Register plugins once and they will apply automatically to all agents, tools, and model calls executed by that Runner.
Notes:
- Plugin names must be unique per Runner.
- Plugins run in the order they are registered.
- If a plugin implements
plugin.Closer, Runner will call it inClose().
๐ Ralph Loop Mode
Ralph Loop is an "outer loop" mode. Instead of trusting a Large Language Model (LLM) to decide when it is done, Runner will keep iterating until a verifiable completion condition is met.
Common completion conditions:
- A completion promise in the assistant output (for example,
<promise>DONE</promise>). - A verification command exits with code 0 (for example,
go test ./...). - Additional custom checks via
runner.Verifier. MaxIterationsis always recommended as a safety valve.
When MaxIterations is reached without success, Runner emits an error event
with error type stop_agent_error.
Run Conversation
Request ID (requestID) and Run Control
Each call to Runner.Run is a run. If you want to cancel a run or query
its status, you need a request identifier (requestID).
You can provide your own requestID (recommended) via agent.WithRequestID
(for example, a Universally Unique Identifier (UUID)). Runner injects it into
every emitted event.Event (event.RequestID).
Detached Cancellation (background execution)
In Go, context.Context (often named ctx) carries both cancellation and a
deadline. By default, Runner stops when ctx is cancelled.
If you want the run to continue after a parent cancellation, enable detached cancellation and use a timeout to bound the total runtime:
Runner enforces the earlier of:
- the parent context deadline (if any)
MaxRunDuration(if set)
Resume Interrupted Runs (tools-first resume)
In long-running conversations, users may interrupt the agent while it is still
in a tool-calling phase (for example, the last message in the session is an
assistant message with tool_calls, but no tool result has been written yet).
When you later reuse the same sessionID, you can ask the Runner to resume
from that point instead of asking the model to repeat the tool calls:
When WithResume(true) is set:
- Runner inspects the latest persisted session event.
- If the last event is an assistant response that contains
tool_callsand there is no later tool result, Runner will execute those pending tools first (using the same tool set and callbacks as a normal step) and persist the tool results into the session. - After tools finish, the normal LLM cycle continues using the updated session history, so the model sees both the original tool calls and their results.
If the last event is a user or tool message (or a plain assistant reply
without tool_calls), WithResume(true) is a no-op and the flow behaves like
todayโs Run call.
Tool Call Arguments Auto Repair
Some models may emit non-strict JSON arguments for tool_calls (for example, unquoted object keys or trailing commas), which can break tool execution or external parsing.
When agent.WithToolCallArgumentsJSONRepairEnabled(true) is enabled in runner.Run, the framework will best-effort repair toolCall.Function.Arguments. For detailed usage, see Tool Call Arguments Auto Repair.
Provide Conversation History (auto-seed + session reuse)
If your upstream service maintains the conversation and you want the agent to
see that context, you can pass a full history ([]model.Message) directly. The
runner will seed an empty session with that history automatically and then
merge in new session events.
Option A: Use the convenience helper runner.RunWithMessages
Example: examples/runwithmessages (uses RunWithMessages; runner auto-seeds and
continues reusing the session)
Option B: Pass via RunOption explicitly (same philosophy as ADK Python)
When []model.Message is provided, the runner persists that history into the
session on first use (if empty). The content processor does not read this
option; it only derives messages from session events (or falls back to the
single invocation.Message if the session has no events). RunWithMessages
still sets invocation.Message to the latest user turn so graph/flow agents
that inspect it continue to work.
โ Detecting End-of-Run and Reading Final Output (Graph-friendly)
When driving a GraphAgent workflow, the LLMโs โfinal responseโ is not the end of
the workflowโnodes like output may still be pending. Instead of checking
Response.IsFinalResponse(), always stop on the Runnerโs terminal completion
event:
For convenience, Runner now propagates the graphโs final snapshot into this last
event. You can extract the final textual output via graph.StateKeyLastResponse:
This keeps application code simple and consistent across Agent types while still preserving detailed graph events for advanced use.
Fatal Errors Before a Graph Completion Event
Sometimes a run stops early because of a fatal error before the graph emits its
final graph.execution event. A common example is:
- a node callback emits a custom state delta with fatal-error details
- the run then aborts before the graph can produce its normal final snapshot
In that case, Runner still emits the final runner.completion event. When the
terminal error is a real fatal error (not stop_agent_error), Runner now copies
the accumulated fallback business state onto that last event for you:
StateDelta: the accumulated state delta from the error path
Two details matter here:
- Runner keeps the original fatal event as the only carrier of
Response.Error, so downstream translators can still treatrunner.completionas a normal finish signal. - Graph metadata keys such as
graph.MetadataKeyNodeandgraph.MetadataKeyToolare filtered out from the fallback delta to avoid re-translating node/tool lifecycle events in consumers such as AGUI.
This lets application code keep the same simple rule: read the last event first for business-level fatal details, instead of scanning the whole stream to find the callback/error event.
Example:
Recommended mental model:
- Success path with graph completion: read final output from the completion
eventโs
StateDelta(for example,graph.StateKeyLastResponse) - Fatal exit before graph completion: read your custom fatal keys from the same
completion event; if you also need the structured
Response.Error, it remains on the original fatal event stop_agent_error: still behaves like a controlled stop signal and is not duplicated onto the completion event
๐ Option: Emit Final Graph LLM Responses
Graph-based agents (for example, GraphAgent) can call a Large Language Model (LLM) many times inside a single run. Each model call can produce a stream of events:
- Partial chunks:
IsPartial=true,Done=false, incremental text inchoice.Delta.Content - Final message:
IsPartial=false,Done=true, full text inchoice.Message.Content
By default, graph LLM nodes only emit the partial chunks. This avoids treating intermediate node outputs as normal assistant replies (for example, persisting them into the Session by Runner or showing them to end users).
To opt into the newer behavior (emit the final Done=true assistant message
events from graph LLM nodes), enable this RunOption:
Behavior summary:
First, one key idea: this option controls whether each graph Large Language
Model (LLM) node emits an extra final Done=true assistant message event. It
does not mean the Runner completion event will always have (or not have)
Response.Choices.
Assume your graph is llm1 -> llm2 -> llm3, and llm3 produces the final
answer:
- Case 1:
agent.WithGraphEmitFinalModelResponses(false)(default)llm1/llm2/llm3: emit only partial chunks (Done=false), no finalDone=trueassistant message events.- Runner completion event: to keep the โread only the last eventโ pattern
working, Runner echoes
llm3โs final output into completionResponse.Choices(when the graph provides final choices). The final text is also always available viaStateDelta[graph.StateKeyLastResponse].
- Case 2:
agent.WithGraphEmitFinalModelResponses(true)llm1/llm2/llm3: in addition to partial chunks, each node emits a finalDone=trueassistant message event (so intermediate nodes may now produce complete assistant messages, and Runner may persist those non-partial events into the Session).- Runner completion event: to avoid duplicating the final message, Runner
deduplicates by response identifier (ID). When it can confirm the final
message already appeared earlier, it omits the echo, so completion
Response.Choicesmay be empty. The final text should still be read fromStateDelta[graph.StateKeyLastResponse].
Recommendation: for GraphAgent workflows, always read the final output from the
Runner completion eventโs StateDelta (for example,
graph.StateKeyLastResponse). Treat Response.Choices on the completion event
as optional when this option is enabled.
๐๏ธ Option: StreamMode
Runner can filter the event stream before it reaches your application code.
This provides a single, run-level switch to select which categories of events
are forwarded to your eventChan.
Use agent.WithStreamMode(...):
Supported modes (graph workflows):
messages: model output events (for example,chat.completion.chunk)updates:graph.state.update/graph.channel.update/graph.executioncheckpoints:graph.checkpoint.*tasks: task lifecycle events (graph.node.*,graph.pregel.*)debug: same ascheckpoints+taskscustom: node-emitted events (graph.node.custom)
Notes:
- When
agent.StreamModeMessagesis selected, graph-based Large Language Model (LLM) nodes enable final model response events automatically for that run. To override it, callagent.WithGraphEmitFinalModelResponses(false)afteragent.WithStreamMode(...). - StreamMode only affects what Runner forwards to your
eventChan. Runner still processes and persists events internally. - For graph workflows, some event types (for example,
graph.checkpoint.*) are emitted only when their corresponding mode is selected. - Runner always emits a final
runner.completionevent.
๐พ Session Management
In-memory Session (Default)
Redis Session (Distributed)
Session Configuration
๐ค Agent Configuration
Runner's core responsibility is to manage the Agent execution flow. A created Agent needs to be executed via Runner.
Basic Agent Creation
Switch Agents Per Request
Runner can register multiple optional agents at construction time and pick one per Run:
runner.NewRunner("my-app", agent): Set the default agent when creating the Runner.runner.WithAgent("agentName", agent): Pre-register an agent by name so later requests can switch via name.agent.WithAgentByName("agentName"): Choose a registered agent by name for a single request without changing the default.agent.WithAgent(agent): Provide an agent instance directly for a single request; highest priority and no pre-registration needed.
Agent selection priority: agent.WithAgent > agent.WithAgentByName > default agent set at construction.
The selected agent name is used as the event author and is recorded via appid.RegisterRunner for observability.
Generation Configuration
Runner passes generation configuration to the Agent:
Tool Integration
Tool configuration is done inside the Agent, while Runner is responsible for running the Agent with tools:
Tool invocation flow: Runner itself does not directly handle tool invocation. The flow is as follows:
- Pass tools: Runner passes context to the Agent via Invocation.
- Agent processing: Agent.Run handles the tool invocation logic.
- Event forwarding: Runner receives the event stream returned by the Agent and forwards it.
- Session recording: Append non-partial response events to the session.
Multi-Agent Support
Runner can execute complex multi-Agent structures (see multiagent.md for details):
๐ Event Processing
Event Types
Complete Event Handling Example
๐ฎ Execution Context Management
Runner creates and manages the Invocation structure:
โ Best Practices
Error Handling
Stopping a Run Safely
When you call Runner.Run, the framework starts goroutines that keep producing
events until the run ends.
There are two different โstopsโ people often confuse:
- Stopping your reader loop (your code stops reading events)
- Stopping the run (the agent stops calling models/tools and exits)
If you only stop reading but the run is still active, the agent goroutine may block trying to write to the event channel. This can lead to goroutine leaks and โstuckโ runs.
The safe pattern is always:
- Trigger cancellation (ctx cancel / requestID cancel / StopError)
- Keep draining the event channel until it is closed
Option A: Ctrl+C (terminal programs)
In a CLI or local demo, a common approach is to translate Ctrl+C into context cancellation:
Option B: Cancel the context (recommended default)
Wrap Runner.Run with context.WithCancel and call cancel() when you decide
to stop (for example, max turns, token budget, user clicked โStopโ, etc.).
llmflow treats context.Canceled as a graceful exit and closes the agent
event channel, so the runner loop can finish cleanly without blocking writers.
If you need to return early (for example, your HTTP handler timed out) but still want to avoid blocking writers, you can drain in a separate goroutine:
Option C: Cancel by requestID (ManagedRunner)
In server scenarios, you often want to cancel a run from a different goroutine or even a different request. For that, use a request identifier (requestID).
- Generate a requestID and pass it into
Runviaagent.WithRequestID. - Type-assert the runner to
runner.ManagedRunner. - Call
Cancel(requestID).
Option D: Stop from inside the run (StopError)
Sometimes the best place to decide โstop nowโ is inside a tool, callback, or processor (for example, policy checks, budget limits, or user-defined rules).
Return agent.NewStopError("reason") (or wrap it with other errors). llmflow
converts it into a stop_agent_error event and stops the flow.
Still prefer context deadlines (WithTimeout, WithMaxRunDuration) for
hard cutoffs.
Common mistakes
- Breaking the event-loop reader without cancellation: the run may keep going and block on channel writes.
- Using
context.Background()everywhere: you cannot stop a run if you have no way to cancel. - Writing tools that ignore
ctx: cancellation is cooperative; long-running tools should checkctx.Done()or passctxinto network/DB requests.
See runnable demos:
examples/cancelrun(cancel via Enter/Ctrl+C, drain events)examples/managedrunner(requestID cancel, detached cancel, max duration)
Resource Management
๐ Closing Runner (Important)
You MUST call Close() when the Runner is no longer needed to prevent goroutine leaks(trpc-agent-go >= v0.5.0).
Runner Only Closes Resources It Created
When a Runner is created without providing a Session Service, it automatically creates a default inmemory Session Service. This service starts background goroutines internally (for asynchronous summary processing, TTL-based session cleanup, etc.). Runner only manages the lifecycle of this self-created inmemory Session Service. If you provide your own Session Service via WithSessionService(), you are responsible for managing its lifecycleโRunner won't close it.
If you don't call Close() on a Runner that owns an inmemory Session Service, the background goroutines will run forever, causing resource leaks.
Recommended Practice:
When You Provide Your Own Session Service:
Long-Running Services:
Important Notes:
- โ
Close()is idempotent; calling it multiple times is safe - โ Runner only closes the inmemory Session Service it creates by default
- โ
If you provide your own Session Service via
WithSessionService(), Runner won't close it (you manage it yourself) - โ Not calling
Close()when Runner owns an inmemory Session Service will cause goroutine leaks
Context Lifecycle Control
Health Check
๐ Summary
The Runner component is a core part of the tRPC-Agent-Go framework, providing complete conversation management and Agent orchestration capabilities. By properly using session management, tool integration, and event handling, you can build powerful intelligent conversational applications.