Model Module

Overview

The Model module is the large language model abstraction layer of the tRPC-Agent-Go framework, providing a unified LLM interface design that currently supports OpenAI-compatible and Anthropic-compatible API calls. Through standardized interface design, developers can flexibly switch between different model providers, achieving seamless model integration and invocation. This module has been verified to be compatible with most OpenAI-like interfaces both inside and outside the company.

The Model module has the following core features:

Unified Interface Abstraction: Provides standardized Model interface, shielding differences between model providers
Streaming Response Support: Native support for streaming output, enabling real-time interactive experience
Multimodal Capabilities: Supports text, image, audio, and other multimodal content processing
Complete Error Handling: Provides dual-layer error handling mechanism, distinguishing between system errors and API errors
Extensible Configuration: Supports rich custom configuration options to meet different scenario requirements

Quick Start

Using Model in Agent

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
    "trpc.group/trpc-go/trpc-agent-go/runner"
    "trpc.group/trpc-go/trpc-agent-go/tool"
)

func main() {
    // 1. Create model instance.
    modelInstance := openai.New("deepseek-v4-flash",
        openai.WithExtraFields(map[string]interface{}{
            "tool_choice": "auto", // Automatically select tools.
        }),
    )

    // 2. Configure generation parameters.
    genConfig := model.GenerationConfig{
        MaxTokens:   intPtr(2000),
        Temperature: floatPtr(0.7),
        Stream:      true, // Enable streaming output.
    }

    // 3. Create Agent and integrate model.
    agent := llmagent.New(
        "chat-assistant",
        llmagent.WithModel(modelInstance),
        llmagent.WithDescription("A helpful assistant"),
        llmagent.WithInstruction("You are an intelligent assistant, use tools when needed."),
        llmagent.WithGenerationConfig(genConfig),
        llmagent.WithTools([]tool.Tool{calculatorTool, timeTool}),
    )

    // 4. Create Runner and run.
    r := runner.NewRunner("app-name", agent)
    eventChan, err := r.Run(ctx, userID, sessionID, model.NewUserMessage("Hello"))
    if err != nil {
        log.Fatal(err)
    }

    // 5. Handle response events.
    for event := range eventChan {
        // Handle streaming responses, tool calls, etc.
    }
}

Example code is located at examples/runner

Usage Methods and Platform Integration Guide

The Model module supports multiple usage methods and platform integration. The following are common usage scenarios based on Runner examples:

Quick Start

# Basic usage: Configure through environment variables, run directly.
cd examples/runner
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_API_KEY="your-api-key"
go run main.go -model deepseek-v4-flash

Platform Integration Configuration

All platform integration methods follow the same pattern, only requiring configuration of different environment variables or direct setting in code:

Environment Variable Method (Recommended):

export OPENAI_BASE_URL="Platform API address"
export OPENAI_API_KEY="API key"

Code Method:

model := openai.New("Model name",
    openai.WithBaseURL("Platform API address"),
    openai.WithAPIKey("API key"),
)

Supported Platforms and Their Configuration

The following are configuration examples for each platform, divided into environment variable configuration and code configuration methods:

Environment Variable Configuration

The runner example supports specifying model names through command line parameters (-model), which is actually passing the model name when calling openai.New().

# OpenAI platform.
export OPENAI_API_KEY="sk-..."
cd examples/runner
go run main.go -model gpt-4o-mini

# OpenAI API compatible.
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_API_KEY="your-api-key"
cd examples/runner
go run main.go -model deepseek-v4-flash

Code Configuration Method

Configuration method when directly using Model in your own code:

model := openai.New("deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
    openai.WithAPIKey("your-api-key"),
)

// Other platform configurations are similar, only need to modify model name, BaseURL and APIKey, no additional fields needed.

Core Interface Design

Model Interface

// Model is the interface that all language models must implement.
type Model interface {
    // Generate content, supports streaming response.
    GenerateContent(ctx context.Context, request *Request) (<-chan *Response, error)

    // Return basic model information.
    Info() Info
}

// Model information structure.
type Info struct {
    Name string // Model name.
}

// IterModel is an optional extension of Model that reduces channel overhead for streaming.
type IterModel interface {
    Model
    GenerateContentIter(ctx context.Context, request *Request) (Seq[*Response], error)
}

// Seq is a callback-based sequence type that yields one value on demand.
type Seq[T any] func(yield func(T) bool)

Channel-based streaming typically requires a dedicated goroutine and incurs channel synchronization on each chunk. In high-frequency streaming, this overhead can become a measurable cost. IterModel is an optional iterator-style API that streams responses synchronously in the caller goroutine to reduce this overhead.

When a model implements IterModel, the framework uses GenerateContentIter; otherwise it uses GenerateContent. Implementations must call yield sequentially and stop when it returns false.

Request Structure

// Request represents the request sent to the model.
type Request struct {
    // Message list, containing system instructions, user input and assistant replies.
    Messages []Message `json:"messages"`

    // Generation configuration (inlined into request).
    GenerationConfig `json:",inline"`

    // Tool list.
    Tools map[string]tool.Tool `json:"-"`
}

// GenerationConfig contains generation parameter configuration.
type GenerationConfig struct {
    // Whether to use streaming response.
    Stream bool `json:"stream"`

    // Temperature parameter (0.0-2.0).
    Temperature *float64 `json:"temperature,omitempty"`

    // Maximum generation token count.
    MaxTokens *int `json:"max_tokens,omitempty"`

    // Top-P sampling parameter.
    TopP *float64 `json:"top_p,omitempty"`

    // Stop generation markers.
    Stop []string `json:"stop,omitempty"`

    // Frequency penalty.
    FrequencyPenalty *float64 `json:"frequency_penalty,omitempty"`

    // Presence penalty.
    PresencePenalty *float64 `json:"presence_penalty,omitempty"`

    // Reasoning effort level. Accepted values depend on the provider:
    //   - OpenAI o-series: "low", "medium", "high".
    //   - Anthropic adaptive thinking: "low", "medium", "high", "max",
    //     and "xhigh" on models that support it.
    //   - DeepSeek v4 (deepseek-v4-pro / deepseek-v4-flash): "high", "max"
    //     (server maps "low"/"medium" -> "high" and "xhigh" -> "max" for
    //     backward compatibility).
    // OpenAI-compatible providers forward this field to the selected backend
    // model; use values supported by that backend.
    ReasoningEffort *string `json:"reasoning_effort,omitempty"`

    // Whether to enable thinking mode.
    // nil means no thinking-toggle field is emitted, so the provider default applies.
    ThinkingEnabled *bool `json:"thinking_enabled,omitempty"`

    // Maximum token count for thinking mode.
    ThinkingTokens *int `json:"thinking_tokens,omitempty"`
}

GenerationConfig itself is just a plain struct, and its zero value means Stream=false. For LLMAgent, if you omit llmagent.WithGenerationConfig(...), the framework forwards that zero value as-is, so the default behavior is non-streaming. Higher-level wrappers can still choose different semantics by setting their own explicit defaults.

Response Structure

// Response represents the response returned by the model.
type Response struct {
    // OpenAI compatible fields.
    ID                string   `json:"id,omitempty"`
    Object            string   `json:"object,omitempty"`
    Created           int64    `json:"created,omitempty"`
    Model             string   `json:"model,omitempty"`
    SystemFingerprint *string  `json:"system_fingerprint,omitempty"`
    Choices           []Choice `json:"choices,omitempty"`
    Usage             *Usage   `json:"usage,omitempty"`

    // Error information.
    Error *ResponseError `json:"error,omitempty"`

    // Internal fields.
    Timestamp time.Time `json:"-"`
    Done      bool      `json:"-"`
    IsPartial bool      `json:"-"`
}

// ResponseError represents API-level errors.
type ResponseError struct {
    Message string    `json:"message"`
    Type    ErrorType `json:"type"`
    Param   string    `json:"param,omitempty"`
    Code    string    `json:"code,omitempty"`
}

// Usage represents token usage information.
type Usage struct {
    PromptTokens            int                     `json:"prompt_tokens"`
    CompletionTokens        int                     `json:"completion_tokens"`
    TotalTokens             int                     `json:"total_tokens"`
    PromptTokensDetails     PromptTokensDetails     `json:"prompt_tokens_details"`
    CompletionTokensDetails CompletionTokensDetails `json:"completion_tokens_details"`
    TimingInfo              *TimingInfo             `json:"timing_info,omitempty"`
}

type PromptTokensDetails struct {
    CachedTokens        int `json:"cached_tokens"`
    CacheCreationTokens int `json:"cache_creation_tokens,omitempty"`
    CacheReadTokens     int `json:"cache_read_tokens,omitempty"`
}

type CompletionTokensDetails struct {
    ReasoningTokens int `json:"reasoning_tokens,omitempty"`
}

For OpenAI-compatible providers, completion_tokens_details.reasoning_tokens is mapped to Usage.CompletionTokensDetails.ReasoningTokens. The value may be 0 when the provider does not spend or report reasoning tokens; for reasoning models, set ReasoningEffort and/or ThinkingEnabled when you want to request reasoning behavior.

OpenAI Model

Model Name Parameter

When creating an OpenAI model instance using openai.New(name string, opts ...Option), the first parameter is the actual model name that gets sent to the OpenAI API, as the specific model identifier that tells the API which language model to use.

Since the framework supports different models compatible with the OpenAI API, you can obtain the base URL, API key, and model name from various model providers:

1. OpenAI Official

Base URL: https://api.openai.com/v1
Model Names: gpt-4o, gpt-4o-mini, etc.

2. DeepSeek

Base URL: https://api.deepseek.com
Model Names: deepseek-v4-flash, deepseek-v4-pro

deepseek-chat and deepseek-reasoner are deprecated compatibility aliases; prefer the explicit DeepSeek v4 model names for new code.

3. Tencent Hunyuan

Base URL: https://api.hunyuan.cloud.tencent.com/v1
Model Names: hunyuan-2.0-thinking-20251109, hunyuan-2.0-instruct-20251111, etc.

4. Requesty

Base URL: https://router.requesty.ai/v1
Auth: bearer key from REQUESTY_API_KEY
Model Names: provider/model, e.g. openai/gpt-4o-mini (browse models at https://app.requesty.ai/router/list)

See the runnable example in examples/model/requesty.

5. Other Providers

Qwen: Base URL https://dashscope.aliyuncs.com/compatible-mode/v1, Model Names: various qwen models

The OpenAI Model is used to interface with OpenAI and its compatible platforms. It supports streaming output, multimodal and advanced parameter configuration, and provides rich callback mechanisms, batch processing and retry capabilities. It also allows for flexible setting of custom HTTP headers.

Configuration Method

Environment Variable Method

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com" # Optional configuration, default is this BASE URL

Code Method

import "trpc.group/trpc-go/trpc-agent-go/model/openai"

m := openai.New(
    "gpt-4o",
    openai.WithAPIKey("your-api-key"),
    openai.WithBaseURL("https://api.openai.com"), // Optional configuration, default is this BASE URL
)

Direct Model Usage

import (
    "context"
    "fmt"

    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

func main() {
    // Create model instance.
    llm := openai.New("deepseek-v4-flash")

    // Build request.
    temperature := 0.7
    maxTokens := 1000

    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional AI assistant."),
            model.NewUserMessage("Introduce Go language's concurrency features."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      false,
        },
    }

    // Call model.
    ctx := context.Background()
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }

    // Handle response.
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("API error: %s\n", response.Error.Message)
            return
        }

        if len(response.Choices) > 0 {
            fmt.Printf("Reply: %s\n", response.Choices[0].Message.Content)
        }

        if response.Done {
            break
        }
    }
}

Structured Output

When calling model.GenerateContent directly, use model.NewRequest and model.WithStructuredOutputJSON to generate a JSON schema from a Go struct and pass it to model adapters that support provider-native structured output.

type StageParseResult struct {
    Stage  string `json:"stage"`
    Reason string `json:"reason"`
}

request := model.NewRequest(
    []model.Message{
        model.NewUserMessage("Classify the current user intent stage."),
    },
    model.WithStructuredOutputJSON(
        new(StageParseResult),
        true,
        "Return stage parse result as JSON.",
    ),
)

responseChan, err := llm.GenerateContent(ctx, request)
if err != nil {
    return err
}

var final string
for response := range responseChan {
    if response.Error != nil {
        return fmt.Errorf("model error: %s", response.Error.Message)
    }
    if len(response.Choices) > 0 {
        final = response.Choices[0].Message.Content
    }
}

var result StageParseResult
if err := json.Unmarshal([]byte(final), &result); err != nil {
    return err
}

WithStructuredOutputJSON only configures the model request. Direct model callers still receive normal model.Response values and should unmarshal the final JSON content themselves. If you already have a hand-written schema, set request.StructuredOutput directly.

Streaming Output

// Streaming request configuration.
request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a creative story teller."),
        model.NewUserMessage("Write a short story about a robot learning to paint."),
    },
    GenerationConfig: model.GenerationConfig{
        Stream: true,  // Enable streaming output.
    },
}

// Handle streaming response.
responseChan, err := llm.GenerateContent(ctx, request)
if err != nil {
    return err
}

for response := range responseChan {
    if response.Error != nil {
        fmt.Printf("Error: %s", response.Error.Message)
        return
    }

    if len(response.Choices) > 0 && response.Choices[0].Delta.Content != "" {
        fmt.Print(response.Choices[0].Delta.Content)
    }

    if response.Done {
        break
    }
}

Advanced Parameter Configuration

// Use advanced generation parameters.
temperature := 0.3
maxTokens := 2000
topP := 0.9
presencePenalty := 0.2
frequencyPenalty := 0.5
reasoningEffort := "high"

request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a professional technical documentation writer."),
        model.NewUserMessage("Explain the advantages and disadvantages of microservice architecture."),
    },
    GenerationConfig: model.GenerationConfig{
        Temperature:      &temperature,
        MaxTokens:        &maxTokens,
        TopP:             &topP,
        PresencePenalty:  &presencePenalty,
        FrequencyPenalty: &frequencyPenalty,
        ReasoningEffort:  &reasoningEffort,
        Stream:           true,
    },
}

Multimodal Content

// Read image file.
imageData, _ := os.ReadFile("image.jpg")

// Create multimodal message.
request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are an image analysis expert."),
        {
            Role: model.RoleUser,
            ContentParts: []model.ContentPart{
                {
                    Type: model.ContentTypeText,
                    Text: stringPtr("What's in this image?"),
                },
                {
                    Type: model.ContentTypeImage,
                    Image: &model.Image{
                        Data:   imageData,
                        Format: "jpeg",
                    },
                },
            },
        },
    },
}

Advanced Features

1. Callback Functions

// Set pre-request callback function.
model := openai.New("deepseek-v4-flash",
    openai.WithChatRequestCallback(func(ctx context.Context, req *openai.ChatCompletionNewParams) {
        // Called before request is sent.
        log.Printf("Sending request: model=%s, message count=%d", req.Model, len(req.Messages))
    }),

    // Set response callback function (non-streaming).
    openai.WithChatResponseCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        resp *openai.ChatCompletion) {
        // Called when complete response is received.
        log.Printf("Received response: ID=%s, tokens used=%d",
            resp.ID, resp.Usage.TotalTokens)
    }),

    // Set streaming response callback function.
    openai.WithChatChunkCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        chunk *openai.ChatCompletionChunk) {
        // Called when each streaming response chunk is received.
        log.Printf("Received streaming chunk: ID=%s", chunk.ID)
    }),

    // Set streaming completion callback function.
    openai.WithChatStreamCompleteCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        acc *openai.ChatCompletionAccumulator,
        streamErr error) {
        // Called when streaming is completely finished (success or error).
        if streamErr != nil {
            log.Printf("Streaming failed: %v", streamErr)
        } else {
            log.Printf("Streaming completed: reason=%s",
                acc.Choices[0].FinishReason)
        }
    }),
)

Dynamically Modifying Request Body via Callback

WithChatRequestCallback receives a *openai.ChatCompletionNewParams pointer, allowing you to dynamically add or modify fields in the HTTP request body before each request is sent. Use SetExtraFields on the params to inject custom JSON fields:

import (
    "context"

    openai "github.com/openai/openai-go"
    oaimodel "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

model := oaimodel.New("deepseek-v4-flash",
    oaimodel.WithChatRequestCallback(func(ctx context.Context, req *openai.ChatCompletionNewParams) {
        // Dynamically set extra fields based on runtime context.
        userID, _ := ctx.Value("user_id").(string)
        req.SetExtraFields(map[string]any{
            "user":            userID,
            "custom_metadata": map[string]string{"session_id": "abc"},
        })
    }),
)

Difference between request-body customization approaches:

Aspect	`WithExtraFields`	`agent.WithModelRequestExtraFields`	`SetExtraFields` in `WithChatRequestCallback`
Timing	Set once at model creation	Set on one `runner.Run(...)` call	Called before every model request
Dynamism	Static values only	Dynamic values known at run time	Dynamic values based on `ctx` or runtime state
Mechanism	Injected via `openaiopt.WithJSONSet` (RequestOption layer)	Copied into `model.Request.ExtraFields`, then injected by supported adapters	Set on the `ChatCompletionNewParams` struct (serialization layer)
Same key conflict	Overwritten by request-level extra fields	Takes precedence over model-level extra fields	Overwritten by RequestOption-layer extra fields if the same key exists

When multiple approaches use different keys, all fields appear in the final JSON body without conflict. When they set the same key, request-level extra fields passed with agent.WithModelRequestExtraFields take precedence in OpenAI-compatible adapters.

Request-scoped extra fields from Runner

Use agent.WithModelRequestExtraFields when a provider-specific top-level request body field should vary per runner.Run(...) call, for example prompt_cache_key on OpenAI-compatible endpoints:

import (
    "context"

    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/runner"
)

func runOnce(ctx context.Context, r runner.Runner, cacheKey string) error {
    events, err := r.Run(
        ctx,
        "user-001",
        "session-001",
        model.NewUserMessage("Hello"),
        agent.WithModelRequestExtraFields(map[string]any{
            "prompt_cache_key": cacheKey,
        }),
    )
    if err != nil {
        return err
    }
    for range events {
    }
    return nil
}

When a provider requires both a request body cache key and a routing header for the same conversation, combine this with request-scoped headers:

events, err := r.Run(
    ctx,
    "user-001",
    "session-001",
    model.NewUserMessage("Hello"),
    agent.WithModelRequestExtraFields(map[string]any{
        "prompt_cache_key": cacheKey,
    }),
    agent.WithModelRequestHeaders(map[string]string{
        "X-Session-ID": "session-001",
    }),
)

This option applies to every model call created during that run, including normal LLM agents, graph LLM nodes, and requests routed through failover or hedge models. The built-in OpenAI and HuggingFace/OpenAI-compatible adapters merge these fields into the top-level JSON request body. Other provider adapters ignore the field unless they add an explicit provider-specific mapping.

2. Model Switching

Model switching allows dynamically changing the LLM model used by an Agent at runtime. This section shows static switching for OpenAI and OpenAI-compatible model instances at the agent level and per-request level. To select a model separately for each LLM call within the same runner.Run(...), use ModelSelector.

Agent-level Switching

Agent-level switching changes the Agent's default model, affecting all subsequent requests.

Approach 1: Direct Model Instance

Set the model directly by passing a model instance to SetModel:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModel(openai.New("gpt-4o-mini")),
)

// Switch to another model.
agent.SetModel(openai.New("gpt-4o"))

Use Cases:

// Select model based on task complexity.
if isComplexTask {
    agent.SetModel(openai.New("gpt-4o"))  // Use powerful model.
} else {
    agent.SetModel(openai.New("gpt-4o-mini"))  // Use fast model.
}

Approach 2: Switch by Name

Pre-register multiple models with WithModels, then switch by name using SetModelByName:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create multiple model instances.
strong := openai.New("gpt-4o")
fast := openai.New("gpt-4o-mini")

// Register all models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": strong,
        "fast":  fast,
    }),
    llmagent.WithModel(fast), // Specify initial model.
    llmagent.WithInstruction("You are an intelligent assistant."),
)

// Switch models by name at runtime.
err := agent.SetModelByName("smart")
if err != nil {
    log.Fatal(err)
}

// Switch to another model.
err = agent.SetModelByName("fast")
if err != nil {
    log.Fatal(err)
}

Use Cases:

// Select model based on user tier.
modelName := "fast" // Default to fast model.
if user.IsPremium() {
    modelName = "smart" // Premium users get advanced model.
}
if err := agent.SetModelByName(modelName); err != nil {
    log.Printf("Failed to switch model: %v", err)
}

// Select model based on time of day (cost optimization).
hour := time.Now().Hour()
if hour >= 22 || hour < 8 {
    // Use fast model at night.
    agent.SetModelByName("fast")
} else {
    // Use smart model during the day.
    agent.SetModelByName("smart")
}

Per-request Switching

Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.

Approach 1: Using WithModel Option

Use agent.WithModel to specify a model instance for a single request:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Use a specific model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModel(openai.New("gpt-4o")),
)

Approach 2: Using WithModelName Option (Recommended)

Use agent.WithModelName to specify a pre-registered model name for a single request:

// Pre-register multiple models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart":  openai.New("gpt-4o"),
        "fast":   openai.New("gpt-4o-mini"),
        "vision": openai.New("gpt-4o"),
    }),
    llmagent.WithModel(openai.New("gpt-4o-mini")), // Default model.
)

runner := runner.NewRunner("app", agent)

// Temporarily use "smart" model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModelName("smart"),
)

// Next request still uses the default model "gpt-4o-mini".
eventChan2, err := runner.Run(ctx, userID, sessionID, message2)

Use Cases:

// Dynamically select model based on message complexity.
var opts []agent.RunOption
if isComplexQuery(message) {
    opts = append(opts, agent.WithModelName("smart")) // Use powerful model for complex queries.
}

eventChan, err := runner.Run(ctx, userID, sessionID, message, opts...)

// Use a specialized model for a specific task.
eventChan, err := runner.Run(ctx, userID, sessionID, visionMessage,
    agent.WithModelName("vision"),
)

Configuration Details

WithModels Option:

Accepts a map[string]model.Model where key is the model name and value is the model instance
If both WithModel and WithModels are set, WithModel specifies the initial model
If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
Reserved name: __default__ is used internally by the framework and should not be used

SetModelByName Method:

Parameter: model name (string)
Returns: error if the model name is not found
The model must be pre-registered via WithModels

Per-request Options:

agent.RunOptions.Model: Directly specify a model instance
agent.RunOptions.ModelName: Specify a pre-registered model name
agent.RunOptions.Stream: Override whether responses are streamed (use agent.WithStream(...))
agent.RunOptions.Instruction: Override instruction for this request only (use agent.WithInstruction(...))
agent.RunOptions.GlobalInstruction: Override global instruction (system prompt) for this request only (use agent.WithGlobalInstruction(...))
Priority: Model > ModelName > Agent default model
If the model specified by ModelName is not found, it falls back to the Agent's default model

You can set streaming per request using agent.WithStream(true) or agent.WithStream(false).

Agent-level vs Per-request Comparison

Feature	Agent-level Switching	Per-request Switching
Scope	All subsequent requests	Current request only
Usage	`SetModel`/`SetModelByName`	`RunOptions.Model`/`ModelName`
State Change	Changes Agent default model	Does not change Agent state
Use Case	Global strategy adjustment	Specific task temporary needs
Concurrency	Affects all concurrent reqs	Does not affect other requests
Typical Examples	User tier, time-based policy	Complex queries, reasoning

Agent-level Approach Comparison

Feature	SetModel	SetModelByName
Usage	Pass model instance	Pass model name
Pre-registration	Not required	Required via WithModels
Error Handling	None	Returns error
Use Case	Simple switching	Complex scenarios, multi-model management
Code Maintenance	Need to hold model instances	Only need to remember names

Important Notes

Agent-level Switching:

Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
Session Persistence: Switching models does not clear session history
Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
Concurrency Safe: Both switching approaches are concurrency-safe

Per-request Switching:

Temporary Override: Only affects the current request, does not change the Agent's default model
Higher Priority: Per-request model settings take precedence over the Agent's default model
No Side Effects: Does not affect other concurrent requests or subsequent requests
Flexible Combination: Can be used in combination with agent-level switching

Model-specific Prompts (LLMAgent):

Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
For a runnable example, see examples/model/promptmap.

Usage Example

For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.

3. Batch Processing (Batch API)

Batch API is an asynchronous batch processing technique for efficiently handling large volumes of requests. This feature is particularly suitable for scenarios requiring large-scale data processing, significantly reducing costs and improving processing efficiency.

Core Features

Asynchronous Processing: Batch requests are processed asynchronously without waiting for immediate responses
Cost Optimization: Typically more cost-effective than individual requests
Flexible Input: Supports both inline requests and file-based input
Complete Management: Provides full operations including create, retrieve, cancel, and list
Result Parsing: Automatically downloads and parses batch processing results

Quick Start

Creating a Batch Job:

import (
    openaisdk "github.com/openai/openai-go"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create model instance.
llm := openai.New("gpt-4o-mini")

// Prepare batch requests.
requests := []*openai.BatchRequestInput{
    {
        CustomID: "request-1",
        Method:   "POST",
        URL:      string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),
        Body: openai.BatchRequest{
            Messages: []model.Message{
                model.NewSystemMessage("You are a helpful assistant."),
                model.NewUserMessage("Hello"),
            },
        },
    },
    {
        CustomID: "request-2",
        Method:   "POST",
        URL:      string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),
        Body: openai.BatchRequest{
            Messages: []model.Message{
                model.NewSystemMessage("You are a helpful assistant."),
                model.NewUserMessage("Introduce Go language"),
            },
        },
    },
}

// Create batch job.
batch, err := llm.CreateBatch(ctx, requests,
    openai.WithBatchCreateCompletionWindow("24h"),
)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Batch job created: %s\n", batch.ID)

Batch Operations

Retrieving Batch Status:

// Get batch details.
batch, err := llm.RetrieveBatch(ctx, batchID)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Status: %s\n", batch.Status)
fmt.Printf("Total requests: %d\n", batch.RequestCounts.Total)
fmt.Printf("Completed: %d\n", batch.RequestCounts.Completed)
fmt.Printf("Failed: %d\n", batch.RequestCounts.Failed)

Downloading and Parsing Results:

// Download output file.
if batch.OutputFileID != "" {
    text, err := llm.DownloadFileContent(ctx, batch.OutputFileID)
    if err != nil {
        log.Fatal(err)
    }

    // Parse batch output.
    entries, err := llm.ParseBatchOutput(text)
    if err != nil {
        log.Fatal(err)
    }

    // Process each result.
    for _, entry := range entries {
        fmt.Printf("[%s] Status code: %d\n", entry.CustomID, entry.Response.StatusCode)
        if len(entry.Response.Body.Choices) > 0 {
            content := entry.Response.Body.Choices[0].Message.Content
            fmt.Printf("Content: %s\n", content)
        }
        if entry.Error != nil {
            fmt.Printf("Error: %s\n", entry.Error.Message)
        }
    }
}

Canceling a Batch Job:

// Cancel an in-progress batch.
batch, err := llm.CancelBatch(ctx, batchID)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Batch job canceled: %s\n", batch.ID)

Listing Batch Jobs:

// List batch jobs (with pagination support).
page, err := llm.ListBatches(ctx, "", 10)
if err != nil {
    log.Fatal(err)
}

for _, batch := range page.Data {
    fmt.Printf("ID: %s, Status: %s\n", batch.ID, batch.Status)
}

Configuration Options

Global Configuration:

// Configure batch default parameters when creating model.
llm := openai.New("gpt-4o-mini",
    openai.WithBatchCompletionWindow("24h"),
    openai.WithBatchMetadata(map[string]string{
        "project": "my-project",
        "env":     "production",
    }),
    openai.WithBatchBaseURL("https://custom-batch-api.com"),
)

Request-level Configuration:

// Override default configuration when creating batch.
batch, err := llm.CreateBatch(ctx, requests,
    openai.WithBatchCreateCompletionWindow("48h"),
    openai.WithBatchCreateMetadata(map[string]string{
        "priority": "high",
    }),
)

How It Works

Batch API execution flow:

1. Prepare batch requests (BatchRequestInput list)
2. Validate request format and CustomID uniqueness
3. Generate JSONL format input file
4. Upload input file to server
5. Create batch job
6. Process requests asynchronously
7. Download output file and parse results

Key design:

CustomID Uniqueness: Each request must have a unique CustomID for matching input/output
JSONL Format: Batch processing uses JSONL (JSON Lines) format for storing requests and responses
Asynchronous Processing: Batch jobs execute asynchronously in the background without blocking main flow
Completion Window: Configurable completion time window for batch processing (e.g., 24h)

Use Cases

Large-scale Data Processing: Processing thousands or tens of thousands of requests
Offline Analysis: Non-real-time data analysis and processing tasks
Cost Optimization: Batch processing is typically more economical than individual requests
Scheduled Tasks: Regularly executed batch processing jobs

Usage Example

For a complete interactive example, see examples/model/batch.

4. Retry Mechanism

The retry mechanism is an automatic error recovery technique that automatically retries failed requests. This feature is provided by the underlying OpenAI SDK, with the framework passing retry parameters to the SDK through configuration options.

Timeouts and deadlines

Request lifecycle is bounded by two independent limits:

The caller context deadline (for example, Runner max duration, or context.WithTimeout).
The OpenAI request timeout configured by openaiopt.WithRequestTimeout.

The effective budget is the earlier one:

effective_deadline = min(ctx_deadline, request_timeout)

Important notes:

github.com/openai/openai-go does not hardcode timeout by default. If you observe timeout in logs, it typically comes from an upstream deadline (gateway/caller context) or from your own WithRequestTimeout configuration.
If you expect long-running calls (streaming, large prompts, tools, or reasoning models), configure WithRequestTimeout to match your service deadline and service level objective (SLO).

Core Features

Automatic Retry: SDK automatically handles retryable errors
Smart Backoff: Follows API's Retry-After headers or uses exponential backoff
Configurable: Supports custom maximum retry count and timeout duration
Zero Maintenance: No custom retry logic needed, handled by mature SDK

Quick Start

Basic Configuration:

import (
    "time"
    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create model instance with retry configuration.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(3),
        openaiopt.WithRequestTimeout(30*time.Second),
    ),
)

Retryable Errors

The OpenAI SDK automatically retries the following errors:

408 Request Timeout: Request timeout
409 Conflict: Conflict error
429 Too Many Requests: Rate limiting
500+ Server Errors: Internal server errors (5xx)
Network Connection Errors: No response or connection failure

Note: The SDK default maximum retry count is 2. This means the initial request can be retried up to 2 times, for a default maximum of 3 HTTP request attempts per call. The n in openaiopt.WithMaxRetries(n) is the retry count, not the total request count. Set it to 0 to disable SDK automatic retries and send only the initial request.

Retry Strategies

Standard Retry:

// Standard configuration suitable for most scenarios.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(3),
        openaiopt.WithRequestTimeout(30*time.Second),
    ),
)

Rate Limiting Optimization:

// Optimized configuration for rate limiting scenarios.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(5),  // More retry attempts.
        openaiopt.WithRequestTimeout(60*time.Second),  // Longer timeout.
    ),
)

Fast Fail:

// For scenarios requiring quick failure.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(1),  // Minimal retries.
        openaiopt.WithRequestTimeout(10*time.Second),  // Short timeout.
    ),
)

Disable Retry:

// For scenarios that must disable SDK automatic retries.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(0),  // No retries; send one request only.
        openaiopt.WithRequestTimeout(10*time.Second),
    ),
)

How It Works

Retry mechanism execution flow:

1. Send request to LLM API
2. If request fails and error is retryable:
   a. Check if maximum retry count is reached
   b. Calculate wait time based on Retry-After header or exponential backoff
   c. Wait and resend request
3. If request succeeds or error is not retryable, return result

Key design:

SDK-level Implementation: Retry logic is completely handled by OpenAI SDK
Configuration Pass-through: Framework passes configuration via WithOpenAIOptions
Smart Backoff: Prioritizes using Retry-After header returned by API
Transparent Handling: Transparent to application layer, no additional code needed

Use Cases

Production Environment: Improve service reliability and fault tolerance
Rate Limiting: Automatically handle 429 errors
Network Instability: Handle temporary network failures
Server Errors: Handle temporary server-side issues

Important Notes

No Framework Retry: Framework itself does not implement retry logic
Client-level Retry: All retry is handled by OpenAI client
Configuration Pass-through: Use WithOpenAIOptions to configure retry behavior
Disable Retry: Set openaiopt.WithMaxRetries(0) to disable SDK automatic retries
Automatic Handling: Rate limiting (429) is automatically handled without additional code

Usage Example

For a complete interactive example, see examples/model/retry.

5. Custom HTTP Headers

In some enterprise or proxy scenarios, the model provider requires additional HTTP headers (for example, organization ID, tenant routing, or custom authentication). The Model module supports setting headers in several reliable ways.

Recommended order:

Global header via openai.WithHeaders (simplest for static headers)
Request-scoped header via agent.WithModelRequestHeaders (for runner.Run(...) calls whose headers vary by user, session, or tenant)
Global header via OpenAI RequestOption (flexible, middleware-friendly)
Custom http.RoundTripper (advanced, cross-cutting)

All methods affect streaming too because the same client is used for New and NewStreaming calls.

1. Using openai.WithHeaders for headers

import "trpc.group/trpc-go/trpc-agent-go/model/openai"

llm := openai.New("deepseek-v4-flash",
    openai.WithHeaders(map[string]string{
        "X-Custom-Header": "custom-value",
        "X-Request-ID":    "req-123",
    }),
)

2. Request-scoped headers from Runner

Use agent.WithModelRequestHeaders when a header should vary per runner.Run(...) call, for example X-Session-ID on providers that route a conversation to the same inference instance. The OpenAI adapter merges these headers into the HTTP request after model-level headers, so request-level values take precedence on the same key.

events, err := r.Run(
    ctx,
    "user-001",
    "session-001",
    model.NewUserMessage("Hello"),
    agent.WithModelRequestHeaders(map[string]string{
        "X-Session-ID": "session-001",
    }),
)

3. Global headers using OpenAI RequestOption

Use WithOpenAIOptions with openaiopt.WithHeader or openaiopt.WithMiddleware to inject headers for every request created by the underlying OpenAI client.

import (
    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

llm := openai.New("deepseek-v4-flash",
    // If your provider needs extra headers
    openai.WithOpenAIOptions(
        openaiopt.WithHeader("X-Custom-Header", "custom-value"),
        openaiopt.WithHeader("X-Request-ID", "req-123"),
        // You can also set User-Agent or vendor-specific headers
        openaiopt.WithHeader("User-Agent", "trpc-agent-go/1.0"),
    ),
)

For complex logic, middleware lets you modify headers conditionally (for example, by URL path or context values):

llm := openai.New("deepseek-v4-flash",
    openai.WithOpenAIOptions(
        openaiopt.WithMiddleware(
            func(r *http.Request, next openaiopt.MiddlewareNext) (*http.Response, error) {
                // Example: per-request header via context value
                if v := r.Context().Value("x-request-id"); v != nil {
                    if s, ok := v.(string); ok && s != "" {
                        r.Header.Set("X-Request-ID", s)
                    }
                }
                // Or only for chat completion endpoint
                if strings.Contains(r.URL.Path, "/chat/completions") {
                    r.Header.Set("X-Feature-Flag", "on")
                }
                return next(r)
            },
        ),
    ),
)

Notes for authentication variants:

OpenAI style: keep openai.WithAPIKey("sk-...") which sets Authorization: Bearer ... under the hood.
Azure/OpenAI‑compatible that use api-key: omit WithAPIKey and set openaiopt.WithHeader("api-key", "<key>") instead.

Logging raw HTTP request and response

You can use openaiopt.WithMiddleware to log the underlying HTTP request and response. Be careful about secrets (API keys, Authorization headers) and body consumption.

Key points:

Reading req.Body or resp.Body consumes the stream, so you must restore it.
Do not read resp.Body for streaming responses (for example, Content-Type: text/event-stream); skip body logging to avoid blocking or breaking the stream.

import (
    "bytes"
    "io"
    "net/http"
    "strings"

    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/log"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

const streamContentType = "text/event-stream"

llm := openai.New("deepseek-v4-flash",
    openai.WithOpenAIOptions(
        openaiopt.WithMiddleware(
            func(
                req *http.Request,
                next openaiopt.MiddlewareNext,
            ) (*http.Response, error) {
                // 1. Read req.Body.
                bodyBytes, err := io.ReadAll(req.Body)
                if err != nil {
                    return nil, err
                }
                // 2. Log req.Body.
                log.DebugfContext(
                    req.Context(),
                    "Middleware req: %+v",
                    string(bodyBytes),
                )

                // 3. Restore req.Body (critical step).
                req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))

                resp, err := next(req)
                if err != nil || resp == nil {
                    return resp, err
                }

                // 4. Skip body logging for streaming responses.
                contentType := resp.Header.Get("Content-Type")
                if strings.Contains(contentType, streamContentType) {
                    return resp, nil
                }

                // 5. Read resp.Body.
                respBodyBytes, err := io.ReadAll(resp.Body)
                if err != nil {
                    return resp, err
                }
                // 6. Log resp.Body.
                log.DebugfContext(
                    req.Context(),
                    "Middleware rsp: %+v",
                    string(respBodyBytes),
                )

                // 7. Restore resp.Body (critical step).
                resp.Body = io.NopCloser(bytes.NewBuffer(respBodyBytes))
                return resp, nil
            },
        ),
    ),
)

4. Custom http.RoundTripper (advanced)

Inject headers across all requests at the HTTP layer by wrapping the transport. This is useful when you also need custom proxy, TLS, or metrics logic.

type headerRoundTripper struct{ base http.RoundTripper }

func (rt headerRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
    // Add or override headers
    req.Header.Set("X-Custom-Header", "custom-value")
    req.Header.Set("X-Trace-ID", "trace-xyz")
    return rt.base.RoundTrip(req)
}

llm := openai.New("deepseek-v4-flash",
    openai.WithHTTPClientOptions(
        openai.WithHTTPClientTransport(headerRoundTripper{base: http.DefaultTransport}),
    ),
)

Per-request headers

Agent/Runner passes ctx through to the model call; middleware can read values from req.Context() to inject per-invocation headers.
Chat completion per-request base URL override is not exposed; create a second model with a different base URL or alter r.URL in middleware.

6. Token Counter

The Token Counter is used to estimate the token count of text content. The framework provides SimpleTokenCounter as the default implementation, supporting custom token counting logic to meet different scenario requirements.

SimpleTokenCounter

SimpleTokenCounter provides a very rough token estimation based on UTF-8 character count. Its features include:

Core Features:

Lightweight: Does not depend on external tokenizers, only uses character length estimation
Model-agnostic: Suitable for all OpenAI-compatible models
Multimodal Support: Supports message content, reasoning content (ReasoningContent), and tool calls

Estimation Principle:

Uses heuristic rules: approximately N UTF-8 characters per token, where N can be configured via WithApproxRunesPerToken. N is a divisor, not a multiplier:

1	`estimatedTokens = countedUTF8Runes / N`

Therefore, WithApproxRunesPerToken(1.5) means approximately 1.5 characters per token. Passing 2.0/3.0 means approximately 0.67 characters per token, which is about 1.5 tokens per character.

Usage:

import "trpc.group/trpc-go/trpc-agent-go/model"

// Use default configuration (English scenario)
counter := model.NewSimpleTokenCounter()

// Adjust for Chinese scenarios (Chinese has higher token density)
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(1.6),  // Approx. 1.6 chars/token
)

// Adjust for other languages or specific models
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(3.5),  // Approx. 3.5 chars/token
)

Parameter Description:

WithApproxRunesPerToken(v float64)

Purpose: Set the approximate number of characters per token
Type: float64
Default Value: 4.0 (approx. 4 characters per token, suitable for English scenarios)
Value Constraint: Values <= 0 will be ignored, keeping the default value
Formula: estimated tokens = counted UTF-8 characters / v; for example, v=1.5 means approximately 1.5 characters per token

Recommended Values for Common Languages:

Language/Scenario	Recommended Value	Description
English text	4.0	Default value, suitable for English words and common Latin characters
Chinese text	1.6-1.8	Chinese characters have higher token density than English, use smaller values
Japanese text	2.0-2.5	Japanese character token density is between Chinese and English
Mixed text	3.0	Trade-off value for mixed Chinese/English scenarios

Important Notes:

Empirical Values: The recommended values above are empirical estimates. Actual token count depends on the specific model's tokenizer implementation.
Uncertainty: If you're uncertain about language mix or model characteristics, keep the default value (4.0).
Model Differences: Different model vendors (OpenAI, DeepSeek, Hunyuan) have different tokenizer implementations, which may require adjusting this parameter.
Accuracy Consideration: SimpleTokenCounter only provides rough estimation. If you need accurate token counting, consider using the model provider's tokenizer API (e.g., OpenAI's tiktoken).

Custom Token Counter

If SimpleTokenCounter's rough estimation doesn't meet your requirements, you can implement a custom TokenCounter interface:

import (
    "context"
    "fmt"
    "trpc.group/trpc-go/trpc-agent-go/model"
)

type MyCustomCounter struct{}

func (c *MyCustomCounter) CountTokens(ctx context.Context, message model.Message) (int, error) {
    // Calculate tokens for message content, reasoning content, and tool calls
    total := 0

    // Process main content
    if len(message.Content) > 0 {
        total += len(message.Content)  // Example: use more accurate algorithm
    }

    // Process reasoning content
    if len(message.ReasoningContent) > 0 {
        total += len(message.ReasoningContent)
    }

    // Process multimodal content
    for _, part := range message.ContentParts {
        if part.Text != nil {
            total += len(*part.Text)
        }
    }

    // Process tool calls
    for _, toolCall := range message.ToolCalls {
        total += len(toolCall.Function.Name)
        total += len(string(toolCall.Function.Arguments))
    }

    return total, nil
}

func (c *MyCustomCounter) CountTokensRange(ctx context.Context, messages []model.Message, start, end int) (int, error) {
    if start < 0 || end > len(messages) || start >= end {
        return 0, fmt.Errorf("invalid range: start=%d, end=%d, len=%d", start, end, len(messages))
    }

    total := 0
    for i := start; i < end; i++ {
        tokens, err := c.CountTokens(ctx, messages[i])
        if err != nil {
            return 0, err
        }
        total += tokens
    }
    return total, nil
}

// Use custom counter when creating a model
llm := openai.New("deepseek-v4-flash",
    openai.WithTokenCounter(&MyCustomCounter{}),
)

Using Token Counter in Summarizer

SimpleTokenCounter can also be used with the session/summary module to control summary triggering:

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/session/summary"
)

// 1. Create a counter optimized for Chinese
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(1.6),  // Recommended value for Chinese scenarios
)

// 2. Set as global counter (affects all summary triggers)
summary.SetTokenCounter(counter)

// 3. Create summarizer
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithTokenThreshold(4000),  // Uses your custom counter for evaluation
)

7. Token Tailoring

Token Tailoring is an intelligent message management technique designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API. This feature is particularly useful for long conversation scenarios, allowing you to keep the message list within the model's token limits while preserving key context.

Automatic Mode (Recommended):

import (
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Enable token tailoring with automatic configuration
model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
)

Advanced Mode:

// Custom token limit and strategy
model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),               // Required: enable token tailoring
    openai.WithMaxInputTokens(10000),                    // Custom token limit
    openai.WithTokenCounter(customCounter),              // Optional: custom counter
    openai.WithTailoringStrategy(customStrategy),        // Optional: custom strategy
)

Token Calculation Formula:

The framework automatically calculates maxInputTokens based on the model's context window. For OpenAI-compatible models, the automatic budget also accounts for request-scoped output limits and tool definitions:

Context Window Registration

Token Tailoring and session summary WithContextThreshold both need a model context window. Built-in model names are resolved automatically. Token Tailoring uses a 128000-token fallback for unknown model names. For private deployments, tenant-provided models, endpoint IDs, or smaller unknown models, prefer model-instance configuration such as openai.WithContextWindow(32768) or the unified provider.WithContextWindow(32768). For one-off runs, use agent.WithModelContextWindow(32768). Use model.RegisterModelContextWindow("my-model", 32768) only when the name has a stable process-wide meaning. See the Session Summary documentation for a full example.

outputReserve = max(ReserveOutputTokens, request.MaxTokens, request.ThinkingTokens)
safetyMargin = contextWindow × SafetyMarginRatio
calculatedMax = contextWindow - outputReserve - ProtocolOverheadTokens - safetyMargin
ratioLimit = contextWindow × MaxInputTokensRatio
messageBudget = min(max(min(calculatedMax, ratioLimit), InputTokensFloor), calculatedMax)
maxInputTokens = max(messageBudget - estimatedToolsTokens, 0)

For example, "gpt-4o" (contextWindow = 128000):

safetyMargin = 128000 × 0.10 = 12800 tokens
outputReserve = max(2048, request.MaxTokens, request.ThinkingTokens)
calculatedMax = 128000 - outputReserve - 512 - 12800
ratioLimit = 128000 × 1.0 = 128000 tokens
maxInputTokens = messageBudget - estimatedToolsTokens

If WithMaxInputTokens is set explicitly, the framework keeps that value as the configured message budget and does not subtract the estimated Tools schema budget from it. The explicit value is still clamped to the context-safe hard budget after output reserves and safety margin are applied.

Default Budget Parameters:

The framework uses the following default values for token allocation (it is recommended to keep the defaults):

Protocol Overhead (ProtocolOverheadTokens): 512 tokens - reserved for request/response formatting
Output Reserve (ReserveOutputTokens): 2048 tokens - minimum reserve for output generation; OpenAI-compatible requests reserve the larger value among this setting, GenerationConfig.MaxTokens, and GenerationConfig.ThinkingTokens
Input Floor (InputTokensFloor): 1024 tokens - ensures proper model processing
Output Floor (OutputTokensFloor): deprecated and no longer used to auto-calculate output MaxTokens
Safety Margin Ratio (SafetyMarginRatio): 10% - buffer for token counting inaccuracies
Max Input Ratio (MaxInputTokensRatio): 100% - maximum input ratio of context window

Tailoring Strategy:

The framework provides a default tailoring strategy that preserves messages according to the following priorities:

System Messages: Highest priority, always preserved
Latest User Message: Ensures the current conversation turn is complete
Tool Call Related Messages: Maintains tool call context integrity
Historical Messages: Retains as much conversation history as possible based on remaining space

Custom Tailoring Strategy:

You can implement the TailoringStrategy interface to customize the trimming logic:

type CustomStrategy struct{}

func (s *CustomStrategy) Tailor(
    ctx context.Context,
    messages []model.Message,
    maxTokens int,
    counter tokencounter.Counter,
) ([]model.Message, error) {
    // Implement custom tailoring logic
    // e.g., keep only the most recent N conversation rounds
    return messages, nil
}

model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
    openai.WithTailoringStrategy(&CustomStrategy{}),
)

Advanced Configuration (Custom Budget Parameters):

If the default token allocation strategy does not meet your needs, you can customize the budget parameters using WithTokenTailoringConfig. Note: It is recommended to keep the default values unless you have specific requirements.

model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
    openai.WithTokenTailoringConfig(&model.TokenTailoringConfig{
        ProtocolOverheadTokens: 1024,   // Custom protocol overhead
        ReserveOutputTokens:    4096,   // Custom output reserve
        InputTokensFloor:       2048,   // Custom input floor
        OutputTokensFloor:      512,    // Custom output floor
        SafetyMarginRatio:      0.15,   // Custom safety margin (15%)
        MaxInputTokensRatio:    0.90,   // Custom max input ratio (90%)
    }),
)

For Anthropic models, you can use the same configuration:

model := anthropic.New("claude-sonnet-4-0",
    anthropic.WithEnableTokenTailoring(true),
    anthropic.WithTokenTailoringConfig(&model.TokenTailoringConfig{
        SafetyMarginRatio: 0.15,  // Increase safety margin to 15%
    }),
)

7. Variant Optimization: Adapting to Platform-Specific Behaviors

The Variant mechanism is an important optimization in the Model module, used to handle platform-specific behavioral differences across OpenAI-compatible providers. By specifying different Variants, the framework can automatically adapt to API differences between platforms, including file handling and thinking-toggle serialization.

7.1. Supported Variant Types

The framework currently supports the following Variants:

1. VariantOpenAI（default）

Standard OpenAI API-compatible behavior
File upload path：/openapi/v1/files
File purpose:user_data
File deletion Http method:：DELETE

2. VariantHunyuan（hunyuan）

Tencent Hunyuan platform-specific adaptation
File upload path:：/openapi/v1/files/uploads
File purpose：file-extract
File deletion Http Method：POST

3. VariantDeepSeek

DeepSeek platform adaptation
Default BaseURL：https://api.deepseek.com
API Key environment variable name：DEEPSEEK_API_KEY
DeepSeek-specific behavior is enabled when you explicitly set WithVariant(openai.VariantDeepSeek) or use the official DeepSeek API BaseURL
Other behaviors are consistent with standard OpenAI

4. VariantQwen（Qwen）

Qwen platform adaptation
Default BaseURL：https://dashscope.aliyuncs.com/compatible-mode/v1
API Key environment variable name：DASHSCOPE_API_KEY
Other behaviors are consistent with standard OpenAI

5. VariantGLM

GLM OpenAI-compatible API adaptation
Serializes the thinking toggle using GLM's thinking object format
Falls back to exposing reasoning_content as visible content when some GLM gateways return an empty content field without tool calls

6. VariantKimi

Kimi Open Platform adaptation
Default BaseURL: https://api.moonshot.ai/v1
API Key environment variable name: MOONSHOT_API_KEY
Automatically inferred for the official api.moonshot.ai and api.moonshot.cn hosts
Serializes the thinking toggle as {"thinking": {"type": "enabled"}}
Uses file-extract as the default file upload purpose

7. VariantMiniMax

MiniMax OpenAI-compatible API adaptation
Default BaseURL: https://api.minimax.io/v1
API Key environment variable name: MINIMAX_API_KEY
Automatically inferred for the official api.minimax.io and api.minimaxi.com hosts
Serializes the thinking toggle as {"thinking": {"type": "adaptive"}} when enabled and {"thinking": {"type": "disabled"}} when disabled
Keeps MiniMax's native <think>...</think> content unchanged so interleaved thinking can be replayed across tool calls
Uses MiniMax's /v1/files/upload and /v1/files/delete endpoints, with video_understanding as the default purpose

7.2. Usage

Usage Example：

import "trpc.group/trpc-go/trpc-agent-go/model/openai"

// Use the Hunyuan platform
model := openai.New("hunyuan-model",
    openai.WithBaseURL("https://your-hunyuan-api.com"),
    openai.WithAPIKey("your-api-key"),
    openai.WithVariant(openai.VariantHunyuan), // Specify the Hunyuan variant
)

// Use the DeepSeek platform
model := openai.New("deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
    openai.WithAPIKey("your-api-key"),
    openai.WithVariant(openai.VariantDeepSeek), // Specify the DeepSeek variant
)

// Use the Kimi Open Platform
model = openai.New("kimi-k2.6",
    openai.WithVariant(openai.VariantKimi), // Reads MOONSHOT_API_KEY automatically
)

// Use the MiniMax OpenAI-compatible API
model = openai.New("MiniMax-M3",
    openai.WithVariant(openai.VariantMiniMax), // Reads MINIMAX_API_KEY automatically
)

7.3. Behavioral Differences of Variants Examples

Message content handling differences：

import "trpc.group/trpc-go/trpc-agent-go/model"

// For the Hunyuan platform, the file ID is placed in extraFields instead of content parts
message := model.Message{
    Role: model.RoleUser,
    ContentParts: []model.ContentPart{
        {
            Type: model.ContentTypeFile,
            File: &model.File{
                FileID: "file_123",
            },
        },
    },
}

Environment variable auto-configuration

For certain Variants, the framework supports reading configuration from environment variables automatically:

# DeepSeek
export DEEPSEEK_API_KEY="your-api-key"
# No need to call WithAPIKey explicitly; the framework reads it automatically

# Kimi
export MOONSHOT_API_KEY="your-api-key"

# MiniMax
export MINIMAX_API_KEY="your-api-key"

import "trpc.group/trpc-go/trpc-agent-go/model"

// DeepSeek
model := openai.New("deepseek-v4-flash",
    openai.WithVariant(openai.VariantDeepSeek), // Automatically reads DEEPSEEK_API_KEY
)

7.4. Thinking Toggles and Variants

Variant and GenerationConfig.ThinkingEnabled have different responsibilities:

WithVariant(...) selects the provider protocol and determines which field and JSON shape represent the thinking toggle.
ThinkingEnabled explicitly enables or disables thinking.

Setting only a Variant does not emit a thinking toggle. When ThinkingEnabled == nil, the framework omits the toggle and lets the provider apply its default. Even if a provider currently enables thinking by default, callers that require deterministic behavior should set ThinkingEnabled explicitly.

The OpenAI-compatible variants serialize ThinkingEnabled=true as follows:

Variant	Request field for `ThinkingEnabled=true`
`VariantOpenAI`	`"thinking_enabled": true`
`VariantDeepSeek`	`"thinking": {"type": "enabled"}`
`VariantHunyuan`	`"thinking": {"type": "enabled"}`
`VariantGLM`	`"thinking": {"type": "enabled"}`
`VariantQwen`	`"enable_thinking": true`
`VariantKimi`	`"thinking": {"type": "enabled"}`
`VariantMiniMax`	`"thinking": {"type": "adaptive"}`

For example, to deterministically enable thinking through the official DeepSeek API:

thinking := true

llm := openai.New("deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com"),
    openai.WithAPIKey("your-api-key"),
    openai.WithVariant(openai.VariantDeepSeek),
)

request := &model.Request{
    Messages: []model.Message{
        model.NewUserMessage("Analyze this problem."),
    },
    GenerationConfig: model.GenerationConfig{
        Stream:          true,
        ThinkingEnabled: &thinking,
    },
}

ThinkingEnabled applies only to models that expose an explicit thinking toggle; use ReasoningEffort instead when a model exposes only a reasoning budget. For external services that implement one of the thinking-toggle formats above, explicitly setting the matching Variant and ThinkingEnabled is usually sufficient. If a gateway uses a different field or requires additional parameters, use openai.WithExtraFields(...) to add or override provider-specific fields.

For Kimi models that require thinking to remain enabled, leave ThinkingEnabled unset. To preserve reasoning across Kimi K2.6 conversation turns, pass the complete provider extension explicitly:

llm := openai.New(
    "kimi-k2.6",
    openai.WithVariant(openai.VariantKimi),
    openai.WithExtraFields(map[string]any{
        "thinking": map[string]string{
            "type": "enabled",
            "keep": "all",
        },
    }),
)

For MiniMax-M3, ThinkingEnabled=false sends thinking.type=disabled. MiniMax M2.x models accept that value but continue thinking. The adapter leaves reasoning_split unset intentionally: in MiniMax's native OpenAI format, reasoning remains inside the assistant content as <think>...</think>, and the framework preserves that content unchanged in tool-call history. Do not strip the tags from assistant history before returning tool results. See the MiniMax OpenAI SDK documentation for the provider's multi-turn requirements.

8. Streaming Tool Call Deltas: ShowToolCallDelta

By default, the OpenAI adapter suppresses raw tool_calls chunks in streaming responses. Tool calls are accumulated internally and only exposed once in the final aggregated response via Response.Choices[0].Message.ToolCalls. This keeps the stream clean for typical chat UIs that only render assistant text.

For advanced use cases (for example, when the model streams document content inside tool arguments and you need to display it incrementally), you can turn on raw tool call deltas with WithShowToolCallDelta:

llm := openai.New(
    "gpt-4.1",
    openai.WithShowToolCallDelta(true), // Forward tool_call deltas.
)

When WithShowToolCallDelta(true) is enabled:

Streaming chunks that contain tool_calls are no longer suppressed by the adapter.
Each chunk is converted into a partial model.Response where:
- Response.IsPartial == true
- Response.Choices[0].Delta.ToolCalls contains the provider’s raw tool_calls delta mapped to model.ToolCall:
  - Type comes from the provider type field (for example, "function").
  - Function.Name and Function.Arguments mirror the original tool name and JSON-encoded arguments string.
  - ID and Index preserve the tool call identity so callers can stitch fragments together.
The final aggregated response still exposes the merged tool calls in Response.Choices[0].Message.ToolCalls, so existing tool execution logic (for example, FunctionCallResponseProcessor) continues to work unchanged.

Typical integration pattern when this flag is enabled:

Read Response.Choices[0].Delta.ToolCalls[*].Function.Arguments on each partial response.
Group chunks by tool call ID and append the Arguments fragments in order.
Once the accumulated string forms valid JSON, unmarshal it into your business struct (for example, { "content": "..." }) and use it for progressive UI rendering.

If you do not need to inspect tool arguments during streaming, keep WithShowToolCallDelta disabled to avoid handling partial JSON fragments and to preserve the default clean text-streaming behavior.

Anthropic Model

Anthropic Model is used to interface with Claude models and compatible platforms, supporting streaming output, thought modes and tool calls, and providing a rich callback mechanism, while also allowing for flexible configuration of custom HTTP headers.

Configuration Method

Environment Variable Method

export ANTHROPIC_API_KEY="your-api-key"
export ANTHROPIC_BASE_URL="https://api.anthropic.com" # Optional configuration, default is this BASE URL

Code Method

import "trpc.group/trpc-go/trpc-agent-go/model/anthropic"

m := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithAPIKey("your-api-key"),
    anthropic.WithBaseURL("https://api.anthropic.com"), // Optional configuration, default is this BASE URL
)

Using the Model Directly

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    // Create model instance
    llm := anthropic.New("claude-sonnet-4-0")
    // Build request
    temperature := 0.7
    maxTokens := 1000
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional AI assistant."),
            model.NewUserMessage("Introduce the concurrency features of Go language."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      false,
        },
    }
    // Call the model
    ctx := context.Background()
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }
    // Handle response
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("API error: %s\n", response.Error.Message)
            return
        }
        if len(response.Choices) > 0 {
            fmt.Printf("Reply: %s\n", response.Choices[0].Message.Content)
        }
        if response.Done {
            break
        }
    }
}

Streaming Output

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    // Create model instance
    llm := anthropic.New("claude-sonnet-4-0")
    // Streaming request configuration
    temperature := 0.7
    maxTokens := 1000
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a creative story storyteller."),
            model.NewUserMessage("Write a short story about a robot learning to paint."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      true,
        },
    }
    // Call the model
    ctx := context.Background()
    // Handle streaming response
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("Error: %s", response.Error.Message)
            return
        }
        if len(response.Choices) > 0 && response.Choices[0].Delta.Content != "" {
            fmt.Print(response.Choices[0].Delta.Content)
        }
        if response.Done {
            break
        }
    }
}

Advanced Parameter Configuration

// Using advanced generation parameters
temperature := 0.3
maxTokens := 2000
topP := 0.9
thinking := true
thinkingTokens := 2048

request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a professional technical documentation writer."),
        model.NewUserMessage("Explain the pros and cons of microservices architecture."),
    },
    GenerationConfig: model.GenerationConfig{
        Temperature:     &temperature,
        MaxTokens:       &maxTokens,
        TopP:            &topP,
        ThinkingEnabled: &thinking,
        ThinkingTokens:  &thinkingTokens,
        Stream:          true,
    },
}

Multimodal Input

The Anthropic Model supports images and files in user messages:

Images: Provide an image URL or raw image data. Supported MIME types are image/jpeg, image/png, image/gif, and image/webp. Images are sent as Anthropic image blocks.
PDFs: Provide a URL that Claude can access or raw data, using application/pdf. PDFs are sent as Anthropic document blocks.
Text content: Use raw data only. text/plain is sent as an Anthropic document block; text-based MIME types such as text/csv or text/html are sent as Anthropic text blocks.
Other formats: Office documents, JSON, and other formats are not parsed automatically. Convert them to text or PDF first. Audio and FileID are not supported. Non-PDF file URLs are sent as URL text only.

import (
    "context"
    "os"

    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    llm := anthropic.New("claude-sonnet-4-6")
    imageData, _ := os.ReadFile("diagram.png")
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional document and image analysis assistant."),
            {
                Role:    model.RoleUser,
                Content: "Summarize the key information from the image and PDF.",
                ContentParts: []model.ContentPart{
                    {
                        Type: model.ContentTypeImage,
                        Image: &model.Image{
                            Data:   imageData,
                            Format: "png",
                        },
                    },
                    {
                        Type: model.ContentTypeFile,
                        File: &model.File{
                            Name:     "report.pdf",
                            URL:      "https://example.com/report.pdf",
                            MimeType: "application/pdf",
                        },
                    },
                },
            },
        },
    }
    _, _ = llm.GenerateContent(context.Background(), request)
}

Advanced features

1. Callback Functions

import (
    anthropicsdk "github.com/anthropics/anthropic-sdk-go"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

model := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithChatRequestCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams) {
        // Log the request before sending.
        log.Printf("sending request: model=%s, messages=%d.", req.Model, len(req.Messages))
    }),
    anthropic.WithChatResponseCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, resp *anthropicsdk.Message) {
        // Log details of the non-streaming response.
        log.Printf("received response: id=%s, input_tokens=%d, output_tokens=%d.", resp.ID, resp.Usage.InputTokens, resp.Usage.OutputTokens)
    }),
    anthropic.WithChatChunkCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, chunk *anthropicsdk.MessageStreamEventUnion) {
        // Log the type of the streaming event.
        log.Printf("stream event: %T.", chunk.AsAny())
    }),
    anthropic.WithChatStreamCompleteCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, acc *anthropicsdk.Message, streamErr error) {
        // Log stream completion or error.
        if streamErr != nil {
            log.Printf("stream failed: %v.", streamErr)
            return
        }
        log.Printf("stream completed: finish_reason=%s, input_tokens=%d, output_tokens=%d.", acc.StopReason, acc.Usage.InputTokens, acc.Usage.OutputTokens)
    }),
)

2. Model Switching

Model switching allows dynamically changing the LLM model used by an Agent at runtime. This section shows static switching for Anthropic model instances at the agent level and per-request level. To select a model separately for each LLM call within the same runner.Run(...), use ModelSelector.

Agent-level Switching

Agent-level switching changes the Agent's default model, affecting all subsequent requests.

Approach 1: Direct Model Instance

Set the model directly by passing a model instance to SetModel:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Create Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")),
)

// Switch to another model.
agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))

Use Cases:

// Select model based on task complexity.
if isComplexTask {
    agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))  // Use powerful model.
} else {
    agent.SetModel(anthropic.New("claude-3-5-haiku-20241022"))  // Use fast model.
}

Approach 2: Switch by Name

Pre-register multiple models with WithModels, then switch by name using SetModelByName:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Create multiple model instances.
sonnet := anthropic.New("claude-3-5-sonnet-20241022")
haiku := anthropic.New("claude-3-5-haiku-20241022")

// Register all models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": sonnet,
        "fast":  haiku,
    }),
    llmagent.WithModel(haiku), // Specify initial model.
    llmagent.WithInstruction("You are an intelligent assistant."),
)

// Switch models by name at runtime.
err := agent.SetModelByName("smart")
if err != nil {
    log.Fatal(err)
}

// Switch to another model.
err = agent.SetModelByName("fast")
if err != nil {
    log.Fatal(err)
}

Use Cases:

// Select model based on user tier.
modelName := "fast" // Default to fast model.
if user.IsPremium() {
    modelName = "smart" // Premium users get advanced model.
}
if err := agent.SetModelByName(modelName); err != nil {
    log.Printf("Failed to switch model: %v", err)
}

// Select model based on time of day (cost optimization).
hour := time.Now().Hour()
if hour >= 22 || hour < 8 {
    // Use fast model at night.
    agent.SetModelByName("fast")
} else {
    // Use smart model during the day.
    agent.SetModelByName("smart")
}

Per-request Switching

Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.

Approach 1: Using WithModel Option

Use agent.WithModel to specify a model instance for a single request:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Use a specific model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModel(anthropic.New("claude-3-5-sonnet-20241022")),
)

Approach 2: Using WithModelName Option (Recommended)

Use agent.WithModelName to specify a pre-registered model name for a single request:

// Pre-register multiple models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": anthropic.New("claude-3-5-sonnet-20241022"),
        "fast":  anthropic.New("claude-3-5-haiku-20241022"),
    }),
    llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")), // Default model.
)

runner := runner.NewRunner("app", agent)

// Temporarily use "smart" model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModelName("smart"),
)

// Next request still uses the default model "claude-3-5-haiku-20241022".
eventChan2, err := runner.Run(ctx, userID, sessionID, message2)

Use Cases:

// Dynamically select model based on message complexity.
var opts []agent.RunOption
if isComplexQuery(message) {
    opts = append(opts, agent.WithModelName("smart")) // Use powerful model for complex queries.
}

eventChan, err := runner.Run(ctx, userID, sessionID, message, opts...)

// Use specialized model for specific tasks.
eventChan, err := runner.Run(ctx, userID, sessionID, visionMessage,
    agent.WithModelName("vision"),
)

Configuration Details

WithModels Option:

Accepts a map[string]model.Model where key is the model name and value is the model instance
If both WithModel and WithModels are set, WithModel specifies the initial model
If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
Reserved name: __default__ is used internally by the framework and should not be used

SetModelByName Method:

Parameter: model name (string)
Returns: error if the model name is not found
The model must be pre-registered via WithModels

Per-request Options:

agent.RunOptions.Model: Directly specify a model instance
agent.RunOptions.ModelName: Specify a pre-registered model name
Priority: Model > ModelName > Agent default model
If the model specified by ModelName is not found, it falls back to the Agent's default model

Agent-level vs Per-request Comparison

Feature	Agent-level Switching	Per-request Switching
Scope	All subsequent requests	Current request only
Usage	`SetModel`/`SetModelByName`	`RunOptions.Model`/`ModelName`
State Change	Changes Agent default model	Does not change Agent state
Use Case	Global strategy adjustment	Specific task temporary needs
Concurrency	Affects all concurrent reqs	Does not affect other requests
Typical Examples	User tier, time-based policy	Complex queries, reasoning

Agent-level Approach Comparison

Feature	SetModel	SetModelByName
Usage	Pass model instance	Pass model name
Pre-registration	Not required	Required via WithModels
Error Handling	None	Returns error
Use Case	Simple switching	Complex scenarios, multi-model management
Code Maintenance	Need to hold model instances	Only need to remember names

Important Notes

Agent-level Switching:

Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
Session Persistence: Switching models does not clear session history
Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
Concurrency Safe: Both switching approaches are concurrency-safe

Per-request Switching:

Temporary Override: Only affects the current request, does not change the Agent's default model
Higher Priority: Per-request model settings take precedence over the Agent's default model
No Side Effects: Does not affect other concurrent requests or subsequent requests
Flexible Combination: Can be used in combination with agent-level switching

Model-specific Prompts (LLMAgent):

Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
For a runnable example, see examples/model/promptmap.

Usage Example

For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.

3. Custom HTTP Headers

In environments like gateways, proprietary platforms, or proxy setups, model API requests often require additional HTTP headers (e.g., organization/tenant identifiers, grayscale routing, custom authentication, etc.). The Model module provides three reliable ways to add headers for "all model requests," including standard requests, streaming, file uploads, batch processing, etc.

Recommended order:

Global header via anthropic.WithHeaders (simplest for static headers)
Use Anthropic RequestOption to set global headers (flexible, middleware-friendly)
Use a custom http.RoundTripper injection (advanced, more cross-cutting capabilities)

All methods affect streaming requests, as they use the same underlying client.

1. Using anthropic.WithHeaders for headers

import "trpc.group/trpc-go/trpc-agent-go/model/anthropic"

llm := anthropic.New("claude-sonnet-4-0",
    anthropic.WithHeaders(map[string]string{
        "X-Custom-Header": "custom-value",
        "X-Request-ID":    "req-123",
    }),
)

2. Using Anthropic RequestOption to Set Global Headers

By using WithAnthropicClientOptions combined with anthropicopt.WithHeader or anthropicopt.WithMiddleware, you can inject headers into every request made by the underlying Anthropic client.

import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

llm := anthropic.New("claude-sonnet-4-0",
    // If your platform requires additional headers
    anthropic.WithAnthropicClientOptions(
        anthropicopt.WithHeader("X-Custom-Header", "custom-value"),
        anthropicopt.WithHeader("X-Request-ID", "req-123"),
        // You can also set User-Agent or vendor-specific headers
        anthropicopt.WithHeader("User-Agent", "trpc-agent-go/1.0"),
    ),
)

If you need to set headers conditionally (e.g., only for certain paths or depending on context values), you can use middleware:

import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

    llm := anthropic.New("claude-sonnet-4-0",
        anthropic.WithAnthropicClientOptions(
            anthropicopt.WithMiddleware(
            func(r *http.Request, next anthropicopt.MiddlewareNext) (*http.Response, error) {
                // Example: Set "per-request" headers based on context value
                if v := r.Context().Value("x-request-id"); v != nil {
                    if s, ok := v.(string); ok && s != "" {
                        r.Header.Set("X-Request-ID", s)
                    }
                }
                // Or only for the "message completion" endpoint
                if strings.Contains(r.URL.Path, "v1/messages") {
                    r.Header.Set("X-Feature-Flag", "on")
                }
                return next(r)
            },
        ),
        ),
    )
    ```

##### 3. Using Custom `http.RoundTripper`

For injecting headers at the HTTP transport layer, ideal for scenarios requiring proxying, TLS, custom monitoring, and other capabilities.

```go
import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

type headerRoundTripper struct{ base http.RoundTripper }

func (rt headerRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
    // Add or override headers
    req.Header.Set("X-Custom-Header", "custom-value")
    req.Header.Set("X-Trace-ID", "trace-xyz")
    return rt.base.RoundTrip(req)
}

llm := anthropic.New("claude-sonnet-4-0",
    anthropic.WithHTTPClientOptions(
        anthropic.WithHTTPClientTransport(headerRoundTripper{base: http.DefaultTransport}),
    ),
)

Regarding "per-request" headers:

The Agent/Runner will propagate ctx to the model call; middleware can read the value from req.Context() to inject headers for "this call."
For message completion, the current API doesn't expose per-call BaseURL overrides; if switching is needed, create a model using a different BaseURL or modify the r.URL in middleware.

4. Token Tailoring

Anthropic models also support Token Tailoring functionality, designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API.

Automatic Mode (Recommended):

import (
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Enable token tailoring with automatic configuration
model := anthropic.New("claude-3-5-sonnet",
    anthropic.WithEnableTokenTailoring(true),
)

Advanced Mode:

// Custom token limit and strategy
model := anthropic.New("claude-3-5-sonnet",
    anthropic.WithEnableTokenTailoring(true),               // Required: enable token tailoring
    anthropic.WithMaxInputTokens(10000),                    // Custom token limit
    anthropic.WithTokenCounter(customCounter),              // Optional: custom counter
    anthropic.WithTailoringStrategy(customStrategy),        // Optional: custom strategy
)

For detailed explanations of the token calculation formula, tailoring strategy, and custom strategy implementation, please refer to Token Tailoring under OpenAI Model.

5. Streaming Tool Call Deltas: ShowToolCallDelta

By default, the Anthropic adapter accumulates tool-call arguments internally and returns the complete arguments once the model finishes the tool call, through Response.Choices[0].Message.ToolCalls in the final response.

WithShowToolCallDelta exposes Anthropic input_json_delta chunks during streaming. Enable this option when tool arguments are long, or when the frontend needs to display argument generation progress:

llm := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithShowToolCallDelta(true),
)

When WithShowToolCallDelta(true) is enabled:

Streaming responses may include Response.Choices[0].Delta.ToolCalls.
Delta.ToolCalls[*].Function.Arguments is the newly produced argument string fragment, and is usually not complete JSON.
Delta.ToolCalls[*].ID and Index can be used to join fragments for the same tool call.
The final response still returns the complete tool call in Response.Choices[0].Message.ToolCalls, so existing tool execution logic can continue to use the final response unchanged.

Typical handling pattern:

Read Response.Choices[0].Delta.ToolCalls[*].Function.Arguments on each partial response.
Group chunks by tool call ID or Index, and append the Arguments fragments in order.
Treat the accumulated value as the tool argument JSON. For progressive rendering, render the string as it grows and unmarshal it only after it becomes valid JSON.

Provider

With the emergence of multiple large model providers, some have defined their own API specifications. Currently, the framework has integrated the APIs of OpenAI and Anthropic, and exposes them as models. Users can access different provider models through openai.New and anthropic.New.

However, there are differences in instantiation and configuration between providers, which often requires developers to modify a significant amount of code when switching between providers, increasing the cost of switching.

To solve this problem, the Provider offers a unified model instantiation entry point. Developers only need to specify the provider and model name, and other configuration options are managed through the unified Option, simplifying the complexity of switching between providers.

The Provider supports the following Option:

Option	Description
`WithAPIKey` / `WithBaseURL`	Set the API Key and Base URL for the model
`WithHTTPClientName` / `WithHTTPClientTransport`	Configure HTTP client properties
`WithHeaders`	Append static HTTP headers across requests
`WithChannelBufferSize`	Adjust the response channel buffer size
`WithCallbacks`	Configure OpenAI / Anthropic request, response, and streaming callbacks
`WithExtraFields`	Configure custom fields in the request body
`WithEnableTokenTailoring` / `WithMaxInputTokens` `WithTokenCounter` / `WithTailoringStrategy`	Token trimming related parameters
`WithTokenTailoringConfig`	Custom token tailoring budget parameters for advanced configuration
`WithOpenAIOption` / `WithAnthropicOption`	Pass-through native options for the respective providers

Usage Example

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/provider"
)

providerName := "openai"        // provider supports openai and anthropic.
modelName := "deepseek-v4-flash"

modelInstance, err := provider.Model(
    providerName,
    modelName,
    provider.WithAPIKey(c.apiKey),
    provider.WithBaseURL(c.baseURL),
    provider.WithChannelBufferSize(c.channelBufferSize),
    provider.WithEnableTokenTailoring(c.tokenTailoring),
    provider.WithMaxInputTokens(c.maxInputTokens),
)

agent := llmagent.New("chat-assistant", llmagent.WithModel(modelInstance))

Advanced Configuration with TokenTailoringConfig:

For advanced users who need to fine-tune token allocation strategy, you can use WithTokenTailoringConfig:

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/provider"
)

// Custom token tailoring budget parameters for all providers
config := &model.TokenTailoringConfig{
    ProtocolOverheadTokens: 1024,
    ReserveOutputTokens:    4096,
    SafetyMarginRatio:      0.15,
}

modelInstance, err := provider.Model(
    "openai",
    "deepseek-v4-flash",
    provider.WithAPIKey(c.apiKey),
    provider.WithEnableTokenTailoring(true),
    provider.WithTokenTailoringConfig(config),
)

Full code can be found in examples/provider.

Registering a Custom Provider

The framework supports registering custom providers to integrate other large model providers or custom model implementations.

Using provider.Register, you can define a method to create custom model instances based on the options.

import "trpc.group/trpc-go/trpc-agent-go/model/provider"

provider.Register("custom-provider", func(opts *provider.Options) (model.Model, error) {
    return newCustomModel(opts.ModelName, WithAPIKey(opts.APIKey)), nil
})

customModel, err := provider.Model("custom-provider", "custom-model")

Model Failover

model/failover provides a priority-ordered fallback wrapper for models. It can compose multiple model.Model instances into a primary/backup chain and automatically switch to the next candidate when the current one is unavailable. This is useful for scenarios such as primary/backup domains, gateways, or model providers.

Quick Start:

import (
    "trpc.group/trpc-go/trpc-agent-go/model/failover"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

primary := openai.New(
    "gpt-4o-mini",
    openai.WithBaseURL("https://api.openai.com/v1"),
)
backup := openai.New(
    "deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
)

llm, err := failover.New(
    failover.WithCandidates(primary, backup),
)
if err != nil {
    return err
}

failover.New(...) returns a regular model.Model, so it can be passed directly to places that accept model.Model, such as llmagent.WithModel(...). For a complete example, see examples/model/failover.

Switching Rules:

Candidates are tried in the order provided to WithCandidates(...).
Switching is allowed only before the first non-error chunk is received.
If the current candidate returns an error directly before that point, or returns an error response with Response.Error, the wrapper continues with the next candidate.
Once any non-error chunk has been returned to the caller, later streaming failures are surfaced directly instead of replaying the request on the backup candidate.

Model Hedge

model/hedge provides a wrapper for hedging the time to first meaningful response. It starts the primary candidate immediately, then launches later candidates according to the configured hedge delay. If all currently active candidates have already failed, it launches the next candidate early. The first candidate that produces a meaningful response becomes the winner, and the remaining candidates are canceled.

Quick Start:

import (
    "trpc.group/trpc-go/trpc-agent-go/model/hedge"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

primary := openai.New(
    "gpt-4o-mini",
    openai.WithBaseURL("https://api.openai.com/v1"),
)
backup := openai.New(
    "deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
)

llm, err := hedge.New(
    hedge.WithName("chat-hedge"),
    hedge.WithCandidates(primary, backup),
)
if err != nil {
    return err
}

hedge.New(...) also returns a regular model.Model, so it can be passed directly to places that accept model.Model, such as llmagent.WithModel(...). This quick start uses the package default delay. Use WithDelay(...) or WithDelays(...) when you need explicit launch scheduling. If the hedge wrapper is used with context-threshold summary or token tailoring and the candidates have different or unknown context windows, set a stable wrapper window with hedge.WithContextWindow(...); otherwise the wrapper reports the shared candidate window only when all candidates expose the same positive value. For a complete example, see examples/model/hedge.

Scheduling And Commit Rules:

The first candidate always starts immediately when the request begins.
WithDelay(...) defines one fixed interval between successive hedge launches, while WithDelays(...) can specify explicit launch offsets for candidates 1..n.
If all currently active candidates have already failed and there are still candidates that have not started, the wrapper launches the next candidate immediately instead of waiting for the timer.
Only the first candidate that starts producing meaningful content becomes the winner. Empty or metadata-only responses do not win.
Once a winner has been committed, the other candidates are canceled and only the winner's stream is forwarded to the caller.

Configuration Examples:

The following snippets show only the scheduling differences.

llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelay(100*time.Millisecond),
)

This configuration means:

candidate[0] starts at 0ms.
candidate[1] starts at 100ms.
candidate[2] starts at 200ms.

llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelays(80*time.Millisecond, 250*time.Millisecond),
)

This configuration means:

candidate[0] starts at 0ms.
candidate[1] starts at 80ms.
candidate[2] starts at 250ms.

WithDelays(...) uses absolute launch offsets from request start, not incremental gaps relative to the previous candidate.

llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelays(0, 0),
)

This configuration means:

candidate[0] starts at 0ms.
candidate[1] starts at 0ms.
candidate[2] starts at 0ms.

This is the all-at-once case where every candidate launches immediately when the request begins. In fixed-interval form, the same setup can be written as WithDelay(0).

ModelSelector

ModelSelector dynamically selects a model for each framework-managed LLM call within the same runner.Run(...).

selectModel := func(ctx context.Context, inv *agent.Invocation) (model.Model, error) {
    if shouldUseAnotherModel(inv) {
        return anotherModel, nil
    }
    return nil, nil
}

events, err := r.Run(
    ctx,
    userID,
    sessionID,
    message,
    agent.WithModelSelector(selectModel),
)

If an LLMAgent has its own fixed model selection policy, configure it when creating the agent:

a := llmagent.New(
    "assistant",
    llmagent.WithModel(defaultModel),
    llmagent.WithModelSelector(selectModel),
)

Notes:

When both are configured, agent.WithModelSelector(...) takes precedence over llmagent.WithModelSelector(...).
If selector returns nil, nil, the model is not switched and the current inv.Model is kept; returning an error terminates the current LLM call.

For a complete example, see examples/model/selector.