Skip to content

Model Module

Overview

The Model module is the large language model abstraction layer of the tRPC-Agent-Go framework, providing a unified LLM interface design that currently supports OpenAI-compatible and Anthropic-compatible API calls. Through standardized interface design, developers can flexibly switch between different model providers, achieving seamless model integration and invocation. This module has been verified to be compatible with most OpenAI-like interfaces both inside and outside the company.

The Model module has the following core features:

  • Unified Interface Abstraction: Provides standardized Model interface, shielding differences between model providers
  • Streaming Response Support: Native support for streaming output, enabling real-time interactive experience
  • Multimodal Capabilities: Supports text, image, audio, and other multimodal content processing
  • Complete Error Handling: Provides dual-layer error handling mechanism, distinguishing between system errors and API errors
  • Extensible Configuration: Supports rich custom configuration options to meet different scenario requirements

Quick Start

Using Model in Agent

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
    "trpc.group/trpc-go/trpc-agent-go/runner"
    "trpc.group/trpc-go/trpc-agent-go/tool"
)

func main() {
    // 1. Create model instance.
    modelInstance := openai.New("deepseek-v4-flash",
        openai.WithExtraFields(map[string]interface{}{
            "tool_choice": "auto", // Automatically select tools.
        }),
    )

    // 2. Configure generation parameters.
    genConfig := model.GenerationConfig{
        MaxTokens:   intPtr(2000),
        Temperature: floatPtr(0.7),
        Stream:      true, // Enable streaming output.
    }

    // 3. Create Agent and integrate model.
    agent := llmagent.New(
        "chat-assistant",
        llmagent.WithModel(modelInstance),
        llmagent.WithDescription("A helpful assistant"),
        llmagent.WithInstruction("You are an intelligent assistant, use tools when needed."),
        llmagent.WithGenerationConfig(genConfig),
        llmagent.WithTools([]tool.Tool{calculatorTool, timeTool}),
    )

    // 4. Create Runner and run.
    r := runner.NewRunner("app-name", agent)
    eventChan, err := r.Run(ctx, userID, sessionID, model.NewUserMessage("Hello"))
    if err != nil {
        log.Fatal(err)
    }

    // 5. Handle response events.
    for event := range eventChan {
        // Handle streaming responses, tool calls, etc.
    }
}

Example code is located at examples/runner

Usage Methods and Platform Integration Guide

The Model module supports multiple usage methods and platform integration. The following are common usage scenarios based on Runner examples:

Quick Start

1
2
3
4
5
# Basic usage: Configure through environment variables, run directly.
cd examples/runner
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_API_KEY="your-api-key"
go run main.go -model deepseek-v4-flash

Platform Integration Configuration

All platform integration methods follow the same pattern, only requiring configuration of different environment variables or direct setting in code:

Environment Variable Method (Recommended):

export OPENAI_BASE_URL="Platform API address"
export OPENAI_API_KEY="API key"

Code Method:

1
2
3
4
model := openai.New("Model name",
    openai.WithBaseURL("Platform API address"),
    openai.WithAPIKey("API key"),
)

Supported Platforms and Their Configuration

The following are configuration examples for each platform, divided into environment variable configuration and code configuration methods:

Environment Variable Configuration

The runner example supports specifying model names through command line parameters (-model), which is actually passing the model name when calling openai.New().

# OpenAI platform.
export OPENAI_API_KEY="sk-..."
cd examples/runner
go run main.go -model gpt-4o-mini

# OpenAI API compatible.
export OPENAI_BASE_URL="https://api.deepseek.com/v1"
export OPENAI_API_KEY="your-api-key"
cd examples/runner
go run main.go -model deepseek-v4-flash

Code Configuration Method

Configuration method when directly using Model in your own code:

1
2
3
4
5
6
model := openai.New("deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
    openai.WithAPIKey("your-api-key"),
)

// Other platform configurations are similar, only need to modify model name, BaseURL and APIKey, no additional fields needed.

Core Interface Design

Model Interface

// Model is the interface that all language models must implement.
type Model interface {
    // Generate content, supports streaming response.
    GenerateContent(ctx context.Context, request *Request) (<-chan *Response, error)

    // Return basic model information.
    Info() Info
}

// Model information structure.
type Info struct {
    Name string // Model name.
}

// IterModel is an optional extension of Model that reduces channel overhead for streaming.
type IterModel interface {
    Model
    GenerateContentIter(ctx context.Context, request *Request) (Seq[*Response], error)
}

// Seq is a callback-based sequence type that yields one value on demand.
type Seq[T any] func(yield func(T) bool)

Channel-based streaming typically requires a dedicated goroutine and incurs channel synchronization on each chunk. In high-frequency streaming, this overhead can become a measurable cost. IterModel is an optional iterator-style API that streams responses synchronously in the caller goroutine to reduce this overhead.

When a model implements IterModel, the framework uses GenerateContentIter; otherwise it uses GenerateContent. Implementations must call yield sequentially and stop when it returns false.

Request Structure

// Request represents the request sent to the model.
type Request struct {
    // Message list, containing system instructions, user input and assistant replies.
    Messages []Message `json:"messages"`

    // Generation configuration (inlined into request).
    GenerationConfig `json:",inline"`

    // Tool list.
    Tools map[string]tool.Tool `json:"-"`
}

// GenerationConfig contains generation parameter configuration.
type GenerationConfig struct {
    // Whether to use streaming response.
    Stream bool `json:"stream"`

    // Temperature parameter (0.0-2.0).
    Temperature *float64 `json:"temperature,omitempty"`

    // Maximum generation token count.
    MaxTokens *int `json:"max_tokens,omitempty"`

    // Top-P sampling parameter.
    TopP *float64 `json:"top_p,omitempty"`

    // Stop generation markers.
    Stop []string `json:"stop,omitempty"`

    // Frequency penalty.
    FrequencyPenalty *float64 `json:"frequency_penalty,omitempty"`

    // Presence penalty.
    PresencePenalty *float64 `json:"presence_penalty,omitempty"`

    // Reasoning effort level. Accepted values depend on the provider:
    //   - OpenAI o-series: "low", "medium", "high".
    //   - Anthropic adaptive thinking: "low", "medium", "high", "max",
    //     and "xhigh" on models that support it.
    //   - DeepSeek v4 (deepseek-v4-pro / deepseek-v4-flash): "high", "max"
    //     (server maps "low"/"medium" -> "high" and "xhigh" -> "max" for
    //     backward compatibility).
    // OpenAI-compatible providers forward this field to the selected backend
    // model; use values supported by that backend.
    ReasoningEffort *string `json:"reasoning_effort,omitempty"`

    // Whether to enable thinking mode.
    // nil means no thinking-toggle field is emitted, so the provider default applies.
    ThinkingEnabled *bool `json:"thinking_enabled,omitempty"`

    // Maximum token count for thinking mode.
    ThinkingTokens *int `json:"thinking_tokens,omitempty"`
}

GenerationConfig itself is just a plain struct, and its zero value means Stream=false. For LLMAgent, if you omit llmagent.WithGenerationConfig(...), the framework forwards that zero value as-is, so the default behavior is non-streaming. Higher-level wrappers can still choose different semantics by setting their own explicit defaults.

Response Structure

// Response represents the response returned by the model.
type Response struct {
    // OpenAI compatible fields.
    ID                string   `json:"id,omitempty"`
    Object            string   `json:"object,omitempty"`
    Created           int64    `json:"created,omitempty"`
    Model             string   `json:"model,omitempty"`
    SystemFingerprint *string  `json:"system_fingerprint,omitempty"`
    Choices           []Choice `json:"choices,omitempty"`
    Usage             *Usage   `json:"usage,omitempty"`

    // Error information.
    Error *ResponseError `json:"error,omitempty"`

    // Internal fields.
    Timestamp time.Time `json:"-"`
    Done      bool      `json:"-"`
    IsPartial bool      `json:"-"`
}

// ResponseError represents API-level errors.
type ResponseError struct {
    Message string    `json:"message"`
    Type    ErrorType `json:"type"`
    Param   string    `json:"param,omitempty"`
    Code    string    `json:"code,omitempty"`
}

// Usage represents token usage information.
type Usage struct {
    PromptTokens            int                     `json:"prompt_tokens"`
    CompletionTokens        int                     `json:"completion_tokens"`
    TotalTokens             int                     `json:"total_tokens"`
    PromptTokensDetails     PromptTokensDetails     `json:"prompt_tokens_details"`
    CompletionTokensDetails CompletionTokensDetails `json:"completion_tokens_details"`
    TimingInfo              *TimingInfo             `json:"timing_info,omitempty"`
}

type PromptTokensDetails struct {
    CachedTokens        int `json:"cached_tokens"`
    CacheCreationTokens int `json:"cache_creation_tokens,omitempty"`
    CacheReadTokens     int `json:"cache_read_tokens,omitempty"`
}

type CompletionTokensDetails struct {
    ReasoningTokens int `json:"reasoning_tokens,omitempty"`
}

For OpenAI-compatible providers, completion_tokens_details.reasoning_tokens is mapped to Usage.CompletionTokensDetails.ReasoningTokens. The value may be 0 when the provider does not spend or report reasoning tokens; for reasoning models, set ReasoningEffort and/or ThinkingEnabled when you want to request reasoning behavior.

OpenAI Model

Model Name Parameter

When creating an OpenAI model instance using openai.New(name string, opts ...Option), the first parameter is the actual model name that gets sent to the OpenAI API, as the specific model identifier that tells the API which language model to use.

Since the framework supports different models compatible with the OpenAI API, you can obtain the base URL, API key, and model name from various model providers:

1. OpenAI Official

  • Base URL: https://api.openai.com/v1
  • Model Names: gpt-4o, gpt-4o-mini, etc.

2. DeepSeek

  • Base URL: https://api.deepseek.com
  • Model Names: deepseek-v4-flash, deepseek-v4-pro

deepseek-chat and deepseek-reasoner are deprecated compatibility aliases; prefer the explicit DeepSeek v4 model names for new code.

3. Tencent Hunyuan

  • Base URL: https://api.hunyuan.cloud.tencent.com/v1
  • Model Names: hunyuan-2.0-thinking-20251109, hunyuan-2.0-instruct-20251111, etc.

4. Other Providers

  • Qwen: Base URL https://dashscope.aliyuncs.com/compatible-mode/v1, Model Names: various qwen models

The OpenAI Model is used to interface with OpenAI and its compatible platforms. It supports streaming output, multimodal and advanced parameter configuration, and provides rich callback mechanisms, batch processing and retry capabilities. It also allows for flexible setting of custom HTTP headers.

Configuration Method

Environment Variable Method

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com" # Optional configuration, default is this BASE URL

Code Method

1
2
3
4
5
6
7
import "trpc.group/trpc-go/trpc-agent-go/model/openai"

m := openai.New(
    "gpt-4o",
    openai.WithAPIKey("your-api-key"),
    openai.WithBaseURL("https://api.openai.com"), // Optional configuration, default is this BASE URL
)

Direct Model Usage

import (
    "context"
    "fmt"

    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

func main() {
    // Create model instance.
    llm := openai.New("deepseek-v4-flash")

    // Build request.
    temperature := 0.7
    maxTokens := 1000

    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional AI assistant."),
            model.NewUserMessage("Introduce Go language's concurrency features."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      false,
        },
    }

    // Call model.
    ctx := context.Background()
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }

    // Handle response.
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("API error: %s\n", response.Error.Message)
            return
        }

        if len(response.Choices) > 0 {
            fmt.Printf("Reply: %s\n", response.Choices[0].Message.Content)
        }

        if response.Done {
            break
        }
    }
}

Structured Output

When calling model.GenerateContent directly, use model.NewRequest and model.WithStructuredOutputJSON to generate a JSON schema from a Go struct and pass it to model adapters that support provider-native structured output.

type StageParseResult struct {
    Stage  string `json:"stage"`
    Reason string `json:"reason"`
}

request := model.NewRequest(
    []model.Message{
        model.NewUserMessage("Classify the current user intent stage."),
    },
    model.WithStructuredOutputJSON(
        new(StageParseResult),
        true,
        "Return stage parse result as JSON.",
    ),
)

responseChan, err := llm.GenerateContent(ctx, request)
if err != nil {
    return err
}

var final string
for response := range responseChan {
    if response.Error != nil {
        return fmt.Errorf("model error: %s", response.Error.Message)
    }
    if len(response.Choices) > 0 {
        final = response.Choices[0].Message.Content
    }
}

var result StageParseResult
if err := json.Unmarshal([]byte(final), &result); err != nil {
    return err
}

WithStructuredOutputJSON only configures the model request. Direct model callers still receive normal model.Response values and should unmarshal the final JSON content themselves. If you already have a hand-written schema, set request.StructuredOutput directly.

Streaming Output

// Streaming request configuration.
request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a creative story teller."),
        model.NewUserMessage("Write a short story about a robot learning to paint."),
    },
    GenerationConfig: model.GenerationConfig{
        Stream: true,  // Enable streaming output.
    },
}

// Handle streaming response.
responseChan, err := llm.GenerateContent(ctx, request)
if err != nil {
    return err
}

for response := range responseChan {
    if response.Error != nil {
        fmt.Printf("Error: %s", response.Error.Message)
        return
    }

    if len(response.Choices) > 0 && response.Choices[0].Delta.Content != "" {
        fmt.Print(response.Choices[0].Delta.Content)
    }

    if response.Done {
        break
    }
}

Advanced Parameter Configuration

// Use advanced generation parameters.
temperature := 0.3
maxTokens := 2000
topP := 0.9
presencePenalty := 0.2
frequencyPenalty := 0.5
reasoningEffort := "high"

request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a professional technical documentation writer."),
        model.NewUserMessage("Explain the advantages and disadvantages of microservice architecture."),
    },
    GenerationConfig: model.GenerationConfig{
        Temperature:      &temperature,
        MaxTokens:        &maxTokens,
        TopP:             &topP,
        PresencePenalty:  &presencePenalty,
        FrequencyPenalty: &frequencyPenalty,
        ReasoningEffort:  &reasoningEffort,
        Stream:           true,
    },
}

Multimodal Content

// Read image file.
imageData, _ := os.ReadFile("image.jpg")

// Create multimodal message.
request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are an image analysis expert."),
        {
            Role: model.RoleUser,
            ContentParts: []model.ContentPart{
                {
                    Type: model.ContentTypeText,
                    Text: stringPtr("What's in this image?"),
                },
                {
                    Type: model.ContentTypeImage,
                    Image: &model.Image{
                        Data:   imageData,
                        Format: "jpeg",
                    },
                },
            },
        },
    },
}

Advanced Features

1. Callback Functions

// Set pre-request callback function.
model := openai.New("deepseek-v4-flash",
    openai.WithChatRequestCallback(func(ctx context.Context, req *openai.ChatCompletionNewParams) {
        // Called before request is sent.
        log.Printf("Sending request: model=%s, message count=%d", req.Model, len(req.Messages))
    }),

    // Set response callback function (non-streaming).
    openai.WithChatResponseCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        resp *openai.ChatCompletion) {
        // Called when complete response is received.
        log.Printf("Received response: ID=%s, tokens used=%d",
            resp.ID, resp.Usage.TotalTokens)
    }),

    // Set streaming response callback function.
    openai.WithChatChunkCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        chunk *openai.ChatCompletionChunk) {
        // Called when each streaming response chunk is received.
        log.Printf("Received streaming chunk: ID=%s", chunk.ID)
    }),

    // Set streaming completion callback function.
    openai.WithChatStreamCompleteCallback(func(ctx context.Context,
        req *openai.ChatCompletionNewParams,
        acc *openai.ChatCompletionAccumulator,
        streamErr error) {
        // Called when streaming is completely finished (success or error).
        if streamErr != nil {
            log.Printf("Streaming failed: %v", streamErr)
        } else {
            log.Printf("Streaming completed: reason=%s",
                acc.Choices[0].FinishReason)
        }
    }),
)
Dynamically Modifying Request Body via Callback

WithChatRequestCallback receives a *openai.ChatCompletionNewParams pointer, allowing you to dynamically add or modify fields in the HTTP request body before each request is sent. Use SetExtraFields on the params to inject custom JSON fields:

import (
    "context"

    openai "github.com/openai/openai-go"
    oaimodel "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

model := oaimodel.New("deepseek-v4-flash",
    oaimodel.WithChatRequestCallback(func(ctx context.Context, req *openai.ChatCompletionNewParams) {
        // Dynamically set extra fields based on runtime context.
        userID, _ := ctx.Value("user_id").(string)
        req.SetExtraFields(map[string]any{
            "user":            userID,
            "custom_metadata": map[string]string{"session_id": "abc"},
        })
    }),
)

Difference between request-body customization approaches:

Aspect WithExtraFields agent.WithModelRequestExtraFields SetExtraFields in WithChatRequestCallback
Timing Set once at model creation Set on one runner.Run(...) call Called before every model request
Dynamism Static values only Dynamic values known at run time Dynamic values based on ctx or runtime state
Mechanism Injected via openaiopt.WithJSONSet (RequestOption layer) Copied into model.Request.ExtraFields, then injected by supported adapters Set on the ChatCompletionNewParams struct (serialization layer)
Same key conflict Overwritten by request-level extra fields Takes precedence over model-level extra fields Overwritten by RequestOption-layer extra fields if the same key exists

When multiple approaches use different keys, all fields appear in the final JSON body without conflict. When they set the same key, request-level extra fields passed with agent.WithModelRequestExtraFields take precedence in OpenAI-compatible adapters.

Request-scoped extra fields from Runner

Use agent.WithModelRequestExtraFields when a provider-specific top-level request body field should vary per runner.Run(...) call, for example prompt_cache_key on OpenAI-compatible endpoints:

import (
    "context"

    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/runner"
)

func runOnce(ctx context.Context, r runner.Runner, cacheKey string) error {
    events, err := r.Run(
        ctx,
        "user-001",
        "session-001",
        model.NewUserMessage("Hello"),
        agent.WithModelRequestExtraFields(map[string]any{
            "prompt_cache_key": cacheKey,
        }),
    )
    if err != nil {
        return err
    }
    for range events {
    }
    return nil
}

When a provider requires both a request body cache key and a routing header for the same conversation, combine this with request-scoped headers:

events, err := r.Run(
    ctx,
    "user-001",
    "session-001",
    model.NewUserMessage("Hello"),
    agent.WithModelRequestExtraFields(map[string]any{
        "prompt_cache_key": cacheKey,
    }),
    agent.WithModelRequestHeaders(map[string]string{
        "X-Session-ID": "session-001",
    }),
)

This option applies to every model call created during that run, including normal LLM agents, graph LLM nodes, and requests routed through failover or hedge models. The built-in OpenAI and HuggingFace/OpenAI-compatible adapters merge these fields into the top-level JSON request body. Other provider adapters ignore the field unless they add an explicit provider-specific mapping.

2. Model Switching

Model switching allows dynamically changing the LLM model used by an Agent at runtime. This section shows static switching for OpenAI and OpenAI-compatible model instances at the agent level and per-request level. To select a model separately for each LLM call within the same runner.Run(...), use ModelSelector.

Agent-level Switching

Agent-level switching changes the Agent's default model, affecting all subsequent requests.

Approach 1: Direct Model Instance

Set the model directly by passing a model instance to SetModel:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModel(openai.New("gpt-4o-mini")),
)

// Switch to another model.
agent.SetModel(openai.New("gpt-4o"))

Use Cases:

1
2
3
4
5
6
// Select model based on task complexity.
if isComplexTask {
    agent.SetModel(openai.New("gpt-4o"))  // Use powerful model.
} else {
    agent.SetModel(openai.New("gpt-4o-mini"))  // Use fast model.
}
Approach 2: Switch by Name

Pre-register multiple models with WithModels, then switch by name using SetModelByName:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create multiple model instances.
strong := openai.New("gpt-4o")
fast := openai.New("gpt-4o-mini")

// Register all models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": strong,
        "fast":  fast,
    }),
    llmagent.WithModel(fast), // Specify initial model.
    llmagent.WithInstruction("You are an intelligent assistant."),
)

// Switch models by name at runtime.
err := agent.SetModelByName("smart")
if err != nil {
    log.Fatal(err)
}

// Switch to another model.
err = agent.SetModelByName("fast")
if err != nil {
    log.Fatal(err)
}

Use Cases:

// Select model based on user tier.
modelName := "fast" // Default to fast model.
if user.IsPremium() {
    modelName = "smart" // Premium users get advanced model.
}
if err := agent.SetModelByName(modelName); err != nil {
    log.Printf("Failed to switch model: %v", err)
}

// Select model based on time of day (cost optimization).
hour := time.Now().Hour()
if hour >= 22 || hour < 8 {
    // Use fast model at night.
    agent.SetModelByName("fast")
} else {
    // Use smart model during the day.
    agent.SetModelByName("smart")
}
Per-request Switching

Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.

Approach 1: Using WithModel Option

Use agent.WithModel to specify a model instance for a single request:

1
2
3
4
5
6
7
8
9
import (
    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Use a specific model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModel(openai.New("gpt-4o")),
)

Use agent.WithModelName to specify a pre-registered model name for a single request:

// Pre-register multiple models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart":  openai.New("gpt-4o"),
        "fast":   openai.New("gpt-4o-mini"),
        "vision": openai.New("gpt-4o"),
    }),
    llmagent.WithModel(openai.New("gpt-4o-mini")), // Default model.
)

runner := runner.NewRunner("app", agent)

// Temporarily use "smart" model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModelName("smart"),
)

// Next request still uses the default model "gpt-4o-mini".
eventChan2, err := runner.Run(ctx, userID, sessionID, message2)

Use Cases:

// Dynamically select model based on message complexity.
var opts []agent.RunOption
if isComplexQuery(message) {
    opts = append(opts, agent.WithModelName("smart")) // Use powerful model for complex queries.
}

eventChan, err := runner.Run(ctx, userID, sessionID, message, opts...)

// Use a specialized model for a specific task.
eventChan, err := runner.Run(ctx, userID, sessionID, visionMessage,
    agent.WithModelName("vision"),
)
Configuration Details

WithModels Option:

  • Accepts a map[string]model.Model where key is the model name and value is the model instance
  • If both WithModel and WithModels are set, WithModel specifies the initial model
  • If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
  • Reserved name: __default__ is used internally by the framework and should not be used

SetModelByName Method:

  • Parameter: model name (string)
  • Returns: error if the model name is not found
  • The model must be pre-registered via WithModels

Per-request Options:

  • agent.RunOptions.Model: Directly specify a model instance
  • agent.RunOptions.ModelName: Specify a pre-registered model name
  • agent.RunOptions.Stream: Override whether responses are streamed (use agent.WithStream(...))
  • agent.RunOptions.Instruction: Override instruction for this request only (use agent.WithInstruction(...))
  • agent.RunOptions.GlobalInstruction: Override global instruction (system prompt) for this request only (use agent.WithGlobalInstruction(...))
  • Priority: Model > ModelName > Agent default model
  • If the model specified by ModelName is not found, it falls back to the Agent's default model

You can set streaming per request using agent.WithStream(true) or agent.WithStream(false).

Agent-level vs Per-request Comparison
Feature Agent-level Switching Per-request Switching
Scope All subsequent requests Current request only
Usage SetModel/SetModelByName RunOptions.Model/ModelName
State Change Changes Agent default model Does not change Agent state
Use Case Global strategy adjustment Specific task temporary needs
Concurrency Affects all concurrent reqs Does not affect other requests
Typical Examples User tier, time-based policy Complex queries, reasoning
Agent-level Approach Comparison
Feature SetModel SetModelByName
Usage Pass model instance Pass model name
Pre-registration Not required Required via WithModels
Error Handling None Returns error
Use Case Simple switching Complex scenarios, multi-model management
Code Maintenance Need to hold model instances Only need to remember names
Important Notes

Agent-level Switching:

  • Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
  • Session Persistence: Switching models does not clear session history
  • Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
  • Concurrency Safe: Both switching approaches are concurrency-safe

Per-request Switching:

  • Temporary Override: Only affects the current request, does not change the Agent's default model
  • Higher Priority: Per-request model settings take precedence over the Agent's default model
  • No Side Effects: Does not affect other concurrent requests or subsequent requests
  • Flexible Combination: Can be used in combination with agent-level switching

Model-specific Prompts (LLMAgent):

  • Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
  • For a runnable example, see examples/model/promptmap.
Usage Example

For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.

3. Batch Processing (Batch API)

Batch API is an asynchronous batch processing technique for efficiently handling large volumes of requests. This feature is particularly suitable for scenarios requiring large-scale data processing, significantly reducing costs and improving processing efficiency.

Core Features
  • Asynchronous Processing: Batch requests are processed asynchronously without waiting for immediate responses
  • Cost Optimization: Typically more cost-effective than individual requests
  • Flexible Input: Supports both inline requests and file-based input
  • Complete Management: Provides full operations including create, retrieve, cancel, and list
  • Result Parsing: Automatically downloads and parses batch processing results
Quick Start

Creating a Batch Job:

import (
    openaisdk "github.com/openai/openai-go"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create model instance.
llm := openai.New("gpt-4o-mini")

// Prepare batch requests.
requests := []*openai.BatchRequestInput{
    {
        CustomID: "request-1",
        Method:   "POST",
        URL:      string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),
        Body: openai.BatchRequest{
            Messages: []model.Message{
                model.NewSystemMessage("You are a helpful assistant."),
                model.NewUserMessage("Hello"),
            },
        },
    },
    {
        CustomID: "request-2",
        Method:   "POST",
        URL:      string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),
        Body: openai.BatchRequest{
            Messages: []model.Message{
                model.NewSystemMessage("You are a helpful assistant."),
                model.NewUserMessage("Introduce Go language"),
            },
        },
    },
}

// Create batch job.
batch, err := llm.CreateBatch(ctx, requests,
    openai.WithBatchCreateCompletionWindow("24h"),
)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Batch job created: %s\n", batch.ID)
Batch Operations

Retrieving Batch Status:

// Get batch details.
batch, err := llm.RetrieveBatch(ctx, batchID)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Status: %s\n", batch.Status)
fmt.Printf("Total requests: %d\n", batch.RequestCounts.Total)
fmt.Printf("Completed: %d\n", batch.RequestCounts.Completed)
fmt.Printf("Failed: %d\n", batch.RequestCounts.Failed)

Downloading and Parsing Results:

// Download output file.
if batch.OutputFileID != "" {
    text, err := llm.DownloadFileContent(ctx, batch.OutputFileID)
    if err != nil {
        log.Fatal(err)
    }

    // Parse batch output.
    entries, err := llm.ParseBatchOutput(text)
    if err != nil {
        log.Fatal(err)
    }

    // Process each result.
    for _, entry := range entries {
        fmt.Printf("[%s] Status code: %d\n", entry.CustomID, entry.Response.StatusCode)
        if len(entry.Response.Body.Choices) > 0 {
            content := entry.Response.Body.Choices[0].Message.Content
            fmt.Printf("Content: %s\n", content)
        }
        if entry.Error != nil {
            fmt.Printf("Error: %s\n", entry.Error.Message)
        }
    }
}

Canceling a Batch Job:

1
2
3
4
5
6
7
// Cancel an in-progress batch.
batch, err := llm.CancelBatch(ctx, batchID)
if err != nil {
    log.Fatal(err)
}

fmt.Printf("Batch job canceled: %s\n", batch.ID)

Listing Batch Jobs:

1
2
3
4
5
6
7
8
9
// List batch jobs (with pagination support).
page, err := llm.ListBatches(ctx, "", 10)
if err != nil {
    log.Fatal(err)
}

for _, batch := range page.Data {
    fmt.Printf("ID: %s, Status: %s\n", batch.ID, batch.Status)
}
Configuration Options

Global Configuration:

1
2
3
4
5
6
7
8
9
// Configure batch default parameters when creating model.
llm := openai.New("gpt-4o-mini",
    openai.WithBatchCompletionWindow("24h"),
    openai.WithBatchMetadata(map[string]string{
        "project": "my-project",
        "env":     "production",
    }),
    openai.WithBatchBaseURL("https://custom-batch-api.com"),
)

Request-level Configuration:

1
2
3
4
5
6
7
// Override default configuration when creating batch.
batch, err := llm.CreateBatch(ctx, requests,
    openai.WithBatchCreateCompletionWindow("48h"),
    openai.WithBatchCreateMetadata(map[string]string{
        "priority": "high",
    }),
)
How It Works

Batch API execution flow:

1
2
3
4
5
6
7
1. Prepare batch requests (BatchRequestInput list)
2. Validate request format and CustomID uniqueness
3. Generate JSONL format input file
4. Upload input file to server
5. Create batch job
6. Process requests asynchronously
7. Download output file and parse results

Key design:

  • CustomID Uniqueness: Each request must have a unique CustomID for matching input/output
  • JSONL Format: Batch processing uses JSONL (JSON Lines) format for storing requests and responses
  • Asynchronous Processing: Batch jobs execute asynchronously in the background without blocking main flow
  • Completion Window: Configurable completion time window for batch processing (e.g., 24h)
Use Cases
  • Large-scale Data Processing: Processing thousands or tens of thousands of requests
  • Offline Analysis: Non-real-time data analysis and processing tasks
  • Cost Optimization: Batch processing is typically more economical than individual requests
  • Scheduled Tasks: Regularly executed batch processing jobs
Usage Example

For a complete interactive example, see examples/model/batch.

4. Retry Mechanism

The retry mechanism is an automatic error recovery technique that automatically retries failed requests. This feature is provided by the underlying OpenAI SDK, with the framework passing retry parameters to the SDK through configuration options.

Timeouts and deadlines

Request lifecycle is bounded by two independent limits:

  • The caller context deadline (for example, Runner max duration, or context.WithTimeout).
  • The OpenAI request timeout configured by openaiopt.WithRequestTimeout.

The effective budget is the earlier one:

  • effective_deadline = min(ctx_deadline, request_timeout)

Important notes:

  • github.com/openai/openai-go does not hardcode timeout by default. If you observe timeout in logs, it typically comes from an upstream deadline (gateway/caller context) or from your own WithRequestTimeout configuration.
  • If you expect long-running calls (streaming, large prompts, tools, or reasoning models), configure WithRequestTimeout to match your service deadline and service level objective (SLO).
Core Features
  • Automatic Retry: SDK automatically handles retryable errors
  • Smart Backoff: Follows API's Retry-After headers or uses exponential backoff
  • Configurable: Supports custom maximum retry count and timeout duration
  • Zero Maintenance: No custom retry logic needed, handled by mature SDK
Quick Start

Basic Configuration:

import (
    "time"
    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Create model instance with retry configuration.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(3),
        openaiopt.WithRequestTimeout(30*time.Second),
    ),
)
Retryable Errors

The OpenAI SDK automatically retries the following errors:

  • 408 Request Timeout: Request timeout
  • 409 Conflict: Conflict error
  • 429 Too Many Requests: Rate limiting
  • 500+ Server Errors: Internal server errors (5xx)
  • Network Connection Errors: No response or connection failure

Note: SDK default maximum retry count is 2.

Retry Strategies

Standard Retry:

1
2
3
4
5
6
7
// Standard configuration suitable for most scenarios.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(3),
        openaiopt.WithRequestTimeout(30*time.Second),
    ),
)

Rate Limiting Optimization:

1
2
3
4
5
6
7
// Optimized configuration for rate limiting scenarios.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(5),  // More retry attempts.
        openaiopt.WithRequestTimeout(60*time.Second),  // Longer timeout.
    ),
)

Fast Fail:

1
2
3
4
5
6
7
// For scenarios requiring quick failure.
llm := openai.New("gpt-4o-mini",
    openai.WithOpenAIOptions(
        openaiopt.WithMaxRetries(1),  // Minimal retries.
        openaiopt.WithRequestTimeout(10*time.Second),  // Short timeout.
    ),
)
How It Works

Retry mechanism execution flow:

1
2
3
4
5
6
1. Send request to LLM API
2. If request fails and error is retryable:
   a. Check if maximum retry count is reached
   b. Calculate wait time based on Retry-After header or exponential backoff
   c. Wait and resend request
3. If request succeeds or error is not retryable, return result

Key design:

  • SDK-level Implementation: Retry logic is completely handled by OpenAI SDK
  • Configuration Pass-through: Framework passes configuration via WithOpenAIOptions
  • Smart Backoff: Prioritizes using Retry-After header returned by API
  • Transparent Handling: Transparent to application layer, no additional code needed
Use Cases
  • Production Environment: Improve service reliability and fault tolerance
  • Rate Limiting: Automatically handle 429 errors
  • Network Instability: Handle temporary network failures
  • Server Errors: Handle temporary server-side issues
Important Notes
  • No Framework Retry: Framework itself does not implement retry logic
  • Client-level Retry: All retry is handled by OpenAI client
  • Configuration Pass-through: Use WithOpenAIOptions to configure retry behavior
  • Automatic Handling: Rate limiting (429) is automatically handled without additional code
Usage Example

For a complete interactive example, see examples/model/retry.

5. Custom HTTP Headers

In some enterprise or proxy scenarios, the model provider requires additional HTTP headers (for example, organization ID, tenant routing, or custom authentication). The Model module supports setting headers in several reliable ways.

Recommended order:

  • Global header via openai.WithHeaders (simplest for static headers)
  • Request-scoped header via agent.WithModelRequestHeaders (for runner.Run(...) calls whose headers vary by user, session, or tenant)
  • Global header via OpenAI RequestOption (flexible, middleware-friendly)
  • Custom http.RoundTripper (advanced, cross-cutting)

All methods affect streaming too because the same client is used for New and NewStreaming calls.

1. Using openai.WithHeaders for headers
1
2
3
4
5
6
7
8
import "trpc.group/trpc-go/trpc-agent-go/model/openai"

llm := openai.New("deepseek-v4-flash",
    openai.WithHeaders(map[string]string{
        "X-Custom-Header": "custom-value",
        "X-Request-ID":    "req-123",
    }),
)
2. Request-scoped headers from Runner

Use agent.WithModelRequestHeaders when a header should vary per runner.Run(...) call, for example X-Session-ID on providers that route a conversation to the same inference instance. The OpenAI adapter merges these headers into the HTTP request after model-level headers, so request-level values take precedence on the same key.

1
2
3
4
5
6
7
8
9
events, err := r.Run(
    ctx,
    "user-001",
    "session-001",
    model.NewUserMessage("Hello"),
    agent.WithModelRequestHeaders(map[string]string{
        "X-Session-ID": "session-001",
    }),
)
3. Global headers using OpenAI RequestOption

Use WithOpenAIOptions with openaiopt.WithHeader or openaiopt.WithMiddleware to inject headers for every request created by the underlying OpenAI client.

import (
    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

llm := openai.New("deepseek-v4-flash",
    // If your provider needs extra headers
    openai.WithOpenAIOptions(
        openaiopt.WithHeader("X-Custom-Header", "custom-value"),
        openaiopt.WithHeader("X-Request-ID", "req-123"),
        // You can also set User-Agent or vendor-specific headers
        openaiopt.WithHeader("User-Agent", "trpc-agent-go/1.0"),
    ),
)

For complex logic, middleware lets you modify headers conditionally (for example, by URL path or context values):

llm := openai.New("deepseek-v4-flash",
    openai.WithOpenAIOptions(
        openaiopt.WithMiddleware(
            func(r *http.Request, next openaiopt.MiddlewareNext) (*http.Response, error) {
                // Example: per-request header via context value
                if v := r.Context().Value("x-request-id"); v != nil {
                    if s, ok := v.(string); ok && s != "" {
                        r.Header.Set("X-Request-ID", s)
                    }
                }
                // Or only for chat completion endpoint
                if strings.Contains(r.URL.Path, "/chat/completions") {
                    r.Header.Set("X-Feature-Flag", "on")
                }
                return next(r)
            },
        ),
    ),
)

Notes for authentication variants:

  • OpenAI style: keep openai.WithAPIKey("sk-...") which sets Authorization: Bearer ... under the hood.
  • Azure/OpenAI‑compatible that use api-key: omit WithAPIKey and set openaiopt.WithHeader("api-key", "<key>") instead.
Logging raw HTTP request and response

You can use openaiopt.WithMiddleware to log the underlying HTTP request and response. Be careful about secrets (API keys, Authorization headers) and body consumption.

Key points:

  • Reading req.Body or resp.Body consumes the stream, so you must restore it.
  • Do not read resp.Body for streaming responses (for example, Content-Type: text/event-stream); skip body logging to avoid blocking or breaking the stream.
import (
    "bytes"
    "io"
    "net/http"
    "strings"

    openaiopt "github.com/openai/openai-go/option"
    "trpc.group/trpc-go/trpc-agent-go/log"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

const streamContentType = "text/event-stream"

llm := openai.New("deepseek-v4-flash",
    openai.WithOpenAIOptions(
        openaiopt.WithMiddleware(
            func(
                req *http.Request,
                next openaiopt.MiddlewareNext,
            ) (*http.Response, error) {
                // 1. Read req.Body.
                bodyBytes, err := io.ReadAll(req.Body)
                if err != nil {
                    return nil, err
                }
                // 2. Log req.Body.
                log.DebugfContext(
                    req.Context(),
                    "Middleware req: %+v",
                    string(bodyBytes),
                )

                // 3. Restore req.Body (critical step).
                req.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))

                resp, err := next(req)
                if err != nil || resp == nil {
                    return resp, err
                }

                // 4. Skip body logging for streaming responses.
                contentType := resp.Header.Get("Content-Type")
                if strings.Contains(contentType, streamContentType) {
                    return resp, nil
                }

                // 5. Read resp.Body.
                respBodyBytes, err := io.ReadAll(resp.Body)
                if err != nil {
                    return resp, err
                }
                // 6. Log resp.Body.
                log.DebugfContext(
                    req.Context(),
                    "Middleware rsp: %+v",
                    string(respBodyBytes),
                )

                // 7. Restore resp.Body (critical step).
                resp.Body = io.NopCloser(bytes.NewBuffer(respBodyBytes))
                return resp, nil
            },
        ),
    ),
)
4. Custom http.RoundTripper (advanced)

Inject headers across all requests at the HTTP layer by wrapping the transport. This is useful when you also need custom proxy, TLS, or metrics logic.

type headerRoundTripper struct{ base http.RoundTripper }

func (rt headerRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
    // Add or override headers
    req.Header.Set("X-Custom-Header", "custom-value")
    req.Header.Set("X-Trace-ID", "trace-xyz")
    return rt.base.RoundTrip(req)
}

llm := openai.New("deepseek-v4-flash",
    openai.WithHTTPClientOptions(
        openai.WithHTTPClientTransport(headerRoundTripper{base: http.DefaultTransport}),
    ),
)

Per-request headers

  • Agent/Runner passes ctx through to the model call; middleware can read values from req.Context() to inject per-invocation headers.
  • Chat completion per-request base URL override is not exposed; create a second model with a different base URL or alter r.URL in middleware.

6. Token Counter

The Token Counter is used to estimate the token count of text content. The framework provides SimpleTokenCounter as the default implementation, supporting custom token counting logic to meet different scenario requirements.

SimpleTokenCounter

SimpleTokenCounter provides a very rough token estimation based on UTF-8 character count. Its features include:

Core Features:

  • Lightweight: Does not depend on external tokenizers, only uses character length estimation
  • Model-agnostic: Suitable for all OpenAI-compatible models
  • Multimodal Support: Supports message content, reasoning content (ReasoningContent), and tool calls

Estimation Principle:

Uses heuristic rules: approximately N UTF-8 characters per token, where N can be configured via WithApproxRunesPerToken. N is a divisor, not a multiplier:

estimatedTokens = countedUTF8Runes / N

Therefore, WithApproxRunesPerToken(1.5) means approximately 1.5 characters per token. Passing 2.0/3.0 means approximately 0.67 characters per token, which is about 1.5 tokens per character.

Usage:

import "trpc.group/trpc-go/trpc-agent-go/model"

// Use default configuration (English scenario)
counter := model.NewSimpleTokenCounter()

// Adjust for Chinese scenarios (Chinese has higher token density)
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(1.6),  // Approx. 1.6 chars/token
)

// Adjust for other languages or specific models
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(3.5),  // Approx. 3.5 chars/token
)

Parameter Description:

WithApproxRunesPerToken(v float64)

  • Purpose: Set the approximate number of characters per token
  • Type: float64
  • Default Value: 4.0 (approx. 4 characters per token, suitable for English scenarios)
  • Value Constraint: Values <= 0 will be ignored, keeping the default value
  • Formula: estimated tokens = counted UTF-8 characters / v; for example, v=1.5 means approximately 1.5 characters per token

Recommended Values for Common Languages:

Language/Scenario Recommended Value Description
English text 4.0 Default value, suitable for English words and common Latin characters
Chinese text 1.6-1.8 Chinese characters have higher token density than English, use smaller values
Japanese text 2.0-2.5 Japanese character token density is between Chinese and English
Mixed text 3.0 Trade-off value for mixed Chinese/English scenarios

Important Notes:

  • Empirical Values: The recommended values above are empirical estimates. Actual token count depends on the specific model's tokenizer implementation.
  • Uncertainty: If you're uncertain about language mix or model characteristics, keep the default value (4.0).
  • Model Differences: Different model vendors (OpenAI, DeepSeek, Hunyuan) have different tokenizer implementations, which may require adjusting this parameter.
  • Accuracy Consideration: SimpleTokenCounter only provides rough estimation. If you need accurate token counting, consider using the model provider's tokenizer API (e.g., OpenAI's tiktoken).

Custom Token Counter

If SimpleTokenCounter's rough estimation doesn't meet your requirements, you can implement a custom TokenCounter interface:

import (
    "context"
    "fmt"
    "trpc.group/trpc-go/trpc-agent-go/model"
)

type MyCustomCounter struct{}

func (c *MyCustomCounter) CountTokens(ctx context.Context, message model.Message) (int, error) {
    // Calculate tokens for message content, reasoning content, and tool calls
    total := 0

    // Process main content
    if len(message.Content) > 0 {
        total += len(message.Content)  // Example: use more accurate algorithm
    }

    // Process reasoning content
    if len(message.ReasoningContent) > 0 {
        total += len(message.ReasoningContent)
    }

    // Process multimodal content
    for _, part := range message.ContentParts {
        if part.Text != nil {
            total += len(*part.Text)
        }
    }

    // Process tool calls
    for _, toolCall := range message.ToolCalls {
        total += len(toolCall.Function.Name)
        total += len(string(toolCall.Function.Arguments))
    }

    return total, nil
}

func (c *MyCustomCounter) CountTokensRange(ctx context.Context, messages []model.Message, start, end int) (int, error) {
    if start < 0 || end > len(messages) || start >= end {
        return 0, fmt.Errorf("invalid range: start=%d, end=%d, len=%d", start, end, len(messages))
    }

    total := 0
    for i := start; i < end; i++ {
        tokens, err := c.CountTokens(ctx, messages[i])
        if err != nil {
            return 0, err
        }
        total += tokens
    }
    return total, nil
}

// Use custom counter when creating a model
llm := openai.New("deepseek-v4-flash",
    openai.WithTokenCounter(&MyCustomCounter{}),
)

Using Token Counter in Summarizer

SimpleTokenCounter can also be used with the session/summary module to control summary triggering:

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/session/summary"
)

// 1. Create a counter optimized for Chinese
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(1.6),  // Recommended value for Chinese scenarios
)

// 2. Set as global counter (affects all summary triggers)
summary.SetTokenCounter(counter)

// 3. Create summarizer
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithTokenThreshold(4000),  // Uses your custom counter for evaluation
)

7. Token Tailoring

Token Tailoring is an intelligent message management technique designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API. This feature is particularly useful for long conversation scenarios, allowing you to keep the message list within the model's token limits while preserving key context.

Automatic Mode (Recommended):

1
2
3
4
5
6
7
8
import (
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

// Enable token tailoring with automatic configuration
model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
)

Advanced Mode:

1
2
3
4
5
6
7
// Custom token limit and strategy
model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),               // Required: enable token tailoring
    openai.WithMaxInputTokens(10000),                    // Custom token limit
    openai.WithTokenCounter(customCounter),              // Optional: custom counter
    openai.WithTailoringStrategy(customStrategy),        // Optional: custom strategy
)

Token Calculation Formula:

The framework automatically calculates maxInputTokens based on the model's context window. For OpenAI-compatible models, the automatic budget also accounts for request-scoped output limits and tool definitions:

Context Window Registration

Token Tailoring and session summary WithContextThreshold both need a model context window. Built-in model names are resolved automatically. Token Tailoring uses a 128000-token fallback for unknown model names. For private deployments, tenant-provided models, endpoint IDs, or smaller unknown models, prefer model-instance configuration such as openai.WithContextWindow(32768) or the unified provider.WithContextWindow(32768). For one-off runs, use agent.WithModelContextWindow(32768). Use model.RegisterModelContextWindow("my-model", 32768) only when the name has a stable process-wide meaning. See the Session Summary documentation for a full example.

1
2
3
4
5
6
outputReserve = max(ReserveOutputTokens, request.MaxTokens, request.ThinkingTokens)
safetyMargin = contextWindow × SafetyMarginRatio
calculatedMax = contextWindow - outputReserve - ProtocolOverheadTokens - safetyMargin
ratioLimit = contextWindow × MaxInputTokensRatio
messageBudget = min(max(min(calculatedMax, ratioLimit), InputTokensFloor), calculatedMax)
maxInputTokens = max(messageBudget - estimatedToolsTokens, 0)

For example, "gpt-4o" (contextWindow = 128000):

1
2
3
4
5
safetyMargin = 128000 × 0.10 = 12800 tokens
outputReserve = max(2048, request.MaxTokens, request.ThinkingTokens)
calculatedMax = 128000 - outputReserve - 512 - 12800
ratioLimit = 128000 × 1.0 = 128000 tokens
maxInputTokens = messageBudget - estimatedToolsTokens

If WithMaxInputTokens is set explicitly, the framework keeps that value as the configured message budget and does not subtract the estimated Tools schema budget from it. The explicit value is still clamped to the context-safe hard budget after output reserves and safety margin are applied.

Default Budget Parameters:

The framework uses the following default values for token allocation (it is recommended to keep the defaults):

  • Protocol Overhead (ProtocolOverheadTokens): 512 tokens - reserved for request/response formatting
  • Output Reserve (ReserveOutputTokens): 2048 tokens - minimum reserve for output generation; OpenAI-compatible requests reserve the larger value among this setting, GenerationConfig.MaxTokens, and GenerationConfig.ThinkingTokens
  • Input Floor (InputTokensFloor): 1024 tokens - ensures proper model processing
  • Output Floor (OutputTokensFloor): deprecated and no longer used to auto-calculate output MaxTokens
  • Safety Margin Ratio (SafetyMarginRatio): 10% - buffer for token counting inaccuracies
  • Max Input Ratio (MaxInputTokensRatio): 100% - maximum input ratio of context window

Tailoring Strategy:

The framework provides a default tailoring strategy that preserves messages according to the following priorities:

  1. System Messages: Highest priority, always preserved
  2. Latest User Message: Ensures the current conversation turn is complete
  3. Tool Call Related Messages: Maintains tool call context integrity
  4. Historical Messages: Retains as much conversation history as possible based on remaining space

Custom Tailoring Strategy:

You can implement the TailoringStrategy interface to customize the trimming logic:

type CustomStrategy struct{}

func (s *CustomStrategy) Tailor(
    ctx context.Context,
    messages []model.Message,
    maxTokens int,
    counter tokencounter.Counter,
) ([]model.Message, error) {
    // Implement custom tailoring logic
    // e.g., keep only the most recent N conversation rounds
    return messages, nil
}

model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
    openai.WithTailoringStrategy(&CustomStrategy{}),
)

Advanced Configuration (Custom Budget Parameters):

If the default token allocation strategy does not meet your needs, you can customize the budget parameters using WithTokenTailoringConfig. Note: It is recommended to keep the default values unless you have specific requirements.

model := openai.New("deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
    openai.WithTokenTailoringConfig(&model.TokenTailoringConfig{
        ProtocolOverheadTokens: 1024,   // Custom protocol overhead
        ReserveOutputTokens:    4096,   // Custom output reserve
        InputTokensFloor:       2048,   // Custom input floor
        OutputTokensFloor:      512,    // Custom output floor
        SafetyMarginRatio:      0.15,   // Custom safety margin (15%)
        MaxInputTokensRatio:    0.90,   // Custom max input ratio (90%)
    }),
)

For Anthropic models, you can use the same configuration:

1
2
3
4
5
6
model := anthropic.New("claude-sonnet-4-0",
    anthropic.WithEnableTokenTailoring(true),
    anthropic.WithTokenTailoringConfig(&model.TokenTailoringConfig{
        SafetyMarginRatio: 0.15,  // Increase safety margin to 15%
    }),
)

7. Variant Optimization: Adapting to Platform-Specific Behaviors

The Variant mechanism is an important optimization in the Model module, used to handle platform-specific behavioral differences across OpenAI-compatible providers. By specifying different Variants, the framework can automatically adapt to API differences between platforms, especially for file upload, deletion, and processing logic.

7.1. Supported Variant Types

The framework currently supports the following Variants:

1. VariantOpenAI(default)

  • Standard OpenAI API-compatible behavior
  • File upload path:/openapi/v1/files
  • File purpose:user_data
  • File deletion Http method::DELETE

2. VariantHunyuan(hunyuan)

  • Tencent Hunyuan platform-specific adaptation
  • File upload path::/openapi/v1/files/uploads
  • File purpose:file-extract
  • File deletion Http Method:POST

3. VariantDeepSeek

  • DeepSeek platform adaptation
  • Default BaseURL:https://api.deepseek.com
  • API Key environment variable name:DEEPSEEK_API_KEY
  • DeepSeek-specific behavior is enabled when you explicitly set WithVariant(openai.VariantDeepSeek) or use the official DeepSeek API BaseURL
  • Other behaviors are consistent with standard OpenAI

4. VariantQwen(Qwen)

  • Qwen platform adaptation
  • Default BaseURL:https://dashscope.aliyuncs.com/compatible-mode/v1
  • API Key environment variable name:DASHSCOPE_API_KEY
  • Other behaviors are consistent with standard OpenAI
7.2. Usage

Usage Example

import "trpc.group/trpc-go/trpc-agent-go/model/openai"

// Use the Hunyuan platform
model := openai.New("hunyuan-model",
    openai.WithBaseURL("https://your-hunyuan-api.com"),
    openai.WithAPIKey("your-api-key"),
    openai.WithVariant(openai.VariantHunyuan), // Specify the Hunyuan variant
)

// Use the DeepSeek platform
model := openai.New("deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
    openai.WithAPIKey("your-api-key"),
    openai.WithVariant(openai.VariantDeepSeek), // Specify the DeepSeek variant
)
7.3. Behavioral Differences of Variants Examples

Message content handling differences

import "trpc.group/trpc-go/trpc-agent-go/model"

// For the Hunyuan platform, the file ID is placed in extraFields instead of content parts
message := model.Message{
    Role: model.RoleUser,
    ContentParts: []model.ContentPart{
        {
            Type: model.ContentTypeFile,
            File: &model.File{
                FileID: "file_123",
            },
        },
    },
}

Environment variable auto-configuration

For certain Variants, the framework supports reading configuration from environment variables automatically:

1
2
3
# DeepSeek
export DEEPSEEK_API_KEY="your-api-key"
# No need to call WithAPIKey explicitly; the framework reads it automatically
1
2
3
4
5
6
import "trpc.group/trpc-go/trpc-agent-go/model"

// DeepSeek
model := openai.New("deepseek-v4-flash",
    openai.WithVariant(openai.VariantDeepSeek), // Automatically reads DEEPSEEK_API_KEY
)

8. Streaming Tool Call Deltas: ShowToolCallDelta

By default, the OpenAI adapter suppresses raw tool_calls chunks in streaming responses. Tool calls are accumulated internally and only exposed once in the final aggregated response via Response.Choices[0].Message.ToolCalls. This keeps the stream clean for typical chat UIs that only render assistant text.

For advanced use cases (for example, when the model streams document content inside tool arguments and you need to display it incrementally), you can turn on raw tool call deltas with WithShowToolCallDelta:

1
2
3
4
llm := openai.New(
    "gpt-4.1",
    openai.WithShowToolCallDelta(true), // Forward tool_call deltas.
)

When WithShowToolCallDelta(true) is enabled:

  • Streaming chunks that contain tool_calls are no longer suppressed by the adapter.
  • Each chunk is converted into a partial model.Response where:
    • Response.IsPartial == true
    • Response.Choices[0].Delta.ToolCalls contains the provider’s raw tool_calls delta mapped to model.ToolCall:
      • Type comes from the provider type field (for example, "function").
      • Function.Name and Function.Arguments mirror the original tool name and JSON-encoded arguments string.
      • ID and Index preserve the tool call identity so callers can stitch fragments together.
  • The final aggregated response still exposes the merged tool calls in Response.Choices[0].Message.ToolCalls, so existing tool execution logic (for example, FunctionCallResponseProcessor) continues to work unchanged.

Typical integration pattern when this flag is enabled:

  1. Read Response.Choices[0].Delta.ToolCalls[*].Function.Arguments on each partial response.
  2. Group chunks by tool call ID and append the Arguments fragments in order.
  3. Once the accumulated string forms valid JSON, unmarshal it into your business struct (for example, { "content": "..." }) and use it for progressive UI rendering.

If you do not need to inspect tool arguments during streaming, keep WithShowToolCallDelta disabled to avoid handling partial JSON fragments and to preserve the default clean text-streaming behavior.

Anthropic Model

Anthropic Model is used to interface with Claude models and compatible platforms, supporting streaming output, thought modes and tool calls, and providing a rich callback mechanism, while also allowing for flexible configuration of custom HTTP headers.

Configuration Method

Environment Variable Method

export ANTHROPIC_API_KEY="your-api-key"
export ANTHROPIC_BASE_URL="https://api.anthropic.com" # Optional configuration, default is this BASE URL

Code Method

1
2
3
4
5
6
7
import "trpc.group/trpc-go/trpc-agent-go/model/anthropic"

m := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithAPIKey("your-api-key"),
    anthropic.WithBaseURL("https://api.anthropic.com"), // Optional configuration, default is this BASE URL
)

Using the Model Directly

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    // Create model instance
    llm := anthropic.New("claude-sonnet-4-0")
    // Build request
    temperature := 0.7
    maxTokens := 1000
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional AI assistant."),
            model.NewUserMessage("Introduce the concurrency features of Go language."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      false,
        },
    }
    // Call the model
    ctx := context.Background()
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }
    // Handle response
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("API error: %s\n", response.Error.Message)
            return
        }
        if len(response.Choices) > 0 {
            fmt.Printf("Reply: %s\n", response.Choices[0].Message.Content)
        }
        if response.Done {
            break
        }
    }
}

Streaming Output

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    // Create model instance
    llm := anthropic.New("claude-sonnet-4-0")
    // Streaming request configuration
    temperature := 0.7
    maxTokens := 1000
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a creative story storyteller."),
            model.NewUserMessage("Write a short story about a robot learning to paint."),
        },
        GenerationConfig: model.GenerationConfig{
            Temperature: &temperature,
            MaxTokens:   &maxTokens,
            Stream:      true,
        },
    }
    // Call the model
    ctx := context.Background()
    // Handle streaming response
    responseChan, err := llm.GenerateContent(ctx, request)
    if err != nil {
        fmt.Printf("System error: %v\n", err)
        return
    }
    for response := range responseChan {
        if response.Error != nil {
            fmt.Printf("Error: %s", response.Error.Message)
            return
        }
        if len(response.Choices) > 0 && response.Choices[0].Delta.Content != "" {
            fmt.Print(response.Choices[0].Delta.Content)
        }
        if response.Done {
            break
        }
    }
}

Advanced Parameter Configuration

// Using advanced generation parameters
temperature := 0.3
maxTokens := 2000
topP := 0.9
thinking := true
thinkingTokens := 2048

request := &model.Request{
    Messages: []model.Message{
        model.NewSystemMessage("You are a professional technical documentation writer."),
        model.NewUserMessage("Explain the pros and cons of microservices architecture."),
    },
    GenerationConfig: model.GenerationConfig{
        Temperature:     &temperature,
        MaxTokens:       &maxTokens,
        TopP:            &topP,
        ThinkingEnabled: &thinking,
        ThinkingTokens:  &thinkingTokens,
        Stream:          true,
    },
}

Multimodal Input

The Anthropic Model supports images and files in user messages:

  • Images: Provide an image URL or raw image data. Supported MIME types are image/jpeg, image/png, image/gif, and image/webp. Images are sent as Anthropic image blocks.
  • PDFs: Provide a URL that Claude can access or raw data, using application/pdf. PDFs are sent as Anthropic document blocks.
  • Text content: Use raw data only. text/plain is sent as an Anthropic document block; text-based MIME types such as text/csv or text/html are sent as Anthropic text blocks.
  • Other formats: Office documents, JSON, and other formats are not parsed automatically. Convert them to text or PDF first. Audio and FileID are not supported. Non-PDF file URLs are sent as URL text only.
import (
    "context"
    "os"

    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

func main() {
    llm := anthropic.New("claude-sonnet-4-6")
    imageData, _ := os.ReadFile("diagram.png")
    request := &model.Request{
        Messages: []model.Message{
            model.NewSystemMessage("You are a professional document and image analysis assistant."),
            {
                Role:    model.RoleUser,
                Content: "Summarize the key information from the image and PDF.",
                ContentParts: []model.ContentPart{
                    {
                        Type: model.ContentTypeImage,
                        Image: &model.Image{
                            Data:   imageData,
                            Format: "png",
                        },
                    },
                    {
                        Type: model.ContentTypeFile,
                        File: &model.File{
                            Name:     "report.pdf",
                            URL:      "https://example.com/report.pdf",
                            MimeType: "application/pdf",
                        },
                    },
                },
            },
        },
    }
    _, _ = llm.GenerateContent(context.Background(), request)
}

Advanced features

1. Callback Functions

import (
    anthropicsdk "github.com/anthropics/anthropic-sdk-go"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

model := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithChatRequestCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams) {
        // Log the request before sending.
        log.Printf("sending request: model=%s, messages=%d.", req.Model, len(req.Messages))
    }),
    anthropic.WithChatResponseCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, resp *anthropicsdk.Message) {
        // Log details of the non-streaming response.
        log.Printf("received response: id=%s, input_tokens=%d, output_tokens=%d.", resp.ID, resp.Usage.InputTokens, resp.Usage.OutputTokens)
    }),
    anthropic.WithChatChunkCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, chunk *anthropicsdk.MessageStreamEventUnion) {
        // Log the type of the streaming event.
        log.Printf("stream event: %T.", chunk.AsAny())
    }),
    anthropic.WithChatStreamCompleteCallback(func(ctx context.Context, req *anthropicsdk.MessageNewParams, acc *anthropicsdk.Message, streamErr error) {
        // Log stream completion or error.
        if streamErr != nil {
            log.Printf("stream failed: %v.", streamErr)
            return
        }
        log.Printf("stream completed: finish_reason=%s, input_tokens=%d, output_tokens=%d.", acc.StopReason, acc.Usage.InputTokens, acc.Usage.OutputTokens)
    }),
)

2. Model Switching

Model switching allows dynamically changing the LLM model used by an Agent at runtime. This section shows static switching for Anthropic model instances at the agent level and per-request level. To select a model separately for each LLM call within the same runner.Run(...), use ModelSelector.

Agent-level Switching

Agent-level switching changes the Agent's default model, affecting all subsequent requests.

Approach 1: Direct Model Instance

Set the model directly by passing a model instance to SetModel:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Create Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")),
)

// Switch to another model.
agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))

Use Cases:

1
2
3
4
5
6
// Select model based on task complexity.
if isComplexTask {
    agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))  // Use powerful model.
} else {
    agent.SetModel(anthropic.New("claude-3-5-haiku-20241022"))  // Use fast model.
}
Approach 2: Switch by Name

Pre-register multiple models with WithModels, then switch by name using SetModelByName:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Create multiple model instances.
sonnet := anthropic.New("claude-3-5-sonnet-20241022")
haiku := anthropic.New("claude-3-5-haiku-20241022")

// Register all models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": sonnet,
        "fast":  haiku,
    }),
    llmagent.WithModel(haiku), // Specify initial model.
    llmagent.WithInstruction("You are an intelligent assistant."),
)

// Switch models by name at runtime.
err := agent.SetModelByName("smart")
if err != nil {
    log.Fatal(err)
}

// Switch to another model.
err = agent.SetModelByName("fast")
if err != nil {
    log.Fatal(err)
}

Use Cases:

// Select model based on user tier.
modelName := "fast" // Default to fast model.
if user.IsPremium() {
    modelName = "smart" // Premium users get advanced model.
}
if err := agent.SetModelByName(modelName); err != nil {
    log.Printf("Failed to switch model: %v", err)
}

// Select model based on time of day (cost optimization).
hour := time.Now().Hour()
if hour >= 22 || hour < 8 {
    // Use fast model at night.
    agent.SetModelByName("fast")
} else {
    // Use smart model during the day.
    agent.SetModelByName("smart")
}
Per-request Switching

Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.

Approach 1: Using WithModel Option

Use agent.WithModel to specify a model instance for a single request:

1
2
3
4
5
6
7
8
9
import (
    "trpc.group/trpc-go/trpc-agent-go/agent"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Use a specific model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModel(anthropic.New("claude-3-5-sonnet-20241022")),
)

Use agent.WithModelName to specify a pre-registered model name for a single request:

// Pre-register multiple models when creating the Agent.
agent := llmagent.New("my-agent",
    llmagent.WithModels(map[string]model.Model{
        "smart": anthropic.New("claude-3-5-sonnet-20241022"),
        "fast":  anthropic.New("claude-3-5-haiku-20241022"),
    }),
    llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")), // Default model.
)

runner := runner.NewRunner("app", agent)

// Temporarily use "smart" model for this request only.
eventChan, err := runner.Run(ctx, userID, sessionID, message,
    agent.WithModelName("smart"),
)

// Next request still uses the default model "claude-3-5-haiku-20241022".
eventChan2, err := runner.Run(ctx, userID, sessionID, message2)

Use Cases:

// Dynamically select model based on message complexity.
var opts []agent.RunOption
if isComplexQuery(message) {
    opts = append(opts, agent.WithModelName("smart")) // Use powerful model for complex queries.
}

eventChan, err := runner.Run(ctx, userID, sessionID, message, opts...)

// Use specialized model for specific tasks.
eventChan, err := runner.Run(ctx, userID, sessionID, visionMessage,
    agent.WithModelName("vision"),
)
Configuration Details

WithModels Option:

  • Accepts a map[string]model.Model where key is the model name and value is the model instance
  • If both WithModel and WithModels are set, WithModel specifies the initial model
  • If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
  • Reserved name: __default__ is used internally by the framework and should not be used

SetModelByName Method:

  • Parameter: model name (string)
  • Returns: error if the model name is not found
  • The model must be pre-registered via WithModels

Per-request Options:

  • agent.RunOptions.Model: Directly specify a model instance
  • agent.RunOptions.ModelName: Specify a pre-registered model name
  • Priority: Model > ModelName > Agent default model
  • If the model specified by ModelName is not found, it falls back to the Agent's default model
Agent-level vs Per-request Comparison
Feature Agent-level Switching Per-request Switching
Scope All subsequent requests Current request only
Usage SetModel/SetModelByName RunOptions.Model/ModelName
State Change Changes Agent default model Does not change Agent state
Use Case Global strategy adjustment Specific task temporary needs
Concurrency Affects all concurrent reqs Does not affect other requests
Typical Examples User tier, time-based policy Complex queries, reasoning
Agent-level Approach Comparison
Feature SetModel SetModelByName
Usage Pass model instance Pass model name
Pre-registration Not required Required via WithModels
Error Handling None Returns error
Use Case Simple switching Complex scenarios, multi-model management
Code Maintenance Need to hold model instances Only need to remember names
Important Notes

Agent-level Switching:

  • Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
  • Session Persistence: Switching models does not clear session history
  • Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
  • Concurrency Safe: Both switching approaches are concurrency-safe

Per-request Switching:

  • Temporary Override: Only affects the current request, does not change the Agent's default model
  • Higher Priority: Per-request model settings take precedence over the Agent's default model
  • No Side Effects: Does not affect other concurrent requests or subsequent requests
  • Flexible Combination: Can be used in combination with agent-level switching

Model-specific Prompts (LLMAgent):

  • Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
  • For a runnable example, see examples/model/promptmap.
Usage Example

For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.

3. Custom HTTP Headers

In environments like gateways, proprietary platforms, or proxy setups, model API requests often require additional HTTP headers (e.g., organization/tenant identifiers, grayscale routing, custom authentication, etc.). The Model module provides three reliable ways to add headers for "all model requests," including standard requests, streaming, file uploads, batch processing, etc.

Recommended order:

  • Global header via anthropic.WithHeaders (simplest for static headers)
  • Use Anthropic RequestOption to set global headers (flexible, middleware-friendly)
  • Use a custom http.RoundTripper injection (advanced, more cross-cutting capabilities)

All methods affect streaming requests, as they use the same underlying client.

1. Using anthropic.WithHeaders for headers
1
2
3
4
5
6
7
8
import "trpc.group/trpc-go/trpc-agent-go/model/anthropic"

llm := anthropic.New("claude-sonnet-4-0",
    anthropic.WithHeaders(map[string]string{
        "X-Custom-Header": "custom-value",
        "X-Request-ID":    "req-123",
    }),
)
2. Using Anthropic RequestOption to Set Global Headers

By using WithAnthropicClientOptions combined with anthropicopt.WithHeader or anthropicopt.WithMiddleware, you can inject headers into every request made by the underlying Anthropic client.

import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

llm := anthropic.New("claude-sonnet-4-0",
    // If your platform requires additional headers
    anthropic.WithAnthropicClientOptions(
        anthropicopt.WithHeader("X-Custom-Header", "custom-value"),
        anthropicopt.WithHeader("X-Request-ID", "req-123"),
        // You can also set User-Agent or vendor-specific headers
        anthropicopt.WithHeader("User-Agent", "trpc-agent-go/1.0"),
    ),
)

If you need to set headers conditionally (e.g., only for certain paths or depending on context values), you can use middleware:

import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

    llm := anthropic.New("claude-sonnet-4-0",
        anthropic.WithAnthropicClientOptions(
            anthropicopt.WithMiddleware(
            func(r *http.Request, next anthropicopt.MiddlewareNext) (*http.Response, error) {
                // Example: Set "per-request" headers based on context value
                if v := r.Context().Value("x-request-id"); v != nil {
                    if s, ok := v.(string); ok && s != "" {
                        r.Header.Set("X-Request-ID", s)
                    }
                }
                // Or only for the "message completion" endpoint
                if strings.Contains(r.URL.Path, "v1/messages") {
                    r.Header.Set("X-Feature-Flag", "on")
                }
                return next(r)
            },
        ),
        ),
    )
    ```

##### 3. Using Custom `http.RoundTripper`

For injecting headers at the HTTP transport layer, ideal for scenarios requiring proxying, TLS, custom monitoring, and other capabilities.

```go
import (
    anthropicopt "github.com/anthropics/anthropic-sdk-go/option"
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

type headerRoundTripper struct{ base http.RoundTripper }

func (rt headerRoundTripper) RoundTrip(req *http.Request) (*http.Response, error) {
    // Add or override headers
    req.Header.Set("X-Custom-Header", "custom-value")
    req.Header.Set("X-Trace-ID", "trace-xyz")
    return rt.base.RoundTrip(req)
}

llm := anthropic.New("claude-sonnet-4-0",
    anthropic.WithHTTPClientOptions(
        anthropic.WithHTTPClientTransport(headerRoundTripper{base: http.DefaultTransport}),
    ),
)

Regarding "per-request" headers:

  • The Agent/Runner will propagate ctx to the model call; middleware can read the value from req.Context() to inject headers for "this call."
  • For message completion, the current API doesn't expose per-call BaseURL overrides; if switching is needed, create a model using a different BaseURL or modify the r.URL in middleware.

4. Token Tailoring

Anthropic models also support Token Tailoring functionality, designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API.

Automatic Mode (Recommended):

1
2
3
4
5
6
7
8
import (
    "trpc.group/trpc-go/trpc-agent-go/model/anthropic"
)

// Enable token tailoring with automatic configuration
model := anthropic.New("claude-3-5-sonnet",
    anthropic.WithEnableTokenTailoring(true),
)

Advanced Mode:

1
2
3
4
5
6
7
// Custom token limit and strategy
model := anthropic.New("claude-3-5-sonnet",
    anthropic.WithEnableTokenTailoring(true),               // Required: enable token tailoring
    anthropic.WithMaxInputTokens(10000),                    // Custom token limit
    anthropic.WithTokenCounter(customCounter),              // Optional: custom counter
    anthropic.WithTailoringStrategy(customStrategy),        // Optional: custom strategy
)

For detailed explanations of the token calculation formula, tailoring strategy, and custom strategy implementation, please refer to Token Tailoring under OpenAI Model.

5. Streaming Tool Call Deltas: ShowToolCallDelta

By default, the Anthropic adapter accumulates tool-call arguments internally and returns the complete arguments once the model finishes the tool call, through Response.Choices[0].Message.ToolCalls in the final response.

WithShowToolCallDelta exposes Anthropic input_json_delta chunks during streaming. Enable this option when tool arguments are long, or when the frontend needs to display argument generation progress:

1
2
3
4
llm := anthropic.New(
    "claude-sonnet-4-0",
    anthropic.WithShowToolCallDelta(true),
)

When WithShowToolCallDelta(true) is enabled:

  • Streaming responses may include Response.Choices[0].Delta.ToolCalls.
  • Delta.ToolCalls[*].Function.Arguments is the newly produced argument string fragment, and is usually not complete JSON.
  • Delta.ToolCalls[*].ID and Index can be used to join fragments for the same tool call.
  • The final response still returns the complete tool call in Response.Choices[0].Message.ToolCalls, so existing tool execution logic can continue to use the final response unchanged.

Typical handling pattern:

  1. Read Response.Choices[0].Delta.ToolCalls[*].Function.Arguments on each partial response.
  2. Group chunks by tool call ID or Index, and append the Arguments fragments in order.
  3. Treat the accumulated value as the tool argument JSON. For progressive rendering, render the string as it grows and unmarshal it only after it becomes valid JSON.

Provider

With the emergence of multiple large model providers, some have defined their own API specifications. Currently, the framework has integrated the APIs of OpenAI and Anthropic, and exposes them as models. Users can access different provider models through openai.New and anthropic.New.

However, there are differences in instantiation and configuration between providers, which often requires developers to modify a significant amount of code when switching between providers, increasing the cost of switching.

To solve this problem, the Provider offers a unified model instantiation entry point. Developers only need to specify the provider and model name, and other configuration options are managed through the unified Option, simplifying the complexity of switching between providers.

The Provider supports the following Option:

Option Description
WithAPIKey / WithBaseURL Set the API Key and Base URL for the model
WithHTTPClientName / WithHTTPClientTransport Configure HTTP client properties
WithHeaders Append static HTTP headers across requests
WithChannelBufferSize Adjust the response channel buffer size
WithCallbacks Configure OpenAI / Anthropic request, response, and streaming callbacks
WithExtraFields Configure custom fields in the request body
WithEnableTokenTailoring / WithMaxInputTokens
WithTokenCounter / WithTailoringStrategy
Token trimming related parameters
WithTokenTailoringConfig Custom token tailoring budget parameters for advanced configuration
WithOpenAIOption / WithAnthropicOption Pass-through native options for the respective providers

Usage Example

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model/provider"
)

providerName := "openai"        // provider supports openai and anthropic.
modelName := "deepseek-v4-flash"

modelInstance, err := provider.Model(
    providerName,
    modelName,
    provider.WithAPIKey(c.apiKey),
    provider.WithBaseURL(c.baseURL),
    provider.WithChannelBufferSize(c.channelBufferSize),
    provider.WithEnableTokenTailoring(c.tokenTailoring),
    provider.WithMaxInputTokens(c.maxInputTokens),
)

agent := llmagent.New("chat-assistant", llmagent.WithModel(modelInstance))

Advanced Configuration with TokenTailoringConfig:

For advanced users who need to fine-tune token allocation strategy, you can use WithTokenTailoringConfig:

import (
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/provider"
)

// Custom token tailoring budget parameters for all providers
config := &model.TokenTailoringConfig{
    ProtocolOverheadTokens: 1024,
    ReserveOutputTokens:    4096,
    SafetyMarginRatio:      0.15,
}

modelInstance, err := provider.Model(
    "openai",
    "deepseek-v4-flash",
    provider.WithAPIKey(c.apiKey),
    provider.WithEnableTokenTailoring(true),
    provider.WithTokenTailoringConfig(config),
)

Full code can be found in examples/provider.

Registering a Custom Provider

The framework supports registering custom providers to integrate other large model providers or custom model implementations.

Using provider.Register, you can define a method to create custom model instances based on the options.

1
2
3
4
5
6
7
import "trpc.group/trpc-go/trpc-agent-go/model/provider"

provider.Register("custom-provider", func(opts *provider.Options) (model.Model, error) {
    return newCustomModel(opts.ModelName, WithAPIKey(opts.APIKey)), nil
})

customModel, err := provider.Model("custom-provider", "custom-model")

Model Failover

model/failover provides a priority-ordered fallback wrapper for models. It can compose multiple model.Model instances into a primary/backup chain and automatically switch to the next candidate when the current one is unavailable. This is useful for scenarios such as primary/backup domains, gateways, or model providers.

Quick Start:

import (
    "trpc.group/trpc-go/trpc-agent-go/model/failover"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

primary := openai.New(
    "gpt-4o-mini",
    openai.WithBaseURL("https://api.openai.com/v1"),
)
backup := openai.New(
    "deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
)

llm, err := failover.New(
    failover.WithCandidates(primary, backup),
)
if err != nil {
    return err
}

failover.New(...) returns a regular model.Model, so it can be passed directly to places that accept model.Model, such as llmagent.WithModel(...). For a complete example, see examples/model/failover.

Switching Rules:

  • Candidates are tried in the order provided to WithCandidates(...).
  • Switching is allowed only before the first non-error chunk is received.
  • If the current candidate returns an error directly before that point, or returns an error response with Response.Error, the wrapper continues with the next candidate.
  • Once any non-error chunk has been returned to the caller, later streaming failures are surfaced directly instead of replaying the request on the backup candidate.

Model Hedge

model/hedge provides a wrapper for hedging the time to first meaningful response. It starts the primary candidate immediately, then launches later candidates according to the configured hedge delay. If all currently active candidates have already failed, it launches the next candidate early. The first candidate that produces a meaningful response becomes the winner, and the remaining candidates are canceled.

Quick Start:

import (
    "trpc.group/trpc-go/trpc-agent-go/model/hedge"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

primary := openai.New(
    "gpt-4o-mini",
    openai.WithBaseURL("https://api.openai.com/v1"),
)
backup := openai.New(
    "deepseek-v4-flash",
    openai.WithBaseURL("https://api.deepseek.com/v1"),
)

llm, err := hedge.New(
    hedge.WithName("chat-hedge"),
    hedge.WithCandidates(primary, backup),
)
if err != nil {
    return err
}

hedge.New(...) also returns a regular model.Model, so it can be passed directly to places that accept model.Model, such as llmagent.WithModel(...). This quick start uses the package default delay. Use WithDelay(...) or WithDelays(...) when you need explicit launch scheduling. If the hedge wrapper is used with context-threshold summary or token tailoring and the candidates have different or unknown context windows, set a stable wrapper window with hedge.WithContextWindow(...); otherwise the wrapper reports the shared candidate window only when all candidates expose the same positive value. For a complete example, see examples/model/hedge.

Scheduling And Commit Rules:

  • The first candidate always starts immediately when the request begins.
  • WithDelay(...) defines one fixed interval between successive hedge launches, while WithDelays(...) can specify explicit launch offsets for candidates 1..n.
  • If all currently active candidates have already failed and there are still candidates that have not started, the wrapper launches the next candidate immediately instead of waiting for the timer.
  • Only the first candidate that starts producing meaningful content becomes the winner. Empty or metadata-only responses do not win.
  • Once a winner has been committed, the other candidates are canceled and only the winner's stream is forwarded to the caller.

Configuration Examples:

The following snippets show only the scheduling differences.

1
2
3
4
llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelay(100*time.Millisecond),
)

This configuration means:

  • candidate[0] starts at 0ms.
  • candidate[1] starts at 100ms.
  • candidate[2] starts at 200ms.
1
2
3
4
llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelays(80*time.Millisecond, 250*time.Millisecond),
)

This configuration means:

  • candidate[0] starts at 0ms.
  • candidate[1] starts at 80ms.
  • candidate[2] starts at 250ms.

WithDelays(...) uses absolute launch offsets from request start, not incremental gaps relative to the previous candidate.

1
2
3
4
llm, err := hedge.New(
    hedge.WithCandidates(primary, backupA, backupB),
    hedge.WithDelays(0, 0),
)

This configuration means:

  • candidate[0] starts at 0ms.
  • candidate[1] starts at 0ms.
  • candidate[2] starts at 0ms.

This is the all-at-once case where every candidate launches immediately when the request begins. In fixed-interval form, the same setup can be written as WithDelay(0).

ModelSelector

ModelSelector dynamically selects a model for each framework-managed LLM call within the same runner.Run(...).

selectModel := func(ctx context.Context, inv *agent.Invocation) (model.Model, error) {
    if shouldUseAnotherModel(inv) {
        return anotherModel, nil
    }
    return nil, nil
}

events, err := r.Run(
    ctx,
    userID,
    sessionID,
    message,
    agent.WithModelSelector(selectModel),
)

If an LLMAgent has its own fixed model selection policy, configure it when creating the agent:

1
2
3
4
5
a := llmagent.New(
    "assistant",
    llmagent.WithModel(defaultModel),
    llmagent.WithModelSelector(selectModel),
)

Notes:

  • When both are configured, agent.WithModelSelector(...) takes precedence over llmagent.WithModelSelector(...).
  • If selector returns nil, nil, the model is not switched and the current inv.Model is kept; returning an error terminates the current LLM call.

For a complete example, see examples/model/selector.