The Model module is the large language model abstraction layer of the tRPC-Agent-Go framework, providing a unified LLM interface design that currently supports OpenAI-compatible and Anthropic-compatible API calls. Through standardized interface design, developers can flexibly switch between different model providers, achieving seamless model integration and invocation. This module has been verified to be compatible with most OpenAI-like interfaces both inside and outside the company.
The Model module has the following core features:
Unified Interface Abstraction: Provides standardized Model interface, shielding differences between model providers
Streaming Response Support: Native support for streaming output, enabling real-time interactive experience
Multimodal Capabilities: Supports text, image, audio, and other multimodal content processing
Complete Error Handling: Provides dual-layer error handling mechanism, distinguishing between system errors and API errors
Extensible Configuration: Supports rich custom configuration options to meet different scenario requirements
# Basic usage: Configure through environment variables, run directly.cdexamples/runner
exportOPENAI_BASE_URL="https://api.deepseek.com/v1"exportOPENAI_API_KEY="your-api-key"gorunmain.go-modeldeepseek-chat
Platform Integration Configuration
All platform integration methods follow the same pattern, only requiring configuration of different environment variables or direct setting in code:
model:=openai.New("Model name",openai.WithBaseURL("Platform API address"),openai.WithAPIKey("API key"),)
Supported Platforms and Their Configuration
The following are configuration examples for each platform, divided into environment variable configuration and code configuration methods:
Environment Variable Configuration
The runner example supports specifying model names through command line parameters (-model), which is actually passing the model name when calling openai.New().
model:=openai.New("deepseek-chat",openai.WithBaseURL("https://api.deepseek.com/v1"),openai.WithAPIKey("your-api-key"),)// Other platform configurations are similar, only need to modify model name, BaseURL and APIKey, no additional fields needed.
// Model is the interface that all language models must implement.typeModelinterface{// Generate content, supports streaming response.GenerateContent(ctxcontext.Context,request*Request)(<-chan*Response,error)// Return basic model information.Info()Info}// Model information structure.typeInfostruct{Namestring// Model name.}
// Request represents the request sent to the model.typeRequeststruct{// Message list, containing system instructions, user input and assistant replies.Messages[]Message`json:"messages"`// Generation configuration (inlined into request).GenerationConfig`json:",inline"`// Tool list.Toolsmap[string]tool.Tool`json:"-"`}// GenerationConfig contains generation parameter configuration.typeGenerationConfigstruct{// Whether to use streaming response.Streambool`json:"stream"`// Temperature parameter (0.0-2.0).Temperature*float64`json:"temperature,omitempty"`// Maximum generation token count.MaxTokens*int`json:"max_tokens,omitempty"`// Top-P sampling parameter.TopP*float64`json:"top_p,omitempty"`// Stop generation markers.Stop[]string`json:"stop,omitempty"`// Frequency penalty.FrequencyPenalty*float64`json:"frequency_penalty,omitempty"`// Presence penalty.PresencePenalty*float64`json:"presence_penalty,omitempty"`// Reasoning effort level ("low", "medium", "high").ReasoningEffort*string`json:"reasoning_effort,omitempty"`// Whether to enable thinking mode.ThinkingEnabled*bool`json:"-"`// Maximum token count for thinking mode.ThinkingTokens*int`json:"-"`}
// Response represents the response returned by the model.typeResponsestruct{// OpenAI compatible fields.IDstring`json:"id,omitempty"`Objectstring`json:"object,omitempty"`Createdint64`json:"created,omitempty"`Modelstring`json:"model,omitempty"`SystemFingerprint*string`json:"system_fingerprint,omitempty"`Choices[]Choice`json:"choices,omitempty"`Usage*Usage`json:"usage,omitempty"`// Error information.Error*ResponseError`json:"error,omitempty"`// Internal fields.Timestamptime.Time`json:"-"`Donebool`json:"-"`IsPartialbool`json:"-"`}// ResponseError represents API-level errors.typeResponseErrorstruct{Messagestring`json:"message"`TypeErrorType`json:"type"`Paramstring`json:"param,omitempty"`Codestring`json:"code,omitempty"`}
OpenAI Model
Model Name Parameter
When creating an OpenAI model instance using openai.New(name string, opts ...Option), the first parameter is the actual model name that gets sent to the OpenAI API, as the specific model identifier that tells the API which language model to use.
Since the framework supports different models compatible with the OpenAI API, you can obtain the base URL, API key, and model name from various model providers:
1. OpenAI Official
Base URL: https://api.openai.com/v1
Model Names: gpt-4o, gpt-4o-mini, etc.
2. DeepSeek
Base URL: https://api.deepseek.com
Model Names: deepseek-chat, deepseek-reasoner
3. Tencent Hunyuan
Base URL: https://api.hunyuan.cloud.tencent.com/v1
Model Names: hunyuan-2.0-thinking-20251109, hunyuan-2.0-instruct-20251111, etc.
4. Other Providers
Qwen: Base URL https://dashscope.aliyuncs.com/compatible-mode/v1, Model Names: various qwen models
The OpenAI Model is used to interface with OpenAI and its compatible platforms. It supports streaming output, multimodal and advanced parameter configuration, and provides rich callback mechanisms, batch processing and retry capabilities. It also allows for flexible setting of custom HTTP headers.
import"trpc.group/trpc-go/trpc-agent-go/model/openai"m:=openai.New("gpt-4o",openai.WithAPIKey("your-api-key"),openai.WithBaseURL("https://api.openai.com"),// Optional configuration, default is this BASE URL)
import("context""fmt""trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/openai")funcmain(){// Create model instance.llm:=openai.New("deepseek-chat")// Build request.temperature:=0.7maxTokens:=1000request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a professional AI assistant."),model.NewUserMessage("Introduce Go language's concurrency features."),},GenerationConfig:model.GenerationConfig{Temperature:&temperature,MaxTokens:&maxTokens,Stream:false,},}// Call model.ctx:=context.Background()responseChan,err:=llm.GenerateContent(ctx,request)iferr!=nil{fmt.Printf("System error: %v\n",err)return}// Handle response.forresponse:=rangeresponseChan{ifresponse.Error!=nil{fmt.Printf("API error: %s\n",response.Error.Message)return}iflen(response.Choices)>0{fmt.Printf("Reply: %s\n",response.Choices[0].Message.Content)}ifresponse.Done{break}}}
// Streaming request configuration.request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a creative story teller."),model.NewUserMessage("Write a short story about a robot learning to paint."),},GenerationConfig:model.GenerationConfig{Stream:true,// Enable streaming output.},}// Handle streaming response.responseChan,err:=llm.GenerateContent(ctx,request)iferr!=nil{returnerr}forresponse:=rangeresponseChan{ifresponse.Error!=nil{fmt.Printf("Error: %s",response.Error.Message)return}iflen(response.Choices)>0&&response.Choices[0].Delta.Content!=""{fmt.Print(response.Choices[0].Delta.Content)}ifresponse.Done{break}}
// Use advanced generation parameters.temperature:=0.3maxTokens:=2000topP:=0.9presencePenalty:=0.2frequencyPenalty:=0.5reasoningEffort:="high"request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a professional technical documentation writer."),model.NewUserMessage("Explain the advantages and disadvantages of microservice architecture."),},GenerationConfig:model.GenerationConfig{Temperature:&temperature,MaxTokens:&maxTokens,TopP:&topP,PresencePenalty:&presencePenalty,FrequencyPenalty:&frequencyPenalty,ReasoningEffort:&reasoningEffort,Stream:true,},}
// Read image file.imageData,_:=os.ReadFile("image.jpg")// Create multimodal message.request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are an image analysis expert."),{Role:model.RoleUser,ContentParts:[]model.ContentPart{{Type:model.ContentTypeText,Text:stringPtr("What's in this image?"),},{Type:model.ContentTypeImage,Image:&model.Image{Data:imageData,Format:"jpeg",},},},},},}
// Set pre-request callback function.model:=openai.New("deepseek-chat",openai.WithChatRequestCallback(func(ctxcontext.Context,req*openai.ChatCompletionNewParams){// Called before request is sent.log.Printf("Sending request: model=%s, message count=%d",req.Model,len(req.Messages))}),// Set response callback function (non-streaming).openai.WithChatResponseCallback(func(ctxcontext.Context,req*openai.ChatCompletionNewParams,resp*openai.ChatCompletion){// Called when complete response is received.log.Printf("Received response: ID=%s, tokens used=%d",resp.ID,resp.Usage.TotalTokens)}),// Set streaming response callback function.openai.WithChatChunkCallback(func(ctxcontext.Context,req*openai.ChatCompletionNewParams,chunk*openai.ChatCompletionChunk){// Called when each streaming response chunk is received.log.Printf("Received streaming chunk: ID=%s",chunk.ID)}),// Set streaming completion callback function.openai.WithChatStreamCompleteCallback(func(ctxcontext.Context,req*openai.ChatCompletionNewParams,acc*openai.ChatCompletionAccumulator,streamErrerror){// Called when streaming is completely finished (success or error).ifstreamErr!=nil{log.Printf("Streaming failed: %v",streamErr)}else{log.Printf("Streaming completed: reason=%s",acc.Choices[0].FinishReason)}}),)
2. Model Switching
Model switching allows dynamically changing the LLM model used by an Agent at runtime. The framework provides two approaches: agent-level switching (affects all subsequent requests) and per-request switching (affects only a single request).
Agent-level Switching
Agent-level switching changes the Agent's default model, affecting all subsequent requests.
Approach 1: Direct Model Instance
Set the model directly by passing a model instance to SetModel:
import("trpc.group/trpc-go/trpc-agent-go/agent/llmagent""trpc.group/trpc-go/trpc-agent-go/model/openai")// Create Agent.agent:=llmagent.New("my-agent",llmagent.WithModel(openai.New("gpt-4o-mini")),)// Switch to another model.agent.SetModel(openai.New("gpt-4o"))
// Select model based on task complexity.ifisComplexTask{agent.SetModel(openai.New("gpt-4o"))// Use powerful model.}else{agent.SetModel(openai.New("gpt-4o-mini"))// Use fast model.}
Approach 2: Switch by Name
Pre-register multiple models with WithModels, then switch by name using SetModelByName:
import("trpc.group/trpc-go/trpc-agent-go/agent/llmagent""trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/openai")// Create multiple model instances.gpt4:=openai.New("gpt-4o")gpt4mini:=openai.New("gpt-4o-mini")deepseek:=openai.New("deepseek-chat")// Register all models when creating the Agent.agent:=llmagent.New("my-agent",llmagent.WithModels(map[string]model.Model{"smart":gpt4,"fast":gpt4mini,"cheap":deepseek,}),llmagent.WithModel(gpt4mini),// Specify initial model.llmagent.WithInstruction("You are an intelligent assistant."),)// Switch models by name at runtime.err:=agent.SetModelByName("smart")iferr!=nil{log.Fatal(err)}// Switch to another model.err=agent.SetModelByName("cheap")iferr!=nil{log.Fatal(err)}
// Select model based on user tier.modelName:="fast"// Default to fast model.ifuser.IsPremium(){modelName="smart"// Premium users get advanced model.}iferr:=agent.SetModelByName(modelName);err!=nil{log.Printf("Failed to switch model: %v",err)}// Select model based on time of day (cost optimization).hour:=time.Now().Hour()ifhour>=22||hour<8{// Use cheap model at night.agent.SetModelByName("cheap")}else{// Use fast model during the day.agent.SetModelByName("fast")}
Per-request Switching
Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.
Approach 1: Using WithModel Option
Use agent.WithModel to specify a model instance for a single request:
import("trpc.group/trpc-go/trpc-agent-go/agent""trpc.group/trpc-go/trpc-agent-go/model/openai")// Use a specific model for this request only.eventChan,err:=runner.Run(ctx,userID,sessionID,message,agent.WithModel(openai.New("gpt-4o")),)
Approach 2: Using WithModelName Option (Recommended)
Use agent.WithModelName to specify a pre-registered model name for a single request:
// Pre-register multiple models when creating the Agent.agent:=llmagent.New("my-agent",llmagent.WithModels(map[string]model.Model{"smart":openai.New("gpt-4o"),"fast":openai.New("gpt-4o-mini"),"cheap":openai.New("deepseek-chat"),}),llmagent.WithModel(openai.New("gpt-4o-mini")),// Default model.)runner:=runner.NewRunner("app",agent)// Temporarily use "smart" model for this request only.eventChan,err:=runner.Run(ctx,userID,sessionID,message,agent.WithModelName("smart"),)// Next request still uses the default model "gpt-4o-mini".eventChan2,err:=runner.Run(ctx,userID,sessionID,message2)
// Dynamically select model based on message complexity.varopts[]agent.RunOptionifisComplexQuery(message){opts=append(opts,agent.WithModelName("smart"))// Use powerful model for complex queries.}eventChan,err:=runner.Run(ctx,userID,sessionID,message,opts...)// Use specialized reasoning model for reasoning tasks.eventChan,err:=runner.Run(ctx,userID,sessionID,reasoningMessage,agent.WithModelName("deepseek-reasoner"),)
Configuration Details
WithModels Option:
Accepts a map[string]model.Model where key is the model name and value is the model instance
If both WithModel and WithModels are set, WithModel specifies the initial model
If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
Reserved name: __default__ is used internally by the framework and should not be used
SetModelByName Method:
Parameter: model name (string)
Returns: error if the model name is not found
The model must be pre-registered via WithModels
Per-request Options:
agent.RunOptions.Model: Directly specify a model instance
agent.RunOptions.ModelName: Specify a pre-registered model name
agent.RunOptions.Stream: Override whether responses are streamed (use agent.WithStream(...))
agent.RunOptions.Instruction: Override instruction for this request only (use agent.WithInstruction(...))
agent.RunOptions.GlobalInstruction: Override global instruction (system prompt) for this request only (use agent.WithGlobalInstruction(...))
Priority: Model > ModelName > Agent default model
If the model specified by ModelName is not found, it falls back to the Agent's default model
You can set streaming per request using agent.WithStream(true) or
agent.WithStream(false).
Agent-level vs Per-request Comparison
Feature
Agent-level Switching
Per-request Switching
Scope
All subsequent requests
Current request only
Usage
SetModel/SetModelByName
RunOptions.Model/ModelName
State Change
Changes Agent default model
Does not change Agent state
Use Case
Global strategy adjustment
Specific task temporary needs
Concurrency
Affects all concurrent reqs
Does not affect other requests
Typical Examples
User tier, time-based policy
Complex queries, reasoning
Agent-level Approach Comparison
Feature
SetModel
SetModelByName
Usage
Pass model instance
Pass model name
Pre-registration
Not required
Required via WithModels
Error Handling
None
Returns error
Use Case
Simple switching
Complex scenarios, multi-model management
Code Maintenance
Need to hold model instances
Only need to remember names
Important Notes
Agent-level Switching:
Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
Session Persistence: Switching models does not clear session history
Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
Concurrency Safe: Both switching approaches are concurrency-safe
Per-request Switching:
Temporary Override: Only affects the current request, does not change the Agent's default model
Higher Priority: Per-request model settings take precedence over the Agent's default model
No Side Effects: Does not affect other concurrent requests or subsequent requests
Flexible Combination: Can be used in combination with agent-level switching
Model-specific Prompts (LLMAgent):
Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.
3. Batch Processing (Batch API)
Batch API is an asynchronous batch processing technique for efficiently handling large volumes of requests. This feature is particularly suitable for scenarios requiring large-scale data processing, significantly reducing costs and improving processing efficiency.
Core Features
Asynchronous Processing: Batch requests are processed asynchronously without waiting for immediate responses
Cost Optimization: Typically more cost-effective than individual requests
Flexible Input: Supports both inline requests and file-based input
Complete Management: Provides full operations including create, retrieve, cancel, and list
Result Parsing: Automatically downloads and parses batch processing results
import(openaisdk"github.com/openai/openai-go""trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/openai")// Create model instance.llm:=openai.New("gpt-4o-mini")// Prepare batch requests.requests:=[]*openai.BatchRequestInput{{CustomID:"request-1",Method:"POST",URL:string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),Body:openai.BatchRequest{Messages:[]model.Message{model.NewSystemMessage("You are a helpful assistant."),model.NewUserMessage("Hello"),},},},{CustomID:"request-2",Method:"POST",URL:string(openaisdk.BatchNewParamsEndpointV1ChatCompletions),Body:openai.BatchRequest{Messages:[]model.Message{model.NewSystemMessage("You are a helpful assistant."),model.NewUserMessage("Introduce Go language"),},},},}// Create batch job.batch,err:=llm.CreateBatch(ctx,requests,openai.WithBatchCreateCompletionWindow("24h"),)iferr!=nil{log.Fatal(err)}fmt.Printf("Batch job created: %s\n",batch.ID)
// Download output file.ifbatch.OutputFileID!=""{text,err:=llm.DownloadFileContent(ctx,batch.OutputFileID)iferr!=nil{log.Fatal(err)}// Parse batch output.entries,err:=llm.ParseBatchOutput(text)iferr!=nil{log.Fatal(err)}// Process each result.for_,entry:=rangeentries{fmt.Printf("[%s] Status code: %d\n",entry.CustomID,entry.Response.StatusCode)iflen(entry.Response.Body.Choices)>0{content:=entry.Response.Body.Choices[0].Message.Contentfmt.Printf("Content: %s\n",content)}ifentry.Error!=nil{fmt.Printf("Error: %s\n",entry.Error.Message)}}}
// List batch jobs (with pagination support).page,err:=llm.ListBatches(ctx,"",10)iferr!=nil{log.Fatal(err)}for_,batch:=rangepage.Data{fmt.Printf("ID: %s, Status: %s\n",batch.ID,batch.Status)}
The retry mechanism is an automatic error recovery technique that automatically retries failed requests. This feature is provided by the underlying OpenAI SDK, with the framework passing retry parameters to the SDK through configuration options.
Timeouts and deadlines
Request lifecycle is bounded by two independent limits:
The caller context deadline (for example, Runner max duration, or context.WithTimeout).
The OpenAI request timeout configured by openaiopt.WithRequestTimeout.
github.com/openai/openai-go does not hardcode timeout by default. If you observe timeout in logs, it typically comes from an upstream deadline (gateway/caller context) or from your own WithRequestTimeout configuration.
If you expect long-running calls (streaming, large prompts, tools, or reasoning models), configure WithRequestTimeout to match your service deadline and service level objective (SLO).
import("time"openaiopt"github.com/openai/openai-go/option""trpc.group/trpc-go/trpc-agent-go/model/openai")// Create model instance with retry configuration.llm:=openai.New("gpt-4o-mini",openai.WithOpenAIOptions(openaiopt.WithMaxRetries(3),openaiopt.WithRequestTimeout(30*time.Second),),)
Retryable Errors
The OpenAI SDK automatically retries the following errors:
408 Request Timeout: Request timeout
409 Conflict: Conflict error
429 Too Many Requests: Rate limiting
500+ Server Errors: Internal server errors (5xx)
Network Connection Errors: No response or connection failure
// Standard configuration suitable for most scenarios.llm:=openai.New("gpt-4o-mini",openai.WithOpenAIOptions(openaiopt.WithMaxRetries(3),openaiopt.WithRequestTimeout(30*time.Second),),)
// For scenarios requiring quick failure.llm:=openai.New("gpt-4o-mini",openai.WithOpenAIOptions(openaiopt.WithMaxRetries(1),// Minimal retries.openaiopt.WithRequestTimeout(10*time.Second),// Short timeout.),)
1. Send request to LLM API
2. If request fails and error is retryable:
a. Check if maximum retry count is reached
b. Calculate wait time based on Retry-After header or exponential backoff
c. Wait and resend request
3. If request succeeds or error is not retryable, return result
Key design:
SDK-level Implementation: Retry logic is completely handled by OpenAI SDK
Configuration Pass-through: Framework passes configuration via WithOpenAIOptions
Smart Backoff: Prioritizes using Retry-After header returned by API
Transparent Handling: Transparent to application layer, no additional code needed
Use Cases
Production Environment: Improve service reliability and fault tolerance
In some enterprise or proxy scenarios, the model provider requires
additional HTTP headers (for example, organization ID, tenant routing,
or custom authentication). The Model module supports setting headers in
three reliable ways that apply to all model requests, including
non-streaming, streaming, file upload, and batch APIs.
Recommended order:
Global header via openai.WithHeaders (simplest for static headers)
Global header via OpenAI RequestOption (flexible, middleware-friendly)
Use WithOpenAIOptions with openaiopt.WithHeader or
openaiopt.WithMiddleware to inject headers for every request created
by the underlying OpenAI client.
import(openaiopt"github.com/openai/openai-go/option""trpc.group/trpc-go/trpc-agent-go/model/openai")llm:=openai.New("deepseek-chat",// If your provider needs extra headersopenai.WithOpenAIOptions(openaiopt.WithHeader("X-Custom-Header","custom-value"),openaiopt.WithHeader("X-Request-ID","req-123"),// You can also set User-Agent or vendor-specific headersopenaiopt.WithHeader("User-Agent","trpc-agent-go/1.0"),),)
For complex logic, middleware lets you modify headers conditionally
(for example, by URL path or context values):
llm:=openai.New("deepseek-chat",openai.WithOpenAIOptions(openaiopt.WithMiddleware(func(r*http.Request,nextopenaiopt.MiddlewareNext)(*http.Response,error){// Example: per-request header via context valueifv:=r.Context().Value("x-request-id");v!=nil{ifs,ok:=v.(string);ok&&s!=""{r.Header.Set("X-Request-ID",s)}}// Or only for chat completion endpointifstrings.Contains(r.URL.Path,"/chat/completions"){r.Header.Set("X-Feature-Flag","on")}returnnext(r)},),),)
Notes for authentication variants:
OpenAI style: keep openai.WithAPIKey("sk-...") which sets
Authorization: Bearer ... under the hood.
Azure/OpenAI‑compatible that use api-key: omit WithAPIKey and set
openaiopt.WithHeader("api-key", "<key>") instead.
Logging raw HTTP request and response
You can use openaiopt.WithMiddleware to log the underlying HTTP request and
response. Be careful about secrets (API keys, Authorization headers) and body
consumption.
Key points:
Reading req.Body or resp.Body consumes the stream, so you must restore it.
Do not read resp.Body for streaming responses (for example,
Content-Type: text/event-stream); skip body logging to avoid blocking
or breaking the stream.
typeheaderRoundTripperstruct{basehttp.RoundTripper}func(rtheaderRoundTripper)RoundTrip(req*http.Request)(*http.Response,error){// Add or override headersreq.Header.Set("X-Custom-Header","custom-value")req.Header.Set("X-Trace-ID","trace-xyz")returnrt.base.RoundTrip(req)}llm:=openai.New("deepseek-chat",openai.WithHTTPClientOptions(openai.WithHTTPClientTransport(headerRoundTripper{base:http.DefaultTransport}),),)
Per-request headers
Agent/Runner passes ctx through to the model call; middleware can
read values from req.Context() to inject per-invocation headers.
Chat completion per-request base URL override is not exposed; create a
second model with a different base URL or alter r.URL in middleware.
6. Token Tailoring
Token Tailoring is an intelligent message management technique designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API. This feature is particularly useful for long conversation scenarios, allowing you to keep the message list within the model's token limits while preserving key context.
typeCustomStrategystruct{}func(s*CustomStrategy)Tailor(ctxcontext.Context,messages[]model.Message,maxTokensint,countertokencounter.Counter,)([]model.Message,error){// Implement custom tailoring logic// e.g., keep only the most recent N conversation roundsreturnmessages,nil}model:=openai.New("deepseek-chat",openai.WithEnableTokenTailoring(true),openai.WithTailoringStrategy(&CustomStrategy{}),)
If the default token allocation strategy does not meet your needs, you can customize the budget parameters using WithTokenTailoringConfig. Note: It is recommended to keep the default values unless you have specific requirements.
model:=anthropic.New("claude-sonnet-4-0",anthropic.WithEnableTokenTailoring(true),anthropic.WithTokenTailoringConfig(&model.TokenTailoringConfig{SafetyMarginRatio:0.15,// Increase safety margin to 15%}),)
7. Variant Optimization: Adapting to Platform-Specific Behaviors
The Variant mechanism is an important optimization in the Model module, used to handle platform-specific behavioral differences across OpenAI-compatible providers. By specifying different Variants, the framework can automatically adapt to API differences between platforms, especially for file upload, deletion, and processing logic.
7.1. Supported Variant Types
The framework currently supports the following Variants:
1. VariantOpenAI(default)
Standard OpenAI API-compatible behavior
File upload path:/openapi/v1/files
File purpose:user_data
File deletion Http method::DELETE
2. VariantHunyuan(hunyuan)
Tencent Hunyuan platform-specific adaptation
File upload path::/openapi/v1/files/uploads
File purpose:file-extract
File deletion Http Method:POST
3. VariantDeepSeek
DeepSeek platform adaptation
Default BaseURL:https://api.deepseek.com
API Key environment variable name:DEEPSEEK_API_KEY
Other behaviors are consistent with standard OpenAI
import"trpc.group/trpc-go/trpc-agent-go/model/openai"// Use the Hunyuan platformmodel:=openai.New("hunyuan-model",openai.WithBaseURL("https://your-hunyuan-api.com"),openai.WithAPIKey("your-api-key"),openai.WithVariant(openai.VariantHunyuan),// Specify the Hunyuan variant)// Use the DeepSeek platformmodel:=openai.New("deepseek-chat",openai.WithBaseURL("https://api.deepseek.com/v1"),openai.WithAPIKey("your-api-key"),openai.WithVariant(openai.VariantDeepSeek),// Specify the DeepSeek variant)
import"trpc.group/trpc-go/trpc-agent-go/model"// For the Hunyuan platform, the file ID is placed in extraFields instead of content partsmessage:=model.Message{Role:model.RoleUser,ContentParts:[]model.ContentPart{{Type:model.ContentTypeFile,File:&model.File{FileID:"file_123",},},},}
Environment variable auto-configuration
For certain Variants, the framework supports reading configuration from environment variables automatically:
By default, the OpenAI adapter suppresses raw tool_calls chunks in streaming
responses. Tool calls are accumulated internally and only exposed once in the
final aggregated response via Response.Choices[0].Message.ToolCalls. This
keeps the stream clean for typical chat UIs that only render assistant text.
For advanced use cases (for example, when the model streams document content
inside tool arguments and you need to display it incrementally), you can turn
on raw tool call deltas with WithShowToolCallDelta:
Streaming chunks that contain tool_calls are no longer suppressed by the
adapter.
Each chunk is converted into a partial model.Response where:
Response.IsPartial == true
Response.Choices[0].Delta.ToolCalls contains the provider’s raw
tool_calls delta mapped to model.ToolCall:
Type comes from the provider type field (for example, "function").
Function.Name and Function.Arguments mirror the original tool name
and JSON-encoded arguments string.
ID and Index preserve the tool call identity so callers can stitch
fragments together.
The final aggregated response still exposes the merged tool calls in
Response.Choices[0].Message.ToolCalls, so existing tool execution logic
(for example, FunctionCallResponseProcessor) continues to work unchanged.
Typical integration pattern when this flag is enabled:
Read Response.Choices[0].Delta.ToolCalls[*].Function.Arguments on each
partial response.
Group chunks by tool call ID and append the Arguments fragments in
order.
Once the accumulated string forms valid JSON, unmarshal it into your
business struct (for example, { "content": "..." }) and use it for
progressive UI rendering.
If you do not need to inspect tool arguments during streaming, keep
WithShowToolCallDelta disabled to avoid handling partial JSON fragments and
to preserve the default clean text-streaming behavior.
Anthropic Model
Anthropic Model is used to interface with Claude models and compatible platforms, supporting streaming output, thought modes and tool calls, and providing a rich callback mechanism, while also allowing for flexible configuration of custom HTTP headers.
import"trpc.group/trpc-go/trpc-agent-go/model/anthropic"m:=anthropic.New("claude-sonnet-4-0",anthropic.WithAPIKey("your-api-key"),anthropic.WithBaseURL("https://api.anthropic.com"),// Optional configuration, default is this BASE URL)
import("trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/anthropic")funcmain(){// Create model instancellm:=anthropic.New("claude-sonnet-4-0")// Build requesttemperature:=0.7maxTokens:=1000request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a professional AI assistant."),model.NewUserMessage("Introduce the concurrency features of Go language."),},GenerationConfig:model.GenerationConfig{Temperature:&temperature,MaxTokens:&maxTokens,Stream:false,},}// Call the modelctx:=context.Background()responseChan,err:=llm.GenerateContent(ctx,request)iferr!=nil{fmt.Printf("System error: %v\n",err)return}// Handle responseforresponse:=rangeresponseChan{ifresponse.Error!=nil{fmt.Printf("API error: %s\n",response.Error.Message)return}iflen(response.Choices)>0{fmt.Printf("Reply: %s\n",response.Choices[0].Message.Content)}ifresponse.Done{break}}}
import("trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/anthropic")funcmain(){// Create model instancellm:=anthropic.New("claude-sonnet-4-0")// Streaming request configurationtemperature:=0.7maxTokens:=1000request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a creative story storyteller."),model.NewUserMessage("Write a short story about a robot learning to paint."),},GenerationConfig:model.GenerationConfig{Temperature:&temperature,MaxTokens:&maxTokens,Stream:true,},}// Call the modelctx:=context.Background()// Handle streaming responseresponseChan,err:=llm.GenerateContent(ctx,request)iferr!=nil{fmt.Printf("System error: %v\n",err)return}forresponse:=rangeresponseChan{ifresponse.Error!=nil{fmt.Printf("Error: %s",response.Error.Message)return}iflen(response.Choices)>0&&response.Choices[0].Delta.Content!=""{fmt.Print(response.Choices[0].Delta.Content)}ifresponse.Done{break}}}
// Using advanced generation parameterstemperature:=0.3maxTokens:=2000topP:=0.9thinking:=truethinkingTokens:=2048request:=&model.Request{Messages:[]model.Message{model.NewSystemMessage("You are a professional technical documentation writer."),model.NewUserMessage("Explain the pros and cons of microservices architecture."),},GenerationConfig:model.GenerationConfig{Temperature:&temperature,MaxTokens:&maxTokens,TopP:&topP,ThinkingEnabled:&thinking,ThinkingTokens:&thinkingTokens,Stream:true,},}
import(anthropicsdk"github.com/anthropics/anthropic-sdk-go""trpc.group/trpc-go/trpc-agent-go/model/anthropic")model:=anthropic.New("claude-sonnet-4-0",anthropic.WithChatRequestCallback(func(ctxcontext.Context,req*anthropicsdk.MessageNewParams){// Log the request before sending.log.Printf("sending request: model=%s, messages=%d.",req.Model,len(req.Messages))}),anthropic.WithChatResponseCallback(func(ctxcontext.Context,req*anthropicsdk.MessageNewParams,resp*anthropicsdk.Message){// Log details of the non-streaming response.log.Printf("received response: id=%s, input_tokens=%d, output_tokens=%d.",resp.ID,resp.Usage.InputTokens,resp.Usage.OutputTokens)}),anthropic.WithChatChunkCallback(func(ctxcontext.Context,req*anthropicsdk.MessageNewParams,chunk*anthropicsdk.MessageStreamEventUnion){// Log the type of the streaming event.log.Printf("stream event: %T.",chunk.AsAny())}),anthropic.WithChatStreamCompleteCallback(func(ctxcontext.Context,req*anthropicsdk.MessageNewParams,acc*anthropicsdk.Message,streamErrerror){// Log stream completion or error.ifstreamErr!=nil{log.Printf("stream failed: %v.",streamErr)return}log.Printf("stream completed: finish_reason=%s, input_tokens=%d, output_tokens=%d.",acc.StopReason,acc.Usage.InputTokens,acc.Usage.OutputTokens)}),)
2. Model Switching
Model switching allows dynamically changing the LLM model used by an Agent at runtime. The framework provides two approaches: agent-level switching (affects all subsequent requests) and per-request switching (affects only a single request).
Agent-level Switching
Agent-level switching changes the Agent's default model, affecting all subsequent requests.
Approach 1: Direct Model Instance
Set the model directly by passing a model instance to SetModel:
import("trpc.group/trpc-go/trpc-agent-go/agent/llmagent""trpc.group/trpc-go/trpc-agent-go/model/anthropic")// Create Agent.agent:=llmagent.New("my-agent",llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")),)// Switch to another model.agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))
// Select model based on task complexity.ifisComplexTask{agent.SetModel(anthropic.New("claude-3-5-sonnet-20241022"))// Use powerful model.}else{agent.SetModel(anthropic.New("claude-3-5-haiku-20241022"))// Use fast model.}
Approach 2: Switch by Name
Pre-register multiple models with WithModels, then switch by name using SetModelByName:
import("trpc.group/trpc-go/trpc-agent-go/agent/llmagent""trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/anthropic")// Create multiple model instances.sonnet:=anthropic.New("claude-3-5-sonnet-20241022")haiku:=anthropic.New("claude-3-5-haiku-20241022")// Register all models when creating the Agent.agent:=llmagent.New("my-agent",llmagent.WithModels(map[string]model.Model{"smart":sonnet,"fast":haiku,}),llmagent.WithModel(haiku),// Specify initial model.llmagent.WithInstruction("You are an intelligent assistant."),)// Switch models by name at runtime.err:=agent.SetModelByName("smart")iferr!=nil{log.Fatal(err)}// Switch to another model.err=agent.SetModelByName("fast")iferr!=nil{log.Fatal(err)}
// Select model based on user tier.modelName:="fast"// Default to fast model.ifuser.IsPremium(){modelName="smart"// Premium users get advanced model.}iferr:=agent.SetModelByName(modelName);err!=nil{log.Printf("Failed to switch model: %v",err)}// Select model based on time of day (cost optimization).hour:=time.Now().Hour()ifhour>=22||hour<8{// Use fast model at night.agent.SetModelByName("fast")}else{// Use smart model during the day.agent.SetModelByName("smart")}
Per-request Switching
Per-request switching allows temporarily specifying a model for a single request without affecting the Agent's default model or other requests. This is useful for scenarios where different models are needed for specific tasks.
Approach 1: Using WithModel Option
Use agent.WithModel to specify a model instance for a single request:
import("trpc.group/trpc-go/trpc-agent-go/agent""trpc.group/trpc-go/trpc-agent-go/model/anthropic")// Use a specific model for this request only.eventChan,err:=runner.Run(ctx,userID,sessionID,message,agent.WithModel(anthropic.New("claude-3-5-sonnet-20241022")),)
Approach 2: Using WithModelName Option (Recommended)
Use agent.WithModelName to specify a pre-registered model name for a single request:
// Pre-register multiple models when creating the Agent.agent:=llmagent.New("my-agent",llmagent.WithModels(map[string]model.Model{"smart":anthropic.New("claude-3-5-sonnet-20241022"),"fast":anthropic.New("claude-3-5-haiku-20241022"),}),llmagent.WithModel(anthropic.New("claude-3-5-haiku-20241022")),// Default model.)runner:=runner.NewRunner("app",agent)// Temporarily use "smart" model for this request only.eventChan,err:=runner.Run(ctx,userID,sessionID,message,agent.WithModelName("smart"),)// Next request still uses the default model "claude-3-5-haiku-20241022".eventChan2,err:=runner.Run(ctx,userID,sessionID,message2)
// Dynamically select model based on message complexity.varopts[]agent.RunOptionifisComplexQuery(message){opts=append(opts,agent.WithModelName("smart"))// Use powerful model for complex queries.}eventChan,err:=runner.Run(ctx,userID,sessionID,message,opts...)// Use specialized model for specific tasks.eventChan,err:=runner.Run(ctx,userID,sessionID,visionMessage,agent.WithModelName("vision"),)
Configuration Details
WithModels Option:
Accepts a map[string]model.Model where key is the model name and value is the model instance
If both WithModel and WithModels are set, WithModel specifies the initial model
If only WithModels is set, the first model in the map will be used as the initial model (note: map iteration order is not guaranteed, so it's recommended to explicitly specify the initial model)
Reserved name: __default__ is used internally by the framework and should not be used
SetModelByName Method:
Parameter: model name (string)
Returns: error if the model name is not found
The model must be pre-registered via WithModels
Per-request Options:
agent.RunOptions.Model: Directly specify a model instance
agent.RunOptions.ModelName: Specify a pre-registered model name
Priority: Model > ModelName > Agent default model
If the model specified by ModelName is not found, it falls back to the Agent's default model
Agent-level vs Per-request Comparison
Feature
Agent-level Switching
Per-request Switching
Scope
All subsequent requests
Current request only
Usage
SetModel/SetModelByName
RunOptions.Model/ModelName
State Change
Changes Agent default model
Does not change Agent state
Use Case
Global strategy adjustment
Specific task temporary needs
Concurrency
Affects all concurrent reqs
Does not affect other requests
Typical Examples
User tier, time-based policy
Complex queries, reasoning
Agent-level Approach Comparison
Feature
SetModel
SetModelByName
Usage
Pass model instance
Pass model name
Pre-registration
Not required
Required via WithModels
Error Handling
None
Returns error
Use Case
Simple switching
Complex scenarios, multi-model management
Code Maintenance
Need to hold model instances
Only need to remember names
Important Notes
Agent-level Switching:
Immediate Effect: After calling SetModel or SetModelByName, the next request immediately uses the new model
Session Persistence: Switching models does not clear session history
Independent Configuration: Each model retains its own configuration (temperature, max tokens, etc.)
Concurrency Safe: Both switching approaches are concurrency-safe
Per-request Switching:
Temporary Override: Only affects the current request, does not change the Agent's default model
Higher Priority: Per-request model settings take precedence over the Agent's default model
No Side Effects: Does not affect other concurrent requests or subsequent requests
Flexible Combination: Can be used in combination with agent-level switching
Model-specific Prompts (LLMAgent):
Use llmagent.WithModelInstructions / llmagent.WithModelGlobalInstructions to override prompts by model.Info().Name when the Agent switches models; it falls back to the Agent defaults when no mapping exists.
For a complete interactive example, see examples/model/switch, which demonstrates both agent-level and per-request switching approaches.
3. Custom HTTP Headers
In environments like gateways, proprietary platforms, or proxy setups, model API requests often require additional HTTP headers (e.g., organization/tenant identifiers, grayscale routing, custom authentication, etc.). The Model module provides three reliable ways to add headers for "all model requests," including standard requests, streaming, file uploads, batch processing, etc.
Recommended order:
Global header via anthropic.WithHeaders (simplest for static headers)
Use Anthropic RequestOption to set global headers (flexible, middleware-friendly)
Use a custom http.RoundTripper injection (advanced, more cross-cutting capabilities)
All methods affect streaming requests, as they use the same underlying client.
2. Using Anthropic RequestOption to Set Global Headers
By using WithAnthropicClientOptions combined with anthropicopt.WithHeader or anthropicopt.WithMiddleware, you can inject headers into every request made by the underlying Anthropic client.
import(anthropicopt"github.com/anthropics/anthropic-sdk-go/option""trpc.group/trpc-go/trpc-agent-go/model/anthropic")llm:=anthropic.New("claude-sonnet-4-0",// If your platform requires additional headersanthropic.WithAnthropicClientOptions(anthropicopt.WithHeader("X-Custom-Header","custom-value"),anthropicopt.WithHeader("X-Request-ID","req-123"),// You can also set User-Agent or vendor-specific headersanthropicopt.WithHeader("User-Agent","trpc-agent-go/1.0"),),)
If you need to set headers conditionally (e.g., only for certain paths or depending on context values), you can use middleware:
import(anthropicopt"github.com/anthropics/anthropic-sdk-go/option""trpc.group/trpc-go/trpc-agent-go/model/anthropic")llm:=anthropic.New("claude-sonnet-4-0",anthropic.WithAnthropicClientOptions(anthropicopt.WithMiddleware(func(r*http.Request,nextanthropicopt.MiddlewareNext)(*http.Response,error){// Example: Set "per-request" headers based on context valueifv:=r.Context().Value("x-request-id");v!=nil{ifs,ok:=v.(string);ok&&s!=""{r.Header.Set("X-Request-ID",s)}}// Or only for the "message completion" endpointifstrings.Contains(r.URL.Path,"v1/messages"){r.Header.Set("X-Feature-Flag","on")}returnnext(r)},),),)```##### 3. Using Custom `http.RoundTripper`For injecting headers at the HTTP transport layer, ideal for scenarios requiring proxying, TLS, custom monitoring, and other capabilities.```goimport(anthropicopt"github.com/anthropics/anthropic-sdk-go/option""trpc.group/trpc-go/trpc-agent-go/model/anthropic")typeheaderRoundTripperstruct{basehttp.RoundTripper}func(rtheaderRoundTripper)RoundTrip(req*http.Request)(*http.Response,error){// Add or override headersreq.Header.Set("X-Custom-Header","custom-value")req.Header.Set("X-Trace-ID","trace-xyz")returnrt.base.RoundTrip(req)}llm:=anthropic.New("claude-sonnet-4-0",anthropic.WithHTTPClientOptions(anthropic.WithHTTPClientTransport(headerRoundTripper{base:http.DefaultTransport}),),)
Regarding "per-request" headers:
The Agent/Runner will propagate ctx to the model call; middleware can read the value from req.Context() to inject headers for "this call."
For message completion, the current API doesn't expose per-call BaseURL overrides; if switching is needed, create a model using a different BaseURL or modify the r.URL in middleware.
4. Token Tailoring
Anthropic models also support Token Tailoring functionality, designed to automatically trim messages when they exceed the model's context window limits, ensuring requests can be successfully sent to the LLM API.
For detailed explanations of the token calculation formula, tailoring strategy, and custom strategy implementation, please refer to Token Tailoring under OpenAI Model.
Provider
With the emergence of multiple large model providers, some have defined their own API specifications. Currently, the framework has integrated the APIs of OpenAI and Anthropic, and exposes them as models. Users can access different provider models through openai.New and anthropic.New.
However, there are differences in instantiation and configuration between providers, which often requires developers to modify a significant amount of code when switching between providers, increasing the cost of switching.
To solve this problem, the Provider offers a unified model instantiation entry point. Developers only need to specify the provider and model name, and other configuration options are managed through the unified Option, simplifying the complexity of switching between providers.
The Provider supports the following Option:
Option
Description
WithAPIKey / WithBaseURL
Set the API Key and Base URL for the model
WithHTTPClientName / WithHTTPClientTransport
Configure HTTP client properties
WithHeaders
Append static HTTP headers across requests
WithChannelBufferSize
Adjust the response channel buffer size
WithCallbacks
Configure OpenAI / Anthropic request, response, and streaming callbacks
import("trpc.group/trpc-go/trpc-agent-go/model""trpc.group/trpc-go/trpc-agent-go/model/provider")// Custom token tailoring budget parameters for all providersconfig:=&model.TokenTailoringConfig{ProtocolOverheadTokens:1024,ReserveOutputTokens:4096,SafetyMarginRatio:0.15,}modelInstance,err:=provider.Model("openai","deepseek-chat",provider.WithAPIKey(c.apiKey),provider.WithEnableTokenTailoring(true),provider.WithTokenTailoringConfig(config),)