Session Summary
Overview
As conversations grow, maintaining complete event history can consume significant memory and may exceed the LLM's context window limit. The session summary feature uses LLM to automatically compress historical conversations into concise summaries, significantly reducing memory usage and token consumption while preserving important context.
Key Features
- Auto-trigger: Automatically generates summaries based on event count, token count, or time thresholds
- Incremental processing: Only processes new events since the last summary, avoiding redundant computation
- LLM-driven: Uses any configured LLM model to generate high-quality, context-aware summaries
- Non-destructive: Original events are fully preserved; summaries are stored separately
- Async processing: Executes asynchronously in the background without blocking conversation flow
- Flexible configuration: Supports custom trigger conditions, prompts, and word limits
Basic Configuration
Step 1: Create Summarizer
Create a summarizer with an LLM model and configure trigger conditions:
Step 2: Configure Session Service
Integrate the summarizer into a session service:
Step 3: Configure Agent and Runner
Create an Agent and configure summary injection behavior:
After completing the above configuration, the summary feature runs automatically.
SessionSummarizer Interface
Summarizer Options
Trigger Conditions
| Option | Description |
|---|---|
WithEventThreshold(eventCount int) |
Trigger when event count since last summary exceeds threshold |
WithTokenThreshold(tokenCount int) |
Trigger when token count since last summary exceeds threshold |
WithTimeThreshold(interval time.Duration) |
Trigger when time since last event exceeds interval |
Combined Conditions
| Option | Description |
|---|---|
WithChecksAll(checks ...Checker) |
All conditions must be met (AND logic), use Check* functions |
WithChecksAny(checks ...Checker) |
Any condition triggers (OR logic), use Check* functions |
Note: Use Check* functions (e.g., CheckEventThreshold) inside WithChecksAll and WithChecksAny, not With* functions.
Summary Generation
| Option | Description |
|---|---|
WithMaxSummaryWords(maxWords int) |
Limit summary word count; included in prompt to guide model |
WithPrompt(prompt string) |
Custom summary prompt; must contain {conversation_text} placeholder |
WithSkipRecent(skipFunc SkipRecentFunc) |
Custom function to skip recent events |
Hook Options
| Option | Description |
|---|---|
WithPreSummaryHook(h PreSummaryHook) |
Pre-summary hook; can modify input text |
WithPostSummaryHook(h PostSummaryHook) |
Post-summary hook; can modify output summary |
WithSummaryHookAbortOnError(abort bool) |
Whether to abort on hook error; default false (ignore errors) |
Tool Call Formatting
By default, the summarizer includes tool calls and tool results in the conversation text sent to the LLM for summarization. The default format is:
- Tool calls:
[Called tool: toolName with args: {"arg": "value"}] - Tool results:
[toolName returned: result content]
| Option | Description |
|---|---|
WithToolCallFormatter(f ToolCallFormatter) |
Customize how tool calls are formatted in summary input. Return empty string to exclude |
WithToolResultFormatter(f ToolResultFormatter) |
Customize how tool results are formatted in summary input. Return empty string to exclude |
Model Callbacks (Before/After Model)
The summarizer supports model callbacks around the underlying model.GenerateContent call, useful for modifying requests, short-circuiting with custom responses, or instrumentation.
| Option | Description |
|---|---|
WithModelCallbacks(callbacks *model.Callbacks) |
Register Before/After callbacks for the summarizer's underlying model calls |
Checker Functions
Checker is a function type for determining whether to trigger summarization:
Built-in Checkers
| Checker | Description |
|---|---|
CheckEventThreshold(eventCount int) |
Returns true when the number of delta events since the last summary exceeds the threshold |
CheckTimeThreshold(interval time.Duration) |
Returns true when time since last event exceeds interval |
CheckTokenThreshold(tokenCount int) |
Returns true when the estimated token count of delta events since the last summary exceeds the threshold (estimated via TokenCounter from extracted conversation text, not event.Response.Usage.TotalTokens) |
ChecksAll(checks []Checker) |
Combines multiple Checkers; returns true only when all return true (AND) |
ChecksAny(checks []Checker) |
Combines multiple Checkers; returns true when any returns true (OR) |
Custom Prompt
Required placeholders:
{conversation_text}: Must be included; replaced with conversation content{max_summary_words}: Must be included whenmaxSummaryWords > 0
Token Counter Configuration
By default, CheckTokenThreshold uses a built-in SimpleTokenCounter that estimates token count based on text length. To customize token counting behavior, use summary.SetTokenCounter to set a global token counter:
Notes:
- Global effect:
SetTokenCounteraffects allCheckTokenThresholdevaluations in the current process; set it once during application initialization - Default counter: If not set, the default
SimpleTokenCounteris used (approximately 4 characters per token)
Skip Recent Events
Use WithSkipRecent to skip recent events during summarization:
Summary Hooks
PreSummaryHook
Called before summary generation; can modify input text or events:
PostSummaryHook
Called after summary generation; can modify the output summary:
Usage Example
Summary Trigger Mechanism
Automatic Trigger (Recommended)
The Runner automatically checks trigger conditions after each conversation completes, generating summaries asynchronously in the background when conditions are met.
Trigger timing:
- Event count exceeds threshold (
WithEventThreshold) - Token count exceeds threshold (
WithTokenThreshold) - Time since last event exceeds interval (
WithTimeThreshold) - Custom combined conditions met (
WithChecksAny/WithChecksAll)
Manual Trigger
In some scenarios, you may need to manually trigger summarization:
API description:
EnqueueSummaryJob: Async summary (recommended)- Background processing, non-blocking
- Auto-fallback to sync on failure
- Suitable for production
CreateSessionSummary: Sync summary- Immediate processing, blocks current operation
- Returns result directly
- Suitable for debugging or when immediate results are needed
Parameter description:
- filterKey:
session.SummaryFilterKeyAllContentsgenerates a summary for the full session - force parameter:
false: Respects configured trigger conditions; only generates summary when conditions are mettrue: Forces summary generation, completely ignoring all trigger condition checks
Use cases:
| Scenario | Recommended API | force |
|---|---|---|
| Normal conversation flow | Auto-trigger (no call needed) | - |
| Background batch processing | EnqueueSummaryJob |
false |
| User-initiated request | EnqueueSummaryJob |
true |
| Debug/Test | CreateSessionSummary |
true |
| Session end | EnqueueSummaryJob |
true |
Context Injection Mechanism
The framework provides two modes for managing conversation context sent to the LLM:
Mode 1: Enable Summary Injection (Recommended)
How it works:
- Session summary is merged into the existing system message if one exists, or prepended as a new system message if none exists
- This ensures compatibility with models that require a single system message at the beginning (e.g., Qwen3.5 series)
- Includes all incremental events after the summary point (no truncation)
- Guarantees complete context: compressed history + full new conversation
WithMaxHistoryRunsparameter is ignored
Context structure:
Model Compatibility:
Some LLM providers have strict requirements for system message placement and count:
- Qwen3.5 series and similar models require the system message to be at the beginning and do not support multiple system messages
- The default merging behavior prevents errors like
System message must be at the beginning - Preloaded memory content is also merged into the system message using the same mechanism
Mode 2: Without Summary
How it works:
- No summary message added
- Only includes the most recent
MaxHistoryRunsconversation turns MaxHistoryRuns=0means no limit, includes all history
Context structure:
Mode Selection Guide
| Scenario | Recommended Config | Description |
|---|---|---|
| Long sessions (support, assistant) | AddSessionSummary=true |
Maintain full context, optimize tokens |
| Short sessions (single consultation) | AddSessionSummary=falseMaxHistoryRuns=10 |
Simple and direct, no summary overhead |
| Debug/Test | AddSessionSummary=falseMaxHistoryRuns=5 |
Quick validation, reduce noise |
| High concurrency | AddSessionSummary=trueIncrease worker count |
Async processing, no impact on response speed |
Summary Format Customization
By default, session summaries are formatted with context tags and a note about prioritizing current conversation information:
Default format:
You can use WithSummaryFormatter to customize the summary format:
Use cases:
- Simplified format: Use concise titles and minimal context hints to reduce token consumption
- Language localization: Translate context hints to the target language
- Role-specific format: Provide different formats for different Agent roles
- Model optimization: Adjust format based on specific model preferences
Retrieving Summaries
Filter Key support:
- When no option is provided, returns the full session summary (
SummaryFilterKeyAllContents) - When a specific filter key is provided but not found, falls back to the full session summary
- If neither exists, falls back to any available summary
Summary by Event Type
In practice, you may want to generate independent summaries for different types of events.
Setting FilterKey with AppendEventHook
FilterKey Prefix Convention
Important: FilterKey must include the appName + "/" prefix.
Reason: The Runner uses appName + "/" as the filter prefix when filtering events. If the FilterKey doesn't have this prefix, events will be filtered out.
Generating Summaries by Type
How It Works
- Incremental processing: The summarizer tracks the last summary time for each session; subsequent runs only process events after the last summary
- Incremental summary: New events are combined with the previous summary to generate an updated summary containing both old context and new information
- Trigger condition evaluation: Before generating a summary, configured trigger conditions are evaluated. If conditions are not met and
force=false, summarization is skipped - Async workers: Summary tasks are distributed to multiple worker goroutines using a hash-based distribution strategy, ensuring tasks for the same session are processed in order
- Fallback mechanism: If async enqueue fails (queue full, context cancelled, or workers not initialized), the system automatically falls back to synchronous processing
Best Practices
- Choose appropriate thresholds: Set event/token thresholds based on the LLM's context window and conversation patterns. For GPT-4 (8K context), consider
WithTokenThreshold(4000)to leave room for responses - Use async processing: Always use
EnqueueSummaryJobinstead ofCreateSessionSummaryin production to avoid blocking conversation flow - Monitor queue size: If you frequently see "queue is full" warnings, increase
WithSummaryQueueSizeorWithAsyncSummaryNum - Customize prompts: Tailor summary prompts to your application needs. For example, if building a customer support Agent, focus on key issues and solutions
- Balance word limits: Set
WithMaxSummaryWordsto balance context preservation and token usage. Typical range is 100-300 words - Test trigger conditions: Experiment with different
WithChecksAnyandWithChecksAllcombinations to find the optimal balance between summary frequency and cost
Performance Considerations
- LLM cost: Each summary generation calls the LLM. Monitor trigger conditions to balance cost and context preservation
- Memory usage: Summaries are stored alongside events. Configure appropriate TTL to manage memory in long-running sessions
- Async workers: More workers increase throughput but consume more resources. Start with 2-4 workers and scale based on load
- Queue capacity: Adjust queue size based on expected concurrency and summary generation time
Complete Example
Here is a complete example demonstrating how all components work together: