Skip to content

Session Summary

Overview

As conversations grow, maintaining complete event history can consume significant memory and may exceed the LLM's context window limit. The session summary feature uses LLM to automatically compress historical conversations into concise summaries, significantly reducing memory usage and token consumption while preserving important context.

Key Features

  • Auto-trigger: During summary checks, automatically generates summaries based on event count, token count, or time thresholds
  • Incremental processing: Only processes new events since the last summary, avoiding redundant computation
  • LLM-driven: Uses any configured LLM model to generate high-quality, context-aware summaries
  • Non-destructive: Original events are fully preserved; summaries are stored separately
  • Async processing: Executes asynchronously in the background without blocking conversation flow
  • Flexible configuration: Supports custom trigger conditions, prompts, and word limits

Basic Configuration

Step 1: Create Summarizer

Create a summarizer with an LLM model and configure trigger conditions:

import (
    "time"

    "trpc.group/trpc-go/trpc-agent-go/session/summary"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
)

summaryModel := openai.New("gpt-4", openai.WithAPIKey("your-api-key"))

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithChecksAny(
        summary.CheckEventThreshold(20),
        summary.CheckTokenThreshold(4000),
        summary.CheckTimeThreshold(5*time.Minute), // Evaluated on summary check; compares the checked session's last event (normally the latest unsummarized event in delta flow)
    ),
    summary.WithMaxSummaryWords(200),
)

Step 2: Configure Session Service

Integrate the summarizer into a session service:

import (
    "context"
    "time"
    "trpc.group/trpc-go/trpc-agent-go/session/clickhouse"
    "trpc.group/trpc-go/trpc-agent-go/session/inmemory"
    "trpc.group/trpc-go/trpc-agent-go/session/mysql"
    "trpc.group/trpc-go/trpc-agent-go/session/postgres"
    "trpc.group/trpc-go/trpc-agent-go/session/redis"
    "trpc.group/trpc-go/trpc-agent-go/session/summary"
)

// Memory storage (dev/test)
sessionService := inmemory.NewSessionService(
    inmemory.WithSummarizer(summarizer),
    inmemory.WithAsyncSummaryNum(2),
    inmemory.WithSummaryQueueSize(100),
    inmemory.WithSummaryJobTimeout(60*time.Second),
)

// Redis storage (production)
sessionService, err := redis.NewService(
    redis.WithRedisClientURL("redis://localhost:6379"),
    redis.WithSummarizer(summarizer),
    redis.WithAsyncSummaryNum(4),
    redis.WithSummaryQueueSize(200),
)

// PostgreSQL storage
sessionService, err := postgres.NewService(
    postgres.WithHost("localhost"),
    postgres.WithPassword("your-password"),
    postgres.WithSummarizer(summarizer),
    postgres.WithAsyncSummaryNum(2),
    postgres.WithSummaryQueueSize(100),
)

// MySQL storage
sessionService, err := mysql.NewService(
    mysql.WithMySQLClientDSN("user:password@tcp(localhost:3306)/db?charset=utf8mb4&parseTime=True&loc=Local"),
    mysql.WithSummarizer(summarizer),
    mysql.WithAsyncSummaryNum(2),
    mysql.WithSummaryQueueSize(100),
)

// ClickHouse storage
sessionService, err := clickhouse.NewService(
    clickhouse.WithClickHouseDSN("clickhouse://default:password@localhost:9000/default"),
    clickhouse.WithSummarizer(summarizer),
    clickhouse.WithAsyncSummaryNum(2),
)

Step 3: Configure Agent and Runner

Create an Agent and configure summary injection behavior:

import (
    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/runner"
)

llmAgent := llmagent.New(
    "my-agent",
    llmagent.WithModel(summaryModel),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithMaxHistoryRuns(10),
)

r := runner.NewRunner(
    "my-agent",
    llmAgent,
    runner.WithSessionService(sessionService),
)

eventChan, err := r.Run(ctx, userID, sessionID, userMessage)

After completing the above configuration, the summary feature runs automatically.

Cache-Safe Summary Forking

By default, the summarizer sends a standalone summary request: an optional summary system prompt plus a user prompt containing the extracted conversation text. This is simple and remains the default behavior.

For long sessions where prompt-cache reuse matters, you can opt in to cache-safe forking:

1
2
3
4
5
6
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithContextThreshold(),
    summary.WithMaxSummaryWords(200),
    summary.WithCacheSafeForking(true),
)

When enabled and the framework has the parent model request available, the summarizer clones that parent request and appends one compacting user message at the end. This preserves the parent request prefix, including system context, history, and tools, so providers with prompt caching can reuse more cached input. If no parent request is available, for example in asynchronous or manual summary calls, the summarizer falls back to the default standalone request.

The appended compacting prompt is separate from WithPrompt(...) because it does not embed {conversation_text}; the parent request already contains the conversation. Use WithCacheSafeForkPrompt(...) only when you need to customize that appended user message.

Cache-safe forking controls the request used to generate the summary. To make the next normal conversation request more cache friendly after a summary exists, prefer injecting the summary as a user message instead of merging it into the system prompt:

1
2
3
4
5
6
llmAgent := llmagent.New(
    "my-agent",
    llmagent.WithModel(summaryModel),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithSessionSummaryInjectionMode(llmagent.SessionSummaryInjectionUser),
)

Summary + Progressive Disclosure

When summary injection and prompt-side context compaction keep the request small, some older details are no longer visible to the model. If you still want the agent to recover those details only when needed, enable progressive disclosure for session history.

import (
    "os"

    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/session/pgvector"
)

sessionService, err := pgvector.NewService(
    pgvector.WithDSN(os.Getenv("PGVECTOR_DSN")),
    pgvector.WithEmbedder(embedder),
    pgvector.WithSummarizer(summarizer),
)
if err != nil {
    panic(err)
}

llmAgent := llmagent.New(
    "my-agent",
    llmagent.WithModel(summaryModel),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithEnableContextCompaction(true),
    llmagent.WithEnableOnDemandSession(true),
)

Requirements and behavior:

  • WithEnableOnDemandSession(true) enables on-demand session tools according to backend capability. session_search is exposed when the backend implements session.SearchableService; session_load is exposed when the backend implements session.WindowService. Backends may support either one or both.
  • session/pgvector supports both discovery and exact loading. Normal session backends that implement WindowService expose exact session_load recovery even when semantic session_search is unavailable.
  • current_hidden searches current-session history strictly before the summary boundary recorded in summary:last_included_ts.
  • current_session searches the current session regardless of summary cutoff. This is useful when request projection or context compaction omitted current-session details from the visible prompt.
  • other_sessions searches other sessions for the same <appName, userID>.
  • all_sessions combines current_hidden and other_sessions.

What can be recalled:

  • User and assistant messages.
  • Historical tool results, including tool outputs that were compacted out of the prompt.

What is intentionally excluded:

  • Raw tool-call requests are not indexed.
  • Partial events are not indexed.

Recommended usage pattern:

  1. Let the model answer from the visible prompt, summary, and recent history.
  2. If session_search is available and a missing detail is needed, call it first.
  3. Use session_load when you have an event_id and need the surrounding raw history or exact tool result, including on backends without semantic search.
  4. Treat loaded history as untrusted historical context, not active instructions.

Migration note: earlier builds only treated on-demand session support as available when both session_search and session_load were present. The tool surface is now capability-based, so search-only integrations can expose session_search and load-only integrations can expose session_load.

SessionSummarizer Interface

type SessionSummarizer interface {
    // ShouldSummarize checks if the session should be summarized.
    ShouldSummarize(sess *session.Session) bool

    // Summarize generates a summary without modifying the session events.
    Summarize(ctx context.Context, sess *session.Session) (string, error)

    // SetPrompt updates the summarizer's prompt dynamically.
    SetPrompt(prompt string)

    // SetModel updates the summarizer's model dynamically.
    SetModel(m model.Model)

    // Metadata returns metadata about the summarizer configuration.
    Metadata() map[string]any
}

Context-Aware Summary Checks

The released SessionSummarizer interface stays unchanged.

When summary gating depends on request context, use ContextChecker with the context-aware check options:

type asyncSummaryKey struct{}

eventThreshold := summary.CheckEventThreshold(20)

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithChecksAnyContext(
        func(ctx context.Context, sess *session.Session) bool {
            if eventThreshold(sess) {
                return true
            }
            async, _ := ctx.Value(asyncSummaryKey{}).(bool)
            return async
        },
    ),
)

The framework does not reserve any context keys for summary triggering. If your application needs to distinguish different summary entry points, annotate the context before calling the session APIs and read the value inside your ContextChecker.

Dynamic Summarizer

Use NewDynamicSummarizer when the session service should be reused, but the summary model, prompt, or checks must vary per request. This is useful for multi-tenant systems and custom model routing. Keep the session service long-lived, especially for database-backed services such as MySQL, so the underlying connection pool can be reused.

type summaryCfgKey struct{}

type SummaryCfg struct {
    ModelName string
    Prompt    string
}

func WithSummaryCfg(ctx context.Context, cfg SummaryCfg) context.Context {
    return context.WithValue(ctx, summaryCfgKey{}, cfg)
}

func SummaryCfgFromContext(ctx context.Context) (SummaryCfg, bool) {
    cfg, ok := ctx.Value(summaryCfgKey{}).(SummaryCfg)
    return cfg, ok
}

summarizer := summary.NewDynamicSummarizer(func(
    ctx context.Context,
    sess *session.Session,
) (summary.SessionSummarizer, error) {
    cfg, ok := SummaryCfgFromContext(ctx)
    if !ok {
        return nil, nil // Skip automatic summary for this call.
    }
    return BuildSummarizer(cfg)
})

sessionService, err := mysql.NewService(
    mysql.WithMySQLClientDSN(dsn),
    mysql.WithSummarizer(summarizer),
)

Before running the request, attach the request-scoped configuration to ctx:

1
2
3
4
ctx = WithSummaryCfg(ctx, SummaryCfg{
    ModelName: req.SummaryModel,
    Prompt:    req.SummaryPrompt,
})

The resolver should be cheap and deterministic for the same ctx and session. During non-forced summary, it may be called once for the summary gate and once for actual summary generation. If constructing the summarizer is expensive, store the already-built summarizer in ctx and let the resolver only read it. Returning nil from the resolver skips automatic summary checks. Direct Summarize calls, or forced summary calls without a resolved summarizer, return an error. If the resolver returns an error while ShouldSummarizeWithContext is checking an automatic, non-forced summary, the gate treats it as false and skips summary generation; direct Summarize calls propagate resolver errors to the caller.

Summarizer Options

Trigger Conditions

Option Description
WithEventThreshold(eventCount int) Trigger when event count since last summary exceeds threshold
WithTokenThreshold(tokenCount int) Trigger when token count since last summary exceeds threshold
WithContextThreshold(opts ...ContextThresholdOption) Trigger when token count since last summary exceeds a ratio of the current model's context window
WithTimeThreshold(interval time.Duration) Evaluated during summary checks; wraps CheckTimeThreshold and triggers when the checked session's last event is older than the interval

Use WithTokenThreshold when you want a fixed application-defined token threshold, for example "summarize after 4000 new tokens" regardless of which model is serving the request. The threshold is captured in the summarizer configuration and does not change when your application switches models.

Use WithContextThreshold when the summary trigger should follow the active model's context window. This is the recommended option for agents that can switch models within a session. At summary-check time, the framework resolves the context window in this order:

  1. Per-run override from agent.WithModelContextWindow(tokens)
  2. Model instance configuration from providers such as openai.WithContextWindow(tokens) or provider.WithContextWindow(tokens)
  3. Process-wide model-name registry from model.RegisterModelContextWindow(name, tokens)

The threshold is then computed as contextWindow * ratio (default 50%). For private deployments, endpoint IDs, fine-tuned models, newly released models, or multi-tenant custom model configuration, prefer the instance or per-run option so different users do not overwrite a process-wide registry entry:

modelInstance := openai.New(
    "my-custom-model",
    openai.WithAPIKey(apiKey),
    openai.WithBaseURL(apiURI),
    openai.WithContextWindow(204800),
)

eventChan, err := r.Run(
    ctx,
    userID,
    sessionID,
    userMessage,
    agent.WithModel(modelInstance),
)

eventChan, err = r.Run(
    ctx,
    userID,
    sessionID,
    userMessage,
    agent.WithModelName("my-custom-model"),
    agent.WithModelContextWindow(204800),
)

Use global registration only when the model name has a stable process-wide meaning:

model.RegisterModelContextWindow("my-custom-model", 32768)

Combined Conditions

Option Description
WithChecksAll(checks ...Checker) All conditions must be met (AND logic), use Check* functions
WithChecksAny(checks ...Checker) Any condition triggers (OR logic), use Check* functions
WithChecksAllContext(checks ...ContextChecker) All request-scoped conditions must be met (AND logic)
WithChecksAnyContext(checks ...ContextChecker) Any request-scoped condition triggers (OR logic)

ContextChecker receives (ctx context.Context, sess *session.Session).

Note: Use Check* functions (for example CheckEventThreshold) inside WithChecksAll and WithChecksAny, not With* functions.

// AND logic: all conditions must be met
summary.WithChecksAll(
    summary.CheckEventThreshold(10),
    summary.CheckTokenThreshold(2000),
)

// OR logic: any condition triggers
summary.WithChecksAny(
    summary.CheckEventThreshold(50),
    summary.CheckTimeThreshold(10*time.Minute),
)

Summary Generation

Option Description
WithMaxSummaryWords(maxWords int) Limit summary word count; included in prompt to guide model
WithPrompt(prompt string) Custom summary prompt; must contain {conversation_text} placeholder
WithSystemPrompt(prompt string) Add a separate system message for summarization instructions; must not contain {conversation_text}
WithCacheSafeForking(enable bool) Opt in to cache-safe summary request forking when a parent request is available. Disabled by default
WithCacheSafeForkPrompt(prompt string) Customize the compacting user message appended in cache-safe fork mode. May include {max_summary_words}, but not {conversation_text}
WithSkipRecent(skipFunc SkipRecentFunc) Custom function to skip recent events

Hook Options

Option Description
WithPreSummaryHook(h PreSummaryHook) Pre-summary hook; can modify input text
WithPostSummaryHook(h PostSummaryHook) Post-summary hook; can modify output summary
WithSummaryHookAbortOnError(abort bool) Whether to abort on hook error; default false (ignore errors)

Tool Call Formatting

By default, the summarizer includes tool calls and tool results in the conversation text sent to the LLM for summarization. The default format is:

  • Tool calls: [Called tool: toolName with args: {"arg": "value"}]
  • Tool results: [toolName returned: result content]
Option Description
WithToolCallFormatter(f ToolCallFormatter) Customize how tool calls are formatted in summary input. Return empty string to exclude
WithToolResultFormatter(f ToolResultFormatter) Customize how tool results are formatted in summary input. Return empty string to exclude
// Truncate long tool arguments
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithToolCallFormatter(func(tc model.ToolCall) string {
        name := tc.Function.Name
        if name == "" {
            return ""
        }
        args := string(tc.Function.Arguments)
        const maxLen = 100
        if len(args) > maxLen {
            args = args[:maxLen] + "...(truncated)"
        }
        return fmt.Sprintf("[Tool: %s, Args: %s]", name, args)
    }),
    summary.WithEventThreshold(20),
)

// Exclude tool results from summary
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithToolResultFormatter(func(msg model.Message) string {
        return ""
    }),
    summary.WithEventThreshold(20),
)

// Include only tool name, exclude arguments
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithToolCallFormatter(func(tc model.ToolCall) string {
        if tc.Function.Name == "" {
            return ""
        }
        return fmt.Sprintf("[Used tool: %s]", tc.Function.Name)
    }),
    summary.WithEventThreshold(20),
)

Model Callbacks (Before/After Model)

The summarizer supports model callbacks around the underlying model.GenerateContent call, useful for modifying requests, short-circuiting with custom responses, or instrumentation.

Option Description
WithModelCallbacks(callbacks *model.Callbacks) Register Before/After callbacks for the summarizer's underlying model calls
callbacks := model.NewCallbacks().
    RegisterBeforeModel(func(ctx context.Context, args *model.BeforeModelArgs) (*model.BeforeModelResult, error) {
        // Modify args.Request, or return CustomResponse to skip the real model call
        return nil, nil
    }).
    RegisterAfterModel(func(ctx context.Context, args *model.AfterModelArgs) (*model.AfterModelResult, error) {
        // Override model output via CustomResponse
        return nil, nil
    })

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithModelCallbacks(callbacks),
)

Checker Functions

Checker is a function type for determining whether to trigger summarization:

type Checker func(sess *session.Session) bool

Built-in Checkers

Checker Description
CheckEventThreshold(eventCount int) Returns true when the number of delta events since the last summary exceeds the threshold
CheckTimeThreshold(interval time.Duration) Returns true when the checked session's last event is older than the interval
CheckTokenThreshold(tokenCount int) Returns true when the estimated token count of delta events since the last summary exceeds the threshold (estimated via TokenCounter from extracted conversation text, not event.Response.Usage.TotalTokens)
ChecksAll(checks []Checker) Combines multiple Checkers; returns true only when all return true (AND)
ChecksAny(checks []Checker) Combines multiple Checkers; returns true when any returns true (OR)

Custom Prompt

customPrompt := `Analyze the following conversation and provide a concise summary,
focusing on key decisions, action items, and important context.
Keep it within {max_summary_words} words.

<conversation>
{conversation_text}
</conversation>

Summary:`

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithPrompt(customPrompt),
    summary.WithMaxSummaryWords(100),
    summary.WithEventThreshold(15),
)

Required placeholders:

  • {conversation_text}: Must be included; replaced with conversation content
  • {max_summary_words}: Must be included in either WithPrompt(...) or WithSystemPrompt(...) when maxSummaryWords > 0

If you want to keep summarization instructions in a dedicated system message, combine WithSystemPrompt with a lighter user prompt that only carries the conversation payload:

systemPrompt := `Summarize the conversation faithfully.
Focus on key decisions and action items.
Keep it within {max_summary_words} words.`

userPrompt := `<conversation>
{conversation_text}
</conversation>

Summary:`

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithSystemPrompt(systemPrompt),
    summary.WithPrompt(userPrompt),
    summary.WithMaxSummaryWords(100),
    summary.WithEventThreshold(15),
)

Notes:

  • WithPrompt still renders into the user message
  • WithSystemPrompt renders into a dedicated system message
  • WithSystemPrompt must not include {conversation_text}; keep conversation content in the user prompt

Token Counter Configuration

By default, CheckTokenThreshold uses a built-in SimpleTokenCounter that estimates token count based on text length. To customize token counting behavior, use summary.SetTokenCounter to set a global token counter:

For SimpleTokenCounter, WithApproxRunesPerToken(v) means roughly v UTF-8 characters per token. The formula is estimatedTokens = countedUTF8Runes / v. For example, v=1.5 means about 1.5 characters per token; do not treat it as a token multiplier.

import (
    "context"
    "fmt"
    "unicode/utf8"

    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/session/summary"
)

// Use the built-in simple token counter
summary.SetTokenCounter(model.NewSimpleTokenCounter())

// Or use a custom implementation
type MyCustomCounter struct{}

func (c *MyCustomCounter) CountTokens(ctx context.Context, message model.Message) (int, error) {
    _ = ctx
    return utf8.RuneCountInString(message.Content), nil
}

func (c *MyCustomCounter) CountTokensRange(ctx context.Context, messages []model.Message, start, end int) (int, error) {
    if start < 0 || end > len(messages) || start >= end {
        return 0, fmt.Errorf("invalid range: start=%d, end=%d, len=%d",
            start, end, len(messages))
    }

    total := 0
    for i := start; i < end; i++ {
        tokens, err := c.CountTokens(ctx, messages[i])
        if err != nil {
            return 0, err
        }
        total += tokens
    }
    return total, nil
}

summary.SetTokenCounter(&MyCustomCounter{})

Notes:

  • Global effect: SetTokenCounter affects all CheckTokenThreshold evaluations in the current process; set it once during application initialization
  • Default counter: If not set, the default SimpleTokenCounter is used (approximately 4 characters per token)
  • Parameter meaning: v in WithApproxRunesPerToken(v) is characters per token. Passing 2.0/3.0 means about 0.67 characters per token, which is about 1.5 tokens per character

Skip Recent Events

Use WithSkipRecent to skip recent events during summarization:

// Skip a fixed number of events
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithSkipRecent(func(_ []event.Event) int { return 2 }),
    summary.WithEventThreshold(10),
)

// Skip events from the last 5 minutes (time window)
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithSkipRecent(func(events []event.Event) int {
        cutoff := time.Now().Add(-5 * time.Minute)
        skip := 0
        for i := len(events) - 1; i >= 0; i-- {
            if events[i].Timestamp.After(cutoff) {
                skip++
            } else {
                break
            }
        }
        return skip
    }),
    summary.WithEventThreshold(10),
)

// Skip only trailing tool call messages
summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithSkipRecent(func(events []event.Event) int {
        skip := 0
        for i := len(events) - 1; i >= 0; i-- {
            if events[i].Response != nil && len(events[i].Response.Choices) > 0 &&
                events[i].Response.Choices[0].Message.Role == model.RoleTool {
                skip++
            } else {
                break
            }
        }
        return skip
    }),
    summary.WithEventThreshold(10),
)

Summary Hooks

PreSummaryHook

Called before summary generation; can modify input text or events:

1
2
3
4
5
6
7
8
type PreSummaryHookContext struct {
    Ctx     context.Context
    Session *session.Session
    Events  []event.Event
    Text    string
}

type PreSummaryHook func(in *PreSummaryHookContext) error

PostSummaryHook

Called after summary generation; can modify the output summary:

1
2
3
4
5
6
7
type PostSummaryHookContext struct {
    Ctx     context.Context
    Session *session.Session
    Summary string
}

type PostSummaryHook func(in *PostSummaryHookContext) error

Usage Example

summarizer := summary.NewSummarizer(
    summaryModel,
    summary.WithPreSummaryHook(func(ctx *summary.PreSummaryHookContext) error {
        // Modify ctx.Text or ctx.Events before summary generation
        return nil
    }),
    summary.WithPostSummaryHook(func(ctx *summary.PostSummaryHookContext) error {
        // Modify ctx.Summary after summary generation
        return nil
    }),
    summary.WithSummaryHookAbortOnError(true),
)

Summary Trigger Mechanism

The Runner automatically checks trigger conditions after each conversation completes, generating summaries asynchronously in the background when conditions are met.

Trigger timing:

  • Event count exceeds threshold (WithEventThreshold)
  • Token count exceeds threshold (WithTokenThreshold)
  • Token count exceeds the configured ratio of the active model's context window (WithContextThreshold)
  • On a summary check, the checked session's last event is older than the interval (WithTimeThreshold)
  • Custom combined conditions met (WithChecksAny / WithChecksAll)

WithTimeThreshold is not a standalone background timer. The condition is only evaluated when a summary check runs, typically after a conversation turn completes or when you call summary APIs manually. It checks the last event of the session being evaluated; in the Runner's normal delta-summary flow, that session contains only pending events, so this effectively means the latest unsummarized event. For example, 5*time.Minute means "on the next summary check, if the checked session's last event is already older than 5 minutes, summarize now."

Manual Trigger

In some scenarios, you may need to manually trigger summarization:

// Async summary (recommended) - background processing, non-blocking
err := sessionService.EnqueueSummaryJob(
    ctx,
    sess,
    session.SummaryFilterKeyAllContents,
    false,
)

// Sync summary - immediate processing, blocks current operation
err := sessionService.CreateSessionSummary(
    ctx,
    sess,
    session.SummaryFilterKeyAllContents,
    false,
)

// Async forced summary - ignores trigger conditions
err := sessionService.EnqueueSummaryJob(
    ctx,
    sess,
    session.SummaryFilterKeyAllContents,
    true,
)

// Sync forced summary - immediate forced generation
err := sessionService.CreateSessionSummary(
    ctx,
    sess,
    session.SummaryFilterKeyAllContents,
    true,
)

API description:

  • EnqueueSummaryJob: Async summary (recommended)
    • Background processing, non-blocking
    • Auto-fallback to sync on failure
    • Suitable for production
  • CreateSessionSummary: Sync summary
    • Immediate processing, blocks current operation
    • Returns result directly
    • Suitable for debugging or when immediate results are needed

Parameter description:

  • filterKey: session.SummaryFilterKeyAllContents generates a summary for the full session
  • force parameter:
    • false: Respects configured trigger conditions; only generates summary when conditions are met
    • true: Forces summary generation, completely ignoring all trigger condition checks

Use cases:

Scenario Recommended API force
Normal conversation flow Auto-trigger (no call needed) -
Background batch processing EnqueueSummaryJob false
User-initiated request EnqueueSummaryJob true
Debug/Test CreateSessionSummary true
Session end EnqueueSummaryJob true

Context Injection Mechanism

The framework provides two modes for managing conversation context sent to the LLM:

Before choosing a mode, distinguish the three context-reduction mechanisms:

Mechanism Layer What changes Typical use
Summary Session Service + prompt assembly Uses an LLM to create a persisted summary of historical events. With WithAddSessionSummary(true), the request injects that summary and appends only incremental events after the summary point Preserve semantic continuity in long sessions while avoiding repeated full-history prompts
Context compaction Agent prompt assembly Does not call an LLM and does not drop whole turns. It only rewrites tool result content during request projection, such as replacing old results with placeholders or truncating oversized results with head+tail preservation Keep the conversation structure and active tool chain while shrinking large tool outputs
Token tailoring Model provider Drops or keeps message rounds according to a token budget right before the provider call. The default strategy tries to preserve system messages and the latest turn, but preservation is still limited by the available budget Final fallback to keep the request within the model context window

The normal call path is: the agent assembles the prompt, injects the summary when configured, and optionally compacts tool result content. If summary injection is enabled and the compacted request still approaches the context window, the flow may synchronously refresh the summary once and rebuild the request before the LLM call. Finally, model-layer token tailoring trims the message list by budget. In short, context compaction and token tailoring can both reduce prompt size, but compaction shrinks tool-output payloads inside messages, while tailoring drops message rounds. Summary is different again: it creates a semantic replacement for historical context.

llmagent.WithAddSessionSummary(true)

How it works:

  • Session summary is merged into the existing system message if one exists, or prepended as a new system message if none exists
  • This ensures compatibility with models that require a single system message at the beginning (e.g., Qwen3.5 series)
  • Includes all incremental events after the summary point (no truncation)
  • Guarantees complete context: compressed history + full new conversation
  • WithMaxHistoryRuns parameter is ignored

Summary Injection Mode

By default, the summary is injected as a system message (merged into the existing system prompt). In this mode, the summary is protected by token tailoring's preserved head and will not be trimmed by the sliding window.

To allow the summary to participate in token-budget trimming for a true sliding-window experience, switch the injection mode to user:

1
2
3
4
5
6
agent := llmagent.New(
    "my-agent",
    llmagent.WithModel(modelInstance),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithSessionSummaryInjectionMode(llmagent.SessionSummaryInjectionUser),
)

Injection mode comparison:

Mode Injection Position Token Tailoring Behavior Use Case
SessionSummaryInjectionSystem (default) Merged into system message Summary in preserved head, never trimmed Summary must always be present
SessionSummaryInjectionUser Merged into the first user history/current message when possible; otherwise inserted near history Summary participates in round trimming, can be evicted Sliding window for very long conversations

User mode message structure:

When the first history message is a user role, the summary is merged into it:

┌─────────────────────────────────────────┐
│ System Prompt                           │ ← Does not contain summary
├─────────────────────────────────────────┤
│ [Few-shot examples, if any]             │
├─────────────────────────────────────────┤
│ User: [summary context] + [original    │
│        first user message]              │ ← Summary merged into first user history
├─────────────────────────────────────────┤
│ Assistant: ...                          │
│ User: ...                               │
│ ...                                     │
│ User: current message                   │
└─────────────────────────────────────────┘

When the first history message is not a user role, the summary is a standalone user message:

┌─────────────────────────────────────────┐
│ System Prompt                           │ ← Does not contain summary
├─────────────────────────────────────────┤
│ [Few-shot examples, if any]             │
├─────────────────────────────────────────┤
│ User: Context from previous             │
│ interactions: <summary>...</summary>    │ ← Standalone summary user message
├─────────────────────────────────────────┤
│ Assistant/Tool history events           │
│ ...                                     │
│ User: current message                   │
└─────────────────────────────────────────┘

Notes:

  • In user mode, the processor first tries to merge the summary into the first user history/current message so it stays attached to the live user turn
  • If there is no user history/current message and the prompt prefix already ends with a user message (for example, injected context), the summary falls back to that trailing user message instead of adding another adjacent user block
  • User mode uses a more neutral default template ("Context from previous interactions") to avoid system-instruction tone in a user role message
  • Custom WithSummaryFormatter also applies to user mode
  • The summary generation pipeline is unaffected — injection mode only changes prompt assembly, not the summarizer itself

Tip: For very long conversations (hundreds of turns) where you want old summaries to naturally age out (replaced by newer summaries), use SessionSummaryInjectionUser mode.

Context Compaction Details

Context compaction is not another name for summary, and it is not token tailoring. It only targets tool result content, which is the part most likely to grow unexpectedly. It does not summarize ordinary user/assistant messages with an LLM, and it does not discard complete message rounds the way token tailoring may.

Naming note: "compaction" in WithEnableContextCompaction(true) means prompt-side tool result compaction/pruning. Semantic summaries are still controlled by WithAddSessionSummary(true) and the configured session summarizer.

When WithEnableContextCompaction(true) is enabled, the framework adds two compaction passes before the LLM call:

Pass 1 — Historical tool result placeholder (ContextCompactionToolResultMaxTokens, default 1024 tokens):

  • Tool results from older requests that exceed the threshold are replaced entirely with a short placeholder while keeping ToolID and ToolName
  • The current request and the latest ContextCompactionKeepRecentRequests completed requests are never affected
  • If ToolResultCompactionConfig.SkipRecentFunc returns a positive number, the request/invocation units that own those tail events are also treated as recent and skipped by Pass 1
  • This cleans up accumulated long tool outputs from earlier conversation turns

Pass 2 — Oversized tool result truncation (ContextCompactionOversizedToolResultMaxTokens, default 0 / disabled):

  • Applies to nearly all tool results including the current request. Tool results returned by session_load itself are skipped so recovered slices are not compacted again
  • Tool results exceeding this threshold are truncated using head+tail preservation: the beginning and end of the content are kept, with a [...N characters truncated...] marker in the middle
  • This is the safety net for single tool results large enough to overflow the context window on their own (e.g. web_fetch returning 800K+ chars of HTML)

The two passes have different roles: Pass 1 aggressively cleans old history (low threshold, full replacement); Pass 2 is a high-threshold guard that only kicks in for extreme cases but protects the current request too.

Pass 2 is disabled by default (0). It only fires when both (1) WithEnableContextCompaction(true) is set and (2) ContextCompactionOversizedToolResultMaxTokens > 0 (recommended opt-in value: 8192, exposed as the constant processor.DefaultContextCompactionOversizedToolResultMaxTokens). This guarantees that EnableContextCompaction=false always means "the framework will not modify any tool result".

Use WithToolResultCompactionConfig(...) when you need tool-name or recency policy:

  • ForceCleanToolNames: historical results from these tools are replaced with a policy placeholder whenever context compaction is enabled, after current/recent protection is applied. This is useful for noisy tools such as shell, grep, or log dump tools.
  • KeepToolNames: results from these tools are left untouched by context compaction. This is useful for recovery tools such as session_load and session_search when the model may need to read the exact payload.
  • SkipRecentFunc: customizes how many tail events are considered recent. It affects Pass 0 force-clean and Pass 1 historical classification; Pass 2 can still truncate oversized recent/current tool results.

If the same tool name appears in both ForceCleanToolNames and KeepToolNames, KeepToolNames wins.

When the compacted event has an event_id, placeholders and truncation markers include recovery hints such as event_id, tool_call_id, and tool_name. With WithEnableOnDemandSession(true) and a session backend that implements session.WindowService, the model can call session_load with content_offset / content_limit to reload a precise slice of the original tool result. session_load output size is controlled by its own window parameters and content_limit; reload very large results in slices instead of requesting the full payload at once.

Additionally:

  • If WithAddSessionSummary(true) is also enabled and the rebuilt request still approaches the model context window, the framework performs one synchronous CreateSessionSummary(...) retry before calling the model
  • Model-layer token tailoring remains the final fallback. It trims whole message rounds, so keep recovered slices small enough that they still fit in the final provider request
  • Context compaction uses SimpleTokenCounter by default. If your application uses a custom counter for CJK-heavy prompts or provider-specific tokenization, pass the same counter with WithContextCompactionTokenCounter(...) so Pass 1 decisions and Pass 2 truncation use the same estimate as token tailoring.
counter := model.NewSimpleTokenCounter(
    model.WithApproxRunesPerToken(1.6), // About 1.6 chars/token; this value is a divisor, not a multiplier.
)

modelInstance := openai.New(
    "deepseek-v4-flash",
    openai.WithEnableTokenTailoring(true),
    openai.WithTokenCounter(counter),
)

agent := llmagent.New(
    "my-agent",
    llmagent.WithModel(modelInstance),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithEnableContextCompaction(true), // only shrinks tool results; does not generate a summary
    llmagent.WithContextCompactionThresholdRatio(0.7),
    llmagent.WithContextCompactionToolResultMaxTokens(1024),  // Pass 1: old tool results → placeholder
    llmagent.WithContextCompactionOversizedToolResultMaxTokens(8192),  // Pass 2: any huge result → head+tail
    llmagent.WithContextCompactionKeepRecentRequests(1),
    llmagent.WithContextCompactionTokenCounter(counter),
    llmagent.WithToolResultCompactionConfig(&llmagent.ToolResultCompactionConfig{
        ForceCleanToolNames: []string{"shell", "grep"},
        KeepToolNames:       []string{"session_load", "session_search"},
        SkipRecentFunc: func(events []event.Event) int {
            // For example, protect the request/invocation units that own the
            // last 3 events so an in-flight tool chain is not treated as old
            // history by Pass 1.
            return 3
        },
    }),
)

See examples/context_compaction for a full example. It calls a real model and prints the exact request sent to the model by default with -debug=true, which makes it easy to verify whether large historical tool result payloads were replaced with placeholders.

Context structure:

┌─────────────────────────────────────────┐
│ System Prompt                           │
│ (merged with Session Summary)           │ ← System prompt + compressed history
├─────────────────────────────────────────┤
│ Event 1 (after summary)                 │ ┐
│ Event 2                                 │ │
│ Event 3                                 │ │ New events after summary
│ ...                                     │ │ (fully retained)
│ Event N (current message)               │ ┘
└─────────────────────────────────────────┘

Model Compatibility:

Some LLM providers have strict requirements for system message placement and count:

  • Qwen3.5 series and similar models require the system message to be at the beginning and do not support multiple system messages
  • The default merging behavior prevents errors like System message must be at the beginning
  • Preloaded memory content is also merged into the system message using the same mechanism

Mode 2: Without Summary

llmagent.WithAddSessionSummary(false)
llmagent.WithMaxHistoryRuns(10)

How it works:

  • No summary message added
  • Only includes the most recent MaxHistoryRuns conversation turns
  • MaxHistoryRuns=0 means no limit, includes all history
  • If WithEnableContextCompaction(true) is enabled, oversized tool results in older retained requests can still be compacted during request projection (Pass 1). If you also explicitly set WithContextCompactionOversizedToolResultMaxTokens(8192) (or another positive value), extremely large tool results in any request (including the current one) will be head+tail truncated (Pass 2). Both passes require the EnableContextCompaction=true master switch.
  • The pre-LLM synchronous summary retry is disabled in this mode

Context structure:

1
2
3
4
5
6
7
8
┌─────────────────────────────────────────┐
│ System Prompt                           │
├─────────────────────────────────────────┤
│ Event N-k+1                             │ ┐
│ Event N-k+2                             │ │ Last k runs
│ ...                                     │ │ (MaxHistoryRuns=k)
│ Event N (current message)               │ ┘
└─────────────────────────────────────────┘

Mode Selection Guide

Scenario Recommended Config Description
Long sessions (support, assistant) AddSessionSummary=true Maintain full context, optimize tokens
Short sessions (single consultation) AddSessionSummary=false
MaxHistoryRuns=10
Simple and direct, no summary overhead
Debug/Test AddSessionSummary=false
MaxHistoryRuns=5
Quick validation, reduce noise
High concurrency AddSessionSummary=true
Increase worker count
Async processing, no impact on response speed

If your long sessions frequently contain large tool outputs such as search results, logs, or code scan output, enable EnableContextCompaction=true. Pair it with AddSessionSummary=true when you also want the pre-LLM synchronous summary retry.

Tip: If your agent uses tools like web_fetch that can return extremely large results in a single call, ContextCompactionOversizedToolResultMaxTokens is particularly valuable — it prevents a single tool result from consuming the entire context window, even when that result belongs to the current (protected) request. It is disabled by default; opt in by enabling WithEnableContextCompaction(true) and passing a positive threshold (recommended: 8192).

Summary Format Customization

By default, session summaries are formatted with context tags and a note about prioritizing current conversation information:

Default format:

1
2
3
4
5
6
7
Here is a brief summary of your previous interactions:

<summary_of_previous_interactions>
[Summary content]
</summary_of_previous_interactions>

Note: this information is from previous interactions and may be outdated. You should ALWAYS prefer information from this conversation over the past summary.

You can use WithSummaryFormatter to customize the summary format:

1
2
3
4
5
6
7
8
agent := llmagent.New(
    "my-agent",
    llmagent.WithModel(modelInstance),
    llmagent.WithAddSessionSummary(true),
    llmagent.WithSummaryFormatter(func(summary string) string {
        return fmt.Sprintf("## Previous Context\n\n%s", summary)
    }),
)

Use cases:

  • Simplified format: Use concise titles and minimal context hints to reduce token consumption
  • Language localization: Translate context hints to the target language
  • Role-specific format: Provide different formats for different Agent roles
  • Model optimization: Adjust format based on specific model preferences

Retrieving Summaries

// Get full session summary (default)
summaryText, found := sessionService.GetSessionSummaryText(ctx, sess)
if found {
    fmt.Printf("Summary: %s\n", summaryText)
}

// Get summary for a specific filter key
userSummary, found := sessionService.GetSessionSummaryText(
    ctx, sess, session.WithSummaryFilterKey("user-messages"),
)
if found {
    fmt.Printf("User message summary: %s\n", userSummary)
}

Filter Key support:

  • When no option is provided, returns the full session summary (SummaryFilterKeyAllContents)
  • When a specific filter key is provided but not found, falls back to the full session summary
  • If neither exists, falls back to any available summary

Summary by Event Type

In practice, you may want to generate independent summaries for different types of events.

Setting FilterKey with AppendEventHook

sessionService := inmemory.NewSessionService(
    inmemory.WithAppendEventHook(func(ctx *session.AppendEventContext, next func() error) error {
        prefix := "my-app/"
        switch ctx.Event.Author {
        case "user":
            ctx.Event.FilterKey = prefix + "user-messages"
        case "tool":
            ctx.Event.FilterKey = prefix + "tool-calls"
        default:
            ctx.Event.FilterKey = prefix + "misc"
        }
        return next()
    }),
)

FilterKey Prefix Convention

Important: FilterKey must include the appName + "/" prefix.

Reason: The Runner uses appName + "/" as the filter prefix when filtering events. If the FilterKey doesn't have this prefix, events will be filtered out.

1
2
3
4
5
// Correct: with appName prefix
evt.FilterKey = "my-app/user-messages"

// Wrong: no prefix, events will be filtered out
evt.FilterKey = "user-messages"

Generating Summaries by Type

1
2
3
4
5
6
7
8
9
// Generate summary for user messages
err := sessionService.CreateSessionSummary(ctx, sess, "my-app/user-messages", false)

// Generate summary for tool calls
err := sessionService.CreateSessionSummary(ctx, sess, "my-app/tool-calls", false)

// Get summary for a specific type
userSummary, found := sessionService.GetSessionSummaryText(
    ctx, sess, session.WithSummaryFilterKey("my-app/user-messages"))

Restricting Summary Targets

By default, when a non-empty branch FilterKey triggers summarization, the session service refreshes both that branch summary and the full-session summary (SummaryFilterKeyAllContents). If some branches do not need summaries, you can reduce LLM usage with an allowlist and optionally disable the full-session cascade:

1
2
3
4
5
6
7
8
sessionService := inmemory.NewSessionService(
    inmemory.WithSummarizer(summarizer),
    inmemory.WithSummaryFilterAllowlist(
        "my-app/user-messages",
        "my-app/tool-calls",
    ),
    inmemory.WithCascadeFullSessionSummary(false),
)

Behavior notes:

  • WithSummaryFilterAllowlist(...) only controls non-empty branch summary targets. It does not block session.SummaryFilterKeyAllContents.
  • WithCascadeFullSessionSummary(...) controls whether a non-empty branch trigger also refreshes the full-session summary.
  • To keep only full-session summaries from branch-triggered automatic summary, pass an explicit empty allowlist and leave cascade enabled:
1
2
3
4
5
sessionService, err := mysql.NewService(
    mysql.WithMySQLClientDSN(dsn),
    mysql.WithSummarizer(summarizer),
    mysql.WithSummaryFilterAllowlist(""),
)
  • mysql.WithSummaryFilterAllowlist("") and mysql.WithSummaryFilterAllowlist() both mean "no branch keys are allowed"; with the default cascade behavior, the full-session summary still refreshes.
  • If you also set mysql.WithCascadeFullSessionSummary(false), non-empty branch triggers have no summary target and no summary is generated.
  • Allowlist matching is hierarchical and segment-aware, not a raw string prefix check. Internally the framework appends the filter-key delimiter ("/") to both sides and then checks whether either key is an ancestor/descendant of the other.
  • Examples:
    • Allowing my-app/tool matches my-app/tool and my-app/tool/search.
    • Allowing my-app/tool/search also matches my-app/tool.
    • Allowing my-app/tool does not match my-app/toolbox.
    • Allowing my-app/tool does not match other-app/tool.
  • session.SummaryFilterKeyAllContents remains available for direct full-session summaries even when an allowlist is configured.
  • Leaving the allowlist unset preserves the legacy behavior and allows every branch FilterKey to trigger summaries.
  • Passing an explicit empty allowlist blocks branch summary targets; with cascade enabled, branch triggers still refresh the full-session summary.

How It Works

  1. Incremental processing: The summarizer tracks the last summary time for each session; subsequent runs only process events after the last summary
  2. Incremental summary: New events are combined with the previous summary to generate an updated summary containing both old context and new information
  3. Trigger condition evaluation: Before generating a summary, configured trigger conditions are evaluated. If conditions are not met and force=false, summarization is skipped
  4. Async workers: Summary tasks are distributed to multiple worker goroutines using a hash-based distribution strategy, ensuring tasks for the same session are processed in order
  5. Fallback mechanism: If async enqueue fails (queue full, context cancelled, or workers not initialized), the system automatically falls back to synchronous processing

Best Practices

  1. Choose appropriate thresholds: Use WithContextThreshold for agents whose model can change at runtime, and use WithTokenThreshold when you intentionally want a fixed token budget. For custom or tenant-provided models, prefer per-model WithContextWindow or per-run agent.WithModelContextWindow; use global registration only for stable process-wide model names
  2. Use async processing: Always use EnqueueSummaryJob instead of CreateSessionSummary in production to avoid blocking conversation flow
  3. Monitor queue size: If you frequently see "queue is full" warnings, increase WithSummaryQueueSize or WithAsyncSummaryNum
  4. Customize prompts: Tailor summary prompts to your application needs. For example, if building a customer support Agent, focus on key issues and solutions
  5. Balance word limits: Set WithMaxSummaryWords to balance context preservation and token usage. Typical range is 100-300 words
  6. Test trigger conditions: Experiment with different WithChecksAny and WithChecksAll combinations to find the optimal balance between summary frequency and cost

Performance Considerations

  • LLM cost: Each summary generation calls the LLM. Monitor trigger conditions to balance cost and context preservation
  • Memory usage: Summaries are stored alongside events. Configure appropriate TTL to manage memory in long-running sessions
  • Async workers: More workers increase throughput but consume more resources. Start with 2-4 workers and scale based on load
  • Queue capacity: Adjust queue size based on expected concurrency and summary generation time

Complete Example

Here is a complete example demonstrating how all components work together:

package main

import (
    "context"
    "time"

    "trpc.group/trpc-go/trpc-agent-go/agent/llmagent"
    "trpc.group/trpc-go/trpc-agent-go/model"
    "trpc.group/trpc-go/trpc-agent-go/model/openai"
    "trpc.group/trpc-go/trpc-agent-go/runner"
    "trpc.group/trpc-go/trpc-agent-go/session/inmemory"
    "trpc.group/trpc-go/trpc-agent-go/session/summary"
)

func main() {
    ctx := context.Background()

    // Create LLM model for chat and summary
    llm := openai.New("gpt-4", openai.WithAPIKey("your-api-key"))

    // Create summarizer with flexible trigger conditions
    summarizer := summary.NewSummarizer(
        llm,
        summary.WithMaxSummaryWords(200),
        summary.WithChecksAny(
            summary.CheckEventThreshold(20),
            summary.CheckTokenThreshold(4000),
            summary.CheckTimeThreshold(5*time.Minute), // Evaluated on summary check; compares the checked session's last event (normally the latest unsummarized event in delta flow)
        ),
    )

    // Create session service with summarizer
    sessionService := inmemory.NewSessionService(
        inmemory.WithSummarizer(summarizer),
        inmemory.WithAsyncSummaryNum(2),
        inmemory.WithSummaryQueueSize(100),
        inmemory.WithSummaryJobTimeout(60*time.Second),
    )

    // Create agent with summary injection enabled
    agent := llmagent.New(
        "my-agent",
        llmagent.WithModel(llm),
        llmagent.WithAddSessionSummary(true),
        llmagent.WithMaxHistoryRuns(10),
    )

    // Create runner
    r := runner.NewRunner("my-app", agent,
        runner.WithSessionService(sessionService))

    // Run conversation - summary will be managed automatically
    userMsg := model.NewUserMessage("Tell me about AI")
    eventChan, _ := r.Run(ctx, "user123", "session456", userMsg)

    // Consume events
    for event := range eventChan {
        _ = event
    }
}

References