PGVector (PostgreSQL + pgvector)
Example Code: examples/knowledge/vectorstores/postgres
PGVector is a vector store implementation based on PostgreSQL + pgvector extension, supporting hybrid retrieval (vector similarity + text relevance).
Basic Configuration
Configuration Options
Connection Configuration
PGVector supports two connection configuration methods (in priority order from high to low):
1. Direct Connection Configuration
2. Use Registered Instance
Reuse a PostgreSQL instance already registered in storage/postgres, suitable for scenarios where multiple components share the same database connection.
Priority Rules:
- WithPGVectorClientDSN() / WithHost()... > WithPostgresInstance()
- If multiple methods are specified simultaneously, the higher priority takes effect
| Option | Description | Default Value |
|---|---|---|
WithPGVectorClientDSN(dsn) |
PostgreSQL connection string | - |
WithHost(host) |
Database host address | "localhost" |
WithPort(port) |
Database port | 5432 |
WithUser(user) |
Database username | - |
WithPassword(password) |
Database password | - |
WithDatabase(database) |
Database name | "trpc_agent_go" |
WithTable(table) |
Table name | "documents" |
WithSSLMode(mode) |
SSL mode | "disable" |
WithPostgresInstance(name) |
Use registered PostgreSQL instance | - |
Vector Configuration
| Option | Description | Default Value |
|---|---|---|
WithIndexDimension(dim) |
Vector dimension (must match embedding model) | 1536 |
WithVectorIndexType(type) |
Vector index type (VectorIndexHNSW / VectorIndexIVFFlat) |
VectorIndexHNSW |
WithHNSWIndexParams(params) |
HNSW index parameters (M, EfConstruction) | M=16, EfConstruction=64 |
WithIVFFlatIndexParams(params) |
IVFFlat index parameters (Lists) | Lists=100 |
Hybrid Search Configuration
| Option | Description | Default Value |
|---|---|---|
WithEnableTSVector(enabled) |
Enable text search vector | true |
WithHybridSearchWeights(vector, text) |
Hybrid search weights (vector/text), only applies in Weighted mode | 0.7, 0.3 |
WithHybridFusionMode(mode) |
Hybrid search fusion mode (HybridFusionWeighted / HybridFusionRRF) |
HybridFusionWeighted |
WithRRFParams(params) |
RRF parameters (K and CandidateRatio), only applies in RRF mode | K=60, CandidateRatio=3 |
WithLanguageExtension(lang) |
Text tokenization language extension (e.g., zhparser/jieba) | "english" |
Search Configuration
| Option | Description | Default Value |
|---|---|---|
WithMaxResults(n) |
Default search result count | 10 |
WithDocBuilder(builder) |
Custom document builder | Default builder |
WithExtraOptions(opts...) |
Inject custom PostgreSQL ClientBuilder config, no need to care by default | - |
Field Mapping (Advanced)
| Option | Description | Default Value |
|---|---|---|
WithIDField(field) |
ID field name | "id" |
WithNameField(field) |
Name field name | "name" |
WithContentField(field) |
Content field name | "content" |
WithEmbeddingField(field) |
Embedding field name | "embedding" |
WithMetadataField(field) |
Metadata field name | "metadata" |
WithCreatedAtField(field) |
Created at field name | "created_at" |
WithUpdatedAtField(field) |
Updated at field name | "updated_at" |
Full-Text Search
PGVector supports full-text search (TSVector), which can be used for Keyword Search and Hybrid Search.
⚠️ Important: Keyword Search and Hybrid Search require enabling PostgreSQL full-text search functionality via
WithEnableTSVector(true).
Search Mode Support
| Search Mode | Requires TSVector | Description |
|---|---|---|
| Vector Search | ❌ | Uses vector index only |
| Keyword Search | ✅ Required | Depends on PostgreSQL tsvector full-text index |
| Hybrid Search | ✅ Required | Uses both vector index and tsvector index |
| Filter Search | ❌ | Metadata filtering only |
If WithEnableTSVector(true) is not enabled:
The system will automatically downgrade the search mode without error: - When attempting Keyword/Hybrid search → Automatically downgrades to Vector Search (if vector available) or Filter Search (if no vector) - INFO logs will be output indicating the downgrade reason
Note: The default call chain from SearchTool to VectorStore does not actively specify SearchModeKeyword, so Keyword Search is typically not triggered; instead, the default Vector or Hybrid search is used.
Search Modes and Score Normalization
💡 Note: This section covers score calculation details. Users typically don't need to worry about this. PGVector automatically handles score normalization for all search modes.
PGVector supports multiple search modes, all scores are normalized to the [0, 1] range, with higher scores indicating stronger relevance.
1. Vector Search
SQL Template (using subquery to avoid duplicate calculations):
Normalization Formula:
Mathematical Principle:
- PGVector <=> operator returns Cosine Distance: d ∈ [0, 2]
- d = 0: Vectors are identical
- d = 1: Vectors are orthogonal
- d = 2: Vectors are opposite
- Cosine Similarity: s = 1 - d ∈ [-1, 1]
- Normalized to [0, 1]: score = (s + 1) / 2 = (2 - d) / 2 = 1 - d/2
Examples:
- distance = 0.2 → score = 1 - 0.2/2 = 0.90 (highly similar)
- distance = 1.0 → score = 1 - 1.0/2 = 0.50 (orthogonal)
- distance = 1.8 → score = 1 - 1.8/2 = 0.10 (nearly opposite)
2. Keyword Search
SQL Template (using subquery to avoid duplicate calculations):
Normalization Formula:
c = 0.1 (sparseNormConstant)
Mathematical Principle:
- PostgreSQL ts_rank() returns raw text relevance score with unbounded range (typically [0, ∞))
- Uses hyperbolic normalization: f(x) = x / (x + c) to map to [0, 1)
- Parameter c controls sensitivity:
- Smaller c: More sensitive to small rank values, higher discrimination
- Larger c: More tolerant to large rank values, tends to saturate
Examples (c = 0.1):
- rank = 1.0 → score = 1.0 / 1.1 = 0.909
- rank = 0.5 → score = 0.5 / 0.6 = 0.833
- rank = 0.1 → score = 0.1 / 0.2 = 0.500
- rank = 0.01 → score = 0.01 / 0.11 = 0.091
Computation Flow Example:
Assume user query is "machine learning", and the database has a document with content "Machine learning enables intelligent systems..."
Core Functions:
- to_tsvector(): Text → searchable vector (tokenization, stemming)
- websearch_to_tsquery(): User query → search expression (supports "quotes", OR, -exclusion)
- @@: Match check (true/false)
- ts_rank(): Calculate relevance score
3. Hybrid Search
Hybrid Search supports two fusion modes, switchable via WithHybridFusionMode().
3a. Weighted Fusion (default)
SQL Template (using subquery to avoid duplicate calculations):
Normalization Formula:
w_v = 0.7, w_t = 0.3
Mathematical Principle:
- Calculate vector_score and text_score separately (as in the two modes above)
- Use linear weighted combination, weights configurable via WithHybridSearchWeights()
- Since both scores are in [0, 1] range and w_v + w_t = 1, the final hybrid_score ∈ [0, 1]
- Important: Documents without text match are not forcibly filtered out, because vector similarity has higher weight (0.7), even without text match, high-quality results may still be returned
Examples:
3b. Reciprocal Rank Fusion (RRF)
RRF is a rank-based fusion strategy that does not rely on score normalization. Instead, it computes the final score based on each document's rank in the sub-searches. It is suitable for scenarios where vector scores and text scores have very different scales and are hard to combine via direct weighting.
Execution Flow (parallel sub-queries + Go-level fusion):
RRF mode splits vector search and text search into two independent SQL queries executed in parallel, then performs RRF score fusion in Go code:
RRF Formula:
Parameters:
| Parameter | Description | Default |
|---|---|---|
K |
RRF constant, must be > 0. Smaller values give more weight to top-ranked results | 60 |
CandidateRatio |
Candidate multiplier, must be > 0. Each sub-search fetches limit × CandidateRatio candidates |
3 |
Effect of K:
- K=1: Rank 1 scores 1/2 = 0.500, rank 10 scores 1/11 ≈ 0.091, gap ~5.5x
- K=60 (default): Rank 1 scores 1/61 ≈ 0.016, rank 10 scores 1/70 ≈ 0.014, gap ~1.15x
K=60 is the widely used default in academic literature, making fusion more "democratic" by not over-favoring results that rank very high in a single ranking list.
Notes:
- In RRF mode, WithHybridSearchWeights() has no effect; RRF scores from both ranking lists are summed directly
- In RRF mode, MinScore has no effect. RRF scores are rank-based (max ~0.033 for K=60) and are incompatible with the [0,1] similarity score semantics
- The two sub-queries execute in parallel, latency = max(vector_latency, text_latency), more efficient than a single complex CTE
4. Filter Search
SQL Template:
Description:
- Pure metadata filtering, no vector or text similarity involved
- All results have score = 1.0 (as they all satisfy the filter conditions)
- Sorted by creation time in descending order