Skip to content

Config reference: embedding

  • Enterprise tuning surface


    Defaults + constraints are rendered directly from Pydantic.

  • Env keys when available


    Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).

  • Tooltip-level guidance


    If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 18

Group index
  • (root)

(root)

JSON key Env key(s) Type Default Constraints Summary
embedding.auto_set_dimensions bool true When true, the UI auto-syncs embedding_dim from data/models.json when model changes.
embedding.contextual_chunk_embeddings Literal["off", "prepend_context", "late_chunking_local_only"] "off" allowed="off", "prepend_context", "late_chunking_local_only" Contextual chunk embedding mode. 'late_chunking_local_only' requires local/HF provider backend.
embedding.embed_text_prefix str "" Prefix added before chunk text prior to embedding (stable document context).
embedding.embed_text_suffix str "" Suffix added after chunk text prior to embedding.
embedding.embedding_backend Literal["deterministic", "provider"] "deterministic" allowed="deterministic", "provider" Embedding execution backend. 'deterministic' is offline/test-friendly; 'provider' calls real providers.
embedding.embedding_batch_size EMBEDDING_BATCH_SIZE int 64 ≥ 1, ≤ 256 Batch size for embedding generation
embedding.embedding_cache_enabled EMBEDDING_CACHE_ENABLED int 1 ≥ 0, ≤ 1 Enable embedding cache
embedding.embedding_dim EMBEDDING_DIM int 3072 ≥ 128, ≤ 4096 Embedding dimensions
embedding.embedding_max_tokens EMBEDDING_MAX_TOKENS int 8000 ≥ 512, ≤ 8192 Max tokens per embedding chunk
embedding.embedding_model EMBEDDING_MODEL str "text-embedding-3-large" OpenAI embedding model
embedding.embedding_model_local EMBEDDING_MODEL_LOCAL str "all-MiniLM-L6-v2" Local SentenceTransformer model
embedding.embedding_model_mlx EMBEDDING_MODEL_MLX str "mlx-community/all-MiniLM-L6-v2-4bit" MLX-optimized embedding model (used when embedding_type=mlx)
embedding.embedding_retry_max EMBEDDING_RETRY_MAX int 3 ≥ 1, ≤ 5 Max retries for embedding API
embedding.embedding_timeout EMBEDDING_TIMEOUT int 30 ≥ 5, ≤ 120 Embedding API timeout (seconds)
embedding.embedding_type EMBEDDING_TYPE str "openai" Embedding provider (dynamic - validated against models.json at runtime)
embedding.input_truncation Literal["error", "truncate_end", "truncate_middle"] "truncate_end" allowed="error", "truncate_end", "truncate_middle" What to do when text exceeds embedding/token limits.
embedding.late_chunking_max_doc_tokens int 8192 ≥ 256, ≤ 65536 Max tokens per document segment for local late chunking.
embedding.voyage_model VOYAGE_MODEL str "voyage-code-3" Voyage embedding model

Details (glossary)

embedding.embedding_batch_size (EMBEDDING_BATCH_SIZE) — Embedding Batch Size

Category: embedding

Number of text chunks to embed in a single API call or local batch during indexing. Higher values (50-200) speed up indexing by reducing API round trips but may hit rate limits or memory constraints. Lower values (10-30) are safer but slower. For OpenAI/Voyage APIs, batching significantly reduces total indexing time. For local models, larger batches improve GPU utilization but require more VRAM. If indexing fails with rate limit or OOM errors, reduce this value.

Recommended: 100-150 for API providers, 16-32 for local models (GPU), 4-8 for CPU-only.

Badges: - Performance tuning - Watch rate limits

Links: - OpenAI Batch Embedding - Rate Limits - GPU Memory Management

embedding.embedding_cache_enabled (EMBEDDING_CACHE_ENABLED) — Embedding Cache

Category: embedding

Cache embedding API results to disk to avoid re-computing vectors for identical text. Reduces API costs and speeds up reindexing. Disable only for debugging or when embeddings change frequently.

Links: - Caching Strategies - Embedding Best Practices

embedding.embedding_dim (EMBEDDING_DIM) — Embedding Dimension

Category: embedding

Vector dimensionality for MXBAI/local embedding models. Common sizes: 384 (fast, lower quality), 768 (balanced, recommended), 1024 (best quality, slower). Larger dimensions capture more semantic nuance but increase Qdrant storage requirements and query latency. Must match your embedding model's output size. Changing this requires full reindexing - vectors of different dimensions are incompatible.

Badges: - Requires reindex - Affects storage

Links: - Vector Embeddings - Dimensionality Tradeoffs - Qdrant Vector Config

embedding.embedding_max_tokens (EMBEDDING_MAX_TOKENS) — Embedding Max Tokens

Category: embedding

Maximum token length for text chunks sent to embedding models during indexing. Text exceeding this length is truncated by the tokenizer. Most embedding models support 512-8192 tokens. Longer limits preserve more context per chunk but increase embedding cost and processing time. Shorter limits are faster and cheaper but may lose semantic context for large functions/classes. Balance based on your average code chunk size and model capabilities.

Recommended: 512 for most code (functions/methods), 1024 for documentation-heavy repos, 256 for ultra-fast indexing.

Badges: - Affects cost - Context preservation

Links: - Tokenization Basics - OpenAI Token Limits - Voyage Limits

embedding.embedding_model (EMBEDDING_MODEL) — Embedding Model (OpenAI)

Category: embedding

OpenAI embedding model name when EMBEDDING_TYPE=openai. Current options: "text-embedding-3-small" (512-3072 dims, $0.02/1M tokens, fast), "text-embedding-3-large" (256-3072 dims, $0.13/1M tokens, highest quality), "text-embedding-ada-002" (legacy, 1536 dims, $0.10/1M tokens). Larger models improve semantic search quality but cost more and require more storage. Changing this requires full reindexing as embeddings are incompatible across models.

Recommended: text-embedding-3-small for most use cases, text-embedding-3-large for production systems demanding highest quality.

Badges: - Requires reindex - Costs API calls

Links: - OpenAI Embeddings Guide - Embedding Models - Pricing Calculator

embedding.embedding_model_local (EMBEDDING_MODEL_LOCAL) — Local Embedding Model

Category: embedding

HuggingFace model name or local path when EMBEDDING_TYPE=local or mxbai. Popular options: "mixedbread-ai/mxbai-embed-large-v1" (1024 dims, excellent quality), "BAAI/bge-small-en-v1.5" (384 dims, fast), "sentence-transformers/all-MiniLM-L6-v2" (384 dims, lightweight). Local embeddings are free but slower than API-based options. Model is downloaded on first use and cached locally. Choose larger models (768-1024 dims) for quality or smaller (384 dims) for speed.

Recommended: mxbai-embed-large-v1 for best free quality, all-MiniLM-L6-v2 for resource-constrained environments.

Badges: - Free (no API) - Requires download

Links: - Sentence Transformers Models - HuggingFace Model Hub - MTEB Leaderboard

embedding.embedding_model_mlx (EMBEDDING_MODEL_MLX) — MLX Embedding Model

Category: embedding

MLX model identifier when EMBEDDING_TYPE=mlx. Runs locally on Apple Silicon via MLX/Metal for very fast embedding inference. Default: "mlx-community/all-MiniLM-L6-v2-4bit". The model is downloaded on first use and cached locally. Changing this requires a full reindex (embeddings are not comparable across models).

Badges: - Metal GPU - Free (no API) - Requires reindex

embedding.embedding_retry_max (EMBEDDING_RETRY_MAX) — Embedding Max Retries

Category: embedding

Retry attempts for failed embedding API calls during indexing. Higher values ensure indexing completes despite transient errors but slow down failure recovery. Typical: 2-3 retries.

Links: - Error Handling - Retry Patterns

embedding.embedding_timeout (EMBEDDING_TIMEOUT) — Embedding Timeout

Category: embedding

Maximum seconds to wait for embedding API response. Similar to GEN_TIMEOUT but for embedding calls during indexing. Increase for large batches or slow networks. Typical: 30-60 seconds.

Links: - API Timeouts - Embedding API

embedding.embedding_type (EMBEDDING_TYPE) — Embedding Provider

Category: embedding

Selects the embedding provider for dense vector search. Also determines the token counter used during code chunking, which affects chunk boundaries and splitting behavior.

• openai — strong quality, paid (cl100k tokenizer) • voyage — strong retrieval, paid (voyage tokenizer) • mlx — Apple Silicon local embeddings via MLX/Metal (fast) • mxbai — OSS via SentenceTransformers • local — any HuggingFace SentenceTransformer model • gemini — Google Gemini embeddings

Note: Changing this setting affects both retrieval quality AND how code is split into chunks during indexing. A reindex is required after changing.

Badges: - Requires reindex - Affects chunking

Links: - OpenAI Embeddings - Voyage AI Embeddings - Google Gemini Embeddings - SentenceTransformers Docs

embedding.voyage_model (VOYAGE_MODEL) — Voyage Embedding Model

Category: generation

Voyage AI embedding model when EMBEDDING_TYPE=voyage. Options: "voyage-code-2" (1536 dims, optimized for code, recommended), "voyage-3" (1024 dims, general-purpose, fast), "voyage-3-lite" (512 dims, budget option). Voyage models are specialized for code retrieval and often outperform OpenAI on technical queries. Code-specific models understand programming constructs, API patterns, and documentation better than general embeddings.

Recommended: voyage-code-2 for code-heavy repos, voyage-3 for mixed content (code + docs).

Badges: - Requires reindex - Code-optimized

Links: - Voyage Embeddings API - voyage-code-2 Details - Model Comparison