Skip to content

Config reference: reranking

  • Enterprise tuning surface


    Defaults + constraints are rendered directly from Pydantic.

  • Env keys when available


    Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).

  • Tooltip-level guidance


    If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 12

Group index
  • (root)

(root)

JSON key Env key(s) Type Default Constraints Summary
reranking.rerank_input_snippet_chars RERANK_INPUT_SNIPPET_CHARS int 700 ≥ 200, ≤ 2000 Snippet chars for reranking input
reranking.reranker_cloud_model RERANKER_CLOUD_MODEL str "rerank-v3.5" Cloud reranker model name when mode=cloud (Cohere: rerank-v3.5)
reranking.reranker_cloud_provider RERANKER_CLOUD_PROVIDER str "cohere" Cloud reranker provider when mode=cloud (cohere, voyage, jina)
reranking.reranker_cloud_top_n RERANKER_CLOUD_TOP_N int 50 ≥ 1, ≤ 200 Number of candidates to rerank (cloud mode)
reranking.reranker_mode RERANKER_MODE str "none" pattern=^(cloud|local|learning|none)$ Reranker mode: 'cloud' (Cohere/Voyage/Jina API), 'learning' (MLX Qwen3 LoRA learning reranker), 'none' (disabled). Legacy values 'local'/'hf' normalize to 'learning'.
reranking.reranker_timeout RERANKER_TIMEOUT int 10 ≥ 5, ≤ 60 Reranker API timeout (seconds)
reranking.tribrid_reranker_alpha TRIBRID_RERANKER_ALPHA float 0.7 ≥ 0.0, ≤ 1.0 Blend weight for reranker scores
reranking.tribrid_reranker_batch TRIBRID_RERANKER_BATCH int 16 ≥ 1, ≤ 128 Reranker batch size
reranking.tribrid_reranker_maxlen TRIBRID_RERANKER_MAXLEN int 512 ≥ 128, ≤ 2048 Max token length for reranker
reranking.tribrid_reranker_reload_on_change TRIBRID_RERANKER_RELOAD_ON_CHANGE int 0 ≥ 0, ≤ 1 Hot-reload on model change
reranking.tribrid_reranker_reload_period_sec TRIBRID_RERANKER_RELOAD_PERIOD_SEC int 60 ≥ 10, ≤ 600 Reload check period (seconds)
reranking.tribrid_reranker_topn TRIBRID_RERANKER_TOPN int 50 ≥ 10, ≤ 200 Number of candidates to rerank (learning mode)

Details (glossary)

reranking.rerank_input_snippet_chars (RERANK_INPUT_SNIPPET_CHARS) — Rerank Snippet Length

Category: reranking

Maximum characters from each candidate chunk sent to the reranker. Keeps payloads within provider limits and focuses scoring on the most relevant prefix. Typical range: 400-1200 chars. Use 400-600 when providers reject long inputs or latency is critical; 800-1200 when answers depend on longer doc/context blocks. If set too low, quality drops from missing context; too high increases latency and rerank cost per request.

Badges: - Affects latency/cost - Context guardrail

Links: - Voyage reranker token limits - Cohere rerank context length

reranking.reranker_cloud_model (RERANKER_CLOUD_MODEL) — Cloud Model

Category: reranking

Provider-scoped rerank model id from models.json. Examples: rerank-3.5 (cohere), rerank-2 (voyage), or any custom id you add. Model list comes from models.json; add entries there to surface more options in this picker.

Badges: - Provider-scoped

reranking.reranker_cloud_provider (RERANKER_CLOUD_PROVIDER) — Cloud Rerank Provider

Category: reranking

When RERANKER_MODE=cloud, specifies which API provider to use for reranking. Options: cohere, voyage, jina. Each provider has different pricing and model options—see models.json for available models. Requires the corresponding API key (COHERE_API_KEY, VOYAGE_API_KEY, etc.).

Badges: - Requires API key

Links: - Cohere Rerank - Voyage Rerank - Jina Rerank

reranking.reranker_cloud_top_n (RERANKER_CLOUD_TOP_N) — Cloud Reranker Top-N

Category: reranking

Maximum number of candidates to send to cloud reranking APIs (Cohere, Voyage, Jina). Cloud rerankers have rate limits and per-request pricing, so this setting is separate from the learning reranker top-N. Lower values reduce API costs and stay within rate limits. Higher values improve recall but increase costs per query.

• Typical range: 20-100 candidates • Cost-conscious: 20-30 for budget limits • Balanced default: 50 for most workloads • High recall: 80-100 for exploratory queries • Note: Cloud reranking is billed per candidate, so monitor costs

Badges: - Cloud API costs - Rate limits apply

Links: - Cohere Rerank API - Voyage Rerank

reranking.reranker_mode (RERANKER_MODE) — Reranker Mode

Category: reranking

Controls which reranking approach is used.

• none: Disabled—BM25 + vector fusion only (no reranker scoring). • learning: Trainable learning reranker (MLX Qwen3 LoRA). • cloud: External API reranking (Cohere, Voyage, Jina).

Legacy values: • local / hf: normalized to "learning" for backward compatibility.

Recommended: Start with "learning" if you want the system to adapt over time; use "cloud" for managed quality if you have an API budget.

Badges: - Controls reranking behavior

reranking.reranker_timeout (RERANKER_TIMEOUT) — Reranker Timeout

Category: reranking

Timeout (seconds) for cloud reranker HTTP calls. Larger timeouts reduce false failures on slow providers; smaller timeouts fail fast when endpoints are slow or unreachable. Applies only to cloud backends.

Badges: - Reliability

reranking.tribrid_reranker_alpha (TRIBRID_RERANKER_ALPHA) — Reranker Blend Alpha

Category: general

Weight of the learning reranker score during final fusion. Higher alpha prioritizes pairwise reranker scoring; lower alpha relies more on initial hybrid retrieval (BM25 + dense). Typical range 0.6–0.8. Increasing alpha can improve ordering for nuanced queries but may surface false positives if your reranker is undertrained.

Badges: - Affects ranking

Links: - Reciprocal Rank Fusion (RRF) - Hybrid Retrieval Concepts

reranking.tribrid_reranker_batch (TRIBRID_RERANKER_BATCH) — Reranker Batch Size (Inference)

Category: general

Batch size used when scoring candidates during rerank. Higher values reduce latency but increase memory. If you see OOM or throttling, lower this value.

Badges: - Tune for memory

Links: - Batching Techniques - Latency vs Throughput

reranking.tribrid_reranker_maxlen (TRIBRID_RERANKER_MAXLEN) — Reranker Max Sequence Length (Inference)

Category: general

Maximum token length for each (query, text) pair during live reranking. Larger values increase memory/cost and may not improve quality beyond ~256–384 tokens for code. Use higher values for long comments/docs; lower for tight compute budgets.

Badges: - Performance sensitive

Links: - Transformers Tokenization - Sequence Length vs Memory

reranking.tribrid_reranker_reload_on_change (TRIBRID_RERANKER_RELOAD_ON_CHANGE) — Reranker Auto-Reload

Category: general

Automatically reload the learning reranker artifact when training.tribrid_reranker_model_path changes during runtime (1=yes, 0=no). When enabled, the system detects adapter directory changes and hot-reloads the new weights without a server restart. Useful during development and in Training Studio workflows when promoting or swapping artifacts.

Recommended: 1 for development/testing, 0 for production deployments.

Badges: - Development feature - Disable in production

Links: - Hot Reload Patterns

reranking.tribrid_reranker_topn (TRIBRID_RERANKER_TOPN) — Reranker Top-N

Category: general

Maximum number of candidates to pass through the reranker stage during retrieval. After hybrid fusion (BM25 + dense), the top-N candidates are reranked using pairwise scoring before final selection. Higher values (50-100) can improve quality by considering more candidates but increase reranking latency and compute cost. Lower values (20-30) are faster but may miss items that scored poorly in initial retrieval but would rank highly after reranking.

Sweet spot: 40-60 for most use cases. Use 60-80 for complex queries where initial ranking may be noisy. Use 20-40 for tight latency budgets or when initial hybrid retrieval is already high-quality.

Note: MLX Qwen3 reranking can be significantly slower per candidate than smaller rerankers. If you care about interactive latency, start with 20–30 and measure.

• Typical range: 20-80 candidates • Balanced default: 40-50 for most workloads • High recall: 60-80 for exploratory queries • Low latency: 20-30 for speed-critical apps

Badges: - Advanced RAG tuning - Affects latency

Links: - Reranking in RAG - Hybrid Search + Rerank