Skip to content

Config reference: reranking

  • Enterprise tuning surface


    Defaults + constraints are rendered directly from Pydantic.

  • Env keys when available


    Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).

  • Tooltip-level guidance


    If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 12

Group index
  • (root)

(root)

JSON key Env key(s) Type Default Constraints Summary
reranking.rerank_input_snippet_chars RERANK_INPUT_SNIPPET_CHARS int 700 ≥ 200, ≤ 2000 Snippet chars for reranking input
reranking.reranker_cloud_model RERANKER_CLOUD_MODEL str "rerank-v3.5" Cloud reranker model name when mode=cloud (Cohere: rerank-v3.5)
reranking.reranker_cloud_provider RERANKER_CLOUD_PROVIDER str "cohere" Cloud reranker provider when mode=cloud (cohere, voyage, jina)
reranking.reranker_cloud_top_n RERANKER_CLOUD_TOP_N int 50 ≥ 1, ≤ 200 Number of candidates to rerank (cloud mode)
reranking.reranker_mode RERANKER_MODE str "none" pattern=^(cloud|local|learning|none)$ Reranker mode: 'cloud' (Cohere/Voyage/Jina API), 'learning' (MLX Qwen3 LoRA learning reranker), 'none' (disabled). Legacy values 'local'/'hf' normalize to 'learning'.
reranking.reranker_timeout RERANKER_TIMEOUT int 10 ≥ 5, ≤ 60 Reranker API timeout (seconds)
reranking.tribrid_reranker_alpha TRIBRID_RERANKER_ALPHA float 0.7 ≥ 0.0, ≤ 1.0 Blend weight for reranker scores
reranking.tribrid_reranker_batch TRIBRID_RERANKER_BATCH int 16 ≥ 1, ≤ 128 Reranker batch size
reranking.tribrid_reranker_maxlen TRIBRID_RERANKER_MAXLEN int 512 ≥ 128, ≤ 2048 Max token length for reranker
reranking.tribrid_reranker_reload_on_change TRIBRID_RERANKER_RELOAD_ON_CHANGE int 0 ≥ 0, ≤ 1 Hot-reload on model change
reranking.tribrid_reranker_reload_period_sec TRIBRID_RERANKER_RELOAD_PERIOD_SEC int 60 ≥ 10, ≤ 600 Reload check period (seconds)
reranking.tribrid_reranker_topn TRIBRID_RERANKER_TOPN int 50 ≥ 10, ≤ 200 Number of candidates to rerank (learning mode)

Details (glossary)

reranking.rerank_input_snippet_chars (RERANK_INPUT_SNIPPET_CHARS) — Rerank Snippet Length

Category: reranking

RERANK_INPUT_SNIPPET_CHARS caps how many characters from each retrieved chunk are forwarded into reranker scoring. In implementation terms, this is a throughput and quality guardrail: smaller snippets reduce request size and latency, but risk truncating decisive evidence; larger snippets preserve context at the cost of higher tokenization load, longer inference, and potentially provider-side input-limit errors. The right value should be based on corpus structure and query style, then validated with offline ranking metrics plus p95 latency and cost tracking so you can find the smallest snippet size that preserves relevance quality.

Badges: - Affects latency/cost - Context guardrail

Links: - BAR-RAG: Boundary-Aware Adaptive Retrieval for Better Reranking (arXiv) - Cohere Rerank Overview - Voyage AI Reranker Docs - Hugging Face Padding and Truncation

reranking.reranker_cloud_model (RERANKER_CLOUD_MODEL) — Cloud Model

Category: reranking

Specifies the provider model ID used for cloud reranking, such as a Cohere, Voyage, or Jina reranker family variant. This parameter directly controls tradeoffs between multilingual support, context length handling, pricing, and latency. Model IDs are provider-scoped, so the same string is not portable across providers; keep explicit provider-model pairing in configuration and tests. When changing models, re-baseline ranking metrics and failure behavior because score distributions and calibration can shift materially even when APIs look identical.

Badges: - Provider-scoped

Links: - InsertRank: Bias Mitigation in Rerankers (arXiv 2025) - Cohere Models Documentation - Voyage AI Reranker Docs - Jina Reranker v2 Model Card

reranking.reranker_cloud_provider (RERANKER_CLOUD_PROVIDER) — Cloud Rerank Provider

Category: reranking

Determines which external vendor handles reranking when cloud mode is enabled. Provider choice affects auth, rate limits, billing units, token limits, and model availability, so swapping providers is a behavior change, not just a credential change. Keep provider-specific defaults explicit (timeouts, top-N caps, retry policy) and validate with provider-specific regression queries. For production stability, monitor provider error classes separately so fallback rules can distinguish auth/config issues from transient throttling.

Badges: - Requires API key

Links: - HyperRAG: Hybrid Retrieval-Augmented Generation (arXiv 2025) - Cohere Reranking Guide - Voyage AI Reranker Docs - Jina Rerank Models via Elastic Open Inference API

reranking.reranker_cloud_top_n (RERANKER_CLOUD_TOP_N) — Cloud Reranker Top-N

Category: reranking

Limits how many first-pass candidates are sent into the cloud reranker. This is the main quality-cost control for API reranking: higher Top-N usually improves final precision/recall at the expense of latency and request cost. Tune it jointly with first-stage retrieval depth; a small Top-N can hide relevant documents before reranking ever sees them, while an oversized Top-N can waste budget on obvious non-matches. Start from an empirically measured knee point (quality gain flattening vs latency growth) rather than a fixed default.

Badges: - Cloud API costs - Rate limits apply

Links: - RankFlow: Reranking Pipeline Optimization (arXiv 2025) - Cohere Rerank API (top_n parameter) - OpenSearch Rerank Processor - LangChain Contextual Compression Retriever

reranking.reranker_mode (RERANKER_MODE) — Reranker Mode

Category: reranking

Global switch for reranking behavior, typically none, learning, or cloud. Use none for lowest latency baselines, learning for locally trainable behavior, and cloud for managed cross-encoder quality with external dependencies. Because this mode changes the scoring path after retrieval, it can change user-visible answers even when retrieval is identical. Lock this setting per environment and benchmark each mode against shared evaluation sets before promoting to production.

Badges: - Controls reranking behavior

Links: - MICE: Retrieval + Reranking Improvements (arXiv 2026) - Cohere Reranking Guide - Jina MLX Retrieval (Local Reranker Training/Serving) - OpenSearch Rerank Processor

reranking.reranker_timeout (RERANKER_TIMEOUT) — Reranker Timeout

Category: reranking

Maximum wait time for cloud reranker requests before failing fast. This parameter protects end-to-end request latency and prevents queue pileups during provider slowdowns, but setting it too low can create false negatives under transient network variance. Tune timeout with retry policy and user-facing SLA in mind; timeout alone is not enough without fallback strategy (for example, use first-stage ranking when reranker times out). Track timeout rate by provider/model so you can distinguish systemic misconfiguration from temporary upstream degradation.

Badges: - Reliability

Links: - MICE: Retrieval Pipeline Robustness (arXiv 2026) - HTTPX Timeouts Guide - Cohere Rerank API Reference - Voyage AI Reranker Docs

reranking.tribrid_reranker_alpha (TRIBRID_RERANKER_ALPHA) — Reranker Blend Alpha

Category: general

Interpolation weight used when combining the reranker score with upstream hybrid retrieval score. In practical terms, this is the control for how much the final ranking trusts pairwise relevance modeling versus the broader BM25+dense candidate order. Raising alpha usually improves precision for well-formed queries, but if it is set too high the system can overfit to reranker biases and underweight lexical exact-match evidence. Tune it with fixed query sets and report both quality metrics (nDCG, MRR, grounded answer rate) and latency to avoid hidden regressions.

Badges: - Affects ranking

Links: - Rethinking the Reranker: Boundary-Aware Evidence Selection (arXiv 2026) - AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking (arXiv 2025) - Elasticsearch Reciprocal Rank Fusion (RRF) - SentenceTransformers Cross-Encoder Reranker Training

reranking.tribrid_reranker_batch (TRIBRID_RERANKER_BATCH) — Reranker Batch Size (Inference)

Category: general

Inference micro-batch size for reranker scoring over candidate documents. Larger batches can increase throughput and reduce per-item overhead on GPU, but memory pressure grows quickly with longer inputs and higher top-N. If this value is too aggressive you will see OOMs, allocator fragmentation, or latency spikes from retries and paging. Production tuning should sweep batch size jointly with max sequence length and candidate count, because these three parameters multiply into total token compute.

Badges: - Tune for memory

Links: - AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking (arXiv 2025) - PyTorch DataLoader Reference - SentenceTransformers Cross-Encoder Applications - Hugging Face Transformers Padding and Truncation

reranking.tribrid_reranker_maxlen (TRIBRID_RERANKER_MAXLEN) — Reranker Max Sequence Length (Inference)

Category: general

Maximum token budget for each query-document pair at rerank time. This parameter directly controls truncation behavior: small values improve speed and memory, while large values preserve long-context evidence at higher cost. For code and technical retrieval, quality gains usually plateau after a certain length unless queries depend on long-range context. Evaluate max length using long-tail queries, because overly short truncation tends to hide failures that only appear on long files and verbose documentation.

Badges: - Performance sensitive

Links: - Query-focused and Memory-aware Reranker for Long Context Processing (arXiv 2026) - DeAR: Dual-Stage Document Reranking with Reasoning Agents (arXiv 2025) - Hugging Face Transformers Padding and Truncation - SentenceTransformers Cross-Encoder Applications

reranking.tribrid_reranker_reload_on_change (TRIBRID_RERANKER_RELOAD_ON_CHANGE) — Reranker Auto-Reload

Category: general

Toggles hot-reload behavior when the reranker model path changes at runtime. In development this shortens iteration loops because newly trained adapters can be activated without restarting the service. In production, uncontrolled auto-reload can introduce jitter, temporary cache invalidation, and model consistency issues across replicas. If enabled, pair it with health checks and staged rollout logic so reload events do not degrade retrieval latency or answer stability.

Badges: - Development feature - Disable in production

Links: - AcuRank: Uncertainty-Aware Adaptive Computation for Listwise Reranking (arXiv 2025) - watchfiles Documentation (file change monitoring) - PEFT Checkpoint Format and Loading - Transformers from_pretrained() Model Loading

reranking.tribrid_reranker_topn (TRIBRID_RERANKER_TOPN) — Reranker Top-N

Category: general

Upper bound on how many retrieved candidates are passed into the reranker stage. Higher Top-N usually improves recall headroom because more borderline candidates are reconsidered, but reranker cost grows roughly linearly with this value. If set too low, relevant documents never reach reranking; if set too high, latency and GPU utilization can explode for little quality gain. Choose Top-N by plotting quality-latency curves and selecting the smallest value that keeps recall stable on hard queries.

Badges: - Advanced RAG tuning - Affects latency

Links: - Rethinking the Reranker: Boundary-Aware Evidence Selection (arXiv 2026) - DeAR: Dual-Stage Document Reranking with Reasoning Agents (arXiv 2025) - Qdrant Reranking and Hybrid Search - SentenceTransformers Cross-Encoder Reranker Training