Skip to content

Config reference: evaluation

  • Enterprise tuning surface


    Defaults + constraints are rendered directly from Pydantic.

  • Env keys when available


    Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).

  • Tooltip-level guidance


    If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 8

Group index
  • (root)

(root)

JSON key Env key(s) Type Default Constraints Summary
evaluation.baseline_path BASELINE_PATH str "data/evals/eval_baseline.json" Baseline results path
evaluation.eval_dataset_path EVAL_DATASET_PATH str "data/evaluation_dataset.json" Evaluation dataset path
evaluation.eval_multi_m EVAL_MULTI_M int 10 ≥ 1, ≤ 20 Multi-query variants for evaluation
evaluation.ndcg_at_10_k int 10 ≥ 1, ≤ 200 K used for ndcg_at_10 metric (default 10).
evaluation.precision_at_5_k int 5 ≥ 1, ≤ 200 K used for precision_at_5 metric (default 5).
evaluation.recall_at_10_k int 10 ≥ 1, ≤ 200 K used for recall_at_10 metric (default 10).
evaluation.recall_at_20_k int 20 ≥ 1, ≤ 200 K used for recall_at_20 metric (default 20).
evaluation.recall_at_5_k int 5 ≥ 1, ≤ 200 K used for recall_at_5 metric (default 5).

Details (glossary)

evaluation.baseline_path (BASELINE_PATH) — Baseline Path

Category: general

BASELINE_PATH is where evaluation baselines are stored so retrieval and generation changes can be compared to a stable reference over time. A strong baseline captures both quality metrics and operational behavior, including ranking quality, grounding rate, latency, and abstention behavior. Store immutable run identifiers with dataset version and config hash so regressions can be traced to exact parameter changes. Without baseline discipline, tuning often produces short-term wins on narrow queries while silently degrading difficult slices that matter in production.

Badges: - Evaluation

Links: - GaRAGe: Grounded RAG Evaluation Benchmark (arXiv) - LangSmith Evaluation - MLflow Tracking - Weights and Biases Experiment Tracking