Config reference: `evaluation`

Enterprise tuning surface

Defaults + constraints are rendered directly from Pydantic.
Env keys when available

Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).
Tooltip-level guidance

If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 8

Group index

(root)

`(root)`

JSON key	Env key(s)	Type	Default	Constraints	Summary
`evaluation.baseline_path`	`BASELINE_PATH`	`str`	`"data/evals/eval_baseline.json"`	—	Baseline results path
`evaluation.eval_dataset_path`	`EVAL_DATASET_PATH`	`str`	`"data/evaluation_dataset.json"`	—	Evaluation dataset path
`evaluation.eval_multi_m`	`EVAL_MULTI_M`	`int`	`10`	≥ 1, ≤ 20	Multi-query variants for evaluation
`evaluation.ndcg_at_10_k`	—	`int`	`10`	≥ 1, ≤ 200	K used for ndcg_at_10 metric (default 10).
`evaluation.precision_at_5_k`	—	`int`	`5`	≥ 1, ≤ 200	K used for precision_at_5 metric (default 5).
`evaluation.recall_at_10_k`	—	`int`	`10`	≥ 1, ≤ 200	K used for recall_at_10 metric (default 10).
`evaluation.recall_at_20_k`	—	`int`	`20`	≥ 1, ≤ 200	K used for recall_at_20 metric (default 20).
`evaluation.recall_at_5_k`	—	`int`	`5`	≥ 1, ≤ 200	K used for recall_at_5 metric (default 5).

Details (glossary)

evaluation.baseline_path (BASELINE_PATH) — Baseline Path

Category: general

BASELINE_PATH is where evaluation baselines are stored so retrieval and generation changes can be compared to a stable reference over time. A strong baseline captures both quality metrics and operational behavior, including ranking quality, grounding rate, latency, and abstention behavior. Store immutable run identifiers with dataset version and config hash so regressions can be traced to exact parameter changes. Without baseline discipline, tuning often produces short-term wins on narrow queries while silently degrading difficult slices that matter in production.

Badges: - Evaluation

Links: - GaRAGe: Grounded RAG Evaluation Benchmark (arXiv) - LangSmith Evaluation - MLflow Tracking - Weights and Biases Experiment Tracking

Config reference: evaluation

(root)

Details (glossary)

Config reference: `evaluation`

`(root)`