Skip to content

Config reference: tracing

  • Enterprise tuning surface


    Defaults + constraints are rendered directly from Pydantic.

  • Env keys when available


    Many fields have an env-style alias (from TriBridConfig.to_flat_dict()).

  • Tooltip-level guidance


    If a matching glossary entry exists, you’ll see deeper tuning notes.

Config reference Config API & workflow Glossary

Total parameters: 17

Group index
  • (root)

(root)

JSON key Env key(s) Type Default Constraints Summary
tracing.alert_include_resolved ALERT_INCLUDE_RESOLVED int 1 ≥ 0, ≤ 1 Include resolved alerts
tracing.alert_notify_severities ALERT_NOTIFY_SEVERITIES str "critical,warning" Alert severities to notify
tracing.alert_webhook_timeout ALERT_WEBHOOK_TIMEOUT int 5 ≥ 1, ≤ 30 Alert webhook timeout (seconds)
tracing.langchain_endpoint LANGCHAIN_ENDPOINT str "https://api.smith.langchain.com" LangChain/LangSmith API endpoint
tracing.langchain_project LANGCHAIN_PROJECT str "tribrid" LangChain project name
tracing.langchain_tracing_v2 LANGCHAIN_TRACING_V2 int 0 ≥ 0, ≤ 1 Enable LangChain v2 tracing
tracing.langtrace_api_host LANGTRACE_API_HOST str "" LangTrace API host
tracing.langtrace_project_id LANGTRACE_PROJECT_ID str "" LangTrace project ID
tracing.log_level LOG_LEVEL str "INFO" pattern=^(DEBUG|INFO|WARNING|ERROR)$ Logging level
tracing.metrics_enabled METRICS_ENABLED int 1 ≥ 0, ≤ 1 Enable metrics collection
tracing.prometheus_port PROMETHEUS_PORT int 9090 ≥ 1024, ≤ 65535 Prometheus metrics port
tracing.trace_auto_ls TRACE_AUTO_LS int 1 ≥ 0, ≤ 1 Auto-enable LangSmith tracing
tracing.trace_retention TRACE_RETENTION int 50 ≥ 10, ≤ 500 Number of traces to retain
tracing.trace_sampling_rate TRACE_SAMPLING_RATE float 1.0 ≥ 0.0, ≤ 1.0 Trace sampling rate (0.0-1.0)
tracing.tracing_enabled TRACING_ENABLED int 1 ≥ 0, ≤ 1 Enable distributed tracing
tracing.tracing_mode TRACING_MODE str "langsmith" pattern=^(langsmith|local|none|off)$ Tracing backend mode
tracing.tribrid_log_path TRIBRID_LOG_PATH str "data/logs/queries.jsonl" Query log file path

Details (glossary)

tracing.alert_include_resolved (ALERT_INCLUDE_RESOLVED) — Alert Include Resolved

Category: general

ALERT_INCLUDE_RESOLVED controls whether the alert pipeline emits a second notification when an incident transitions from firing to resolved. In this stack, keeping it enabled (1, default) gives on-call responders explicit closure signals, which helps reconcile incident timelines and downstream ticket automation. Disabling it (0) reduces message volume but removes recovery-state visibility, so unresolved-looking alerts can persist in chat channels or incident tools even after the condition clears. Use 1 when you rely on auditability and MTTR measurement, and only disable it if notification fatigue is materially harming response quality.

Links: - Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures (arXiv) - Prometheus Alertmanager webhook_config (send_resolved) - PagerDuty Events API v2 Overview - OpenTelemetry Log Data Model: Severity Fields

tracing.alert_notify_severities (ALERT_NOTIFY_SEVERITIES) — Alert Notify Severities

Category: general

ALERT_NOTIFY_SEVERITIES is the final severity allowlist applied before outbound notification fan-out, using a comma-separated vocabulary such as critical,warning. The configured values must match the exact severity labels emitted upstream, otherwise valid alerts can be silently filtered out at dispatch time. With the default critical,warning, the system typically captures high-urgency incidents while limiting low-signal noise; adding info expands coverage but increases paging and webhook traffic. Treat this setting as an operations policy control: tune it against real incident outcomes, not just raw alert counts.

Links: - Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures (arXiv) - Prometheus Alertmanager webhook_config (send_resolved) - PagerDuty Events API v2 Overview - OpenTelemetry Log Data Model: Severity Fields

tracing.alert_webhook_timeout (ALERT_WEBHOOK_TIMEOUT) — Alert Webhook Timeout

Category: general

ALERT_WEBHOOK_TIMEOUT defines how long the system waits for an outbound alert webhook before treating delivery as failed. In RAG operations this prevents indexing, tracing, or incident pipelines from stalling when third-party endpoints degrade. Set it from real latency percentiles: high enough for normal network jitter, low enough to preserve queue health and fast failure detection during outages. This value works best with idempotent payloads, retry backoff, and dead-letter handling so timeouts become controlled recovery signals instead of duplicate alert storms.

Badges: - Reliability

Links: - LA-IMR: Latency-Aware Tail-Latency Control (arXiv) - GitHub Webhook Best Practices - Stripe Webhooks - MDN AbortSignal.timeout

tracing.langchain_endpoint (LANGCHAIN_ENDPOINT) — LangChain Endpoint

Category: general

Specifies the base URL where LangSmith trace payloads are sent. This is critical in enterprise setups that route telemetry through regional gateways, private networks, or controlled egress proxies. Endpoint and key must match the same deployment; otherwise you can see authentication failures, timeouts, or fragmented projects across environments. Validate endpoint reachability with health checks before enabling full tracing volume in production. Keeping endpoint configuration explicit per environment reduces surprise during incident response and migration.

Badges: - Telemetry routing

Links: - AgentSight: A Monitoring and Risk Mitigation Framework for LLM-based Agents - LangSmith Environment Variables - Trace with LangChain - OpenTelemetry SDK Environment Variables

tracing.langchain_project (LANGCHAIN_PROJECT) — LangChain Project

Category: general

Defines the project namespace under which traces are grouped in LangSmith dashboards and analytics. For RAG systems, stable project naming is essential for comparing retrieval quality, latency, and failure patterns across environments like dev, staging, and prod. Frequent renaming fragments trend history and makes incident forensics harder because related runs are no longer co-located. Use a predictable naming convention that encodes environment and service boundary without excessive granularity. Good project hygiene turns traces into operational evidence rather than isolated run logs.

Badges: - Namespace hygiene

Links: - RAGVUE: RAG Validation and Unified Evaluation - LangSmith Observability - LangSmith Evaluation Concepts - LangSmith Documentation

tracing.langchain_tracing_v2 (LANGCHAIN_TRACING_V2) — LangChain Tracing

Category: general

Enables the modern LangSmith tracing path that captures structured run trees for model calls, tools, retrievers, and chain steps. In RAG pipelines this visibility is essential for understanding where latency and quality degrade, especially when retrieval and generation are orchestrated through multiple components. Turning tracing on without controls can add overhead, so production setups often apply sampling and metadata policies. Before exporting traces externally, ensure prompt and document payload redaction aligns with data governance requirements. Treat this switch as operational instrumentation, not just a debugging toggle.

Badges: - Tracing pipeline

Links: - AgentSight: A Monitoring and Risk Mitigation Framework for LLM-based Agents - Trace with LangChain - LangSmith Observability - OpenTelemetry Traces

tracing.langtrace_api_host (LANGTRACE_API_HOST) — LangTrace API Host

Category: infrastructure

Base endpoint for Langtrace ingestion and control APIs. This setting determines where traces from retrieval, reranking, and generation are delivered, so it effectively controls data residency, network path, and tenant routing for observability. Use an explicit host per environment and verify protocol, TLS, and region alignment before enabling high-volume tracing, especially when moving between cloud and self-hosted collectors. Host/key/project mismatches are a common cause of silent trace loss, so validate this alongside credentials during rollout.

Badges: - Trace routing

Links: - AgentSight: AI Agent Observability (arXiv) - Langtrace Documentation - Langtrace OTEL Configuration - OpenTelemetry Traces

tracing.langtrace_project_id (LANGTRACE_PROJECT_ID) — LangTrace Project ID

Category: general

Logical project namespace used by Langtrace to partition telemetry from different applications or environments. In practice, it is the boundary that keeps retrieval experiments, reranker tuning runs, and production traffic from mixing in the same dashboard and skewing metrics. Assign stable project IDs per environment and per major product surface so trace analytics remain comparable over time and access controls stay clean. If this is mis-set, traces may appear to vanish when they are actually being written to a different project bucket.

Badges: - Project scoping

Links: - AgentSight: AI Agent Observability (arXiv) - Langtrace Documentation - Langtrace Integrations Overview - OpenTelemetry Traces

tracing.log_level (LOG_LEVEL) — Log Level

Category: general

Controls runtime verbosity for diagnostics, operational visibility, and incident response. DEBUG is best for short-lived debugging sessions where per-step details matter; INFO is the stable default for normal operation; WARNING and ERROR reduce noise when you only need actionable signals. Excessive debug logging can materially impact latency and storage cost, and can also increase risk of sensitive payload exposure if message templates are not scrubbed. Production-safe practice is to run at INFO/WARNING and temporarily raise verbosity during scoped investigations.

Links: - LLM-SrcLog: Source-Aware Log Analysis with LLMs (arXiv 2025) - Python Logging Levels Reference - OpenTelemetry Logs Data Model - RFC 5424 Syslog Severity and Structured Logging

tracing.metrics_enabled (METRICS_ENABLED) — Metrics Enabled

Category: evaluation

Master toggle for emitting runtime metrics from the application. When enabled, the process publishes counters, gauges, and histograms used for dashboards, alerting, and SLO tracking; when disabled, you lose quantitative visibility into throughput, error rates, latency distributions, and retrieval quality trends. Enable this in any shared or production-like environment, then gate high-cardinality labels to control cost. The goal is not just observability but fast diagnosis: metrics should let you correlate parameter changes (retrieval thresholds, rewrites, model routing) with concrete performance and reliability shifts.

Links: - Agentic Observability: Automated Alert Triage (arXiv 2026) - Prometheus Instrumentation Best Practices - OpenTelemetry Metrics API Spec - Grafana Alerting Documentation

tracing.prometheus_port (PROMETHEUS_PORT) — Prometheus Port

Category: infrastructure

Port used to expose the metrics endpoint that Prometheus scrapes (typically /metrics). If this value is wrong, observability breaks quietly: the application can be healthy while dashboards, alerts, and SLO calculations go blind. Configure it together with scrape jobs, network policy, and service discovery labels so monitoring remains consistent across environments. In production, validate this by checking target health in Prometheus and ensuring metric cardinality and scrape intervals match system load.

Links: - PromAssistant: Prompting for Time-Series Monitoring with PromQL (arXiv 2025) - Prometheus Configuration Reference - Prometheus Exposition Formats - Prometheus Querying Basics

tracing.trace_auto_ls (TRACE_AUTO_LS) — Auto-open LangSmith

Category: general

TRACE_AUTO_LS controls whether the UI should automatically open a LangSmith run view after request completion. It does not change retrieval quality directly, but it changes debugging speed by reducing the friction between an anomalous response and its trace evidence. Keep it enabled in active tuning sessions where fast trace inspection matters, and disable it in high-throughput workflows where constant context switching is distracting. If this flag is enabled while external tracing is disabled, the expected behavior should degrade gracefully to local trace views rather than broken deep-links.

Links: - AgentTrace: Comprehensive Tracing for AI Agents (arXiv 2026) - LangSmith Observability Quickstart - LangSmith Environment Variables - LangSmith Trace with OpenTelemetry

tracing.trace_retention (TRACE_RETENTION) — Trace Retention

Category: general

TRACE_RETENTION defines how long trace records are kept before pruning. Retention is a tradeoff between forensic depth and operational cost: longer windows improve post-incident analysis and regression investigations, while shorter windows limit storage growth and reduce compliance surface area. Set this value based on your incident review cadence and model rollout cycle, then validate that pruning does not remove traces needed for reproducibility. In production, align retention with data-governance policy and downstream index lifecycle settings so trace deletion is predictable and auditable.

Links: - GraphTracer: Tracing Dynamic Dataflow in Agentic AI Systems (arXiv 2025) - Elasticsearch Index Lifecycle Management (ILM) - OpenSearch Index State Management (ISM) - LangSmith Data Purging and Compliance

tracing.trace_sampling_rate (TRACE_SAMPLING_RATE) — Trace Sampling Rate

Category: general

TRACE_SAMPLING_RATE sets the fraction of requests that emit full traces. Higher sampling improves visibility into rare routing failures and latency spikes, but increases telemetry volume, cost, and operator noise. Lower sampling is cheaper but can miss edge cases unless paired with rule-based overrides for errors, timeouts, or high-value tenants. A robust strategy is adaptive sampling: keep a low baseline for normal traffic and automatically raise sampling around deployments, incidents, or anomalous metrics.

Badges: - Cost control - Observability

Links: - AgentTrace: Comprehensive Tracing for AI Agents (arXiv 2026) - OpenTelemetry Trace SDK (samplers and processors) - OpenTelemetry Trace API - LangSmith Trace with OpenTelemetry

tracing.tracing_enabled (TRACING_ENABLED) — Tracing Enabled

Category: general

TRACING_ENABLED is the master switch for request-level trace capture in the retrieval and generation pipeline. When enabled, each request can emit structured events that explain routing decisions, retrieval candidates, rerank outcomes, and timing breakdowns. This setting is foundational for debugging because it turns opaque failures into inspectable execution paths. In production, keep it enabled with controlled sampling so you retain diagnostic coverage without overwhelming observability storage.

Links: - AgentTrace: Comprehensive Tracing for AI Agents (arXiv 2026) - OpenTelemetry Trace API - OpenTelemetry Trace SDK - LangSmith Observability Concepts

tracing.tracing_mode (TRACING_MODE) — Tracing Mode

Category: general

TRACING_MODE selects the trace backend behavior (for example local-only, external export, or disabled pathways in mixed environments). This mode determines where spans are emitted, which metadata is attached, and how operators inspect runs during incident triage. Choose a mode that matches deployment stage: local views for rapid iteration, full OpenTelemetry export for shared production observability, and controlled fallback modes for constrained environments. Ensure mode changes are tested with synthetic requests so trace continuity does not break across upgrades.

Links: - AgentTrace: Comprehensive Tracing for AI Agents (arXiv 2026) - LangSmith Trace with OpenTelemetry - OpenTelemetry Trace SDK - LangSmith Observability Concepts

tracing.tribrid_log_path (TRIBRID_LOG_PATH) — Reranker Log Path

Category: general

TRIBRID_LOG_PATH specifies where local runtime logs and trace artifacts are written on disk. A stable, writable path is required for reproducibility workflows such as replaying failure cases, auditing retrieval decisions, and comparing behavior across model/version changes. In multi-process deployments, this path should be paired with rotation and retention policy to prevent unbounded growth and partial-write corruption. Treat log-path configuration as part of operational hardening: explicit permissions, predictable lifecycle, and compatibility with your observability export strategy.

Links: - GraphTracer: Tracing Dynamic Dataflow in Agentic AI Systems (arXiv 2025) - OpenTelemetry Trace SDK - Elasticsearch Index Lifecycle Management (ILM) - LangSmith Data Purging and Compliance