Monitoring & Observability¶
AGRO ships with a full monitoring stack, but you don't have to run all of it. This page explains what each piece does, how it connects to the AGRO services, and how configuration actually flows through the code.
The monitoring stack is built around:
- Prometheus (metrics scraping)
- Alertmanager (alerts)
- Loki (logs)
- Grafana (dashboards)
- AGRO's own HTTP APIs for traces, index stats, and system status
Under the hood, the web UI talks to a small set of service-layer modules in
server/services/ that expose monitoring data in a way that's safe for the
browser and stable across config changes.
High-level architecture¶
At runtime, there are three main data flows that matter for monitoring:
- Configuration – what repo is active, where data lives, which ports to use
- Indexing & retrieval status – what the indexer is doing, how many documents are indexed, current errors
- Traces & analytics – per-query traces, evaluation runs, and cost/usage
flowchart LR
subgraph User
UI[Web UI]
end
subgraph AGRO API
RAG[server/services/rag.py]
IDX[server/services/indexing.py]
KW[server/services/keywords.py]
TR[server/services/traces.py]
CFG[server/services/config_registry.py]
end
subgraph Storage
FS[(Filesystem: data/, out/, logs/)]
QD[(Qdrant)]
MON[(Prometheus / Loki / Grafana)]
end
UI -->|HTTP /fetch| RAG
UI --> IDX
UI --> KW
UI --> TR
RAG -->|reads| QD
IDX -->|writes| QD
IDX -->|writes status & stats| FS
KW -->|reads/writes keywords| FS
TR -->|reads traces| FS
CFG --> RAG
CFG --> IDX
CFG --> KW
MON -->|scrape / tail| FS
MON -->|scrape| AGRO API
The important bit: all monitoring-related behavior is driven by the same configuration registry that powers the rest of AGRO. You don't have to thread environment variables through every service manually.
Configuration: where monitoring reads its settings¶
Monitoring-related services don't read os.environ directly. They go through
server/services/config_registry.py, which implements a central
configuration registry with clear precedence rules:
.envfile (secrets and infrastructure overrides)agro_config.json(tunable RAG and monitoring parameters)- Pydantic defaults (fallback values)
Every monitoring-facing service module grabs a module-level registry instance:
| example: module-level registry | |
|---|---|
This is why the docs and UI can show you exactly where a value came from
(.env vs agro_config.json vs default) and why you can safely reload
configuration without restarting everything.
System status & indexing status¶
The Dashboard → System Status and Dashboard → Indexing panels in the web UI are backed by a couple of small service modules:
server/services/indexing.py– starts and monitors the indexer processserver/index_stats.py– reads index statistics from disk / Qdrant
Indexing service¶
server/services/indexing.py is responsible for kicking off indexing runs and
exposing their status to the UI.
Key points:
- It uses the same Python interpreter as the running server
- It passes
REPO,REPO_ROOT, andPYTHONPATHthrough the environment so the indexer resolves paths correctly - It stores human-readable status messages and metadata in module-level variables that the HTTP API can read
The HTTP API exposes this via a simple endpoint (see api/endpoints.md), and
the web UI polls it to render live status.
Keyword extraction & discriminative keywords¶
AGRO maintains a set of discriminative keywords per repo to help BM25 and hybrid search. These are surfaced in the Dashboard → Glossary / Keywords areas and used in evaluation.
The service layer for this lives in server/services/keywords.py.
How keyword config is loaded¶
keywords.py uses the config registry once at import time to populate a set
of module-level constants, then exposes a reload_config() helper so you can
pick up changes without restarting the server.
Why this is useful:
- The UI can show you the effective values and where they came from
- You can tweak keyword behavior (
*_BOOST,*_MIN_FREQ, etc.) and reload without restarting AGRO - The keyword service can be used both by the HTTP API and the CLI without duplicating config parsing logic
Traces & evaluation¶
AGRO writes detailed traces for RAG queries and evaluation runs under
out/<repo>/traces/. These are JSON files that the UI can render in the
Analytics → Tracing and Evaluation → Trace Viewer tabs.
The service-layer entry point for listing and reading traces is
server/services/traces.py.
A couple of things to note:
- The repo is taken from the argument or
REPOenv var, with a default ofagro. This keeps the HTTP API simple while still working in multi-repo setups. out_dir()centralizes where "ephemeral" outputs go. If you move that directory, traces, eval snapshots, and other monitoring artifacts move with it.- The service layer is intentionally defensive: exceptions are logged and turned into empty-ish responses so the UI doesn't explode when a trace file is missing or corrupted.
Editor / DevTools integration¶
AGRO ships with an embedded editor / DevTools UI that can be used to inspect and tweak configuration, run quick experiments, and inspect logs.
The backend for this is server/services/editor.py.
Why this matters for monitoring:
- The editor can show live status (via
_status_path()), including indexing progress and last error - The same config registry is used to decide whether the editor is enabled,
which port it binds to, and whether it should be reachable from outside
localhost
RAG query telemetry¶
RAG queries themselves are instrumented in server/services/rag.py. This is
where query events are logged, metrics are emitted, and (optionally) traces
are written.
A few things to call out:
stagefromserver.metricsis used to time and label different phases of the RAG pipeline (retrieval, reranking, synthesis). Those metrics are what Prometheus scrapes.log_query_eventwrites a structured event that can be used for analytics, evaluation, or training the learning reranker.FINAL_KandLANGGRAPH_FINAL_Kare read from the config registry, so you can change how many results are returned without touching code.
Config store & secrets in the UI¶
The Admin → General / Integrations / Secrets tabs in the web UI talk to
server/services/config_store.py. This module is responsible for:
- Reading and writing
agro_config.json - Masking secrets when sending config to the browser
- Atomically writing config files even on Docker Desktop / macOS volume
mounts (which are notorious for
Device or resource busyerrors)
This is one of those "not glamorous but important" pieces: without the atomic write + fallback, saving config from the UI on macOS Docker can fail randomly when file watchers are active.
How this shows up in the web UI¶
The React components under web/src/components/ are wired to these service
modules via the HTTP API. A few relevant ones for monitoring:
Dashboard/SystemStatus.tsx,SystemStatusPanel.tsx,SystemStatusSubtab.tsxDashboard/MonitoringLogsPanel.tsx,Dashboard/MonitoringSubtab.tsxAnalytics/Tracing.tsx,Analytics/Usage.tsx,Analytics/Performance.tsxEvaluation/TraceViewer.tsx,Evaluation/HistoryViewer.tsx
They don't talk to Prometheus or Loki directly. Instead, they:
- Call AGRO's HTTP endpoints (documented in
api/endpoints.md) - Render whatever the service layer returns
- Let you drill into traces, index stats, and evaluation runs without needing to know where files live on disk
Running with and without the full monitoring stack¶
You can run AGRO in a few different modes:
-
Just AGRO, no external monitoring
-
Only AGRO's own HTTP APIs and file-based traces are used
-
Useful for local experiments or CI jobs
-
AGRO + Prometheus + Grafana
-
Prometheus scrapes AGRO's
/metricsendpoint and any sidecars - Grafana dashboards read from Prometheus
-
Loki can tail logs from
docker-composeor your own logging setup -
AGRO + external observability (your own stack)
-
You can ignore the bundled compose file and point your own Prometheus / Grafana / Loki at AGRO's endpoints and log files
The important part is that AGRO itself doesn't care which of these you choose. The service layer always exposes the same:
- Indexing status
- Keyword stats
- Traces
- RAG query telemetry
The monitoring stack just decides how much of that you want to aggregate and visualize.
Debugging monitoring issues¶
A few practical tips if something looks off in the UI:
Check the config registry first
- Hit the Admin → General tab and inspect the effective values
- Make sure
REPO,OUT_DIR, and any monitoring-related keys are what
you expect
Look at the raw traces
- Go to
out/<repo>/traces/and open the latest JSON file - If the UI trace viewer is blank but files exist, it's probably a frontend bug, not a backend one
Check index status from the API
- Use
curlor the CLI to hit the indexing status endpoint - If
_INDEX_STATUSnever updates, the indexer subprocess may be failing early – check logs underout/<repo>/logs/or your Docker logs
If you run into something that isn't covered here, remember that AGRO is
indexed on itself. Go to the chat tab and ask it about
server/services/config_registry.py, server/services/indexing.py, or any
other module – the RAG engine will happily walk you through the code.