Retrieval Pipeline¶
AGRO’s retrieval stack is intentionally layered and heavily configurable, but you don’t have to use every knob from day one. This page walks through what actually happens when you hit “Ask” in the UI or call the /rag endpoint, and how the different services and config surfaces tie together.
At a high level:
flowchart LR
Q[User query] --> RQ[Request handler<br/>server/services/rag.py]
RQ --> CFG[Config registry<br/>.env + agro_config.json]
RQ --> HYB[Hybrid search<br/>retrieval/hybrid_search.py]
HYB -->|BM25| SP[BM25 index]
HYB -->|Dense| VE[Vector index (Qdrant)]
HYB --> KW[Discriminative keywords<br/>server/services/keywords.py]
HYB --> RR[Cross-encoder reranker]
RR --> RES[Ranked chunks]
RES --> LG[LangGraph app (optional)<br/>server/langgraph_app.py]
LG --> ANS[Final answer]
The rest of this page is “how it actually works in code,” not just a diagram.
1. Entry points: HTTP and CLI¶
Most retrieval flows end up in server/services/rag.py.
- HTTP:
/rag,/search,/chatroutes call intodo_search/search_routed_multi. - CLI:
cli/chat_cli.pyandcli/commands/chat.pytalk to the same HTTP API.
Key points:
do_searchis the single place whereFINAL_K/LANGGRAPH_FINAL_Kare resolved from config.- The actual retrieval work is delegated to
retrieval.hybrid_search.search_routed_multi. - LangGraph is optional: if
build_graph()fails, AGRO falls back to “plain” hybrid search.
2. Configuration: where retrieval knobs live¶
Retrieval behavior is driven by the central configuration registry, not scattered os.getenv calls.
Retrieval-related keys you’ll see referenced from the services layer:
FINAL_K/LANGGRAPH_FINAL_K– how many chunks to return after reranking.KEYWORDS_MAX_PER_REPO,KEYWORDS_MIN_FREQ,KEYWORDS_BOOST,KEYWORDS_AUTO_GENERATE,KEYWORDS_REFRESH_HOURS– discriminative keyword extraction.EDITOR_*,INDEX_*– control the embedded editor and indexer behavior.
The registry is used everywhere via get_config_registry() and type-safe helpers:
_config_registry = get_config_registry()
_KEYWORDS_MAX_PER_REPO = _config_registry.get_int('KEYWORDS_MAX_PER_REPO', 50)
_KEYWORDS_BOOST = _config_registry.get_float('KEYWORDS_BOOST', 1.3)
You don’t have to know every key up front
The web UI surfaces these settings with inline tooltips and links to docs / papers. You can start with defaults and only touch the knobs when you have a concrete retrieval failure to fix.
For a deeper dive into the registry itself, see Configuration.
3. Indexing: how code becomes chunks¶
Retrieval only works if the index is sane. Indexing is orchestrated by server/services/indexing.py and the standalone indexer (see README-INDEXER.md).
Important details:
- Indexing runs in a background thread and spawns a separate Python process with the same interpreter and
PYTHONPATH. REPOandREPO_ROOTare passed through the environment so the indexer can resolve paths consistently.ENRICH_CODE_CHUNKStoggles additional semantic enrichment of chunks (e.g. adding symbol context) without changing the core BM25 / dense indexing logic.
The actual chunking, tokenization, and Qdrant collection layout live under common/ and retrieval/ and are covered in more detail in the main RAG design doc.
4. Discriminative keywords layer¶
On top of BM25 and dense vectors, AGRO maintains a small per-repo set of “discriminative keywords” that help with:
- Disambiguating overloaded terms in large monorepos.
- Boosting domain-specific tokens that BM25 alone tends to underweight.
This is handled by server/services/keywords.py.
Why this is useful:
- For small codebases, plain BM25 is often enough. You can set
KEYWORDS_AUTO_GENERATE=0and ignore this entire layer. - For larger repos, this gives you a cheap, interpretable way to bias retrieval toward “interesting” tokens without retraining the dense model.
The keywords are stored in data/discriminative_keywords.json and can be inspected or edited directly if you want full control.
5. Hybrid search and reranking¶
The core retrieval logic lives in retrieval/hybrid_search.py (not shown here in full). Conceptually it does:
- Run BM25 over the code chunk index.
- Run dense vector search over the same chunks in Qdrant.
- Optionally incorporate discriminative keyword scores.
- Merge and normalize scores.
- Run a cross-encoder reranker over the top N candidates.
The reranker is itself configurable and can be trained on your own feedback (see Self-learning cross-encoder reranker).
Small repos: you can keep it simple
If you’re indexing a small library or a single service, you can disable dense search and reranking entirely and just use BM25. In practice that often gives better latency and more predictable behavior until you have enough data to justify the extra complexity.
6. LangGraph orchestration (optional)¶
If server/langgraph_app.py is present and build_graph() succeeds, AGRO will:
- Wrap the hybrid search results in a LangGraph workflow.
- Let you plug in more complex reasoning / tool-calling / multi-step flows.
- Still fall back to “plain” retrieval if the graph fails to build.
This is intentionally defensive:
- The retrieval stack should keep working even if your experimental graph code is broken.
- The HTTP API contract (
/ragreturns{"results": ...}) doesn’t change.
If you don’t care about LangGraph, you can ignore this entirely and just treat do_search as “BM25 + dense + reranker.”
7. Editor and live context¶
AGRO ships with an embedded editor / devtools panel that can:
- Show you the current retrieval config for the active repo.
- Let you tweak settings like
FINAL_K, reranker weights, and keyword behavior. - Persist those changes back to
agro_config.jsonvia the config store.
The editor service is wired through server/services/editor.py:
This is mostly plumbing, but it’s worth knowing that:
- Editor behavior is just more config in the same registry.
- The UI reads from
/out/editor/settings.jsonand/out/editor/status.json, which are generated here.
8. Tracing and evaluation hooks¶
Retrieval is instrumented for tracing and evaluation so you can debug bad answers.
server/services/traces.pyexposes:list_traces(repo)– list recent trace files underout/<repo>/traces.latest_trace(repo)– return the most recent trace path.- The evaluation pipeline under
features/evaluation.mdcan snapshot config and compare different retrieval settings.
The web UI’s Analytics → Tracing and Evaluation tabs are thin wrappers around these endpoints.
9. Putting it together: typical retrieval flow¶
Here’s the full flow for a single /rag call, with the pieces from above:
sequenceDiagram
participant User
participant WebUI as Web UI / CLI
participant API as FastAPI / rag.py
participant Hybrid as hybrid_search
participant Qdrant
participant BM25
participant Rerank as Cross-encoder
User->>WebUI: Ask question
WebUI->>API: POST /rag { q, repo, top_k? }
API->>API: do_search()
API->>CFG: resolve FINAL_K, REPO
API->>Hybrid: search_routed_multi(query, repo, final_k)
Hybrid->>BM25: sparse search
Hybrid->>Qdrant: dense vector search
Hybrid->>Hybrid: merge + keyword boosts
Hybrid->>Rerank: rerank top N candidates
Rerank-->>Hybrid: scored chunks
Hybrid-->>API: final_k chunks
API->>LangGraph: (optional) run graph
API-->>WebUI: answer + contexts
WebUI-->>User: rendered answer + citations
10. When to tune what¶
You don’t need to memorize every config key. A practical tuning order:
- Indexing sanity
- Make sure the right files are included / excluded.
-
Check chunk sizes and language detection in the indexer logs.
-
BM25 only
- Disable dense search and reranking.
-
Fix obvious misses by adjusting tokenization / stopwords.
-
Enable dense + reranker
- Turn on dense search in
agro_config.json. -
Start with a small reranker
FINAL_K(e.g. 10–20) to keep latency reasonable. -
Discriminative keywords
- Enable
KEYWORDS_AUTO_GENERATEfor large repos. -
Inspect
data/discriminative_keywords.jsonand adjust thresholds. -
LangGraph / advanced orchestration
- Only once retrieval itself is solid.
If you’re not sure what a parameter does, hover it in the UI or ask AGRO itself in the Chat tab – the codebase is indexed into its own RAG engine, and the tooltips link back to the relevant docs and papers.