Architecture
-
Tri-Path Retrieval
Vector, Sparse, and Graph retrievers run concurrently for maximum recall.
-
Fusion Layer
Weighted fusion or RRF unifies heterogeneous scores into one ranking.
-
Optional Reranker
Cross-encoder can refine the fused list by understanding local context.
-
Pydantic-Orchestrated
All engine parameters are Pydantic fields with constraints and defaults.
-
FastAPI Surface
Clean endpoints for indexing, retrieval, graph queries, and system health.
-
Observability
Readiness + Prometheus metrics + PostgreSQL exporter.
Concurrency
TriBridRAG parallelizes retrievers with async I/O. Size DB connection pools to match concurrency and avoid I/O starvation.
Failure Isolation
Each retriever is wrapped so failures degrade that leg only. Fusion runs on the subset that succeeded; fused results keep provenance in ChunkMatch.source.
Graph Availability
If Neo4j is temporarily unavailable, retrieval continues with vector + sparse. Test fallback behavior in your deployment.
System Diagram
flowchart LR
subgraph API
FAPI["FastAPI"]
end
FAPI --> V["VectorRetriever"]
FAPI --> S["SparseRetriever"]
FAPI --> G["GraphRetriever"]
V --> FU["Fusion"]
S --> FU
G --> FU
FU --> RR["Reranker (optional)"]
RR --> RES["Results"]
FU --> RES
V <--> PG["("PostgreSQL\n(pgvector+FTS)")"]
S <--> PG
G <--> NEO["("Neo4j\nGraph")"] Layer Responsibilities
| Layer | Module | Responsibilities | Representative Config |
|---|---|---|---|
| Vector | server/retrieval/vector.py | Dense search via pgvector | vector_search.enabled, vector_search.top_k, embedding.* |
| Sparse | server/retrieval/sparse.py | FTS/BM25 over chunks | sparse_search.enabled, sparse_search.top_k, indexing.bm25_* |
| Graph | server/retrieval/graph.py | Entity traversal, context expansion | graph_search.enabled, graph_search.max_hops, graph_storage.* |
| Fusion | server/retrieval/fusion.py | Merge lists and scores | fusion.method, fusion.rrf_k, fusion.*_weight |
| Reranker | server/retrieval/rerank.py | Cross-encoder scoring | reranking.reranker_mode, reranking.* |
Hot Path (Annotated)
from server.retrieval.fusion import TriBridFusion
from server.retrieval.rerank import Reranker
async def search(query: str, corpus_id: str, cfg): # (1)
fusion = TriBridFusion(cfg)
fused = await fusion.search(corpus_id, query) # (2)
if cfg.reranking.reranker_mode != "none":
rr = Reranker(cfg)
fused = await rr.rerank(query, fused) # (3)
return fused # (4)
BASE=http://localhost:8000
# (2) Fusion (vector+sparse+graph)
curl -sS -X POST "$BASE/search" \
-H 'Content-Type: application/json' \
-d '{
"corpus_id": "tribrid",
"query": "connection pool size",
"top_k": 10
}' | jq '.matches[0]'
import type { SearchRequest, SearchResponse } from "../web/src/types/generated";
export async function triSearch(req: SearchRequest): Promise<SearchResponse> {
const resp = await fetch("/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(req),
});
return await resp.json(); // (4)
}
- Query and corpus identifier; use
corpus_id(alias of legacyrepo_id) - Fusion runs vector/sparse/graph concurrently and merges results
- Optional reranking with cross-encoder based on config
- Returns unified
SearchResponsewith provenance and latency
Fusion Choices
| Method | Formula | Strengths | Notes |
|---|---|---|---|
| weighted | w_v*sv + w_s*ss + w_g*sg | Interpretable weight tuning | Normalize scores if distributions differ |
| rrf | sum 1/(k+rank_i) | Robust across heterogeneous scales | Tune rrf_k in fusion.rrf_k |
flowchart TB
Q["Query"] --> V["Vector Top-K"]
Q --> S["Sparse Top-K"]
Q --> G["Graph Top-K"]
V --> FU["Fusion"]
S --> FU
G --> FU
FU --> OUT["Top-N Results"] Implementation Notes
- All configurable fields (weights, top_k, thresholds) live in
TriBridConfig. Frontend sliders and toggles must map 1:1 to these fields viagenerated.ts. - DB clients:
server/db/postgres.py(pgvector + FTS) andserver/db/neo4j.py(graph). Keep pools separate to avoid head-of-line blocking.
Do Not Hand-Write Types
All API types must be imported from web/src/types/generated.ts. Regenerate with uv run scripts/generate_types.py whenever Pydantic models change.