How accurate is SourceVault's retrieval on a known codebase?

On express v5.1.0 across 30 authored ground-truth questions, 100% of answers were grounded in clickable file-and-line citations and 100% contained the facts the ground truth required, at a median 20.4 seconds per cited answer, with zero source code transmitted anywhere.

What can SourceVault answer that cloud code assistants cannot?

Questions that require git history, such as why or when code changed, because SourceVault indexes commit history locally and cloud indexers never see it. It also covers uncommitted work and local branches, and exposes a retrieval-quality report measured on your own codebase.

Does any source code leave my infrastructure?

No. Retrieval, embeddings, and answer generation all run locally. The benchmark was produced on one workstation with zero egress; the privacy model is architectural and verifiable, not contractual.

← SourceVault

Ground-truth retrieval benchmark

The production pipeline, measured on a public repository anyone can inspect. Local models only — every number below was produced on one workstation, with zero egress.

Method

Corpus: express v5.1.0 (commit cd7d439), indexed exactly like a customer repository.

Questions: 30 authored questions with known answers, each declaring the file a correct answer must cite and the facts the answer text must contain. They're the questions a developer actually asks:

"How does the trust proxy setting affect how the client IP address is determined?"
"Where is the etag setting compiled into the function that generates ETags?"
"What handles a request when no route matches it?"

Run: every question goes fresh through the full production pipeline — retrieval, context assembly, local model — no cache, no cherry-picking. Reports are machine-generated by the same eval harness that ships in the product.

Results — production pipeline, 2026-06-12

qwen3-coder:30b + nomic-embed-text, one workstation, fully local.

What was measured	Result
Answers grounded in cited sources (clickable file-and-line citations)	100% (30/30)
Answers containing the facts the ground truth demanded	100% (30/30)
Median time to a cited answer	20.4 s
Source code transmitted anywhere	0 bytes

That is the citation guarantee, measured: answers about your code, with file-and-line proof — and an answer engine that declines rather than guesses when retrieval comes up empty.

One stricter metric, for completeness: in 83% of questions the answer's citations included the exact file our ground truth named (up from 60% after a retrieval fix — see the changelog). The remainder answered correctly while citing related code — for example the place a function is used rather than the line it's defined on. We publish it because honest benchmarks publish their strictest number, not just their best one.

What actually moved the numbers

Honest benchmarking means crediting the right change. The jump in exact file-hit came from a retrieval fix — folding exact-name matches into the hybrid ranking instead of letting a literal symbol short-circuit it — not from the cross-encoder reranker. Here is the isolated effect of each, on the same 30-question set:

Configuration	Exact file-hit
Hybrid retrieval, no reranker	56.7% (17/30)
+ cross-encoder reranker	60.0% (18/30)
+ literal-plan recall fix	83.3% (25/30)

The reranker's own contribution here is a single question (56.7% → 60.0%) — within noise on a set this size, and not a result we'll dress up as a proven file-hit or precision win. We keep the cross-encoder on as a defensible default: it re-reads the top candidates against your actual question at sub-second cost, and on larger, noisier corpora a reranker typically earns more than it does on 30 clean questions. Its isolated benefit after the recall fix is still to be re-measured on a bigger question set — and we'll publish that number when a real run produces it, not before.

Large-repo benchmark (in progress)

Thirty questions on Express is a clean, inspectable start — but small. A larger set is staged: 28 verified questions against kubernetes v1.36.2, a repository big enough to stress retrieval where it actually strains. The questions exist and are checked; the run does not yet. We won't print a large-repo accuracy number here until it comes from a real local run — when it does, this section gets the figures and the method, exactly as above.

Against cloud assistants

Cursor, Copilot, and Cody can't be driven headlessly, so a scored head-to-head has to be produced by hand; when we publish one it will include both sides' full transcripts, on this same question set. What can be compared today is structural — properties that don't depend on who runs the benchmark:

	SourceVault	Cloud codebase chat
Source code leaves your infrastructure	Never — architecturally	Chunks/embeddings upload
Git history answers ("why was this changed?")	Indexed and cited	Never sees your history
Uncommitted work and local branches	Indexed on your machine	Only what syncs
Retrieval quality measured on your codebase	Eval report per install	Not exposed
Privacy model	Verifiable (zero egress)	Contractual (policy)

The second row is the one no cloud vendor can ever match by shipping a feature: answering "why was this changed?" requires your commit history, and their indexers never see it.

Verify it on your own code

The benchmark harness ships inside every SourceVault install — the same machinery produced this page. Run it against your repositories during the free 7-day trial and get the same retrieval-quality report on your own code. If the answers don't cite your code with file-and-line proof, don't buy it.

← Back to SourceVault · support@sourcevault.ai