Route to the Cheapest Oracle That Can Decide
When you build on top of a language model, routing every decision to the model is the path of least resistance. One interface, one prompt, one place to add logic. It works beautifully in a demo.
Then it meets production scale, and you find you've been using the most expensive, slowest, least deterministic tool you own to answer questions that were already solved — for free — by something far simpler.
The fix isn't a better model. It's a routing decision made before any work runs: send each decision to the cheapest oracle that can actually decide it — where an "oracle" is just anything that can answer the question, from a regex or a lookup to a small model to the big one. Let me make that concrete with a system I built, then pull out the part that generalizes.
Why "LLM everything" is the default — and why it's wrong
Defaulting every decision to a language model is the natural first move. At prototype scale it's the obvious one. Then production shows you three costs the demo never did:
Money. You pay per token for decisions a free deterministic function could make.
Latency. A model call is hundreds of milliseconds to seconds; a structural check is sub-millisecond. Multiply by every decision, every time.
Non-determinism. The same input can resolve one way now and another way later, because the oracle is probabilistic. For anything that gates a workflow, that's corrosive — people stop trusting a check that flip-flops.
The design treats the expensive oracle as the default. It should be the fallback.
Most decisions have a cheaper oracle that can decide them
Here's the observation that reframes the problem. A large share of the decisions you're tempted to route to an LLM don't require its judgment at all. Their truth condition is something a cheaper oracle can evaluate — a structural check, a lookup, a rule, a smaller model.
The minority that genuinely need the expensive oracle are the ones whose answer depends on meaning or context — the things language models are actually for. The mistake is letting that hard minority set the architecture for the easy majority.
So the design question is never "should we use an LLM here?" It's sharper: for which specific decisions is the LLM the only oracle that can decide? That's a per-decision question, and — crucially — it's answerable upfront, before anything runs.
A worked example: a code-standards enforcement system
The clearest case I lived through was a rule-enforcement system — the kind that scans code against a set of standards and flags violations. The first version routed every rule to an LLM. It demoed well and fell apart the moment it ran on every commit.
Most of the rules I was paying a model to evaluate had a structural truth condition:
"Does this module import a forbidden package?" — an import-graph lookup.
"Does every public function have a return type?" — an AST traversal.
"Does this file match the required naming pattern?" — a regex.
None of those need a model to understand intent; they need a parser to inspect structure. The rules that did need a model were the semantic ones — "is this error message actually helpful?" — where the truth depends on meaning.
So I put a classification step in front of the dispatcher. Before a single check runs, each rule is graded once into two tiers:
| Tier | Oracle | Properties |
|---|---|---|
| Deterministic | AST / regex / import-graph | sub-second · no per-token cost · reproducible · unit-testable |
| Semantic | LLM | slower · paid · needs intent or context |
The classification is cheap and one-time — a small, fast model is plenty to sort "structural" from "needs-judgment." You pay it once, not on every commit forever. And because that step runs once and gets a human sign-off, its own non-determinism never reaches production: the flaky part is bounded to design time, not runtime. The genuinely ambiguous rules — mostly structural but with a semantic edge — default up to the LLM tier; you'd rather over-power a check than let a deterministic shortcut quietly miss the hard case.
When I ran that split across one rule stream, the majority came back deterministic and only a handful were genuinely LLM-only. Routing accordingly cut the model spend on that stream by roughly 80% — the saving comes from pulling the high-frequency checks (the ones that run on every commit) off the model, not from dropping any coverage. The deterministic checks now run in milliseconds and never flip-flop, so the gate became something people trust.
The part that generalizes
Strip the code-standards specifics and the pattern is domain-agnostic. Anywhere you're tempted to default to the model, the same routing layer applies:
Classification — a cheap keyword/rule pass first; the LLM only for the ambiguous remainder.
Extraction — a regex or parser for the structured fields; the LLM only for the free text that resists one.
Validation — schema and type checks before you ask a model whether something "reads right."
Escalation — a small, cheap model first; promote to the expensive one only when confidence is low.
In every case the move is the same: grade the decision once, route it to the cheapest oracle that can decide it, and reserve the expensive one for the decisions that genuinely earn it. A one-tier system that sends everything through the same oracle has structurally given up those savings before it starts.
What it costs you
This isn't free. Two tiers means a classification step to maintain and two code paths to keep honest, where a one-tier "LLM everything" system has just one. That overhead is real, and it's why the one-tier version is the right first move: at prototype scale, when you're still discovering what the rules even are, a single model call per decision is the fastest way to learn.
The split earns its keep only when volume shows up — every commit, every item, every time — and the per-decision cost of the model starts compounding. Add the routing layer when the scale is real, not before. Reaching for it on day one is its own kind of over-engineering.
How to apply it
If you're building on an LLM, run one audit:
List the actual decisions your system makes — not categories, the real decisions.
For each, ask: what's the minimum oracle that can decide this? A rule? A regex? A parser? A lookup? A small model? Or genuine semantic judgment?
Route accordingly. Cheap-tier decisions get a real function with a test. Only the judgment calls keep the model.
Re-run the audit as you add features. New decisions default to the cheap tier and only graduate to the model when you can name the judgment they require.
The instinct to make everything smart is the expensive one. The discipline is to make each decision exactly as smart as it needs to be — and no smarter.
I'm a Technical Architect building AI-enabled systems. I write about what I actually build — the routing and cost decisions under the hood, not the hype.
If you've split an LLM pipeline into cheap and expensive tiers, I'd like to know where you drew the line. Find me on X @deepakgoyal_ai or LinkedIn.
