Lesson 03·Agent Operations·~8 min·spend → control
The money lesson
A bulk extractor ran overnight on a hosted model and quietly billed about $50 a day. Nobody approved it. The fix was not a cheaper vendor. It was routing, empty fallbacks, and a governor with no model in it, and it took the same work to about $0 to $2 a day.
In ~8 minutes you'll be able to
- Route pipeline work down a four-tier price ladder, and say why the cheap tiers are safe.
- Explain why an empty fallback list is a cost control, not a reliability bug.
- Describe the credit governor: a small tool with no model that decides whether work dispatches at all.
The whole lesson in one block:
Agents overspend silently, never loudly. So the controls are structural. Route bulk work to local and free models, orchestration and verification to flat-rate plans, the small synthesis slice to metered tokens, and reach for a frontier model only as a deliberate exception. Keep fallback lists empty, so a dead model means a loud stall, not a quiet bill. Put the dispatch decision in a small operator tool with no model in it, one that refuses to spend when it cannot read the balance. The receipt: moving bulk extraction to a local model took recurring spend from about $50 a day to about $0 to $2 a day.
01The quiet bill
The incident that wrote this lesson was small and invisible. A bulk extractor, the workhorse that turns raw pages into structured claims, was pointed at a hosted model. It ran every night. It worked fine. And it quietly billed about $50 a day that nobody had approved.[1] No crash. No alert. No red light on any dashboard. The invoice was the first symptom, and invoices arrive late.
Compare that with the other way the same job can fail: the model server dies and the pipeline stops. That failure is loud. You see it at breakfast, restart the server, and lose a few hours of throughput that nothing downstream was waiting on. It costs almost nothing.
If you remember nothing else
The dangerous failure is the silent one.
A loud stall is cheap to notice. A silent spend is not.[2] Every control in this lesson, the tier ladder, the empty fallback, the governor, is a way of converting silent failures into loud ones.
This is Rule 5 of the fleet's six: fail closed, never silently spend.
02The tier ladder
Start with where the money goes. Most agent pipelines spend like every step is hard. Almost none of the steps are. On this fleet the work routes down a four-tier ladder, and each tier exists for a reason.[1]
Local and free
Bulk authoring · discovery · automationThe high-volume work: extraction, drafting, scraping triage. A local model does it for the price of electricity. This is where almost all the tokens are, which is exactly why it must not be where the money is.
Flat-rate
Orchestration · verificationCoordination and the independent verifier run on flat-rate plans. Predictable jobs, predictable bill. A subscription cannot surprise you in the morning, which makes it the right home for work that runs every day.
Metered
The small synthesis slicePer-token pricing is reserved for the one step that genuinely needs sharper reasoning: synthesis. On this fleet that step touches a thin slice of daily volume, so metered pricing stays a rounding error instead of becoming the bill.
Frontier
Fallback · genuinely hard reasoningThe most expensive model fires only where it changes the outcome, and only on purpose. It is a scalpel, not a default. If a frontier model is doing your bulk work, you are paying frontier prices for work a free model clears.
The receipt for the whole ladder is one move: bulk extraction went from a hosted model to a local one, and recurring spend went from about $50 a day to about $0 to $2 a day.[1] Same work, same quality bar. The quality bar was never the author's job. It was the verifier's.
03The empty fallback
Here is the counterintuitive part. The bulk extractor on this fleet has no paid fallback. If its local model is down, it stalls.[4] On purpose.
✕ Automatic failover to a paid provider
The local model dies at 2 a.m. Config quietly reroutes the night's work to a hosted model. Everything keeps working. That is the problem: "working" now bills a paid provider all night and produces zero alerts.
This is the $50-a-day incident from section 1, automated and made permanent.
✓ An empty fallback list
The local model dies at 2 a.m. The extractor stalls. The watchdog pages the operator. You restart the server at breakfast and lose a few hours of bulk work nothing downstream was waiting on.
A loud stall is cheap to notice. A silent spend is not.
The fleet's do-not list states it as a flat rule: never give a bulk authoring agent a paid fallback. Let it stall; do not let it silently spend.[5] Notice this is Lesson 1's prompt-versus-grant distinction wearing a dollar sign. "Be careful about costs" is a request to a model. An empty fallback list is a fact about the config. The model cannot be talked into taking a failover path that does not exist.
04The governor has no model
Budgets need an owner, and the owner cannot be one of the workers. On this fleet a small operator tool owns whether and how often work dispatches at all. The workers just work.[6] The governor has no model in it. It is plain code, which means it cannot be prompted, persuaded, or injected into generosity.
The division of labor is strict:
- Per-item concurrency is a database lease. Two workers cannot claim the same item.
- Per-host pacing is the worker's job. It knows how hard it can hit a host.
- Global cadence, budget, and the circuit breaker belong to the governor, and only the governor.
The live example is a credit-aware backfill. It holds a reserve floor of roughly 20% of credits that it will not spend down. It reconciles its own ledger against the live balance instead of trusting its own bookkeeping. And if it cannot read the balance, it refuses to dispatch.[6] Not "dispatches with a warning." Refuses. An unreadable balance could mean anything, and the safe default for "anything" is closed.
05The real budget is fetch credits
One more place the money hides. On a fleet where local models have made token cost a near-solved problem, the recurring budget that actually matters is paid web-fetch credits.[7] Every page a discovery agent pulls through a paid fetcher costs real credits, and discovery agents pull a lot of pages.
The pattern is free-fetch-first: try a plain free HTTPS GET and extraction before spending a paid fetch credit, and fall back to the paid fetcher only when the free path fails.[7] Most pages do not need the expensive tool. The expensive tool is for the pages that fight back.
The headless exception
Free-fetch-first assumes an operator can approve the call. A headless worker that cannot surface an approval prompt should not use terminal HTTP at all: with nobody there to approve it, the request hangs the worker. Route that worker's web access through a managed fetch tool instead.[7]
You can still recover the credit thrift: map a site first, then scrape only the pages worth keeping, instead of paying to fetch everything and sorting it afterward.
06Try it: route the money yourself
The simulator below is the whole lesson as a control panel: a volume slider, a price tier per stage, the fallback checkbox from section 3, and a 2 a.m. outage button. Run the outage twice, once with the paid fallback checked and once without, and decide which failure you would rather own.
Interactive · your feedback loop
Cost-Routing Simulator
Pick a tier for each pipeline stage and watch the daily bill move. Unit costs and the per-stage volume shares are illustrative — not any vendor's price list, and not a measured fleet ratio. In this demo, verification and synthesis each run on a small ~5% slice of items.
Tip: preset "Routed", uncheck the fallback, then run the outage. That configuration is the fleet's. The loud version costs $0.
07Check your understanding
3 quick checks
Click an answer for instant feedback. One try per question. Nothing is sent anywhere.
Q1Which failure is the dangerous one?
Q2What does an empty fallback list buy you?
Q3The credit governor cannot read the balance. What does it do?
Sources
- AgentOps fleet manual, Proven Patterns P2: local/free for bulk authoring, discovery, and automation; flat-rate for orchestration and verification; metered only for the small synthesis step; frontier only as fallback or for genuinely hard reasoning. Moving bulk extraction to a local model took recurring spend from ~$50/day to ~$0–2/day. ↩
- AgentOps fleet manual, The Six Rules, Rule 5: fail closed, never silently spend. Fallbacks are empty so an agent stalls loudly instead of quietly billing a paid provider; a loud stall is cheap to notice, a silent spend is not. ↩
- AgentOps fleet manual, Create ≠ Verify ("The payoff"): because the verification wall is independent and structurally enforced, the author does not have to be expensive or trustworthy; bulk authoring runs on a free local model with no loss of safety. ↩
- AgentOps fleet manual, Proven Patterns P3: the bulk extractor stalls if its local model is down rather than failing over to a paid one; gates default to "needs human" on any error. ↩
- AgentOps fleet manual, The Do-Not List (Models & cost): don't give a bulk authoring agent a paid fallback; let it stall, don't let it silently spend. ↩
- AgentOps fleet manual, Proven Patterns P10: a small operator tool with no model owns whether and how often work dispatches; per-item concurrency is a database lease, per-host pacing is the worker's job, and global cadence, budget, and circuit-breaker are the governor's. The credit-aware backfill holds a ~20% reserve floor, reconciles against the live balance, and fails closed when it cannot read it. ↩
- AgentOps fleet manual, Proven Patterns P4: free-fetch-first. Try a plain free HTTPS GET and extraction before spending a paid fetch credit; on a fleet where token cost is ~solved, the real recurring budget is paid web-fetch credits. Caveat: a headless worker that cannot surface an approval prompt should route web access through a managed fetch tool, mapping a site first and scraping only the pages worth keeping. ↩
— end of lesson 3 —