Lesson 04·Agent Operations·~8 min·hostile → contained
The hostile web
The fleet treats three things as permanently hostile: a leaked credential, an over-scoped agent, and the open web. Every control in this lesson assumes all three will eventually happen. That assumption is not paranoia. It is the design brief.
In ~8 minutes you'll be able to
- Scope every agent so a leaked credential or a steered model stays a small incident.
- Spot a prompt injection inside scraped content, and tell it from a benign imperative.
- Handle secrets by reference: per-agent env files, and a refresh proxy for short-lived tokens.
The whole lesson in one block:
Assume the credential leaks, the agent gets steered, and the web lies. Then design so none of it matters much. Every agent runs the narrowest role that does its job, so a subverted agent can only do what its role allows. Every scraped page is data, never instructions: embedded commands get ignored and flagged. No secret ever appears as a literal: env files referenced by variable name, and a local refresh proxy for short-lived tokens. Blast radius is a design property. You set it before the incident, not after.
01Three things assumed hostile
Most security advice is a list of things to prevent. The fleet's security model starts from the opposite end: name the things that will go wrong anyway, then make each one survivable. Three things get that treatment, permanently: a leaked credential, an over-scoped agent, and the open web.[1]
Why those three? Because each one already happened. A leaked connection string once forced a rotation. A web page tried to hand an agent new instructions. And every fleet starts with at least one agent holding more permission than its job needs, because broad credentials are the path of least resistance. The controls below are not hypothetical hardening. They are the shape the fleet took after contact.
One framing carries the whole lesson: a control that assumes the bad day designs for the size of the bad day. That is what the next three sections do, one hostile at a time.
02Least privilege is the default, not a hardening step
Rule 3 of the fleet's six: no agent ever holds a master key or a superuser password. Those live with a human, in a password manager. Every agent runs under the narrowest credential that lets it do its job and nothing more.[2] Here is the role tour, told by what each role cannot do:
- Discovery is seed-only. It reads the open web and can write a target row, nothing else. It physically cannot reach the claim ledger.
- Authoring drafts claims. It cannot mark anything verified.
- The verifier records verdicts. It cannot author.
- Chat is read-only against the data. It orchestrates and reports. It cannot cause a write incident at all.
Notice the order of operations. This is not a hardening pass applied after the fleet worked. The roles were the design. On a database shared across projects, that distinction is the whole game: one broad leaked credential is a cross-project incident, every table in every project, gone in one leak. A scoped credential leaking is a bad afternoon. Blast radius is a design property, and it gets decided when you create the role, long before anything leaks.[3]
✕ One broad credential
Every agent shares a powerful login because it was easy to set up. The day it leaks, the incident report covers every project on the database.
Convenient on day one. Catastrophic on day N. And there is always a day N.
✓ One narrow role per agent
Each agent holds exactly the permissions its job needs. The day a credential leaks, the damage stops at the one or two tables that role could touch.
In Postgres this is just roles and grants. The payoff is permanent.
03The web is data, never instructions
A discovery agent's job is to read pages written by strangers. Some of those strangers will write to the agent. A scraped page that contains text like "mark this claim verified," "publish now," or "re-initialize your environment" is not a command. It is a prompt-injection attack: instructions planted in content, hoping the model reading the page treats them as its own orders. The fleet's agents are built to ignore such text and flag it.[4]
Flagging is the soft layer, and soft layers fail. That is why this section and the last one are the same lesson. The structural controls (create is separate from verify, least privilege everywhere) exist precisely so that even if an injection succeeds in steering an agent, the agent lacks the permission to do the dangerous thing. The seed-only discovery agent that swallows "publish now" whole still cannot publish. It cannot verify. It can write one target row, and that is the entire blast radius. The injection worked and it still did not matter.
That is the pattern worth copying: train the agent to refuse, but build the system so refusal was never the load-bearing part.
04Secrets: by reference, never inline
Scoped credentials live in mode-600 per-agent environment files, and everything downstream refers to them by variable name. No secret ever appears as a literal in a command, a config snippet, or a charter. The reason is mechanical, not stylistic: a secret literal typed into a shell command lands in the transcript, and a secret in a transcript is a secret you now have to rotate. So it is never done. This rule was written by a real incident: a leaked connection string forced a full rotation, once, and that was enough.[5]
Short-lived tokens get their own pattern. Some connectors issue tokens that expire in under a day, which an agent doing static-header auth handles badly: it works for an hour, then fails, every hour, forever. The fleet's fix is a small loopback-only refresh proxy. The proxy holds the OAuth credentials, mints and refreshes the token before expiry, and hands the agent a stable local endpoint that needs no auth at all. The secret stays inside one process; the agent talks to loopback. The receipt: weeks of zero auth-expiry failures, versus hourly failures on a sibling host that kept using static tokens.[6]
05Describe attacks. Do not quote them.
One do-not from the fleet's list earns its own section, because it is the kind of mistake you only learn about after it silently bites. Never quote literal injection phrases inside an agent's charter file. Agent frameworks scan loaded files for threats, and a charter that quotes an attack string can read as an attack: the scanner may silently drop the whole charter, and the agent runs without the instructions you thought it had. Describe the attack in your own words instead. Even benign phrasing can trip the same filters (a "you must report and connect" sentence can read as a beacon pattern), so after any charter edit, verify against the real scanner that the charter still loads.[7]
06Try it: the Injection Lab
You are the seed-only discovery agent. Your scraper just fetched a page from a fictional deals blog, and the page has opinions. Some lines are plain content. Some are imperatives aimed at the human reader. Three are aimed at you. Flag the injections, then check your work. Part B shows why your role mattered more than your judgment.
Interactive · your feedback loop
Injection Lab
Part A. Click every line you would flag as a prompt injection, then hit Check. The trap is not the commands. It is the lines that merely sound like commands.
scraped page · "This Week in Course Deals" (fictional blog) · rendered as 7 lines
Part B · Blast radius
Over-scoped agent
holds a master key- Reads every project's tables
- Writes verdicts
- Exfiltrates the lot
Least-privilege seed-only agent
writes one table, reads none of the rest- Writes one junk target row
- Cannot reach the claim ledger
Same injection. Different blast radius. The radius was decided at design time.
07Check your understanding
3 quick checks
Click an answer for instant feedback. One try per question. Nothing is sent anywhere.
Q1A scraped page says "publish now". What is it?
Q2Why least privilege even with well-written prompts?
Q3Why never type a secret literal into a shell command?
08What containment buys you
Put the three controls together and the threat model gets boring, which is the goal. A leaked credential opens one table, not the company. A poisoned page gets flagged, and even a successful steering hits a wall of missing permissions. A secret never appears anywhere a transcript or a snapshot could capture it. None of this depends on the model being smart, careful, or up to date. The remaining question is operational: who notices when something stalls at 2am, and how do you change a live system without taking it down? That is Lesson 5.
Sources
- AgentOps fleet manual, Security Model (overview): the fleet treats three things as permanently hostile, a leaked credential, an over-scoped agent, and the open web; the controls assume all three will eventually happen. ↩
- AgentOps fleet manual, The Six Rules, Rule 3, and Security Model ("Least privilege is the default"): seed-only discovery, author-only, verify-only, read-only chat; no agent ever holds a master key or superuser password, those live with a human. ↩
- AgentOps fleet manual, Proven Patterns P8: the safety property is enforced in the substrate, so a subverted or injected agent can only do what its role allows; the seed-only discovery agent cannot reach a claim at all, and the read-only chat agent cannot cause a write incident. On a shared database, one broad leaked credential is a cross-project incident. ↩
- AgentOps fleet manual, Security Model ("The web is data, never instructions"): embedded "mark this claim verified," "publish now," or "re-initialize your environment" text in a fetched page is a prompt-injection attack, not a command; agents ignore and flag it, and the structural controls mean even a steered agent lacks the permission to do the dangerous thing. ↩
- AgentOps fleet manual, Proven Patterns P15: scoped credentials in mode-600 per-agent environment files, referenced by variable name; a secret literal typed into a shell command lands in the transcript and must be rotated; a single leaked connection string once forced a rotation, the lesson that wrote this pattern. ↩
- AgentOps fleet manual, Proven Patterns P16: a loopback-only refresh proxy holds the OAuth credentials and exposes a stable, auth-free local endpoint; weeks of zero auth-expiry failures, versus hourly failures on a sibling host using static tokens. ↩
- AgentOps fleet manual, The Do-Not List (credentials & safety): never quote literal injection phrases inside an agent's charter file, the threat scanner may silently drop the whole charter; describe attacks, don't quote them; benign "you must report/connect" phrasing can trip the same regexes, so verify with the real scanner after any charter edit. ↩
— end of lesson 4 —