HITL Orders of Autonomous System Capability
The human never leaves the loop. The loop gets bigger
Disclaimer: this is a distillation of thinking from a very lengthy chat w/ Opus 4.6
Every autonomous system operates in a cycle: act, observe, adjust. The question that determines a system’s capability is not “how autonomous is it” but “what is the human approving?”
At the lowest order, the human approves individual actions — each file edit, each command execution. At higher orders, the human approves tuning decisions, outcome targets, resource allocations, value functions. The structure is identical at every level: a human reviews something and accepts or rejects it. Only the approval surface changes.
We call each scope of approval a HITL Order. Higher orders do not remove the human. They elevate what the human looks at.
But elevation is not free. Each order requires backpressure (Moss) — automated feedback that validates work at that order so the human doesn’t have to. Without backpressure at a given order, the human collapses back to being the feedback mechanism themselves, and the system’s effective order drops to wherever the human’s attention is consumed. A type checker frees the human from verifying syntax. A test suite frees the human from verifying behavior. A compliance scanner frees the human from verifying policy. At each order, backpressure is what makes it safe for the human to approve at a higher altitude without needing to inspect the level below.
An organization’s effective HITL Order is capped by the highest order at which it has functioning backpressure.
The progression between orders is not always linear. Three independent capabilities branch from Order 2 and converge at Order 4, forming a directed acyclic graph rather than a ladder:
The Orders
Order 0 — Direct Control (Human in the Loop)
The human approves every action. The agent proposes, the human confirms, the agent executes. The human is the backpressure.
The human approves: individual tool calls and code changes. Backpressure: none automated — the human provides all feedback. You know you’re here when: nothing happens while the operator is away.
Order 1 — Prompt Tuning (Human ON the Loop)
The agent executes autonomously in a loop. The human reads artifacts between loops — the plan, the commit history, the error patterns — and adjusts the prompt, the constraints, the specification. The human approves the environment in which actions occur, not the actions themselves.
This corresponds to Geoffrey Huntley’s “Ralph Wiggum” technique: a while-loop feeding a static prompt into an autonomous agent. The operator’s skill expresses itself through prompt construction and inter-loop adjustment. Each time the agent does something bad, the prompt gets tuned — like a guitar.
The human approves: prompt changes, constraint additions, plan resets, workspace reverts. Backpressure: build system, type checker, test suite, linter — anything that tells the agent it made a mistake before the human sees it. You know you’re here when: the operator’s primary activity is editing the prompt, erecting guardrail instructions, or resetting the workspace. Work continues while the operator is away.
Order 2 — Agent Tuning (HITL with the AGENT on the Loop)
An agent reads the worker’s artifacts — session logs, plan drift, commit history, rework patterns — and proposes the tuning a human would perform at Order 1. The human reviews proposed adjustments and approves or rejects them.
The human approves: tuning recommendations from the tuning agent. Backpressure: measurable tuning impact — did rework rate drop? Did convergence speed up? Without this, the human is gut-checking recommendations, which is Order 1 with extra steps. You know you’re here when: the operator receives a summary like “worker meandered for 3 loops on permissions — recommend adding a scoping constraint” and says “yes, apply that.”
Order 3a — Factory (HITL with the fleet)
The tuning agent applies learnings across N parallel workers. A constraint erected from one worker’s failure propagates to all active workers.
The human approves: fleet-wide tuning changes. Backpressure: per-worker backpressure aggregated across the fleet — pass rates, failure patterns, convergence distributions. You know you’re here when: a prompt change from one worker’s failure immediately affects all workers.
Order 3b — Self-Improving (HITL with the tuning methodology)
The tuning agent runs controlled experiments to validate its own changes. “This constraint reduced meandering in 8 of 10 test loops.”
The human approves: experimental evidence about tuning effectiveness. Backpressure: A/B results with statistical validity. This order IS backpressure on the tuning process — it is the mechanism by which the system validates its own adjustments. You know you’re here when: the system produces A/B results about its own tuning decisions.
Order 3c — Closed-Loop (HITL with the outcome signal)
The tuning signal shifts from agent artifacts to business outcomes. Not “did the agent meander” but “did the deliverable pass validation” or “was the output accepted without rework.”
The human approves: which outcome metrics the system tunes against. Backpressure: business outcome measurement — deployment success, acceptance rates, defect rates. The most powerful backpressure because it is the least gameable. You know you’re here when: the system references metrics from outside its own loop.
Order 4 — Directed Factory (HITL with the objective)
The system receives objectives, not specifications. An objective becomes a specification, a materials manifest, worker configurations, and a validation suite — all system-generated. The factory is now directed by outcomes rather than instructions. Requires fleet scale (3a), validated tuning (3b), and outcome measurement (3c).
The human approves: how the system decomposed an objective into executable work. Backpressure: worker success and failure as transitive validation. If workers succeed against their own backpressure, the decomposition was sound. You know you’re here when: the operator provides a one-sentence objective and reviews a generated specification, materials list, and validation suite they did not author.
Order 5 — KPI Discovery (HITL with the metric definitions)
The system notices patterns in its own outcome data that the operator hasn’t asked about. “Deliverables with explicit interface descriptions have 40% fewer review comments. Recommend tracking description completeness.”
The human approves: what gets measured. Backpressure: evidence that proposed metrics correlate with outcomes the operator already cares about. You know you’re here when: a metric appears that the operator didn’t define, accompanied by evidence for why it matters.
Order 6 — Portfolio Optimization (HITL with the resource allocation)
The system manages tradeoffs across multiple objectives. “Task type A converges in 3 loops, type B needs 12. Reallocating accordingly.”
The human approves: how resources distribute across competing priorities. Backpressure: ROI measurement per workload — cost, throughput, and quality per unit of resource. You know you’re here when: a resource allocation table exists that the system authored and executes against.
Order 7 — Strategy Generation (HITL with the roadmap)
The system proposes changes to what the factory produces. “Approach X produces fewer defects than approach Y for this problem class. Recommend shifting production.”
The human approves: what the system should build, not how it builds. Backpressure: historical pattern evidence across multiple runs with measurable confidence. You know you’re here when: the system’s output includes recommendations that would change the roadmap.
Order 8a — Market Response (HITL with the environmental model)
The system detects external changes — new platform capabilities, regulatory shifts, competitive moves — and proposes reorientation.
The human approves: the system’s interpretation of external signals and proposed response. Backpressure: ground truth validation — did the detected change actually occur? Did the proposed response address it? You know you’re here when: the system flags an environmental change the operator didn’t know about, with a proposed adaptation.
Order 8b — Self-Funding (HITL with the budget)
The system proposes resource allocation against measured returns, scaling up positive-ROI work and shutting down negative-ROI work.
The human approves: spending rationale within a fixed envelope. Backpressure: financial outcome tracking — did the spend produce the projected return? You know you’re here when: resource consumption varies period to period and every variance comes with ROI justification.
Order 9 — Goal Selection (HITL with the value function)
The system runs parallel instances under different optimization targets and presents comparative results against an operator-provided axiom.
The human approves: which definition of success to operate under. Backpressure: axiom-relative measurement — which value system produced better outcomes as judged by the operator’s stated axiom? You know you’re here when: the system presents results under competing objective functions and recommends one.
Order 10 — Frame Selection (HITL with the frame)
The system evaluates which problem space to inhabit — including the null option of not acting. Frame selection applied to itself produces frame selection. The recursion reaches a fixed point.
The human approves: which game the system plays. Backpressure: none available. Frame selection cannot be validated without a meta-frame, which is another frame. The human either has conviction about which game to play or doesn’t. You know you’re here when: you can’t tell. Any observable behavior is indistinguishable from a lower order executing within a chosen frame. The decision to remain in the current frame looks identical to never having considered alternatives. The fixed point is silent.
Backpressure as the Enabling Mechanism
Each HITL Order is only as real as the backpressure that supports it. An organization can build Order 4 capability, but if automated feedback only exists at Order 1, the human spends their time catching mistakes the system should catch itself. The effective order drops to wherever the human’s attention is consumed, regardless of what the architecture diagram claims.
Moss articulates the principle at the practitioner level: “If you’re directly responsible for checking each line of code produced is syntactically valid, then that’s time taken away from thinking about the larger goals.” Every backpressure investment at order N liberates the human to operate at order N+1. The HITL Orders framework names where that liberated attention goes.
Backpressure Speed
Backpressure must be fast relative to the loop it serves. A type checker that runs in milliseconds is excellent backpressure for a code generation loop iterating in minutes. A production defect rate that takes two weeks to stabilize is useless for the same loop — but appropriate backpressure for a monthly portfolio optimization cycle.
OrderTypical loopBackpressure speed0SecondsInstant (human reaction time)1Minutes–hoursSeconds (build, test, lint)2Hours–daysMinutes (artifact analysis)3a–3cHours–daysMinutes–hours (aggregation, A/B, outcomes)4Days–weeksHours (worker completion)5–6Weeks–monthsDays (correlation, ROI)7–9Months–quartersWeeks (strategic measurement)10UndefinedUndefined
The required speed decreases as order increases, which is fortunate — higher-order backpressure is inherently slower to produce. An organization that demands real-time feedback at Order 6 is asking for something that doesn’t exist. An organization that accepts quarterly evidence at Order 7 is operating within the natural tempo of strategic validation.
The Backpressure Ceiling
The practical implication: building autonomous systems is not primarily a model capability problem or a tooling problem. It is a backpressure engineering problem. The system’s ceiling is not determined by how intelligent the agent is. It is determined by how fast and how reliably the environment can tell the agent — and the human — whether the work is good.
Reflections
On the ceiling. The framework terminates at Order 10 not because something deeper exists beyond reach, but because there is nothing beyond it that isn’t Order 10 pointed at itself. The system that asks “which game should I play” and the system that asks “should I be the kind of system that plays games” are performing the same operation at different scopes. The recursion is real but resolves to a fixed point, not an infinite descent.
On the human. The human never leaves the loop. At Order 0, this is obvious. At Order 10, it is nominal — the approval surface is so abstract that meaningful oversight becomes an open question. Every organization must decide which order represents its ceiling of responsible operation. The framework does not answer that question. It makes the question legible.
On the branches. Orders 3a, 3b, and 3c represent three independent backpressure investments: breadth (can you see patterns across many workers?), rigor (can you prove your tuning works?), and ground truth (are you measuring the right thing?). Each delivers value independently. All three converge at Order 4 because decomposing objectives into autonomous work requires confidence in all three. The convergence is not architectural preference — it reflects the minimum backpressure surface needed to safely approve objective decomposition.
On practical application. Most engineering organizations operate between Orders 0 and 1. Practitioners running autonomous agents in while-loops operate at Order 1. The compound value unlocks at Orders 2 through 3, where tuning becomes systematic rather than artisanal. Order 4 — where the system decomposes objectives into work — is where autonomous systems begin to feel like infrastructure rather than tools. Everything beyond Order 6 is, for now, a research direction rather than an engineering practice.
The first step is not to build a higher-order system. The first step is to audit your backpressure. Whatever order you aspire to, ask: what automated feedback exists at that order? If the answer is “the human checks,” you have found your actual ceiling. Build the backpressure. The order follows.
References
Geoffrey Huntley, “Ralph Wiggum as a software engineer”
Moss, “Don’t waste your back pressure”


