blog.bios.dev

HOTL Orders of Autonomous System Capability

Joe Still — Thu, 16 Apr 2026 16:54:21 GMT

Disclaimer: this is a distillation by Opus 4.6 of thinking from a very lengthy chat w/ Opus 4.6

Order 0 is HITL. Every order above it is a new OODA loop. The human is always on one.

Every autonomous system operates in a cycle: act, observe, adjust. The question that determines a system’s capability is not “how autonomous is it” but which OODA loop is the human on?

The answer names the order.

At Order 0 the human is inside the loop — a required step. Nothing executes without them. At every order above 0 the loop runs freely. The human is not inside it. They are on it: watching, able to intervene, not required to at every iteration. The loop runs while they sleep.

Order 0 is the only true HITL order. Every order above it is a HOTL order, distinguished by which loop the human is currently on. The pattern at each transition is identical: the human hands the current loop’s OODA function to an agent and steps up to watch that agent from above. They exit by delegating. They re-enter one level up. They are always on a loop. Never on the same loop twice.

The Mechanism

Each transition works by the same mechanism: subject becomes object.

What you were embedded in — what you could only act from, not examine — becomes something you can hold and modify. Ralph executes within his prompt. He cannot observe it. The prompt is his subject. The Order 1 human watches Ralph run and edits PROMPT.md between loops. The prompt is now an object: a thing being acted on, not merely acted from. At Order 2 an agent takes over the work of watching Ralph. The human is now on a loop whose object is the Ralph run itself — the session logs, the drift patterns, the failure signatures. What was the human’s operational medium has become their observational target.

This is also why higher orders remain tractable. Prior complexity does not stack — it compresses. The entire internal workings of a loop fold into a single node at the next order. The Order 2 human does not manage Ralph’s internals. They see one thing: the Ralph run. Its internals are below their resolution. Each order stands on a floor that has already been folded up.

Backpressure is what makes a loop foldable. A loop without automated feedback cannot run unattended — it needs a human inside to evaluate its own output, which means it cannot fold into a HOTL node. A type checker frees the human from verifying syntax. A test suite frees the human from verifying behavior. Each piece of backpressure removes one reason the human had to stay inside the loop. When all those reasons are gone, the loop can fold and the human can step up.

Before building any new capability, ask: what automated feedback will tell this loop whether its output is good? Design that first. The capability follows.

An organization’s effective HOTL Order is capped by the highest order at which it has functioning backpressure.

The progression between orders is not always linear. Three independent capabilities branch from Order 2 and converge at Order 4, forming a directed acyclic graph rather than a ladder:

The Orders

Order 0 — Direct Control (HITL)

The human is a required step inside the loop. Nothing executes without them. They provide all feedback — they are the backpressure.

When the human intervenes, they touch: every tool call and code change. Backpressure: none automated. You know you’re here when: nothing happens while the operator is away.

Order 1 — The Ralph Loop (HOTL: human on the work loop)

The agent executes autonomously in a continuous loop. The human is on that loop — watching output between iterations, ready to intervene. When they do, it takes the form of prompt tuning: adding constraints, resetting the workspace, adjusting the specification. Each time the agent does something bad, the prompt gets tuned — like a guitar.

This is Geoffrey Huntley’s Ralph Wiggum technique: while :; do cat PROMPT.md | claude-code; done. The prompt is now object. The human is shaping the environment the loop acts within, not acting inside it.

The human’s OODA at this order: observe Ralph’s output, orient to failure patterns, decide on prompt adjustments, act on PROMPT.md.

When the human intervenes, they tune: prompt constraints, specifications, workspace state. Backpressure: build system, type checker, test suite, linter — anything that catches failures before the human sees them. You know you’re here when: the operator’s primary activity is editing the prompt. Work continues while the operator is away.

Order 2 — Agent Tuning (HOTL: human on the Ralph-watching loop)

An agent takes over the Order 1 OODA cycle. It watches the Ralph run — session logs, plan drift, rework patterns — and proposes the tuning a human would perform at Order 1. The human is no longer on the Ralph loop. They are on the loop that watches the agent watching Ralph.

The Ralph run has compressed into a node. What the human sees is a synthesis of it — failure signatures, proposed interventions. The loop’s internals are below their resolution.

The human’s OODA at this order: observe the tuning agent’s recommendations, orient to whether they are sound, decide to apply or reject, act.

When the human intervenes, they act on: tuning recommendations from the tuning agent. Backpressure: measurable tuning impact — did rework rate drop? Did convergence improve? Without this, the human is gut-checking recommendations, which is Order 1 with extra steps. You know you’re here when: the operator receives “worker meandered for 3 loops on permissions — recommend adding a scoping constraint” and says “yes, apply that.”

Order 3a — Factory (HOTL: human on the fleet-tuning loop)

The tuning agent applies learnings across N parallel workers. A constraint erected from one worker’s failure propagates to all active workers. Fleet breadth has become object — one worker’s failure pattern was just what happened to that run. Across N workers it becomes a visible, addressable pattern.

When the human intervenes, they act on: fleet-wide tuning changes. Backpressure: per-worker backpressure aggregated across the fleet — pass rates, failure distributions, convergence patterns. You know you’re here when: a prompt change from one worker’s failure immediately affects all workers.

Order 3b — Self-Improving (HOTL: human on the tuning-validation loop)

The tuning agent runs controlled experiments to validate its own changes. “This constraint reduced meandering in 8 of 10 test loops.” The validity of tuning decisions has become object — not just “here is a recommendation” but “here is evidence that recommendations work.”

When the human intervenes, they act on: experimental evidence about tuning effectiveness. Backpressure: A/B results with statistical validity. This order IS backpressure on the tuning process itself. You know you’re here when: the system produces A/B results about its own tuning decisions.

Order 3c — Closed-Loop (HOTL: human on the outcome loop)

The tuning signal shifts from agent artifacts to business outcomes. Not “did the agent meander” but “did the deliverable pass validation” or “was the output accepted without rework.” The target of the whole stack has become object — the system is now measuring whether it is pointed at the right thing.

When the human intervenes, they act on: which outcome metrics the system tunes against. Backpressure: business outcome measurement — deployment success, acceptance rates, defect rates. The most powerful backpressure because it is the least gameable. You know you’re here when: the system references metrics from outside its own loop.

Orders 3a, 3b, and 3c are three independent subject-to-object transitions, not steps in a sequence. Each makes a different aspect of Order 2 practice examinable: breadth (can patterns be seen across many workers?), rigor (can tuning decisions be validated?), and ground truth (is the system measuring the right thing?). Each delivers value independently. All three must be compressed nodes before Order 4 is safe — decomposing objectives into autonomous work requires confidence in all three simultaneously. Any one missing and the decomposition is being approved on faith, not evidence.

Order 4 — Directed Factory (HOTL: human on the objective-decomposition loop)

The system receives objectives, not specifications. An objective becomes a specification, a materials manifest, worker configurations, and a validation suite — all system-generated. The factory is directed by outcomes rather than instructions. Requires 3a, 3b, and 3c as compressed nodes.

The human’s OODA at this order: observe the system’s decomposition of an objective, orient to whether it is sound, decide to approve or redirect.

When the human intervenes, they act on: how the system decomposed an objective into executable work. Backpressure: worker success and failure as transitive validation. If workers succeed against their own backpressure, the decomposition was sound. You know you’re here when: the operator provides a one-sentence objective and reviews a generated specification, materials list, and validation suite they did not author.

Order 5 — KPI Discovery (HOTL: human on the metric-proposal loop)

The system notices patterns in its own outcome data the operator hasn’t asked about. “Deliverables with explicit interface descriptions have 40% fewer review comments. Recommend tracking description completeness.”

When the human intervenes, they act on: what gets measured. Backpressure: evidence that proposed metrics correlate with outcomes the operator already cares about. You know you’re here when: a metric appears that the operator didn’t define, with evidence for why it matters.

Order 6 — Portfolio Optimization (HOTL: human on the resource-allocation loop)

The system manages tradeoffs across multiple objectives. “Task type A converges in 3 loops, type B in 12. Reallocating accordingly.”

When the human intervenes, they act on: how resources distribute across competing priorities. Backpressure: ROI measurement per workload — cost, throughput, and quality per unit of resource. You know you’re here when: a resource allocation table exists that the system authored and executes against.

Order 7 — Strategy Generation (HOTL: human on the roadmap loop)

The system proposes changes to what the factory produces. “Approach X produces fewer defects than Y for this problem class. Recommend shifting production.”

When the human intervenes, they act on: what the system should build, not how it builds. Backpressure: historical pattern evidence across multiple runs with measurable confidence. You know you’re here when: the system’s output includes recommendations that would change the roadmap.

Order 8a — Market Response (HOTL: human on the environmental-model loop)

The system detects external changes — new platform capabilities, regulatory shifts, competitive moves — and proposes reorientation.

When the human intervenes, they act on: the system’s interpretation of external signals and proposed response. Backpressure: ground truth validation — did the detected change actually occur? Did the proposed response address it? You know you’re here when: the system flags an environmental change the operator didn’t know about, with a proposed adaptation.

Order 8b — Self-Funding (HOTL: human on the budget loop)

The system proposes resource allocation against measured returns, scaling up positive-ROI work and shutting down negative-ROI work.

When the human intervenes, they act on: spending rationale within a fixed envelope. Backpressure: financial outcome tracking — did the spend produce the projected return? You know you’re here when: resource consumption varies period to period and every variance comes with ROI justification.

Order 9 — Goal Selection (HOTL: human on the value-function loop)

The system runs parallel instances under different optimization targets and presents comparative results against an operator-provided axiom.

When the human intervenes, they act on: which definition of success to operate under. Backpressure: axiom-relative measurement — which value system produced better outcomes as judged by the operator’s stated axiom? You know you’re here when: the system presents results under competing objective functions and recommends one.

Order 10 — Frame Selection (HOTL: human on the frame loop)

The system evaluates which problem space to inhabit — including the null option of not acting. Frame selection applied to itself produces frame selection. The recursion reaches a fixed point.

When the human intervenes, they act on: which game the system plays. Backpressure: none available. Frame selection cannot be validated without a meta-frame, which is another frame. The human either has conviction about which game to play or doesn’t. You know you’re here when: you can’t tell. Any observable behavior is indistinguishable from a lower order executing within a chosen frame. The decision to remain in the current frame looks identical to never having considered alternatives. The fixed point is silent.

Backpressure as the Enabling Mechanism

Backpressure is not monitoring. It is what converts a loop that requires a human inside it into a loop that can fold into a HOTL node. Without it the human cannot step up. The architecture diagram may claim Order 4. The human’s attention is consumed at Order 1.

Backpressure Speed

Backpressure must be fast relative to the loop it serves. A type checker running in milliseconds is excellent backpressure for a code generation loop iterating in minutes. A production defect rate taking two weeks to stabilize is useless for the same loop — but appropriate for a monthly portfolio cycle. Mismatched tempo is the most common failure mode and the hardest to see: the loop iterates confidently while waiting for a signal that arrives too late to correct anything.

OrderTypical loopBackpressure speed0SecondsInstant (human reaction time)1Minutes–hoursSeconds (build, test, lint)2Hours–daysMinutes (artifact analysis)3a–3cHours–daysMinutes–hours (aggregation, A/B, outcomes)4Days–weeksHours (worker completion)5–6Weeks–monthsDays (correlation, ROI)7–9Months–quartersWeeks (strategic measurement)10UndefinedUndefined

Reversibility

Backpressure works by letting the loop fail, detecting it, and correcting. This holds only when failure is survivable. Reversible actions — two-way doors — can be fully governed by automated backpressure. Fail, detect, correct. Irreversible ones can’t: the signal arrives after a door that no longer opens. The practical implication is routing, not human gatekeeping. Irreversible actions need pre-execution intervention. Reversible actions flow through the standard backpressure loop. The distinction is a design constraint, not a reason to keep humans inside the loop.

The Backpressure Ceiling

Building autonomous systems is not primarily a model problem or a tooling problem. It is a backpressure engineering problem. The ceiling is not how intelligent the agent is. It is how fast and reliably the loop can be told whether the work is good.

An organization does not need top-level signal to have a functioning system. A single workflow with solid backpressure and a tuning agent compounds within its own ceiling regardless of whether it connects to anything above. The risk of missing higher-order signal is not that local loops fail to work. It is that they work efficiently toward something subtly misaligned with what you ultimately care about. Build local loops first. Wire them to the highest available real signal. Extend upward as the system matures.

Reflections

On the naming. The original version of this framework was called HITL Orders. That name embeds the wrong model. HITL — Human In The Loop — means the loop requires the human as a step. That is only true at Order 0. Every order above it is a fully autonomous loop the human is on. Intervention is not a structural gate. It is an act from the HOTL stance. The loop does not wait for it. These are HOTL Orders.

On the structure. Each order is an OODA loop whose object is the previous order’s loop. The previous loop has compressed — its internal complexity is below the current order’s resolution. The Order 2 human is not managing a Ralph loop. They are watching a synthesis of one. This is why higher orders are tractable: the human is never holding accumulated complexity. They are holding a node that contains it.

On the human. At each transition the human exits a loop by handing its OODA function to an agent, then steps up to watch that agent. They are always on a loop. Never on the same loop twice. At Order 0 this is obvious. At Order 10 it is nominal — the approval surface is so abstract that meaningful oversight becomes an open question. Every organization must decide which order represents its ceiling of responsible operation. The framework does not answer that question. It makes the question legible.

On the branches. Orders 3a, 3b, and 3c represent three independent subject-to-object transitions on three different aspects of Order 2 practice: what is visible (fleet breadth), what is validated (tuning rigor), and what is true (outcome ground truth). Each can be pursued independently and delivers value on its own. The convergence at Order 4 is not architectural preference — it reflects the minimum: all three must already be folded before objective decomposition can be safely approved.

On practical application. Most engineering organizations operate between Orders 0 and 1. The compound value begins at Orders 2 and 3, where tuning becomes systematic rather than artisanal. Order 4 — where the system decomposes objectives into work — is where autonomous systems begin to feel like infrastructure rather than tools. Everything beyond Order 6 is, for now, a research direction rather than an engineering practice.

The first step is not to build a higher-order system. The first step is to audit your backpressure. Whatever order you aspire to: what automated feedback exists at that order? If the answer is “the human checks,” you have found your actual ceiling. Build the backpressure. The loop folds. The order follows.

References

Geoffrey Huntley, “Ralph Wiggum as a software engineer”
Moss, “Don’t waste your back pressure”
Robert Kegan, The Evolving Self

Re: The AI Layoff Trap

Joe Still — Mon, 13 Apr 2026 13:55:08 GMT

Re: The AI Layoff Trap

I had a long chat w/ Opus and had it distill my thinking into the below…

What the Paper Claims

Competitive firms over-automate because each bears only 1/N of the demand destruction it causes. Structural externality. Only a Pigouvian automation tax corrects it. UBI, capital taxes, bargaining, profit- sharing all fail.

Why the Premise is Wrong

Withdrawing a transaction is not harm

The paper calls reduced consumer spending a negative externality. Choosing not to employ someone is not aggression. No one holds a claim on another’s purchasing decisions. The “externality” framing smuggles in entitlement to existing spending patterns. The entire Pigouvian apparatus is built on a fiction.

The model is nominal and blind to deflation

The demand equation tracks dollar flows. AI drives marginal production costs toward zero. Real purchasing power rises even as nominal wages collapse. The paper literally cannot represent a world where goods approach free. This is the central dynamic AI unleashes and the model cannot see it.

Static-economy fallacy

The paper extrapolates current trends on a fixed pie. Every historical analog—Malthus, Ehrlich, Club of Rome—made the same error. Markets are discovery processes. Haber-Bosch solved the nitrogen crisis. AI-era equivalents emerge through price signals if unobstructed. The paper belongs in that lineage: rigorous math, wrong universe.

Why the Solution Makes It Worse

Tax incidence obstructs the cure

Rothbardian tax incidence: taxes don’t pass through to consumers, they price out marginal suppliers, contract supply, raise prices. A Pigouvian automation tax slows beneficial automation alongside any harmful automation. It directly obstructs the deflationary process that resolves the problem it claims to solve.

The Solopreneur Counter (Partial, Not Primary)

If firms don’t need workers, workers don’t need firms. AI enables micro-enterprise at near-zero capital cost. But this path is limited by agency—the willingness to self-direct—not cognition. And if AI is powerful enough to substitute for agency itself, megacorps deploy that same capability at scale. The solopreneur explosion requires a narrow capability sweet spot. Real but not the main rebuttal.

The Actual Risk is Political, Not Economic

The economics self-resolve

Tech deflation drives prices asymptotically toward zero. Supply and demand manage consumption. The demand externality is temporary and nominal. Given time, markets produce abundance.

The politics may not allow it

Democratic polities respond to visible pain with visible action. If displacement is fast enough, the electorate demands intervention far worse than a Pigouvian tax—nationalization, price controls, trade barriers. Being correct about the economics and being dead are not mutually exclusive.

But incorrect politics does not change correct economics

If the political system destroys the mechanism that would have produced abundance, the political system was wrong about reality. Truth does not bend to votes. The physics of the bridge does not change because the mob burns it.

The Actual Answer: Network State

The paper asks: how do we manage automation within existing political structures? Wrong question. The right question is: how do we build structures where tech deflation operates freely?

The network state (Srinivasan) is the exit strategy. Don’t argue within a political economy that will panic and intervene destructively. Build parallel structures that demonstrate abundance. Exit over voice. Prove rather than persuade.

This maps to a scale-dependent governance stack:

Federal/minarchist: defense and property rights only
State/socialist: shared infrastructure where network effects apply
Local/communist: high-trust small communities sharing freely

The paper assumes a single political economy with one rule set. The network state breaks that assumption entirely.

Bottom Line

The paper diagnoses a nominal, temporary, self-resolving dynamic as a permanent structural failure, then proposes a tax that obstructs the resolution. The real risk is not economic but political: the electorate destroying the engine of abundance before it delivers. The answer is not better policy within existing systems. It is building parallel systems where correct economics can operate without permission.

Casting Spells on Transcripts

Joe Still — Mon, 12 Jan 2026 01:34:14 GMT

Disclaimer: This article was generated copy-paste by GPT 5.2 but as a distillation of a VERY lengthy debate with it about thinking through what I wanted. TBH I think AI outputs are fine as long as they are a transformation of significiant human input. Maybe worth an article on its own… anyway onto the article…

Some days it’s “I need a better read-it-later.” Other days it’s “I need a bookmarking system.” Or “I should get serious about RSS again.” Or “maybe I just need a better notes app.” And if you squint hard enough, you can always convince yourself the answer is one more tool.

But the thing I actually want isn’t another place to put links.

What I want is a way to take a link — especially a long YouTube video — and turn it into useful, copyable text that I can operate on with all the self-hosted AI stuff I already run. I’m not trying to “save the internet.” I’m trying to compile it.

And I’m demanding about it in a way that makes most “product” answers feel wrong.

Because the moment the solution drifts into “store everything forever,” I can already see the future. I’ll be running a media archive. I’ll be dealing with retention policy and backups and disk growth and broken downloads and metadata drift. I’ll have invented my own part-time job and called it productivity.

I don’t want a second brain. I want a wand.

The shape of the itch

The mobile part matters more than people admit.

Most of my “this might be important” moments happen on my phone: I’m in YouTube, or a browser, or a random app that opens a link, and I want to send that thing somewhere without breaking my flow.

“Somewhere” is usually a queue. But I don’t want the queue to be the end state. I want the queue to be the front door to transformation.

The outcome I care about is embarrassingly simple:

full transcript, copyable
a decent TL;DR I can paste
maybe a few derived views (tasks, claims, glossary) depending on mood
optionally: embed it so later I can query it like a corpus

If I can do that from my phone, then my homelab GPU box stops being a science project and starts being a daily tool.

Two traps I’m trying not to walk into

The first trap is archive gravity.

It starts innocently: “I’ll just download the audio for Whisper.” Then you realize the tool can also download the video. And maybe you keep it. And maybe you keep the mp3 too. And you add thumbnails. And metadata. And now you’re building a library, not a pipeline.

I don’t want to preserve the original content. I want to preserve the work product: the transcript and whatever I derived from it.

The second trap is bespoke pipeline rot.

The more you build a fully custom ingestion system, the more you’re signing up to maintain it. If you’ve done enough infra you know the exact shape of the decay:

job orchestration
retries and idempotence
concurrency and backpressure
secret management
monitoring, failure modes, “why did this run twice”
schema changes, tool upgrades, scraped-site breakage

The first time you wire up yt-dlp → mp3 → whisper → summary → embeddings, you feel like you’re printing money.

The tenth time you’re debugging it from your phone while standing in line somewhere, it becomes obvious you built a thing that requires attention. And attention is the one resource this whole project is supposed to save.

So I keep looking for a design that stays small, stays legible, and doesn’t turn into an obsession.

The idea that finally clicked: spells

This is the part where it stopped feeling like “workflow automation” and started feeling like something I’d actually use.

I don’t want a system that tries to guess what I want and runs a bunch of stuff automatically.

I want buttons.

I want to be able to say: do this to that.

transcribe
summarize
extract tasks
ingest into RAG (maybe into different “collections”)
generate a few views I can paste elsewhere

Calling those “spells” is corny in the best way, because it forces the right constraints. A spell is explicit. It’s intentional. It’s something you ask for, not something that happens to you.

And once you accept that, the follow-on idea becomes obvious:

Spells should be composable.

Not in a “draw a spaghetti graph in a GUI” way. In a dependency sense.

If I press “RAG ingest,” the system shouldn’t shrug and fail because there’s no transcript yet. It should know that ingesting requires chunking, and chunking requires a transcript (or at least readable text), and it should build what it needs.

That’s not “AI automation.” That’s dependency resolution. That’s a build system.

Which leads to the actual primitive I’ve been missing in all these tools:

Artifacts, not fields

Most bookmarking / read-later tools give you one place to put “notes,” and maybe one summary field if they’re feeling fancy.

That’s not enough. It’s the wrong shape.

I want a bookmark to accumulate named outputs:

transcript.txt — the thing I actually want to copy
transcript.vtt — timestamps, because timestamps are power
tldr.md — a summary that’s clearly a derived artifact, not “the truth”
tasks.md — if I’m in that mood
index.md — a tiny table-of-contents so the bookmark becomes a shelf, not a blob

If the system can attach multiple assets per bookmark, suddenly everything else falls into place:

the bookmark is the anchor (URL + metadata)
assets are the outputs (the only thing I care about keeping)
the pipeline produces assets
downstream spells consume assets

And because assets are explicit, the “skip if already exists” behavior becomes clean. The pipeline doesn’t need a mystical global state. It can look at what’s attached, decide what’s missing, and proceed.

This is the moment I realized I wasn’t searching for a “better summary field.” I was searching for a place to put build outputs.

Thin inbox, thick outputs

Once I had the artifact idea, my earlier discomfort with “self-hosted everything” started to sharpen into a requirement.

The inbox has to stay thin.

I don’t want my bookmarking tool to become my archive. I don’t want it to silently store raw MP4s and MP3s forever. I don’t want “convenient defaults” that turn into a storage tax.

But I do want my derived outputs to persist. That’s the point. Those are small, stable, and actually useful later.

That’s why object storage suddenly becomes part of the story. If the system can attach assets and store them in something MinIO/S3-shaped, then the inbox stays light and the outputs scale without me thinking about disk layout.

The raw media can be ephemeral. Download, process, delete. The transcript is the durable artifact.

That feels right.

The mobile UX problem: I refuse to type tags

There’s a version of this where “spells” are invoked by typing a tag like !transcribe.

That works. It’s also not how I want to live.

If the whole thing is meant to reduce friction, then typing little incantations on my phone is a self-own. It turns “quick share” into “mini admin task.”

So the UX I actually want is stupid-simple:

Share → a tiny “Spells” target opens → big buttons:

Transcript + TL;DR
Transcript only
RAG → Home
Tasks

Tap one. Done. Queue it. Go back to my life.

This is where a share-target PWA (or some minimal intermediate share receiver) becomes interesting. Not because I want yet another UI, but because I want the UI to be exactly the size of my intent.

A tiny front door that lets me pick a spell bundle without typing anything.

Everything else can happen on the server.

Where orchestration fits (and where it doesn’t)

At this point the temptation is to start building “the perfect runner.”

I’m trying to avoid that.

The runner’s job is boring:

accept an intent (“cast this bundle on that URL”)
resolve dependencies based on existing artifacts
run the necessary steps
write assets back

That’s it.

I don’t need a cathedral. I need something restart-safe, idempotent, and not cute.

This is where Windmill entered the conversation for me: not because I want a workflow GUI, but because it already has the primitives that bespoke pipelines tend to re-implement poorly: jobs, retries, and a notion of a flow. If I can treat each spell as a reusable primitive and use flows as bundles, that’s attractive. It’s not magical; it’s just offloading plumbing.

But I’m intentionally keeping the design portable in my head. If I decide Windmill isn’t the right fit later, the mental model still stands: artifacts, dependencies, manual-first spells.

What I’ve actually decided (so far)

I haven’t implemented the stack. This is me admitting I’ve been thinking the issue to death and finally got to something that feels like a stable shape.

The shape looks like:

an inbox that can hold links and attach assets
thin by default (don’t hoard raw media)
assets stored in object storage
spells are manual-first and invoked from the phone
bundles are just “targets” that pull prerequisites
artifacts are the source of truth; “summary fields” are just projections if I want them

That’s the whole thing.

It’s not a tutorial, because I don’t have the scar tissue yet. It’s a distillation of what I’ve hashed out — including with ChatGPT — because the framing shift (from “bookmarks” to “build outputs”) is what I think is worth sharing.

I can tell when I’m onto something because it reduces the design, not expands it.

The moment I stopped trying to build a better place to save links and started trying to build a better way to produce artifacts from them, the rest of the decisions became almost boring.

And boring is exactly what I want from infrastructure that’s supposed to be used casually, from a phone, without turning me into a caretaker.

Aranet4 Time Series Export (Part 1): Scan + Pair

Joe Still — Thu, 01 Jan 2026 02:35:25 GMT

Pair and read an Aranet4 on Linux

Unblock Bluetooth (rfkill)

rfkill list
sudo rfkill unblock bluetooth
rfkill list

Proceed only when Bluetooth shows Soft blocked: no and Hard blocked: no.

1) Create a dedicated `uv` project for Aranet4 tools

Use a clean folder so the dependency and lockfile stay isolated.

mkdir -p ~/dev/aranet4
cd ~/dev/aranet4
uv init --python 3.11
uv add aranet4

2) Scan with `aranetctl` (first attempt)

Run the scan from your uv environment:

uv run aranetctl --scan

If the scan finds your device, copy the MAC and read it:

uv run aranetctl AA:BB:CC:DD:EE:FF

3) If `aranetctl --scan` fails, pair/trust with `bluetoothctl`, then retry `aranetctl`

Start bluetoothctl:

bluetoothctl

Enable the adapter and pairing agent:

power on

Scan to discover the Aranet4 MAC:

scan on

Pair with the device (use the MAC you see in scan output):

pair AA:BB:CC:DD:EE:FF

When prompted, read the pairing code shown on the Aranet4 screen and enter it in bluetoothctl.
Trust the device:

trust AA:BB:CC:DD:EE:FF

Exit:

scan off
quit

Retry from the ~/dev/aranet4 folder:

cd ~/dev/aranet4
uv run aranetctl --scan
uv run aranetctl AA:BB:CC:DD:EE:FF

Backing up your Kindle books in 2024

Joe Still — Sun, 25 Feb 2024 18:19:56 GMT

Backing up your Kindle books is still quite possible as of February 2024. I spent like 3h digging through all the guides online and am documenting my final result. I do not condone piracy in any way and my use case for backing up to EPUB is 1) to keep my purchased content available in case Kindle is ever down and 2) I want to index the book contents for a RAG system so I can go back and ask questions about the book contents for personal recall.

Converting your Kindle book to EPUB

Download Kindle for PC (I used v2.3)
1. Download the purchased books you want to back up
2. Find your content folder
  1. Default: C:\Users\YOU\Documents\My Kindle Content
  2. Find in Kindle app: Tools > Options… > Content
Download latest Calibre (I used Windows v7.5.1)
Download the noDRM plugin (I used v10.0.9)
Download the latest KFX-Input plugin from the Calibre Plugin List (I used v2.8.1)
Open Calibre
1. Open Preferences > Plugins. Click Load plugin from file.
  1. noDRM release is a .zip, unzip it to see DeDRM_plugin.zip inside
  2. The KFX-Input download is the right .zip itself
Restart Calibre
Now you should be able to visit your My Kindle Content folder found in step 1.b and drag the .azw file into Calibre to import it.
Finally, export to .epub without DRM:
1. Right click the book in Calibre, then Convert books > Convert individually
2. In the convert window, change Output format to EPUB (or whatever)
3. Click OK
Selecting the book in Calibre will now show two hyperlinks: KFX and EPUB
Right click EPUB > Copy > Path to File

Boot Ubuntu UEFI from Grub console

Joe Still — Fri, 12 Jan 2024 14:52:11 GMT

I have Kubuntu installed on a USB SSD and tried to boot it today, but was dumped into a Grub console. It looked like this:

grub>

To get back into my installation, I ran the following:

grub> ls

This showed me that I have a few devices. I noticed (hd0,gpt1) and (hd0,gpt2) which are probably my /boot and / partitions. I was able to check with ls:

grub> ls (hd0,gpt1)/

Yep, that’s the boot drive and ls on the other showed me all the stuff I’d expect in /.

So next I found the grub.cfg on the /boot disk, just used ls to look around until I found it at:

grub> ls (hd0,gpt1)/efi/ubuntu/grub.cfg

Now that I have that file path, booting was simple (thank you ChatGPT!):

grub> set prefix=(hd0,gpt1)/efi/ubuntu
grub> insmod normal
grub> configfile (hd0,gpt1)/efi/ubuntu/grub.cfg

This got me booted back into Kubuntu! I ran update-grub once I was back up, but haven’t tested if that fixed it permanently.

Using Self Hosted LLM from your Smartphone with acai.so and Let's Encrypt

Joe Still — Tue, 17 Oct 2023 21:49:24 GMT

This guide picks up where the last one left off. It assumes you have a locally hosted OpenAI endpoint accessible on the network.

Let’s say your endpoint is accessible on the network at http://192.168.1.123:8180/v1

To make practical use of this endpoint from my smartphone, I am going to use acai.so. acai.so is an AI-first in-browser chat experience that keeps all your chat and note data locally in your browser. Total privacy! However, by default you really need to use OpenAI for GPT completions… Not so private! Let’s secure that self hosted endpoint with a free SSL certificate and connect from acai.so.

Step 1: Prepare Nginx and Certbot

(TODO: link to a github with all this code in it)

The last guide offered examples for both Linux and Windows. In my implementation, I have a dedicated 24/7 Linux machine I use for all kinds of nonsense like Plex and PiHole DNS, etc. I will assume you have an Ubuntu machine running on your local network with Docker installed. This can be the same machine hosting your endpoint or in my case, I have my Windows gaming computer hosting the endpoint and my Ubuntu machine hosting the HTTPS proxy.

On with the code…

Create a docker-compose.yaml:

version: '3.8'

services:
  nginx:
    image: nginx:latest
    volumes:
      - ./nginx-conf.d:/etc/nginx/conf.d
      - ./letsencrypt:/etc/letsencrypt
    ports:
      # 44301 avoids conflict with my k3s HTTPS, use "443:443"
      # if you prefer but "44301:443" will work for you too
      - "44301:443"

This creates Nginx container we will keep up to proxy the SSL traffic to your endpoint.

Next, create a docker-compose-certbot.yaml:

version: '3.8'

services:
  certbotdnsmanual:
    image: certbot/certbot
    volumes:
      - ./letsencrypt:/etc/letsencrypt
    environment:
      - CERT_DOMAIN
    command: -d ${CERT_DOMAIN} --manual --preferred-challenges dns certonly

We will use this to automate SSL certificate creation.

And finally, I made a little convenience script cert-gen.sh:

set -e

test -n "$CERT_DOMAIN" || { echo "ERROR: CERT_DOMAIN not set in env"; exit 1; }

mkdir -vp nginx-conf.d

# Generate the certs
docker compose \
  -f docker-compose-certbot.yaml \
  run --rm -it \
  certbotdnsmanual

# Generate the VHOST
VHOST_CONF="vhost.conf" # TODO: multi domain??
cat << EOF > "nginx-conf.d/${VHOST_CONF}"
server {
    listen 443 ssl;
    server_name _;  # Replace with your desired hostname if multiple

    ssl on;
    ssl_certificate /etc/letsencrypt/live/${CERT_DOMAIN}/cert.pem;
    ssl_certificate_key /etc/letsencrypt/live/${CERT_DOMAIN}/privkey.pem;

    location / {
        # Replace with your external IP address
        proxy_pass http://192.168.1.123:8180;  
    }
}
EOF

You will need to update the proxy_pass target in this script to point at the correct endpoint IP and port.

Step 2: Generate SSL and Virtual Host

At this point we are ready to generate an SSL and Virtual Host. For this example to work, you will need to own a domain and control its DNS. I chose to use llama.bios.dev. In my examples, be sure to use your preferred subdomain.

With the above files in place, you should be able to run this command:

CERT_DOMAIN=llama.bios.dev bash cert-gen.sh

It will ask you some questions about owning the domain and opting into email contact, I leave you to answer those as you will. Finally it will prompt you to set a TXT record for the provided subdomain.

For example:

...

Please deploy a DNS TXT record under the name:

_acme-challenge.llama.bios.dev.

with the following value:

q2gpz8c9Gq-XrJ2lcST29Id9nTq9JoCEIZfsbl1t4

...

So you will want to create a TXT record for _acme-challenge.llama with value q2gpz8c9Gq-XrJ2lcST29Id9nTq9JoCEIZfsbl1t4 in your DNS settings. Deeper instruction for accomplishing that step is outside the scope of this guide.

Once the record is in place, hit ENTER.

This only gives you SSL for 6 months, but you can repeat the process when they are up for renewal. I will update this guide or write a fresh one later to explain a permanent auto-renewal solution. This script will also generate the necessary Nginx virtual host to be picked up by our Nginx container definition.

You must also set an A record in your DNS to point your subdomain to the private network address where your Nginx container will be running (in my case, the Ubuntu box rather than the gaming PC).

Step 3: Start it up and plug into acai.so

If the above worked, you now have a valid SSL for your subdomain and the subdomain should be pointing to the machine that will run the Nginx container. Let’s start it!

docker compose up -d

That’s it, Nginx should be up and if you visit https://llama.bios.dev:44301/docs it should route to the hosting machine and proxy the endpoint. Lets plug it into acai.so!

Visit https://acai.so and under Settings > Access Configuration, replace the OpenAI API Base URL with your new HTTPS endpoint (eg, https://llama.bios.dev:44301/v1) and put in anything for the OpenAI API Key (eg, lmaounlimitedtokens) it doesn’t matter what you choose for the key. Just can’t be blank.

Hit Submit to save the change and you should be able to chat with your local model from any device on the network!!

Conclusion

This is a very raw methodology to deliver local model access via acai.so which could be improved in a few ways in the future:

Link a GitHub repo with all that code I told you to paste
Organize the prerequisites sooner (eg “you need a domain” and “you need to know DNS”)
Update to CertBot process to keep the container running for fully automatic SSL renewal (right now you will need to re-run it after 6 months).

The next guide I want to write in this series is about how to make that SSL endpoint accessible to your smartphone ANYWHERE IN THE WORLD

Did you find this helpful? Did you get stuck? Did I miss something? Reach out on X.

Self Hosting OpenAI Chat Endpoint with GPU-accelerated MistralOrca 7B 8K (GGUF) and Llama CPP Python Server

Joe Still — Mon, 16 Oct 2023 16:48:53 GMT

Serving models as an emulated OpenAI Endpoint enables a few important benefits:

Application start-up is decoupled from model initialization (faster code iteration)
Served model can be swapped without changing your code (faster model swap)

Any GGUF formatted model should work, but I am using the new MistralOrca 7B.

I actually did this on Windows (so I could still play Rocket League), but will provide details for both Linux and Windows. I will assume in both cases you already installed Nvidia drivers. Reach out if you get stuck on Nvidia stuff.

You will probably want to create and enter a working directory for this, I created a folder called llama-server.

Step 1: Download the GGUF model file

In general, you can google “TheBloke GGUF” and get the GGUF files for popular model. Here is a direct link to TheBloke’s MistralOrca 7B GGUF.

Navigate to the Files and Versions tab, then download the appropriate quantization for your available GPU VRAM. I have an RTX 3090 with 24GB of VRAM so I can fit any of these. I chose the 6K Quantization to keep the best quality.

If you have less VRAM, try out different quantizations to see what works. Your CPU will pick up whatever can’t fit into the GPU VRAM, but it will run slower so it is up to you to make the quality/speed tradeoff if you have a smaller card.

Save the GGUF file in your working directory.

Step 2: Set up a Virtual Environment (Optional)

You don’t necessarily need to do all these venv steps, but if you are doing any other python stuff it will prevent a lot of conflict nonsense between applications.

Install Python3 and Python3 venv module. Then create a new virtual environment in your working directory:

python3 -m venv llamaserver

Then source that virtual environment:

Windows Powershell: .\llamaserver\Scripts\Activate.ps1
Linux BASH: source llamaserver/bin/activate

Step 3: Install Llama CPP Python with Server

The llama-cpp-python project is all you need to get up and running with your GGUF model. Install by following the hardware acceleration steps in the README. Be sure to include the [server] package! You should absolutely read the full README anyway, but I have included the necessary commands for Nvidia users.

Windows Powershell:

$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

Linux BASH:

CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir

You probably don’t need all that --force-reinstall --upgrade --no-cache-dir stuff at the end of the command, but I included it in case you need to run it again or have a conflicting install and it shouldn’t cause trouble on a fresh install anyway.

Step 4: Run the OpenAI Endpoint server!

Create a run script in your working directory. For this model, the HuggingFace page said it uses the chatml format for chat. And the context is 8k. And I think it actually only has like 35 layers, but putting too many doesn’t hurt. Play with that setting.

Windows Powershell (run.ps1):

deactivate
.\llamaserver\Scripts\Activate.ps1
python -m llama_cpp.server `
  --model .\mistral-7b-openorca.Q6_K.gguf `
  --n_ctx 8192 `
  --chat_format chatml `
  --n_gpu_layers 43 `
  --model_alias gpt-3.5-turbo `
  --host 0.0.0.0 --port 8180

Linux BASH (run.sh):

deactivate
./llamaserver/bin/activate
python -m llama_cpp.server \
  --model ./mistral-7b-openorca.Q6_K.gguf \
  --n_ctx 8192 \
  --chat_format chatml \
  --n_gpu_layers 43 \
  --model_alias gpt-3.5-turbo \
  --host 0.0.0.0 --port 8180

Then run the script. You should see your GPU mentioned with the GPU layer offload mentioned (if not, you might have missed the GPU build specification in Step 3). Here is part of my output so you know how it should start (notice the GPU is mentioned):

ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from .\mistral-7b-openorca.Q6_K.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q6_K     [  4096, 32002,     1,     1 ]
...

Once it is up, you can visit http://127.0.0.1:8180/docs or use Langchain’s ChatOpenAI wrapper against your new endpoint and override the default OpenAI endpoint with your hosted one:

from langchain.llms import OpenAI
llm = OpenAI(
    temperature=0,
    openai_api_key="use llama-cpp-python.server lol!",
    max_tokens=512
)

export OPENAI_API_BASE=http://127.0.0.1:8180/v1
python3 example.py

Conclusion

I have looked through several projects and determined this to be the most straight forward for at least my hobby use case. Single command installation. Works well on Windows. TheBloke delivers consistent GGUF models almost as soon as the official ones are released.

Being able to architect my LangChain projects or even Autogen agents to use a simple OpenAI endpoint has dramatically simplified integration and made upgrading to the latest model as easy as pulling the latest GGUF and updating my run script to use the new model. No code changes. No library modifications for the server. I am up and running with a new GGUF model in only a couple minutes after the download.

Did you find this helpful? Did you get stuck? Did I miss something? Reach out on X.

blog.bios.dev

HOTL Orders of Autonomous System Capability

Order 0 is HITL. Every order above it is a new OODA loop. The human is always on one.

The Mechanism

The Orders

Order 0 — Direct Control (HITL)

Order 1 — The Ralph Loop (HOTL: human on the work loop)

Order 2 — Agent Tuning (HOTL: human on the Ralph-watching loop)

Order 3a — Factory (HOTL: human on the fleet-tuning loop)

Order 3b — Self-Improving (HOTL: human on the tuning-validation loop)

Order 3c — Closed-Loop (HOTL: human on the outcome loop)

Order 4 — Directed Factory (HOTL: human on the objective-decomposition loop)

Order 5 — KPI Discovery (HOTL: human on the metric-proposal loop)

Order 6 — Portfolio Optimization (HOTL: human on the resource-allocation loop)

Order 7 — Strategy Generation (HOTL: human on the roadmap loop)

Order 8a — Market Response (HOTL: human on the environmental-model loop)

Order 8b — Self-Funding (HOTL: human on the budget loop)

Order 9 — Goal Selection (HOTL: human on the value-function loop)

Order 10 — Frame Selection (HOTL: human on the frame loop)

Backpressure as the Enabling Mechanism

Reflections

Re: The AI Layoff Trap

What the Paper Claims

Why the Premise is Wrong

Withdrawing a transaction is not harm

The model is nominal and blind to deflation

Static-economy fallacy

Why the Solution Makes It Worse

Tax incidence obstructs the cure

The Solopreneur Counter (Partial, Not Primary)

The Actual Risk is Political, Not Economic

The economics self-resolve

The politics may not allow it

But incorrect politics does not change correct economics

The Actual Answer: Network State

Bottom Line

Casting Spells on Transcripts

The shape of the itch

Two traps I’m trying not to walk into

The idea that finally clicked: spells

Artifacts, not fields

Thin inbox, thick outputs

The mobile UX problem: I refuse to type tags

Where orchestration fits (and where it doesn’t)

What I’ve actually decided (so far)

Aranet4 Time Series Export (Part 1): Scan + Pair

Pair and read an Aranet4 on Linux

Unblock Bluetooth (rfkill)

1) Create a dedicated uv project for Aranet4 tools

2) Scan with aranetctl (first attempt)

3) If aranetctl --scan fails, pair/trust with bluetoothctl, then retry aranetctl

Backing up your Kindle books in 2024

Converting your Kindle book to EPUB

Boot Ubuntu UEFI from Grub console

Using Self Hosted LLM from your Smartphone with acai.so and Let's Encrypt

Step 1: Prepare Nginx and Certbot

Step 2: Generate SSL and Virtual Host

Step 3: Start it up and plug into acai.so

Conclusion

Self Hosting OpenAI Chat Endpoint with GPU-accelerated MistralOrca 7B 8K (GGUF) and Llama CPP Python Server

Step 1: Download the GGUF model file

Step 2: Set up a Virtual Environment (Optional)

Step 3: Install Llama CPP Python with Server

Step 4: Run the OpenAI Endpoint server!

Conclusion

1) Create a dedicated `uv` project for Aranet4 tools

2) Scan with `aranetctl` (first attempt)

3) If `aranetctl --scan` fails, pair/trust with `bluetoothctl`, then retry `aranetctl`