<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[blog.bios.dev]]></title><description><![CDATA[blog.bios.dev]]></description><link>https://blog.bios.dev</link><image><url>https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png</url><title>blog.bios.dev</title><link>https://blog.bios.dev</link></image><generator>Substack</generator><lastBuildDate>Mon, 20 Apr 2026 01:21:58 GMT</lastBuildDate><atom:link href="https://blog.bios.dev/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Joe Still]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[bios@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[bios@substack.com]]></itunes:email><itunes:name><![CDATA[Joe Still]]></itunes:name></itunes:owner><itunes:author><![CDATA[Joe Still]]></itunes:author><googleplay:owner><![CDATA[bios@substack.com]]></googleplay:owner><googleplay:email><![CDATA[bios@substack.com]]></googleplay:email><googleplay:author><![CDATA[Joe Still]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[HOTL Orders of Autonomous System Capability]]></title><description><![CDATA[The human never leaves the loop. The loop gets bigger]]></description><link>https://blog.bios.dev/p/tuning-orders-of-autonomous-system</link><guid isPermaLink="false">https://blog.bios.dev/p/tuning-orders-of-autonomous-system</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Thu, 16 Apr 2026 16:54:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rZbd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Disclaimer: this is a distillation by Opus 4.6 of thinking from a very lengthy chat w/ Opus 4.6</em></p><h3><strong>Order 0 is HITL. Every order above it is a new OODA loop. The human is always on one.</strong></h3><p>Every autonomous system operates in a cycle: act, observe, adjust. The question that determines a system&#8217;s capability is not &#8220;how autonomous is it&#8221; but <strong>which OODA loop is the human on?</strong></p><p>The answer names the order.</p><p>At Order 0 the human is inside the loop &#8212; a required step. Nothing executes without them. At every order above 0 the loop runs freely. The human is not inside it. They are <strong>on</strong> it: watching, able to intervene, not required to at every iteration. The loop runs while they sleep.</p><p><strong>Order 0 is the only true HITL order.</strong> Every order above it is a <strong>HOTL order</strong>, distinguished by which loop the human is currently on. The pattern at each transition is identical: the human hands the current loop&#8217;s OODA function to an agent and steps up to watch that agent from above. They exit by delegating. They re-enter one level up. They are always on a loop. Never on the same loop twice.</p><div><hr></div><h3><strong>The Mechanism</strong></h3><p>Each transition works by the same mechanism: <strong>subject becomes object.</strong></p><p>What you were embedded in &#8212; what you could only act from, not examine &#8212; becomes something you can hold and modify. Ralph executes within his prompt. He cannot observe it. The prompt is his subject. The Order 1 human watches Ralph run and edits PROMPT.md between loops. The prompt is now an object: a thing being acted on, not merely acted from. At Order 2 an agent takes over the work of watching Ralph. The human is now on a loop whose object is the Ralph run itself &#8212; the session logs, the drift patterns, the failure signatures. What was the human&#8217;s operational medium has become their observational target.</p><p>This is also why higher orders remain tractable. Prior complexity does not stack &#8212; it <strong>compresses</strong>. The entire internal workings of a loop fold into a single node at the next order. The Order 2 human does not manage Ralph&#8217;s internals. They see one thing: the Ralph run. Its internals are below their resolution. Each order stands on a floor that has already been folded up.</p><p><strong>Backpressure is what makes a loop foldable.</strong> A loop without automated feedback cannot run unattended &#8212; it needs a human inside to evaluate its own output, which means it cannot fold into a HOTL node. A type checker frees the human from verifying syntax. A test suite frees the human from verifying behavior. Each piece of backpressure removes one reason the human had to stay inside the loop. When all those reasons are gone, the loop can fold and the human can step up.</p><p>Before building any new capability, ask: what automated feedback will tell this loop whether its output is good? Design that first. The capability follows.</p><p><strong>An organization&#8217;s effective HOTL Order is capped by the highest order at which it has functioning backpressure.</strong></p><p>The progression between orders is not always linear. Three independent capabilities branch from Order 2 and converge at Order 4, forming a directed acyclic graph rather than a ladder:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rZbd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rZbd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 424w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 848w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rZbd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png" width="780" height="1153" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1153,&quot;width&quot;:780,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:104668,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.bios.dev/i/194427608?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rZbd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 424w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 848w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!rZbd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F70366caf-1d8f-411a-ab66-d79d887cf8a6_780x1153.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h3><strong>The Orders</strong></h3><h4><strong>Order 0 &#8212; Direct Control </strong><em><strong>(HITL)</strong></em></h4><p>The human is a required step inside the loop. Nothing executes without them. They provide all feedback &#8212; they are the backpressure.</p><p><strong>When the human intervenes, they touch:</strong> every tool call and code change. <strong>Backpressure:</strong> none automated. <strong>You know you&#8217;re here when:</strong> nothing happens while the operator is away.</p><div><hr></div><h4><strong>Order 1 &#8212; The Ralph Loop </strong><em><strong>(HOTL: human on the work loop)</strong></em></h4><p>The agent executes autonomously in a continuous loop. The human is on that loop &#8212; watching output between iterations, ready to intervene. When they do, it takes the form of prompt tuning: adding constraints, resetting the workspace, adjusting the specification. Each time the agent does something bad, the prompt gets tuned &#8212; like a guitar.</p><p>This is Geoffrey Huntley&#8217;s Ralph Wiggum technique: <code>while :; do cat PROMPT.md | claude-code; done</code>. The prompt is now object. The human is shaping the environment the loop acts within, not acting inside it.</p><p>The human&#8217;s OODA at this order: observe Ralph&#8217;s output, orient to failure patterns, decide on prompt adjustments, act on PROMPT.md.</p><p><strong>When the human intervenes, they tune:</strong> prompt constraints, specifications, workspace state. <strong>Backpressure:</strong> build system, type checker, test suite, linter &#8212; anything that catches failures before the human sees them. <strong>You know you&#8217;re here when:</strong> the operator&#8217;s primary activity is editing the prompt. Work continues while the operator is away.</p><div><hr></div><h4><strong>Order 2 &#8212; Agent Tuning </strong><em><strong>(HOTL: human on the Ralph-watching loop)</strong></em></h4><p>An agent takes over the Order 1 OODA cycle. It watches the Ralph run &#8212; session logs, plan drift, rework patterns &#8212; and proposes the tuning a human would perform at Order 1. The human is no longer on the Ralph loop. They are on the loop that watches the agent watching Ralph.</p><p>The Ralph run has compressed into a node. What the human sees is a synthesis of it &#8212; failure signatures, proposed interventions. The loop&#8217;s internals are below their resolution.</p><p>The human&#8217;s OODA at this order: observe the tuning agent&#8217;s recommendations, orient to whether they are sound, decide to apply or reject, act.</p><p><strong>When the human intervenes, they act on:</strong> tuning recommendations from the tuning agent. <strong>Backpressure:</strong> measurable tuning impact &#8212; did rework rate drop? Did convergence improve? Without this, the human is gut-checking recommendations, which is Order 1 with extra steps. <strong>You know you&#8217;re here when:</strong> the operator receives &#8220;worker meandered for 3 loops on permissions &#8212; recommend adding a scoping constraint&#8221; and says &#8220;yes, apply that.&#8221;</p><div><hr></div><h4><strong>Order 3a &#8212; Factory </strong><em><strong>(HOTL: human on the fleet-tuning loop)</strong></em></h4><p>The tuning agent applies learnings across N parallel workers. A constraint erected from one worker&#8217;s failure propagates to all active workers. Fleet breadth has become object &#8212; one worker&#8217;s failure pattern was just what happened to that run. Across N workers it becomes a visible, addressable pattern.</p><p><strong>When the human intervenes, they act on:</strong> fleet-wide tuning changes. <strong>Backpressure:</strong> per-worker backpressure aggregated across the fleet &#8212; pass rates, failure distributions, convergence patterns. <strong>You know you&#8217;re here when:</strong> a prompt change from one worker&#8217;s failure immediately affects all workers.</p><div><hr></div><h4><strong>Order 3b &#8212; Self-Improving </strong><em><strong>(HOTL: human on the tuning-validation loop)</strong></em></h4><p>The tuning agent runs controlled experiments to validate its own changes. &#8220;This constraint reduced meandering in 8 of 10 test loops.&#8221; The validity of tuning decisions has become object &#8212; not just &#8220;here is a recommendation&#8221; but &#8220;here is evidence that recommendations work.&#8221;</p><p><strong>When the human intervenes, they act on:</strong> experimental evidence about tuning effectiveness. <strong>Backpressure:</strong> A/B results with statistical validity. This order IS backpressure on the tuning process itself. <strong>You know you&#8217;re here when:</strong> the system produces A/B results about its own tuning decisions.</p><div><hr></div><h4><strong>Order 3c &#8212; Closed-Loop </strong><em><strong>(HOTL: human on the outcome loop)</strong></em></h4><p>The tuning signal shifts from agent artifacts to business outcomes. Not &#8220;did the agent meander&#8221; but &#8220;did the deliverable pass validation&#8221; or &#8220;was the output accepted without rework.&#8221; The target of the whole stack has become object &#8212; the system is now measuring whether it is pointed at the right thing.</p><p><strong>When the human intervenes, they act on:</strong> which outcome metrics the system tunes against. <strong>Backpressure:</strong> business outcome measurement &#8212; deployment success, acceptance rates, defect rates. The most powerful backpressure because it is the least gameable. <strong>You know you&#8217;re here when:</strong> the system references metrics from outside its own loop.</p><div><hr></div><p><em>Orders 3a, 3b, and 3c are three independent subject-to-object transitions, not steps in a sequence. Each makes a different aspect of Order 2 practice examinable: breadth (can patterns be seen across many workers?), rigor (can tuning decisions be validated?), and ground truth (is the system measuring the right thing?). Each delivers value independently. All three must be compressed nodes before Order 4 is safe &#8212; decomposing objectives into autonomous work requires confidence in all three simultaneously. Any one missing and the decomposition is being approved on faith, not evidence.</em></p><div><hr></div><h4><strong>Order 4 &#8212; Directed Factory </strong><em><strong>(HOTL: human on the objective-decomposition loop)</strong></em></h4><p>The system receives objectives, not specifications. An objective becomes a specification, a materials manifest, worker configurations, and a validation suite &#8212; all system-generated. The factory is directed by outcomes rather than instructions. Requires 3a, 3b, and 3c as compressed nodes.</p><p>The human&#8217;s OODA at this order: observe the system&#8217;s decomposition of an objective, orient to whether it is sound, decide to approve or redirect.</p><p><strong>When the human intervenes, they act on:</strong> how the system decomposed an objective into executable work. <strong>Backpressure:</strong> worker success and failure as transitive validation. If workers succeed against their own backpressure, the decomposition was sound. <strong>You know you&#8217;re here when:</strong> the operator provides a one-sentence objective and reviews a generated specification, materials list, and validation suite they did not author.</p><div><hr></div><h4><strong>Order 5 &#8212; KPI Discovery </strong><em><strong>(HOTL: human on the metric-proposal loop)</strong></em></h4><p>The system notices patterns in its own outcome data the operator hasn&#8217;t asked about. &#8220;Deliverables with explicit interface descriptions have 40% fewer review comments. Recommend tracking description completeness.&#8221;</p><p><strong>When the human intervenes, they act on:</strong> what gets measured. <strong>Backpressure:</strong> evidence that proposed metrics correlate with outcomes the operator already cares about. <strong>You know you&#8217;re here when:</strong> a metric appears that the operator didn&#8217;t define, with evidence for why it matters.</p><div><hr></div><h4><strong>Order 6 &#8212; Portfolio Optimization </strong><em><strong>(HOTL: human on the resource-allocation loop)</strong></em></h4><p>The system manages tradeoffs across multiple objectives. &#8220;Task type A converges in 3 loops, type B in 12. Reallocating accordingly.&#8221;</p><p><strong>When the human intervenes, they act on:</strong> how resources distribute across competing priorities. <strong>Backpressure:</strong> ROI measurement per workload &#8212; cost, throughput, and quality per unit of resource. <strong>You know you&#8217;re here when:</strong> a resource allocation table exists that the system authored and executes against.</p><div><hr></div><h4><strong>Order 7 &#8212; Strategy Generation </strong><em><strong>(HOTL: human on the roadmap loop)</strong></em></h4><p>The system proposes changes to what the factory produces. &#8220;Approach X produces fewer defects than Y for this problem class. Recommend shifting production.&#8221;</p><p><strong>When the human intervenes, they act on:</strong> what the system should build, not how it builds. <strong>Backpressure:</strong> historical pattern evidence across multiple runs with measurable confidence. <strong>You know you&#8217;re here when:</strong> the system&#8217;s output includes recommendations that would change the roadmap.</p><div><hr></div><h4><strong>Order 8a &#8212; Market Response </strong><em><strong>(HOTL: human on the environmental-model loop)</strong></em></h4><p>The system detects external changes &#8212; new platform capabilities, regulatory shifts, competitive moves &#8212; and proposes reorientation.</p><p><strong>When the human intervenes, they act on:</strong> the system&#8217;s interpretation of external signals and proposed response. <strong>Backpressure:</strong> ground truth validation &#8212; did the detected change actually occur? Did the proposed response address it? <strong>You know you&#8217;re here when:</strong> the system flags an environmental change the operator didn&#8217;t know about, with a proposed adaptation.</p><div><hr></div><h4><strong>Order 8b &#8212; Self-Funding </strong><em><strong>(HOTL: human on the budget loop)</strong></em></h4><p>The system proposes resource allocation against measured returns, scaling up positive-ROI work and shutting down negative-ROI work.</p><p><strong>When the human intervenes, they act on:</strong> spending rationale within a fixed envelope. <strong>Backpressure:</strong> financial outcome tracking &#8212; did the spend produce the projected return? <strong>You know you&#8217;re here when:</strong> resource consumption varies period to period and every variance comes with ROI justification.</p><div><hr></div><h4><strong>Order 9 &#8212; Goal Selection </strong><em><strong>(HOTL: human on the value-function loop)</strong></em></h4><p>The system runs parallel instances under different optimization targets and presents comparative results against an operator-provided axiom.</p><p><strong>When the human intervenes, they act on:</strong> which definition of success to operate under. <strong>Backpressure:</strong> axiom-relative measurement &#8212; which value system produced better outcomes as judged by the operator&#8217;s stated axiom? <strong>You know you&#8217;re here when:</strong> the system presents results under competing objective functions and recommends one.</p><div><hr></div><h4><strong>Order 10 &#8212; Frame Selection </strong><em><strong>(HOTL: human on the frame loop)</strong></em></h4><p>The system evaluates which problem space to inhabit &#8212; including the null option of not acting. Frame selection applied to itself produces frame selection. The recursion reaches a fixed point.</p><p><strong>When the human intervenes, they act on:</strong> which game the system plays. <strong>Backpressure:</strong> none available. Frame selection cannot be validated without a meta-frame, which is another frame. The human either has conviction about which game to play or doesn&#8217;t. <strong>You know you&#8217;re here when:</strong> you can&#8217;t tell. Any observable behavior is indistinguishable from a lower order executing within a chosen frame. The decision to remain in the current frame looks identical to never having considered alternatives. The fixed point is silent.</p><div><hr></div><h3><strong>Backpressure as the Enabling Mechanism</strong></h3><p>Backpressure is not monitoring. It is what converts a loop that requires a human inside it into a loop that can fold into a HOTL node. Without it the human cannot step up. The architecture diagram may claim Order 4. The human&#8217;s attention is consumed at Order 1.</p><p><strong>Backpressure Speed</strong></p><p>Backpressure must be fast relative to the loop it serves. A type checker running in milliseconds is excellent backpressure for a code generation loop iterating in minutes. A production defect rate taking two weeks to stabilize is useless for the same loop &#8212; but appropriate for a monthly portfolio cycle. Mismatched tempo is the most common failure mode and the hardest to see: the loop iterates confidently while waiting for a signal that arrives too late to correct anything.</p><p>OrderTypical loopBackpressure speed0SecondsInstant (human reaction time)1Minutes&#8211;hoursSeconds (build, test, lint)2Hours&#8211;daysMinutes (artifact analysis)3a&#8211;3cHours&#8211;daysMinutes&#8211;hours (aggregation, A/B, outcomes)4Days&#8211;weeksHours (worker completion)5&#8211;6Weeks&#8211;monthsDays (correlation, ROI)7&#8211;9Months&#8211;quartersWeeks (strategic measurement)10UndefinedUndefined</p><p><strong>Reversibility</strong></p><p>Backpressure works by letting the loop fail, detecting it, and correcting. This holds only when failure is survivable. Reversible actions &#8212; two-way doors &#8212; can be fully governed by automated backpressure. Fail, detect, correct. Irreversible ones can&#8217;t: the signal arrives after a door that no longer opens. The practical implication is routing, not human gatekeeping. Irreversible actions need pre-execution intervention. Reversible actions flow through the standard backpressure loop. The distinction is a design constraint, not a reason to keep humans inside the loop.</p><p><strong>The Backpressure Ceiling</strong></p><p>Building autonomous systems is not primarily a model problem or a tooling problem. It is a <strong>backpressure engineering problem.</strong> The ceiling is not how intelligent the agent is. It is how fast and reliably the loop can be told whether the work is good.</p><p>An organization does not need top-level signal to have a functioning system. A single workflow with solid backpressure and a tuning agent compounds within its own ceiling regardless of whether it connects to anything above. The risk of missing higher-order signal is not that local loops fail to work. It is that they work efficiently toward something subtly misaligned with what you ultimately care about. Build local loops first. Wire them to the highest available real signal. Extend upward as the system matures.</p><div><hr></div><h3><strong>Reflections</strong></h3><p><strong>On the naming.</strong> The original version of this framework was called HITL Orders. That name embeds the wrong model. HITL &#8212; Human In The Loop &#8212; means the loop requires the human as a step. That is only true at Order 0. Every order above it is a fully autonomous loop the human is on. Intervention is not a structural gate. It is an act from the HOTL stance. The loop does not wait for it. These are HOTL Orders.</p><p><strong>On the structure.</strong> Each order is an OODA loop whose object is the previous order&#8217;s loop. The previous loop has compressed &#8212; its internal complexity is below the current order&#8217;s resolution. The Order 2 human is not managing a Ralph loop. They are watching a synthesis of one. This is why higher orders are tractable: the human is never holding accumulated complexity. They are holding a node that contains it.</p><p><strong>On the human.</strong> At each transition the human exits a loop by handing its OODA function to an agent, then steps up to watch that agent. They are always on a loop. Never on the same loop twice. At Order 0 this is obvious. At Order 10 it is nominal &#8212; the approval surface is so abstract that meaningful oversight becomes an open question. Every organization must decide which order represents its ceiling of responsible operation. The framework does not answer that question. It makes the question legible.</p><p><strong>On the branches.</strong> Orders 3a, 3b, and 3c represent three independent subject-to-object transitions on three different aspects of Order 2 practice: what is visible (fleet breadth), what is validated (tuning rigor), and what is true (outcome ground truth). Each can be pursued independently and delivers value on its own. The convergence at Order 4 is not architectural preference &#8212; it reflects the minimum: all three must already be folded before objective decomposition can be safely approved.</p><p><strong>On practical application.</strong> Most engineering organizations operate between Orders 0 and 1. The compound value begins at Orders 2 and 3, where tuning becomes systematic rather than artisanal. Order 4 &#8212; where the system decomposes objectives into work &#8212; is where autonomous systems begin to feel like infrastructure rather than tools. Everything beyond Order 6 is, for now, a research direction rather than an engineering practice.</p><p>The first step is not to build a higher-order system. The first step is to audit your backpressure. Whatever order you aspire to: what automated feedback exists at that order? If the answer is &#8220;the human checks,&#8221; you have found your actual ceiling. Build the backpressure. The loop folds. The order follows.</p><div><hr></div><p><em>References</em></p><ol><li><p>Geoffrey Huntley, &#8220;Ralph Wiggum as a software engineer&#8221;</p></li><li><p>Moss, &#8220;Don&#8217;t waste your back pressure&#8221;</p></li><li><p>Robert Kegan, <em>The Evolving Self</em></p></li></ol>]]></content:encoded></item><item><title><![CDATA[Re: The AI Layoff Trap]]></title><description><![CDATA[Re: The AI Layoff Trap]]></description><link>https://blog.bios.dev/p/re-the-ai-layoff-trap</link><guid isPermaLink="false">https://blog.bios.dev/p/re-the-ai-layoff-trap</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Mon, 13 Apr 2026 13:55:08 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1></h1><p>Re: <a href="https://arxiv.org/html/2603.20617v1#S6">The AI Layoff Trap</a></p><p>I had a long chat w/ Opus and had it distill my thinking into the below&#8230;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2><strong>What the Paper Claims</strong></h2><p>Competitive firms over-automate because each bears only 1/N of the demand destruction it causes. Structural externality. Only a Pigouvian automation tax corrects it. UBI, capital taxes, bargaining, profit- sharing all fail.</p><h2><strong>Why the Premise is Wrong</strong></h2><h3><strong>Withdrawing a transaction is not harm</strong></h3><p>The paper calls reduced consumer spending a negative externality. Choosing not to employ someone is not aggression. No one holds a claim on another&#8217;s purchasing decisions. The &#8220;externality&#8221; framing smuggles in entitlement to existing spending patterns. The entire Pigouvian apparatus is built on a fiction.</p><h3><strong>The model is nominal and blind to deflation</strong></h3><p>The demand equation tracks dollar flows. AI drives marginal production costs toward zero. Real purchasing power rises even as nominal wages collapse. The paper literally cannot represent a world where goods approach free. This is the central dynamic AI unleashes and the model cannot see it.</p><h3><strong>Static-economy fallacy</strong></h3><p>The paper extrapolates current trends on a fixed pie. Every historical analog&#8212;Malthus, Ehrlich, Club of Rome&#8212;made the same error. Markets are discovery processes. Haber-Bosch solved the nitrogen crisis. AI-era equivalents emerge through price signals if unobstructed. The paper belongs in that lineage: rigorous math, wrong universe.</p><h2><strong>Why the Solution Makes It Worse</strong></h2><h3><strong>Tax incidence obstructs the cure</strong></h3><p>Rothbardian tax incidence: taxes don&#8217;t pass through to consumers, they price out marginal suppliers, contract supply, raise prices. A Pigouvian automation tax slows beneficial automation alongside any harmful automation. It directly obstructs the deflationary process that resolves the problem it claims to solve.</p><h2><strong>The Solopreneur Counter (Partial, Not Primary)</strong></h2><p>If firms don&#8217;t need workers, workers don&#8217;t need firms. AI enables micro-enterprise at near-zero capital cost. But this path is limited by agency&#8212;the willingness to self-direct&#8212;not cognition. And if AI is powerful enough to substitute for agency itself, megacorps deploy that same capability at scale. The solopreneur explosion requires a narrow capability sweet spot. Real but not the main rebuttal.</p><h2><strong>The Actual Risk is Political, Not Economic</strong></h2><h3><strong>The economics self-resolve</strong></h3><p>Tech deflation drives prices asymptotically toward zero. Supply and demand manage consumption. The demand externality is temporary and nominal. Given time, markets produce abundance.</p><h3><strong>The politics may not allow it</strong></h3><p>Democratic polities respond to visible pain with visible action. If displacement is fast enough, the electorate demands intervention far worse than a Pigouvian tax&#8212;nationalization, price controls, trade barriers. Being correct about the economics and being dead are not mutually exclusive.</p><h3><strong>But incorrect politics does not change correct economics</strong></h3><p>If the political system destroys the mechanism that would have produced abundance, the political system was wrong about reality. Truth does not bend to votes. The physics of the bridge does not change because the mob burns it.</p><h2><strong>The Actual Answer: Network State</strong></h2><p>The paper asks: how do we manage automation within existing political structures? Wrong question. The right question is: how do we build structures where tech deflation operates freely?</p><p>The network state (Srinivasan) is the exit strategy. Don&#8217;t argue within a political economy that will panic and intervene destructively. Build parallel structures that demonstrate abundance. Exit over voice. Prove rather than persuade.</p><p>This maps to a scale-dependent governance stack:</p><ul><li><p><strong>Federal/minarchist:</strong> defense and property rights only</p></li><li><p><strong>State/socialist:</strong> shared infrastructure where network effects apply</p></li><li><p><strong>Local/communist:</strong> high-trust small communities sharing freely</p></li></ul><p>The paper assumes a single political economy with one rule set. The network state breaks that assumption entirely.</p><h2><strong>Bottom Line</strong></h2><p>The paper diagnoses a nominal, temporary, self-resolving dynamic as a permanent structural failure, then proposes a tax that obstructs the resolution. The real risk is not economic but political: the electorate destroying the engine of abundance before it delivers. The answer is not better policy within existing systems. It is building parallel systems where correct economics can operate without permission.</p>]]></content:encoded></item><item><title><![CDATA[Casting Spells on Transcripts]]></title><description><![CDATA[Disclaimer: This article was generated copy-paste by GPT 5.2 but as a distillation of a VERY lengthy debate with it about thinking through what I wanted.]]></description><link>https://blog.bios.dev/p/casting-spells-on-transcripts</link><guid isPermaLink="false">https://blog.bios.dev/p/casting-spells-on-transcripts</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Mon, 12 Jan 2026 01:34:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Disclaimer: This article was generated copy-paste by GPT 5.2 but as a distillation of a VERY lengthy debate with it about thinking through what I wanted. TBH I think AI outputs are fine as long as they are a transformation of significiant human input. Maybe worth an article on its own&#8230; anyway onto the article&#8230;</em></p><p>Some days it&#8217;s &#8220;I need a better read-it-later.&#8221; Other days it&#8217;s &#8220;I need a bookmarking system.&#8221; Or &#8220;I should get serious about RSS again.&#8221; Or &#8220;maybe I just need a better notes app.&#8221; And if you squint hard enough, you can always convince yourself the answer is one more tool.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>But the thing I actually want isn&#8217;t another place to put links.</p><p>What I want is a way to take a link &#8212; especially a long YouTube video &#8212; and turn it into <strong>useful, copyable text</strong> that I can operate on with all the self-hosted AI stuff I already run. I&#8217;m not trying to &#8220;save the internet.&#8221; I&#8217;m trying to <strong>compile it</strong>.</p><p>And I&#8217;m demanding about it in a way that makes most &#8220;product&#8221; answers feel wrong.</p><p>Because the moment the solution drifts into &#8220;store everything forever,&#8221; I can already see the future. I&#8217;ll be running a media archive. I&#8217;ll be dealing with retention policy and backups and disk growth and broken downloads and metadata drift. I&#8217;ll have invented my own part-time job and called it productivity.</p><p>I don&#8217;t want a second brain. I want a wand.</p><h2>The shape of the itch</h2><p>The mobile part matters more than people admit.</p><p>Most of my &#8220;this might be important&#8221; moments happen on my phone: I&#8217;m in YouTube, or a browser, or a random app that opens a link, and I want to send that thing <em>somewhere</em> without breaking my flow.</p><p>&#8220;Somewhere&#8221; is usually a queue. But I don&#8217;t want the queue to be the end state. I want the queue to be the front door to transformation.</p><p>The outcome I care about is embarrassingly simple:</p><ul><li><p>full transcript, copyable</p></li><li><p>a decent TL;DR I can paste</p></li><li><p>maybe a few derived views (tasks, claims, glossary) depending on mood</p></li><li><p>optionally: embed it so later I can query it like a corpus</p></li></ul><p>If I can do <em>that</em> from my phone, then my homelab GPU box stops being a science project and starts being a daily tool.</p><h2>Two traps I&#8217;m trying not to walk into</h2><p>The first trap is archive gravity.</p><p>It starts innocently: &#8220;I&#8217;ll just download the audio for Whisper.&#8221; Then you realize the tool can also download the video. And maybe you keep it. And maybe you keep the mp3 too. And you add thumbnails. And metadata. And now you&#8217;re building a library, not a pipeline.</p><p>I don&#8217;t want to preserve the original content. I want to preserve the <strong>work product</strong>: the transcript and whatever I derived from it.</p><p>The second trap is bespoke pipeline rot.</p><p>The more you build a fully custom ingestion system, the more you&#8217;re signing up to maintain it. If you&#8217;ve done enough infra you know the exact shape of the decay:</p><ul><li><p>job orchestration</p></li><li><p>retries and idempotence</p></li><li><p>concurrency and backpressure</p></li><li><p>secret management</p></li><li><p>monitoring, failure modes, &#8220;why did this run twice&#8221;</p></li><li><p>schema changes, tool upgrades, scraped-site breakage</p></li></ul><p>The first time you wire up <code>yt-dlp &#8594; mp3 &#8594; whisper &#8594; summary &#8594; embeddings</code>, you feel like you&#8217;re printing money.</p><p>The tenth time you&#8217;re debugging it from your phone while standing in line somewhere, it becomes obvious you built a thing that <em>requires</em> attention. And attention is the one resource this whole project is supposed to save.</p><p>So I keep looking for a design that stays small, stays legible, and doesn&#8217;t turn into an obsession.</p><h2>The idea that finally clicked: spells</h2><p>This is the part where it stopped feeling like &#8220;workflow automation&#8221; and started feeling like something I&#8217;d actually use.</p><p>I don&#8217;t want a system that tries to guess what I want and runs a bunch of stuff automatically.</p><p>I want <strong>buttons</strong>.</p><p>I want to be able to say: <em>do this to that</em>.</p><ul><li><p>transcribe</p></li><li><p>summarize</p></li><li><p>extract tasks</p></li><li><p>ingest into RAG (maybe into different &#8220;collections&#8221;)</p></li><li><p>generate a few views I can paste elsewhere</p></li></ul><p>Calling those &#8220;spells&#8221; is corny in the best way, because it forces the right constraints. A spell is explicit. It&#8217;s intentional. It&#8217;s something you ask for, not something that happens to you.</p><p>And once you accept that, the follow-on idea becomes obvious:</p><p>Spells should be composable.</p><p>Not in a &#8220;draw a spaghetti graph in a GUI&#8221; way. In a dependency sense.</p><p>If I press &#8220;RAG ingest,&#8221; the system shouldn&#8217;t shrug and fail because there&#8217;s no transcript yet. It should know that ingesting requires chunking, and chunking requires a transcript (or at least readable text), and it should build what it needs.</p><p>That&#8217;s not &#8220;AI automation.&#8221; That&#8217;s dependency resolution. That&#8217;s a build system.</p><p>Which leads to the actual primitive I&#8217;ve been missing in all these tools:</p><h2>Artifacts, not fields</h2><p>Most bookmarking / read-later tools give you one place to put &#8220;notes,&#8221; and maybe one summary field if they&#8217;re feeling fancy.</p><p>That&#8217;s not enough. It&#8217;s the wrong shape.</p><p>I want a bookmark to accumulate <strong>named outputs</strong>:</p><ul><li><p><code>transcript.txt</code> &#8212; the thing I actually want to copy</p></li><li><p><code>transcript.vtt</code> &#8212; timestamps, because timestamps are power</p></li><li><p><code>tldr.md</code> &#8212; a summary that&#8217;s clearly a derived artifact, not &#8220;the truth&#8221;</p></li><li><p><code>tasks.md</code> &#8212; if I&#8217;m in that mood</p></li><li><p><code>index.md</code> &#8212; a tiny table-of-contents so the bookmark becomes a shelf, not a blob</p></li></ul><p>If the system can attach multiple assets per bookmark, suddenly everything else falls into place:</p><ul><li><p>the bookmark is the anchor (URL + metadata)</p></li><li><p>assets are the outputs (the only thing I care about keeping)</p></li><li><p>the pipeline produces assets</p></li><li><p>downstream spells consume assets</p></li></ul><p>And because assets are explicit, the &#8220;skip if already exists&#8221; behavior becomes clean. The pipeline doesn&#8217;t need a mystical global state. It can look at what&#8217;s attached, decide what&#8217;s missing, and proceed.</p><p>This is the moment I realized I wasn&#8217;t searching for a &#8220;better summary field.&#8221; I was searching for a place to put build outputs.</p><h2>Thin inbox, thick outputs</h2><p>Once I had the artifact idea, my earlier discomfort with &#8220;self-hosted everything&#8221; started to sharpen into a requirement.</p><p>The inbox has to stay thin.</p><p>I don&#8217;t want my bookmarking tool to become my archive. I don&#8217;t want it to silently store raw MP4s and MP3s forever. I don&#8217;t want &#8220;convenient defaults&#8221; that turn into a storage tax.</p><p>But I <em>do</em> want my derived outputs to persist. That&#8217;s the point. Those are small, stable, and actually useful later.</p><p>That&#8217;s why object storage suddenly becomes part of the story. If the system can attach assets and store them in something MinIO/S3-shaped, then the inbox stays light and the outputs scale without me thinking about disk layout.</p><p>The raw media can be ephemeral. Download, process, delete. The transcript is the durable artifact.</p><p>That feels right.</p><h2>The mobile UX problem: I refuse to type tags</h2><p>There&#8217;s a version of this where &#8220;spells&#8221; are invoked by typing a tag like <code>!transcribe</code>.</p><p>That works. It&#8217;s also not how I want to live.</p><p>If the whole thing is meant to reduce friction, then typing little incantations on my phone is a self-own. It turns &#8220;quick share&#8221; into &#8220;mini admin task.&#8221;</p><p>So the UX I actually want is stupid-simple:</p><p>Share &#8594; a tiny &#8220;Spells&#8221; target opens &#8594; big buttons:</p><ul><li><p>Transcript + TL;DR</p></li><li><p>Transcript only</p></li><li><p>RAG &#8594; Home</p></li><li><p>Tasks</p></li></ul><p>Tap one. Done. Queue it. Go back to my life.</p><p>This is where a share-target PWA (or some minimal intermediate share receiver) becomes interesting. Not because I want yet another UI, but because I want the UI to be <em>exactly</em> the size of my intent.</p><p>A tiny front door that lets me pick a spell bundle without typing anything.</p><p>Everything else can happen on the server.</p><h2>Where orchestration fits (and where it doesn&#8217;t)</h2><p>At this point the temptation is to start building &#8220;the perfect runner.&#8221;</p><p>I&#8217;m trying to avoid that.</p><p>The runner&#8217;s job is boring:</p><ul><li><p>accept an intent (&#8220;cast this bundle on that URL&#8221;)</p></li><li><p>resolve dependencies based on existing artifacts</p></li><li><p>run the necessary steps</p></li><li><p>write assets back</p></li></ul><p>That&#8217;s it.</p><p>I don&#8217;t need a cathedral. I need something restart-safe, idempotent, and not cute.</p><p>This is where Windmill entered the conversation for me: not because I want a workflow GUI, but because it already has the primitives that bespoke pipelines tend to re-implement poorly: jobs, retries, and a notion of a flow. If I can treat each spell as a reusable primitive and use flows as bundles, that&#8217;s attractive. It&#8217;s not magical; it&#8217;s just offloading plumbing.</p><p>But I&#8217;m intentionally keeping the design portable in my head. If I decide Windmill isn&#8217;t the right fit later, the mental model still stands: artifacts, dependencies, manual-first spells.</p><h2>What I&#8217;ve actually decided (so far)</h2><p>I haven&#8217;t implemented the stack. This is me admitting I&#8217;ve been thinking the issue to death and finally got to something that feels like a stable shape.</p><p>The shape looks like:</p><ul><li><p>an inbox that can hold links and attach assets</p></li><li><p>thin by default (don&#8217;t hoard raw media)</p></li><li><p>assets stored in object storage</p></li><li><p>spells are manual-first and invoked from the phone</p></li><li><p>bundles are just &#8220;targets&#8221; that pull prerequisites</p></li><li><p>artifacts are the source of truth; &#8220;summary fields&#8221; are just projections if I want them</p></li></ul><p>That&#8217;s the whole thing.</p><p>It&#8217;s not a tutorial, because I don&#8217;t have the scar tissue yet. It&#8217;s a distillation of what I&#8217;ve hashed out &#8212; including with ChatGPT &#8212; because the framing shift (from &#8220;bookmarks&#8221; to &#8220;build outputs&#8221;) is what I think is worth sharing.</p><p>I can tell when I&#8217;m onto something because it reduces the design, not expands it.</p><p>The moment I stopped trying to build a better place to <em>save</em> links and started trying to build a better way to <em>produce artifacts from them</em>, the rest of the decisions became almost boring.</p><p>And boring is exactly what I want from infrastructure that&#8217;s supposed to be used casually, from a phone, without turning me into a caretaker.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Aranet4 Time Series Export (Part 1): Scan + Pair]]></title><description><![CDATA[Pair and read an Aranet4 on Linux]]></description><link>https://blog.bios.dev/p/aranet4-time-series-export-part-1</link><guid isPermaLink="false">https://blog.bios.dev/p/aranet4-time-series-export-part-1</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Thu, 01 Jan 2026 02:35:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Pair and read an Aranet4 on Linux</h3><h4>Unblock Bluetooth (rfkill)</h4><pre><code><code>rfkill list
sudo rfkill unblock bluetooth
rfkill list</code></code></pre><p>Proceed only when Bluetooth shows <code>Soft blocked: no</code> and <code>Hard blocked: no</code>.</p><div><hr></div><h4>1) Create a dedicated <code>uv</code> project for Aranet4 tools</h4><p>Use a clean folder so the dependency and lockfile stay isolated.</p><pre><code><code>mkdir -p ~/dev/aranet4
cd ~/dev/aranet4
uv init --python 3.11
uv add aranet4</code></code></pre><div><hr></div><h4>2) Scan with <code>aranetctl</code> (first attempt)</h4><p>Run the scan from your <code>uv</code> environment:</p><pre><code><code>uv run aranetctl --scan</code></code></pre><p>If the scan finds your device, copy the MAC and read it:</p><pre><code><code>uv run aranetctl AA:BB:CC:DD:EE:FF</code></code></pre><div><hr></div><h4>3) If <code>aranetctl --scan</code> fails, pair/trust with <code>bluetoothctl</code>, then retry <code>aranetctl</code></h4><ol><li><p>Start <code>bluetoothctl</code>:</p></li></ol><pre><code><code>bluetoothctl</code></code></pre><ol start="2"><li><p>Enable the adapter and pairing agent:</p></li></ol><pre><code><code>power on</code></code></pre><ol start="3"><li><p>Scan to discover the Aranet4 MAC:</p></li></ol><pre><code><code>scan on</code></code></pre><ol start="4"><li><p>Pair with the device (use the MAC you see in scan output):</p></li></ol><pre><code><code>pair AA:BB:CC:DD:EE:FF</code></code></pre><ol start="5"><li><p>When prompted, read the pairing code shown on the Aranet4 screen and enter it in <code>bluetoothctl</code>.</p></li><li><p>Trust the device:</p></li></ol><pre><code><code>trust AA:BB:CC:DD:EE:FF</code></code></pre><ol start="7"><li><p>Exit:</p></li></ol><pre><code><code>scan off
quit</code></code></pre><ol start="8"><li><p>Retry from the <code>~/dev/aranet4</code> folder:</p></li></ol><pre><code><code>cd ~/dev/aranet4
uv run aranetctl --scan
uv run aranetctl AA:BB:CC:DD:EE:FF</code></code></pre><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Backing up your Kindle books in 2024]]></title><description><![CDATA[Backing up your Kindle books is still quite possible as of February 2024.]]></description><link>https://blog.bios.dev/p/backing-up-your-kindle-books-in-2024</link><guid isPermaLink="false">https://blog.bios.dev/p/backing-up-your-kindle-books-in-2024</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Sun, 25 Feb 2024 18:19:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Backing up your Kindle books is still quite possible as of February 2024. I spent like 3h digging through all the guides online and am documenting my final result. I do not condone piracy in any way and my use case for backing up to EPUB is 1) to keep my purchased content available in case Kindle is ever down and 2) I want to index the book contents for a RAG system so I can go back and ask questions about the book contents for personal recall.</p><h2>Converting your Kindle book to EPUB</h2><ol><li><p>Download <a href="https://www.amazon.com/gp/help/customer/display.html?nodeId=GZSM7D8A85WKPYYD">Kindle for  PC</a> (I used v2.3)</p><ol><li><p>Download the purchased books you want to back up</p></li><li><p>Find your content folder</p><ol><li><p>Default: C:\Users\YOU\Documents\My Kindle Content</p></li><li><p>Find in Kindle app:  Tools  &gt; Options&#8230; &gt; Content</p></li></ol></li></ol></li><li><p>Download latest <a href="https://calibre-ebook.com/download_windows">Calibre</a> (I used Windows v7.5.1)</p></li><li><p>Download the <a href="https://github.com/noDRM/DeDRM_tools/releases">noDRM</a> plugin (I used v10.0.9)</p></li><li><p>Download the latest <a href="https://plugins.calibre-ebook.com/291290.zip">KFX-Input</a> plugin from the <a href="https://plugins.calibre-ebook.com/">Calibre Plugin List</a> (I used v2.8.1)</p></li><li><p>Open Calibre</p><ol><li><p>Open Preferences &gt; Plugins. Click Load plugin from file.</p><ol><li><p>noDRM release is a .zip, unzip it to see <strong>DeDRM_plugin.zip</strong> inside</p></li><li><p>The KFX-Input download is the right .zip itself</p></li></ol></li></ol></li><li><p>Restart Calibre</p></li><li><p>Now  you should be able to visit your <strong>My Kindle Content</strong> folder found in step 1.b and drag the .azw file into Calibre to import it.</p></li><li><p>Finally, export to .epub without DRM:</p><ol><li><p>Right click the book in Calibre, then Convert books &gt; Convert individually</p></li><li><p>In the convert window, change Output format to EPUB (or whatever)</p></li><li><p>Click OK</p></li></ol></li><li><p>Selecting the  book in Calibre will now show two hyperlinks: KFX and EPUB</p></li><li><p>Right click  EPUB &gt; Copy &gt; Path to File</p></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Boot Ubuntu UEFI from Grub console]]></title><description><![CDATA[I have Kubuntu installed on a USB SSD and tried to boot it today, but was dumped into a Grub console.]]></description><link>https://blog.bios.dev/p/boot-ubuntu-uefi-from-grub-console</link><guid isPermaLink="false">https://blog.bios.dev/p/boot-ubuntu-uefi-from-grub-console</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Fri, 12 Jan 2024 14:52:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have Kubuntu installed on a USB SSD and tried to boot it today, but was dumped into a Grub console. It looked like this:</p><pre><code><code>grub&gt;</code></code></pre><p>To get back into my installation, I ran the following:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><pre><code><code>grub&gt; ls</code></code></pre><p>This showed me that I have a few devices. I noticed <code>(hd0,gpt1)</code> and <code>(hd0,gpt2)</code> which are probably my /boot and / partitions. I was able to check with <code>ls</code>:</p><pre><code><code>grub&gt; ls (hd0,gpt1)/</code></code></pre><p>Yep, that&#8217;s the boot drive and <code>ls</code> on the other showed me all the stuff I&#8217;d expect in <code>/</code>.</p><p>So next I found the <code>grub.cfg</code> on the <code>/boot</code> disk, just used ls to look around until I found it at:</p><pre><code><code>grub&gt; ls (hd0,gpt1)/efi/ubuntu/grub.cfg</code></code></pre><p>Now that I have that file path, booting was simple (thank you ChatGPT!):</p><pre><code><code>grub&gt; set prefix=(hd0,gpt1)/efi/ubuntu
grub&gt; insmod normal
grub&gt; configfile (hd0,gpt1)/efi/ubuntu/grub.cfg</code></code></pre><p>This got me booted back into Kubuntu! I ran <code>update-grub</code> once I was back up, but haven&#8217;t tested if that fixed it permanently.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Using Self Hosted LLM from your Smartphone with acai.so and Let's Encrypt]]></title><description><![CDATA[This guide picks up where the last one left off.]]></description><link>https://blog.bios.dev/p/using-self-hosted-llm-from-your-smartphone</link><guid isPermaLink="false">https://blog.bios.dev/p/using-self-hosted-llm-from-your-smartphone</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Tue, 17 Oct 2023 21:49:24 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This guide picks up where the last one left off. It assumes you have a locally hosted OpenAI endpoint accessible on the network.</p><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0bb5ac47-a071-4617-8f3d-90e8cdf7dc18&quot;,&quot;caption&quot;:&quot;Serving models as an emulated OpenAI Endpoint enables a few important benefits:Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work. Application start-up is decoupled from model initialization (faster code iteration)&quot;,&quot;cta&quot;:null,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Self Hosting OpenAI Chat Endpoint with GPU-accelerated MistralOrca 7B 8K (GGUF) and Llama CPP Python Server&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:40135572,&quot;name&quot;:&quot;Joe Still&quot;,&quot;bio&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ac99dc2-9e59-4614-b640-a0c1c4847477_400x400.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2023-10-16T16:48:53.672Z&quot;,&quot;cover_image&quot;:null,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://biosdev.substack.com/p/self-hosting-gpu-accelerated-mistralorca&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:137994337,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:0,&quot;comment_count&quot;:0,&quot;publication_id&quot;:null,&quot;publication_name&quot;:&quot;blog.bios.dev&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png&quot;,&quot;belowTheFold&quot;:false,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><p>Let&#8217;s say your endpoint is accessible on the network at <strong>http://192.168.1.123:8180/v1</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>To make practical use of this endpoint from my smartphone, I am going to use <a href="https://acai.so">acai.so</a>. acai.so is an AI-first in-browser chat experience that keeps all your chat and note data locally in your browser. Total privacy! However, by default you really need to use OpenAI for GPT completions&#8230; Not so private! Let&#8217;s secure that self hosted endpoint with a free SSL certificate and connect from acai.so.</p><h2>Step 1: Prepare Nginx and Certbot</h2><p>(TODO: link to a github with all this code in it)</p><p>The last guide offered examples for both Linux and Windows. In my implementation, I have a dedicated 24/7 Linux machine I use for all kinds of nonsense like Plex and PiHole DNS, etc. I will assume you have an Ubuntu machine running on your local network with Docker installed. This can be the same machine hosting your endpoint or in my case, I have my Windows gaming computer hosting the endpoint and my Ubuntu machine hosting the HTTPS proxy.</p><p>On with the code&#8230;</p><p>Create a <code>docker-compose.yaml</code>:</p><pre><code>version: '3.8'

services:
  nginx:
    image: nginx:latest
    volumes:
      - ./nginx-conf.d:/etc/nginx/conf.d
      - ./letsencrypt:/etc/letsencrypt
    ports:
      # 44301 avoids conflict with my k3s HTTPS, use "443:443"
      # if you prefer but "44301:443" will work for you too
      - "44301:443"</code></pre><p>This creates Nginx container we will keep up to proxy the SSL traffic to your endpoint.</p><p>Next, create a <code>docker-compose-certbot.yaml</code>:</p><pre><code>version: '3.8'

services:
  certbotdnsmanual:
    image: certbot/certbot
    volumes:
      - ./letsencrypt:/etc/letsencrypt
    environment:
      - CERT_DOMAIN
    command: -d ${CERT_DOMAIN} --manual --preferred-challenges dns certonly</code></pre><p>We will use this to automate SSL certificate creation.</p><p>And finally, I made a little convenience script <code>cert-gen.sh</code>:</p><pre><code>set -e

test -n "$CERT_DOMAIN" || { echo "ERROR: CERT_DOMAIN not set in env"; exit 1; }

mkdir -vp nginx-conf.d

# Generate the certs
docker compose \
  -f docker-compose-certbot.yaml \
  run --rm -it \
  certbotdnsmanual

# Generate the VHOST
VHOST_CONF="vhost.conf" # TODO: multi domain??
cat &lt;&lt; EOF &gt; "nginx-conf.d/${VHOST_CONF}"
server {
    listen 443 ssl;
    server_name _;  # Replace with your desired hostname if multiple

    ssl on;
    ssl_certificate /etc/letsencrypt/live/${CERT_DOMAIN}/cert.pem;
    ssl_certificate_key /etc/letsencrypt/live/${CERT_DOMAIN}/privkey.pem;

    location / {
        # Replace with your external IP address
        proxy_pass http://192.168.1.123:8180;  
    }
}
EOF</code></pre><p>You will need to update the <code>proxy_pass</code> target in this script to point at the correct endpoint IP and port.</p><h2>Step 2: Generate SSL and Virtual Host</h2><p>At this point we are ready to generate an SSL and Virtual Host. For this example to work, you will need to own a domain and control its DNS. I chose to use <strong>llama.bios.dev</strong>. In my examples, be sure to use your preferred subdomain.</p><p>With the above files in place, you should be able to run this command:</p><pre><code>CERT_DOMAIN=llama.bios.dev bash cert-gen.sh</code></pre><p>It will ask you some questions about owning the domain and opting into email contact, I leave you to answer those as you will. Finally it will prompt you to set a TXT record for the provided subdomain.</p><p>For example:</p><pre><code>...

Please deploy a DNS TXT record under the name:

_acme-challenge.llama.bios.dev.

with the following value:

q2gpz8c9Gq-XrJ2lcST29Id9nTq9JoCEIZfsbl1t4

...</code></pre><p>So you will want to create a TXT record for <strong>_acme-challenge.llama</strong><code> with value </code><strong>q2gpz8c9Gq-XrJ2lcST29Id9nTq9JoCEIZfsbl1t4</strong> in your DNS settings. Deeper instruction for accomplishing that step is outside the scope of this guide.</p><p>Once the record is in place, hit ENTER.</p><p>This only gives you SSL for 6 months, but you can repeat the process when they are up for renewal. I will update this guide or write a fresh one later to explain a permanent auto-renewal solution. This script will also generate the necessary Nginx virtual host to be picked up by our Nginx container definition.</p><p>You must also set an A record in your DNS to point your subdomain to the private network address where your Nginx container will be running (in my case, the Ubuntu box rather than the gaming PC).</p><h2>Step 3: Start it up and plug into acai.so</h2><p>If the above worked, you now have a valid SSL for your subdomain and the subdomain should be pointing to the machine that will run the Nginx container. Let&#8217;s start it!</p><pre><code>docker compose up -d</code></pre><p>That&#8217;s it, Nginx should be up and if you visit <strong>https://llama.bios.dev:44301/docs</strong> it should route to the hosting machine and proxy the endpoint. Lets plug it into acai.so!</p><p>Visit <a href="https://acai.so">https://acai.so</a> and under <strong>Settings</strong> &gt; <strong>Access Configuration</strong>, replace the <strong>OpenAI API Base URL</strong> with your new HTTPS endpoint (eg, <strong>https://llama.bios.dev:44301/v1</strong>) and put in anything for the <strong>OpenAI API Key</strong> (eg, <strong>lmaounlimitedtokens</strong>) it doesn&#8217;t matter what you choose for the key. Just can&#8217;t be blank.</p><p>Hit <strong>Submit</strong> to save the change and you should be able to chat with your local model from any device on the network!!</p><h2>Conclusion</h2><p>This is a very raw methodology to deliver local model access via acai.so which could be improved in a few ways in the future:</p><ul><li><p>Link a GitHub repo with all that code I told you to paste</p></li><li><p>Organize the prerequisites sooner (eg &#8220;you need a domain&#8221; and &#8220;you need to know DNS&#8221;)</p></li><li><p>Update to CertBot process to keep the container running for fully automatic SSL renewal (right now you will need to re-run it after 6 months).</p></li></ul><p>The next guide I want to write in this series is about how to make that SSL endpoint accessible to your smartphone <em><strong>ANYWHERE IN THE WORLD</strong></em></p><p>Did you find this helpful? Did you get stuck? Did I miss something? <a href="https://twitter.com/PatternPodJoe">Reach out on X</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Self Hosting OpenAI Chat Endpoint with GPU-accelerated MistralOrca 7B 8K (GGUF) and Llama CPP Python Server]]></title><description><![CDATA[Serving models as an emulated OpenAI Endpoint enables a few important benefits:Thanks for reading blog.bios.dev!]]></description><link>https://blog.bios.dev/p/self-hosting-gpu-accelerated-mistralorca</link><guid isPermaLink="false">https://blog.bios.dev/p/self-hosting-gpu-accelerated-mistralorca</guid><dc:creator><![CDATA[Joe Still]]></dc:creator><pubDate>Mon, 16 Oct 2023 16:48:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7Bzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F547c78eb-e427-4174-82a9-719adf7d85ed_400x400.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Serving models as an emulated OpenAI Endpoint enables a few important benefits:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><ul><li><p>Application start-up is decoupled from model initialization (faster code iteration)</p></li><li><p>Served model can be swapped without changing your code (faster model swap)</p></li></ul><p>Any GGUF formatted model should work, but I am using the new <a href="https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca">MistralOrca 7B</a>.</p><p>I actually did this on Windows (so I could still play Rocket League), but will provide details for both Linux and Windows. I will assume in both cases you already installed Nvidia drivers. Reach out if you get stuck on Nvidia stuff.</p><p>You will probably want to create and enter a working directory for this, I created a folder called <code>llama-server</code>.</p><h2>Step 1: Download the GGUF model file</h2><p>In general, you can google &#8220;TheBloke &lt;MODEL NAME&gt; GGUF&#8221; and get the GGUF files for popular model. Here is a direct link to <a href="https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF">TheBloke&#8217;s MistralOrca 7B GGUF</a>.</p><p>Navigate to the <a href="https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/tree/main">Files and Versions</a> tab, then download the appropriate <a href="https://towardsdatascience.com/introduction-to-weight-quantization-2494701b9c0c">quantization</a> for your available GPU VRAM. I have an RTX 3090 with 24GB of VRAM so I can fit any of these. I chose the <a href="https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/blob/main/mistral-7b-openorca.Q6_K.gguf">6K Quantization</a> to keep the best quality.</p><p>If you have less VRAM, try out different quantizations to see what works. Your CPU will pick up whatever can&#8217;t fit into the GPU VRAM, but it will run slower so it is up to you to make the quality/speed tradeoff if you have a smaller card.</p><p>Save the GGUF file in your working directory.</p><h2>Step 2: Set up a Virtual Environment (Optional)</h2><p>You don&#8217;t necessarily need to do all these venv steps, but if you are doing any other python stuff it will prevent a lot of conflict nonsense between applications.</p><p>Install Python3 and Python3 <code>venv</code> module. Then create a new virtual environment in your working directory:</p><pre><code>python3 -m venv llamaserver</code></pre><p>Then source that virtual environment:</p><ul><li><p><strong>Windows Powershell</strong>: <code>.\llamaserver\Scripts\Activate.ps1</code></p></li><li><p><strong>Linux BASH</strong>: <code>source llamaserver/bin/activate</code></p></li></ul><h2>Step 3: Install Llama CPP Python with Server</h2><p>The<code> llama-cpp-python</code> project is all you need to get up and running with your GGUF model. Install by following <a href="https://github.com/abetlen/llama-cpp-python#installation-with-hardware-acceleration">the hardware acceleration steps in the README</a>. <strong>Be sure to include the [server] package</strong>! You should absolutely read the full README anyway, but I have included the necessary commands for Nvidia users.</p><p><strong>Windows Powershell</strong>:</p><pre><code>$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir</code></pre><p><strong>Linux BASH</strong>:</p><pre><code>CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python[server] --force-reinstall --upgrade --no-cache-dir</code></pre><p>You probably don&#8217;t need all that <code>--force-reinstall --upgrade --no-cache-dir</code> stuff at the end of the command, but I included it in case you need to run it again or have a conflicting install and it shouldn&#8217;t cause trouble on a fresh install anyway.</p><h2>Step 4: Run the OpenAI Endpoint server!</h2><p>Create a run script in your working directory. For this model, the HuggingFace page said it uses the <a href="https://github.com/openai/openai-python/blob/main/chatml.md">chatml</a> format for chat. And the context is 8k. And I think it actually only has like 35 layers, but putting too many doesn&#8217;t hurt. Play with that setting.</p><p><strong>Windows Powershell (run.ps1)</strong>:</p><pre><code>deactivate
.\llamaserver\Scripts\Activate.ps1
python -m llama_cpp.server `
  --model .\mistral-7b-openorca.Q6_K.gguf `
  --n_ctx 8192 `
  --chat_format chatml `
  --n_gpu_layers 43 `
  --model_alias gpt-3.5-turbo `
  --host 0.0.0.0 --port 8180</code></pre><p><strong>Linux BASH (run.sh)</strong>:</p><pre><code>deactivate
./llamaserver/bin/activate
python -m llama_cpp.server \
  --model ./mistral-7b-openorca.Q6_K.gguf \
  --n_ctx 8192 \
  --chat_format chatml \
  --n_gpu_layers 43 \
  --model_alias gpt-3.5-turbo \
  --host 0.0.0.0 --port 8180</code></pre><p>Then run the script. You should see your GPU mentioned with the GPU layer offload mentioned (if not, you might have missed the GPU build specification in Step 3). Here is part of my output so you know how it should start (<strong>notice the GPU is mentioned</strong>):</p><pre><code>ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from .\mistral-7b-openorca.Q6_K.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor    0:                token_embd.weight q6_K     [  4096, 32002,     1,     1 ]
...</code></pre><p>Once it is up, you can visit <a href="http://127.0.0.1:8180/docs">http://127.0.0.1:8180/docs</a> or use Langchain&#8217;s ChatOpenAI wrapper against your new endpoint and override the default OpenAI endpoint with your hosted one:</p><pre><code>from langchain.llms import OpenAI
llm = OpenAI(
    temperature=0,
    openai_api_key="use llama-cpp-python.server lol!",
    max_tokens=512
)</code></pre><pre><code>export OPENAI_API_BASE=http://127.0.0.1:8180/v1
python3 example.py</code></pre><h2>Conclusion</h2><p>I have looked through several projects and determined this to be the most straight forward for at least my hobby use case. Single command installation. Works well on Windows. TheBloke delivers consistent GGUF models almost as soon as the official ones are released.</p><p>Being able to architect my LangChain projects or even Autogen agents to use a simple OpenAI endpoint has dramatically simplified integration and made upgrading to the latest model as easy as pulling the latest GGUF and updating my run script to use the new model. No code changes. No library modifications for the server. I am up and running with a new GGUF model in only a couple minutes after the download.</p><p>Did you find this helpful? Did you get stuck? Did I miss something? <a href="https://twitter.com/PatternPodJoe">Reach out on X</a>.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.bios.dev/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading blog.bios.dev! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item></channel></rss>