Decoupling the Harness Pluggables

Langfuse versioned our RAG Bot prompts. Why not modern harness pluggables?

Jul 02, 2026

Disclaimer: This was distilled by GPT-5.5 from a very long chat

In 2025, the canonical RAGbot taught a durable lesson: keep behavior-bearing prompts outside the deployed app.

The app should ship as a stable executor. The prompt should live as a versioned artifact. A runtime label such as production should resolve to a concrete prompt version. Every trace should record which prompt version the system used.

That pattern made prompt iteration, experimentation, and rollback tractable.

It also revealed a broader principle: when behavior changes outside application code, the behavior-bearing artifact needs its own versioning, promotion, rollback, and trace identity.

1. The 2025 RAGbot Pattern

A basic RAGbot has stable code and changing prompts.

The code retrieves context, builds a message, calls a model, and returns an answer. The prompt carries much of the behavior.

Embedding the prompt in the deployed artifact couples behavioral iteration to code deployment. It also muddies experiments: a trace should clearly answer which prompt caused a behavior.

Langfuse-style prompt management solved this by decoupling prompt behavior from the deployed app.

The app owns execution.
The prompt manager owns prompt versions.
The runtime resolves a label.
The trace records the resolved version.

2025:
  stable app code
  + prompt label resolved at runtime
  + trace records prompt version

2. The General Principle

Prompt versioning addressed experimental identity.

A behavior-changing component should be externally managed when teams need to test, compare, promote, or rollback that behavior.

Such components should be:

versioned
label-resolved
trace-correlated
promotable
rollbackable

The 2025 RAGbot made this obvious for prompts.

Modern harnesses make the same pattern apply to more of the system.

3. The Modern Harness

Modern AI systems compose many behavior-bearing parts:

prompts
skills
tools
MCP servers
model choices
context policies
retrieval policies
guards
evaluators
loop strategies
budgets
exit criteria

These parts change behavior individually and in combination.

A skill can alter tool use. A context policy can alter the model’s apparent competence. An evaluator can alter what counts as success. A budget can alter search behavior. A guard can alter which paths are legal.

The behavior-bearing unit has moved above the prompt into the harness composition.

4. Profile Decoupling

The prompt decoupling pattern should move up one level.

The runtime should resolve profile label: production, where the profile points to a concrete harness composition.

The deployed app stays stable.

The profile carries the behavior.

2026:
  stable app code
  + profile label resolved at runtime
  + trace records profile and component versions

A profile is the experimental identity of a composed behavior.

5. The Harness Profile

A harness profile defines one executable behavior unit.

It answers:

Given this task shape, what harness should run?

A profile may include:

initial conditions
available affordances
prompt versions
skills
tools
MCP servers
context policy
model policy
guards
budget
loop strategy
exit criteria
evaluation gate

The runtime executes the profile.

The trace records the profile.

The evaluator judges the result.

The promotion system advances or rejects the profile.

6. Git and Runtime Labels

Git remains the right place to author definitions, review changes, preserve history, and enforce schema checks.

Runtime behavior also needs labels, promotion, rollback, A/B testing, staging, shadowing, tenant overrides, experiment cohorts, and trace correlation.

Git:
  author definitions
  review changes
  preserve source history

Profile registry:
  resolve runtime behavior
  move labels
  promote candidates
  rollback production
  correlate traces

Versions should be immutable.

Labels should move.

profile_v17 = immutable
production = movable pointer

7. Runs and Traces

A run executes a resolved profile.

A trace records what happened.

A profile records what behavior was selected.

Profile = intent
Run = execution instance
Trace = evidence
Artifact = output
Gate = judgment
Lineage = provenance

Every meaningful run should be explainable as:

(profile, environment, state, inputs) -> run -> trace + artifacts + gate result

Every meaningful behavior change should be explainable as:

old_profile -> new_profile -> observed behavior delta

The trace should answer:

What happened?

The profile should answer:

What configuration caused it?

8. Profile Experiments

Once the runtime can resolve profiles, teams can compare profiles under controlled conditions.

Profile experiments give behavioral changes clean attribution.

When the outcome changes, the system can identify which profile caused the change.

9. From One Profile to a DAG

A single harness profile defines one executable unit.

Larger systems compose several units.

One profile may collect scope. Another may stage data. Another may generate candidates. Another may evaluate candidates. Another may write an artifact.

These units compose naturally into a finite DAG.

The composition has its own profile.

A composed DAG profile points to a set of harness profiles and their edges.

component versions
  -> harness profile
  -> composed DAG profile
  -> run
  -> trace

10. Trace Shape for a DAG Run

A DAG run should produce one bounded trace.

Each node records the harness profile it used.

Low-level spans record model calls, tool calls, artifact writes, evaluator outputs, and other execution details.

Existing trace systems can store the spans.

The profile system supplies behavioral identity.

11. Finitude by DAG Composition

Long-running systems often look cyclic.

They plan, act, observe, evaluate, and plan again.

The trace can stay finite.

Keep each composed execution bounded. When a node would return to an earlier node, end the current DAG run and let the exit trigger a new DAG run.

This gives a clean invariant:

A DAG run is finite. A process may be cyclic.

Loops become transitions between DAG runs.

12. Side Effects Start New DAGs

A terminal node may produce an artifact, promote a profile, schedule another run, or launch another composed DAG.

That handoff is a side effect.

It creates a new bounded execution.

inside a DAG:
  finite, traceable, bounded

between DAGs:
  process transitions, side effects, scheduling, continuation

A system can reuse composed DAG profiles the same way it reuses harness profiles.

harness profile:
  one executable unit

composed DAG profile:
  finite arrangement of executable units

process:
  sequence of DAG runs, possibly cyclic

13. Trace vs Lineage

Trace nesting should follow execution.

If one run creates or modifies another profile, the later run should have its own trace. The relationship belongs in lineage metadata.

Trace says what ran.

Lineage says where it came from.

14. Promotion Lifecycle

Profiles need lifecycle labels.

A promotion gate decides whether a profile can advance.

A rollback moves the label back.

The deployed app remains unchanged.

15. Core Ontology

The system reduces to a small ontology.

The profile says what should run.

The run executes it.

The trace records what happened.

The artifact captures what got produced.

The gate judges whether it was acceptable.

The lineage records where it came from.

The promotion system decides which label moves.

Final Thesis

Prompt versioning decoupled prompt behavior from application deploys.

Harness profile versioning decouples composed behavior from application deploys.

As modern systems compose prompts, skills, tools, context policies, evaluators, guards, budgets, and loop strategies, the behavior-bearing unit moves above code.

That unit is the profile.

A harness profile defines one executable behavior unit.

A composed DAG profile arranges those units into a finite execution.

A cyclic process emerges by chaining finite DAG runs through explicit exits and side effects.

Compose behavior into profiles.
Resolve profiles at runtime.
Trace the resolved versions.
Gate promotion.
Rollback by label.
Keep each DAG run finite.
Model loops between runs.

blog.bios.dev

Discussion about this post

Ready for more?