Map a prompt's genome.
Paste a prompt. The mapper decomposes it into 14 functional loci, scores each, surfaces missing or weak genes, predicts likely failure modes, and proposes a mutation plan. Static heuristic — no live LLM, deterministic given the input.
v0.1 scope per the handoff: static analyzer only. No live LLM evaluation, no evolutionary search, no governance loop. See /theory for the framework and the scope-boundary paragraph.
Input
Genome score
0.0014-loci map
| Locus | Status | Evidence | Risk if missing | Suggested mutation |
|---|
Mutation plan
Concrete edits the operator can apply. Re-analyze the mutated prompt and verify the score increases.
Tests to run
Compare two prompts.
Paste an ancestor and a descendant prompt. The diff computes a content-addressed mutation ledger over the 14-locus vector, classifies each change into one of six mutation kinds, projects a 7-axis phenotype shift, and flags regressions and new risks. Cosmetic rewording yields ΔG ≈ 0 — diffs live at the genotype layer, not the text layer.
ΔG (genomic distance)
0.00
ΔZ (predicted phenotype shift)
Each axis is the heuristic projection from locus-status changes onto the 7-axis phenotype model. Range [-2, +2]; zero means no predicted change.
Mutation ledger — genomic diff
Content-addressed (ledger_id = sha256(locus | kind | after_fragment)). v0.3 Evolution Loop will reference these ids in lineage edges.
| Locus | Kind | Before | After | Rationale | Ledger id |
|---|
Regressions & new risks
Next single highest-leverage mutation
Simulate evolution.
Run the same five core engine moves the biological simulator runs (truncation, balanced, drift, OCS-like, introgression) over the prompt-organism substrate. Population of 14-locus PromptOrganisms, deterministic placeholder judge (sum of weighted locus statuses — not a real LLM grader), per-strategy mean-fitness trajectory and Pareto front.
v0.7.33 ships the substrate abstraction as a working contract: the engine kernel does not import promptbio; promptbio implements the engine.Individual / FitnessFunc / MutationOp / RecombinationOp interfaces. A real LLM-judge integration is queued (post-v0.8). Use the v0.1 Mapper above to inspect the ancestor's locus profile first.
Best risk-adjusted strategy
Per-strategy outcome
Final gain = mean Δfitness over replicates; risk = stdev of replicate end-state fitness. Substrate label is promptbio for every row (biology runs would tag biology).
| Code | Name | Final gain | Final risk | Pareto? | Summary |
|---|
Trajectories (mean fitness per generation)
Honesty layer
What could be wrong:
Evolve a prompt family.
From a single ancestor prompt, grow a population of variants, score them across 3 canonical niches (Core breadth, Epistemic depth, Safety-first), select per-niche specialists + global top-K, iterate N generations, and emit a lineage tree whose edges carry content-addressed v0.2 mutation ledger entries.
v0.3 v0.7.34 ships the deterministic core: placeholder judge, 4-kind mutation taxonomy (addition / deletion / substitution / amplification), synchronous endpoint. Real LLM-judge integration + 15-operator v5.7 catalogue + async /start + /status pattern are queued behind v0.4 Ecology.
Run summary
Niche specialists (final generation)
Each niche is a different per-locus weight profile. A specialist in one niche may not be globally best — that's exactly the point.
Generation changelog
| Gen | Mean fitness | Best | Best variant | Pareto front |
|---|
Lineage edges (sample)
Each row is one parent → child mutation. Ledger id is content-addressed (sha256(locus | kind | after_fragment)) and stable across runs.
| Parent | Child | Locus | Kind | Ledger id |
|---|
Honesty layer
Triage the raw context. Get a belief state.
Y = Generate(P, B_t, C_t). Every output must come from a managed belief state, not from raw context. This module classifies incoming statements into the 12-type claim ontology, ranks them on the 10-tier source hierarchy, tags 5-axis confidence, and renders the four-pane inspector (Known facts · Working assumptions · Unknowns · Contradictions).
v0.7.35 ships the full plan endpoint (16 sections), the runtime gate (10 checks + 9 anti-pattern detectors), and the belief-update protocol with deprecate + propagate. Decision Theory layer (v2.8) consumes the belief state.
Epistemic diagnosis & score
EpistemicScore: —
Belief state — 4-pane inspector
Source hierarchy: current_user_correction > current_user_fact > verified_tool_result > authoritative_document > confirmed_memory > older_memory > unverified_user_claim > retrieved_snippet > model_inference > assumption.
Known facts
Working assumptions
Unknowns
Contradictions
Classified claims
| ID | Type | Source tier | Conf (src/intp/inf/act/fresh) | State | Content |
|---|
Runtime gate result
10 checks
Anti-pattern hits
Epistemic status block
Reference blocks (drop-in)
EPISTEMIC PROTOCOL (9 clauses)
PSL epistemology: block
Prove the new prompt is actually better.
Run the candidate through 10 test types × 9 environments × 8 specialised judges. The score matrix tells you where it wins; the reaction norm tells you where it breaks; the ablation tells you which modules earn their place; the regression report tells you what improved and what got worse; the decision engine emits one of accept / accept_as_specialist / reject / split_into_profiles / mutate_again.
v0.7.36 ships a deterministic genome-map-driven judging stack so the canonical P₀ → P₁.₁ → P₁.₂ ladder is reproducible. Real LLM-as-judge integration lands with v1.4 Runtime. F_net = F_quality − λ·Cost.
Deployment decision
verdict
F_quality: F_net: (F_net = F_quality − λ·Cost)
Score matrix (mean per environment)
Geometric mean of 8 judges across the seeded test bank, per environment. Critical failure column flags injection / drift / constraint_leakage misses.
| Env | Mean score | Pass rate | Critical? |
|---|
Reaction norm (per-trait stability across environments)
Stable: variance < 0.04. Floating: variance ≥ 0.04. Niche fit:
| Trait (judge) | Mean | Variance | Worst env | Worst score |
|---|
Regression report (P_old → P_new)
Verdict:
Improved
Degraded
New risks
Ablation plan
Knock out each module; if removing it costs > 0.05 in mean score, keep it.
| Module | Expected loss | Actual loss | Keep? |
|---|
Next mutations
Design the whole organism. One spec, sixteen organs.
v3.0 is the architecture spine: 16 organs (Genome / Constitutional / Context / Epistemic / Metabolic / Immune / Decision / Planning / Runtime / Memory / Tool / Observability / Evaluation / Governance / Reproductive / Autopoietic), 7 information flows, 6 control loops, 10 architectural principles. Pick the inputs; get a prompt_organism spec, the anatomy diagram, the §14 8-step minimum-viable template, the §23 16-step production path, the top risks, and a validation report against all 10 principles.
Agent = PromptOrganism + ActionLoop + Tools + Permissions. The validator classifies your spec as workflow / organism / agent / agent_incomplete / model_reference_only.
Validation verdict
verdict size
Anatomy diagram
4-column anatomy view
Identity
Genome & Runtime
Belief & Decision
Observability & Evolution
Top risks
| Failure mode | Repair |
|---|
§14 Minimum Viable Organism (8 steps)
§23 Full Production Path (16 steps)
Right pattern before right prompt.
v3.1 is zoology over the v3.0 anatomy: 12 canonical pattern cards (Explainer Cell · Research Synthesizer · Strategy Advisor · Strategy Critic · Code Repair · Document Review · Data Analysis · Eval Judge · Memory-Aware Companion · Agentic Tool-Using · High-Assurance Advisor · Autopoietic Ecosystem), 34-row selection matrix, 8 composition rules, 10 anti-patterns. Pick inputs → get pattern + size + agency + required organs + minimal prompt skeleton + required tests.
Recommended pattern
Required organs:
Anti-pattern risks:
Next action: