How We Evaluate Biological Computing Primitives
Every Mutuus primitive goes through a rigorous gauntlet before we claim it deserves a place alongside conventional approaches. Here's the process -- for both organics and metabolics -- and why we publish our failures too.
We don't just name things after biology and call it a day. Every primitive in the Mutuus project goes through a rigorous evaluation before we claim it belongs alongside proven conventional approaches. Organics get 7 phases. Metabolics get 5. Both have binary gates.
Bio-inspired computing has a credibility problem. Too many projects stop at the metaphor: "it's like a neural network," "it's inspired by ant colonies." That's a naming exercise, not engineering. For biological primitives to earn a place alongside proven conventional approaches, they need the same rigor: formal interfaces, complexity bounds, reproducible benchmarks.
That's what Mutuus provides. The biology suggests the design. The mathematics validates it.
Evaluating Organics: The 7-Phase Gauntlet
Seven phases, each with a gate question. Fail a gate, and the organic either gets redesigned or abandoned. No exceptions.
The process starts with a nature analog and ends with publication. Each phase builds on the last:
- Nature Analog Identification -- Find the biological phenomenon and map its properties to computation. The mapping must be genuine, not decorative.
- Formal Specification -- Write a complete spec: data model, operations, complexity, invariants. If you can't spec it, you can't build it.
- Complexity Analysis -- Analyze every operation theoretically. Compare against the inorganic counterpart. The organic must win somewhere.
- Implementation -- Build it in Rust with full test coverage. The code must match the spec.
- Benchmarking -- Run it against the inorganic across multiple workloads and sizes. The numbers must back up the theory.
- WASM + TypeScript Integration -- Make it available in the browser. The overhead must be acceptable.
- Publication -- Ship the documentation, benchmarks, and analysis.
Every phase ends with a binary gate question. Phase 1 asks: "Does this nature analog provide real computational insight, or is it just a naming exercise?" Phase 3 asks: "Does the organic theoretically outperform the inorganic in at least one operation class?" Phase 5 asks: "Does the organic meet or exceed the inorganic on the workloads where it claims superiority?"
These gates exist because it's easy to convince yourself a design is working when you're deep in the implementation. The gates force an honest answer at each stage.
Evaluating Metabolics: The 5-Phase Process
Metabolics aren't data structures -- they're operational strategies. Different primitive, different evaluation.
Metabolics are operational strategies drawn from biology: how systems manage energy, resources, attention, and lifecycle. A metabolic like Unihemispheric Sleep doesn't compete with a data structure -- it offers an alternative to static autoscaling policies. The evaluation reflects this:
- Dormancy/Energy Survey -- Catalog biological strategies for the specific resource constraint. We require convergent analogs from at least two different phyla -- if evolution arrived at this strategy independently multiple times, it's probably solving a real problem.
- Resource Model Extraction -- For each analog, extract the trigger condition, depth spectrum (binary or graduated?), wake latency, maintenance cost, and recovery debt.
- Hardware/Infrastructure Filter -- Can this be implemented with standard OS primitives, container orchestration, or application-level state? We eliminate anything requiring kernel modifications or custom hardware.
- Policy Composition -- Define the default behavior, tuning surface (fewer knobs is better), interaction with organics, and failure modes. The critical question: when this strategy fails, does the system degrade to "always awake" (safe) or "stuck asleep" (dangerous)?
- System Validation -- Map to real systems that already implement ad-hoc versions of this strategy. What does the metabolic simplify? What are the measurable resource savings?
The key difference: organics are benchmarked against a specific inorganic counterpart (Nacre Array vs. Vec, Diatom Bitmap vs. roaring). Metabolics are validated against ad-hoc implementations of the same strategy. Kubernetes HPA is a crude circadian rhythm. Kafka's log.retention.ms is crude hibernation. The metabolic evaluation asks: does formalizing the biological strategy produce a meaningfully better version of what systems already do informally?
What We Measure
Classical analysis tells you O(n). We track dimensions that capture properties conventional approaches don't even track.
The first dimension, O(e), is standard algorithmic complexity -- time and space for every operation. But Mutuus primitives do things conventional approaches don't, so we measure those too.
O(a) captures adaptiveness: how well does the primitive respond when workloads shift? A structure that's fast on Monday's access patterns but slow on Tuesday's has poor adaptiveness.
O(r) captures resilience: what happens at the margins? When you push past design parameters, does it fail abruptly or degrade smoothly?
O(τ) captures thermal cost: adaptation isn't free. Self-organizing behavior consumes CPU cycles and memory. We measure that overhead and decide whether the trade-off is justified.
The τ dimension is particularly important because it's where bio-inspired designs often hide their costs. A cache that "self-tunes" sounds elegant, but if the tuning overhead exceeds the benefit, you're better off with a static LRU. By making thermal cost an explicit dimension, we force ourselves to account for the full cost of nature-inspired behavior.
For metabolics, we add two more dimensions from the Mutuus complexity framework: O(h) for entropy maintenance (the cost to prevent degradation -- homeostatic energy) and O(ε) for emergence time (how long the strategy takes to self-organize and reach steady state).
Honest Benchmarks
We publish everything -- wins, losses, and the regressions we found along the way. If a benchmark shows the primitive is slower, that result ships too.
Our benchmark provenance operates at two levels. Every benchmark run lives in the git history alongside Criterion HTML reports -- anyone can clone the repo and reproduce our results. Separately, we capture key milestones in analysis documents: the breakthroughs, the dead ends, and the "wait, why is this 102x slower than expected?" moments that drive real understanding.
All benchmark code is open source. The methodology is transparent. The commit history tells the complete story, including the parts where we got it wrong before we got it right.
What's Next
The first organics -- Nacre Array and Diatom Bitmap -- are moving through the pipeline now. The first standalone metabolic -- Unihemispheric Sleep -- is being formalized for production use.
Each primitive follows its own publication arc. Some lead with benchmarks, some lead with the nature story, some lead with applied case studies. The variety is intentional -- these are different primitives solving different problems, and they deserve different introductions.
For the full framework, including all evaluation phases, the complexity dimensions, and our benchmark methodology, visit the Approach page. If you want to see the results, head to the Library or browse the Research page.