Mutuus TeamMarch 28, 20267 min read

Nacre Array Benchmarks: The Numbers

We promised the full benchmark numbers. Here they are: every operation, every scale, wins and losses. Vec is still faster at some things. Nacre Array is faster at others. The crossover points tell the real story.

organicsbenchmarksnacre-arrayarrays

Skim

Read

Deep Dive

In Why Nature Builds Better Arrays, we described the limitations of flat sequential storage. In Nacre Array: What If Your Vec Had a Spine?, we introduced the design and showed where segmented architecture should win. This post is the receipt. Every number, every scale, every operation where Vec still wins.

Setup

All numbers in this post come from the corrected benchmark harness run on 2026-03-14 using Criterion.rs on Windows 11 with Rust 1.92. The corrections matter: random access now measures lookup cost instead of lookup plus RNG overhead, and split now compares against Vec::split_off instead of a clone-both-halves baseline. The sizes tested are 1K, 10K, and 100K elements, with push also tested at 1M.

The baseline is Vec<T> from Rust's standard library. This is not a strawman. Vec is one of the most optimized data structures in any language runtime, backed by decades of allocator tuning, LLVM vectorization, and hardware prefetch alignment. Beating it at anything is non-trivial.

Nacre Array uses the current tuned default configuration: 4096-element segments, LZ4 compression, and segment locality cache enabled unless a cache-off variant is shown. For push, we show two Vec baselines: default growth and preallocated capacity. That distinction still matters.

Where Nacre Wins

Three operations where segmented architecture dominates: insert at scale, split at fracture planes, and metadata-only scanning. The margins are large enough to justify the overhead for workloads that hit these paths.

Insert at Middle

Vec shifts n - k elements on every mid-array insert. Nacre Array shifts within one segment, then updates cumulative suffix metadata across later segments. With the current 4096-element default, Nacre still loses at 1K and 10K, but the structural crossover at scale is decisive. At 100K elements, Nacre is 3.7x faster. Note: elements are 12 bytes with the SlimElementHeader (4-byte header + 8-byte u64), compared to 48 bytes under the previous ElementHeader. Smaller elements also make Vec shifts cheaper, reducing the factor from 8.3x to 3.7x; the structural advantage is unchanged.

Size	NacreArray	Vec	Ratio
1K	786 ns	733 ns	Near parity
10K	4,604 ns	3,573 ns	Vec 1.3x faster
100K	19,723 ns	72,047 ns	Nacre 3.7x faster

The crossover now happens later but more decisively. At small and medium sizes, the segmented structure is still overhead. By 100K, the O(n) suffix shift in Vec dominates and Nacre's local movement plus suffix maintenance wins convincingly.

Split at Midpoint

This is the clearest structural win on the tuned branch. The old comparison used clone+truncate for Vec; the proper contiguous-array baseline is Vec::split_off. Against that baseline, Nacre now measures 12.1 microseconds (12,105 ns) versus 90.9 microseconds (90,944 ns). That is a real 7.5x win, not a benchmark artifact. The factor decreased from 23.6x because smaller elements (12 bytes vs 48 bytes) also make Vec::split_off cheaper; the memcpy cost that Nacre avoids is proportional to element size.

Size	NacreArray (fracture)	Vec (`split_off`)	Ratio
100K	12,105 ns	90,944 ns	Nacre 7.5x faster

Segment Scan

This is the real metadata win. scan_segments evaluates about 25 segment summaries in 241 ns. A comparable Vec predicate scan over 100K elements takes about 324,578 ns. That is a roughly 1346.5x difference, and it is not a micro-optimization of the same operation. Segment metadata lets you ask which regions are interesting before you touch payloads.

Operation	Time (100K)	Elements Touched
scan_segments	241 ns	0 payloads
scan_headers_collect	305,952 ns	100K element-associated headers
Vec iter_filter_collect	324,578 ns	100K elements

scan_headers is near parity with the equivalent Vec collect because the current implementation still traverses live elements to inspect their headers. The real structural win is scan_segments: summary-level filtering at segment granularity.

Where Vec Wins

Radical honesty time. Vec is faster at random access, iteration, and small-scale pushes. These are real costs, not edge cases.

Random Access

Vec's get is a single pointer offset. Nacre's get resolves the containing segment, then dereferences into that segment. Once the harness stopped timing RNG work, the gap turned out to be much larger than the old draft claimed. For pure random access, disable the cache: it only adds overhead.

Size	Nacre (cache)	Nacre (no-cache)	Vec	cache/Vec	no-cache/Vec
1K	2.14 ns	2.72 ns	0.76 ns	2.8x	3.6x
10K	3.52 ns	3.82 ns	0.73 ns	4.8x	5.2x
100K	6.73 ns	5.96 ns	0.75 ns	9.0x	7.9x

The asymptotic story is still correct: segmented lookup is structurally slower than contiguous pointer arithmetic. What changed is the measured constant factor. With the benchmark now isolating lookup cost cleanly, random access is not a mild tax. It is a major tradeoff.

Iteration

Vec wins iteration decisively at every scale. Contiguous memory with sequential hardware prefetch is extremely hard to beat. Per-segment pointer indirection breaks the prefetch stream. This is a fundamental cost of segmentation, not an optimization we missed.

Size	NacreArray	Vec	Ratio
1K	804 ns	103 ns	Vec 7.8x faster
10K	10,671 ns	1,295 ns	Vec 8.2x faster
100K	99,054 ns	15,684 ns	Vec 6.3x slower

With smaller elements, Vec iteration benefits more from cache-line density (more elements per cache line), widening the gap from the previous 2.9x to 6.3x at 100K. Contiguous memory and hardware prefetch dominate segment-by-segment traversal.

Push Throughput

Push needs two baselines. Against default-growth Vec, Nacre briefly wins at 100K because Vec hits a reallocation cliff. Against preallocated Vec, Nacre loses at every tested size. That means push is not a structural reason to choose Nacre Array.

Size	NacreArray	Vec (default)	Vec (prealloc)
1K	4,568 ns	1,323 ns	569 ns
10K	40,025 ns	12,202 ns	5,711 ns
100K	436,050 ns	505,050 ns	322,840 ns
1M	6,843,200 ns	5,000,500 ns	2,740,900 ns

{/* integrity:claim push_100k_vs_vec_default /} {/ integrity:claim push_100k_vs_vec_prealloc */} At 100K, Nacre records 436,050 ns versus 505,050 ns for default-growth Vec and 322,840 ns for preallocated Vec. Elements are 12 bytes with the SlimElementHeader, compared to 48 bytes under the previous ElementHeader; both Nacre and Vec benefit from smaller elements.

The Crossover Points

Segmentation starts paying for itself at different scales depending on the operation. Understanding these crossover points is the key to knowing when Nacre Array is the right choice.

For insert, the crossover is between 10K and 100K elements. At 10K, Vec still wins. By 100K, suffix shifting dominates and Nacre pulls ahead. At 100K, the medians are 19,723 ns for Nacre and 72,047 ns for Vec.

For push, the crossover against default-growth Vec is narrow and unreliable. Against preallocated Vec, there is no crossover. Do not choose Nacre Array for push throughput.

For split, the corrected baseline still shows a clear Nacre win because the operation is fundamentally different: segment-level fracture versus contiguous suffix relocation.

For random access and iteration, there is no crossover in the other direction. Vec wins at every tested size. The question is whether the structural wins elsewhere justify paying that tax.

The Optimization Story

The corrected harness forced a reset in how we talk about get(). The durable implementation ideas survived. The old headline ratios did not.

What still matters from the optimization work:

The linear segment scan is gone. Lookup uses a cumulative index and binary search.
The segment cache is real, but it is workload-sensitive. It helps sequential, localized, and the current Zipfian benchmark. It hurts pure random access.
Insert and remove now update cumulative suffix metadata incrementally. Split now carries only the moved right-half cumulative metadata and renormalizes that side, so the mutation-side upside is more faithful to the segmented design than it was earlier in the branch.

What no longer survives publication:

The old 15.8x -> 2.55x random-access story. Those ratios were collected under the pre-fix harness and are not the numbers we should publish.

The corrected numbers are the ones above: at 100K, pure random access is 5.96 ns cache-off versus 0.75 ns for Vec. That is the honest current state.

Full Complexity Table

Operation	Nacre	Vec	Winner
push	O(1) amortized	O(1) amortized	Mixed
get	O(log S) / O(1) cached	O(1)	Vec
insert	O(s + S) current impl	O(n)	Mixed, Nacre at scale
remove	O(s + S) current impl	O(n)	Mixed, Nacre at scale
split	O(S) current impl	O(n)	Nacre at scale
iter	O(n)	O(n)	Vec (cache locality)
scan_headers	O(n)	O(n)	Mixed
scan_segments	O(segments)	N/A	Nacre (novel)
tick	O(segments)	N/A	Nacre (novel)

The s in O(s) is the segment capacity (current default 4096), not the total element count. The S in O(S) is the number of segments. The current implementation's mutation path is the composition of both: local per-segment work plus cumulative suffix maintenance or right-half metadata renormalization.

What the Numbers Mean

The Nacre Array is not a faster Vec. It is a different set of tradeoffs. You pay a tax on reads and iteration. You gain structural operations (split, insert, scan) that scale independently of collection size.

Think of it as a metabolic investment. The per-element header overhead, the segment metadata, the binary search on every get. These are ongoing costs. The returns come when you split a time-series partition without copying, when you scan segment summaries instead of iterating millions of elements, when you insert into a hot region without shifting the entire collection.

The workloads that benefit most: event logs, time-series stores, stream processors, and any system that partitions, scans, or restructures data at runtime. The workloads that should stay with Vec: tight inner loops over contiguous data, random-access-heavy indices, and small collections where the metadata cost never amortizes.

The biology was right about the architecture, but the implementation still matters. Layered structures with differentiated regions do handle certain stresses better than uniform ones. The corrected benchmarks show exactly where that principle pays off today and where the current code still needs work.

What's Next

Next up: Diatom Bitmap benchmarks, covering density-aware container selection and the cooperative thermoregulation model. We're also developing steady-state benchmarks that measure performance after tick cycles, capturing the long-term behavior of the thermal state machine rather than just initial throughput.

March 26, 20268 min read

Nacre Array: What If Your Vec Had a Spine?

Mother-of-pearl is 3,000 times tougher than the crystals it's made of. The secret is layered organization with flexible mortar between rigid segments. We built an array that works the same way.

March 30, 202610 min read

What Happens When Your Index Navigates by Smell?

Echo Field just ran its first away-field benchmark on 18,840 real arXiv sections. 100% recall parity with brute-force, 18x faster at 500K vectors, and churn-stable under continuous mutation. Here's what we found.

April 7, 20265 min read

What If Your Bitmap Knew Its Own Density?

Bitmap indexes power analytics queries across billions of rows. But the dominant implementation treats all data the same: fixed boundaries, uniform containers, no adaptation. What if the bitmap could see its own density distribution?

April 16, 20266 min read

What Happens When Bitmap Boundaries Listen to the Data?

What if bitmap container boundaries adapted to data density instead of falling at fixed intervals? We benchmarked 14 operations at three scales to find out. The results surprised us in both directions.

March 17, 20265 min read

Why Nature Builds Better Arrays

Every system that stores sequential data eventually hits the same wall: cold data costs the same as hot, inserts require shifting everything, and there's no way to split without copying. What if the structure itself knew the difference?

March 19, 20266 min read

Why Your Data Structure Doesn't Have a Metabolism

Data structures are benchmarked at birth, compared as static objects, and optimized for a single moment in time. But biological systems invest in costly metabolic machinery that pays off across a lifecycle. What if the overhead isn't the problem, but the investment?