Nacre Array: A Segmented Dynamic Array with Thermal State Management and Structural Fracture Planes
Abstract. The Nacre Array is a segmented dynamic array with three properties absent from conventional contiguous arrays: (1) per-element metadata headers that make elements self-describing without external bookkeeping, (2) a four-state thermal machine (Hot/Warm/Cold/Compressed) that allows cold segments to compress in place via LZ4 or Zstd while hot segments remain uncompressed and mutable, and (3) fracture planes at segment boundaries that enable structural splits without copying element payloads. These properties target a recurring constraint class: sequence workloads where structural mutation, selective compression, and low-cost partitioning matter more than uniform random-access throughput. Inspired by nacre (mother-of-pearl), where layered aragonite tablets bonded by organic biopolymer achieve roughly 3,000x the fracture resistance of the raw mineral [9, 10], the structure organizes elements into fixed-capacity segments with per-segment summaries and a tick-driven cooldown finite-state machine. The implementation evaluated here performs mutations locally within a segment, updates cumulative suffix metadata incrementally after insert/remove, splits cumulative metadata alongside segments instead of rebuilding both halves from scratch, and exposes an explicit reheat path for compressed segments through owned reads and mutable accessors. Against Rust's Vec<T> on the Criterion harness used in this paper (12-byte elements: 4-byte SlimElementHeader + 8-byte u64), Nacre splits 7.5x faster than Vec::split_off at 100K elements and inserts 3.7x faster at the midpoint, while pure random access is 7.9x slower at 100K with the segment cache disabled. These structural win factors are element-size-sensitive: larger elements amplify the advantage because Nacre avoids per-element copying in split and bounds shift cost in insert, while Vec copies or shifts the full suffix. A brief 100K push win over default-growth Vec persists, but push throughput is not treated here as a structural Nacre advantage because preallocated Vec remains faster. Mixed hot/cold memory-reduction claims are deferred until a reproducible benchmark harness for that workload is checked into the repository. The paper positions Nacre as a specialized segmented-sequence primitive for workloads where these structural operations dominate, not as a blanket replacement for contiguous arrays.
1. Introduction
The dynamic array is the most ubiquitous data structure in computing. Rust's Vec<T>, C++'s std::vector, Python's list, Java's ArrayList. Every language provides a contiguous, growable, cache-friendly sequence as the default container for ordered data. Event logs, time-series buffers, message queues, and database rows all reach for the dynamic array first.
The design is well-understood: a heap-allocated contiguous block with geometric growth (typically factor 2). When the block fills, allocate a new block at double the capacity, copy everything, free the old block. Elements are accessed by pointer arithmetic in O(1). Iteration benefits from hardware prefetch over contiguous memory. For many workloads, this is close to optimal.
But the uniformity that makes dynamic arrays simple also makes them rigid. Four specific limitations compound in production systems:
All-or-nothing memory. A Vec is either fully in memory or not. A 1GB event log where only the last 10MB is actively queried still consumes 1GB. There is no mechanism to selectively compress the cold 990MB while keeping the hot 10MB fast. Kafka partition logs solve this externally with fixed segment files (default 1GB), deleting or archiving entire segments [7], but this is coarse-grained and requires infrastructure outside the data structure.
O(n) mid-array mutation. Inserting or removing an element at position i shifts every element after i. For a 10M-element event log, inserting at position 5M moves 5M x size_of::<T>() bytes. This makes mid-array mutation impractical at scale, forcing systems to treat arrays as append-only and build separate structures for ordered insertion.
No structural split points. Splitting a Vec requires cloning half the data. There is no concept of a natural division point. Kafka partitions solve this by stopping writes to old segments and creating new ones [7], but this requires segment management infrastructure, not a property of the array itself.
Reallocation cliff. When capacity is exceeded, the entire contents are copied to a larger allocation. For large arrays (100M+ elements), this creates unpredictable latency spikes. The amortized O(1) guarantee hides a worst-case O(n) copy that arrives without warning.
These are not theoretical concerns. Apache Kafka maintains separate .index files per segment because the log itself has no per-element metadata [7]. Apache Arrow column chunks are contiguous arrays with no per-chunk thermal awareness. Time-series databases partition by time interval because the underlying array offers no selective compression. In each case, infrastructure has been built around the array to compensate for properties the array lacks.
This paper therefore examines an alternative assumption: a sequence can remain array-like at the interface while differentiating its internal regions by metadata, thermal state, and structural boundaries. The goal is not to match Vec on pure random access, where contiguous storage retains the natural advantage, but to determine whether segmentation enables capabilities that contiguous dynamic arrays do not provide directly. The claim is correspondingly narrow: this is a study of a segmented-sequence wedge, not a claim that Nacre replaces Vec broadly or eliminates the surrounding infrastructure used by systems such as Kafka or Arrow.
2. Background and Related Work
2.1 Segmented Sequences
The idea of breaking a sequence into pieces is not new. Several established data structures partition sequences for different purposes:
Gap buffers (Emacs, 1985) maintain a single contiguous block with a moveable gap at the cursor position. Insertions at the gap are O(1); moving the gap is O(n). Designed for text editing where edits cluster around one location, not for general-purpose arrays.
Ropes (Boehm et al. 1995) represent strings as balanced binary trees of short substrings [1]. Concatenation is O(1) (create a new internal node). Random access is O(log n) through tree traversal. Designed for immutable string manipulation where concatenation dominates.
Unrolled linked lists combine array segments with linked list pointers [11]. Each node holds a small array (typically 64-256 elements). Insert is O(n) worst case (shift within node + possible split). Random access is O(n) (linear traversal). The structure trades memory for cache locality within nodes but does not address compression or thermal awareness.
CDT (Counted B-Trees) use B-tree structure to index a sequence by position [12]. Insert and remove are O(log n). Random access is O(log n). Well-suited for ordered sets with frequent mid-sequence mutations, but the tree overhead (internal nodes, rebalancing) exceeds what a flat segmented design requires.
2.2 Storage-Level Segmentation
At the storage layer, segmentation is standard practice:
LSM-trees (O'Neil et al. 1996) write sorted runs to immutable SSTables and merge them in the background [2]. New data goes to an in-memory buffer (hot); old data lives in on-disk levels (cold). The hot/cold distinction is structural but operates at the file level, not within a single data structure.
Kafka partition logs segment into fixed-size files (default 1GB) [7]. Each segment has an associated .index file for offset lookup. Retention policies delete or compact entire segments. The segmentation is external to the data structure; Kafka's log is a directory of files, not a segmented array.
Apache Arrow organizes columnar data into record batches. Each batch is a contiguous array. There is no per-batch thermal awareness; all batches consume equal memory regardless of access pattern.
2.3 Compressed Data Structures
Compression within data structures has been explored in several contexts:
Compressed bitmaps (Roaring, WAH, EWAH, CONCISE) apply different encoding strategies per container based on density [5]. This is analogous to thermal management (dense regions use bitmap encoding, sparse regions use array encoding), but the decision is based on cardinality at write time, not on access pattern over time.
Succinct data structures (Jacobson 1989) represent data in space close to the information-theoretic minimum while supporting efficient queries [3]. These are statically compressed; the structure does not adapt to workload.
Adaptive compression in database systems (Abadi et al. 2006) selects compression schemes per column based on data characteristics [4]. Subsequent work confirmed that column-oriented storage fundamentally outperforms row-stores for analytical workloads [8], but compression decisions remain per-column, not per-region within a single column.
2.4 Positioning Nacre Array
The Nacre Array occupies a specific niche: a dynamic array where segments carry per-element metadata, transition through thermal states based on access patterns, and provide structural split points. It is not a rope (no tree structure), not an unrolled linked list (has cumulative index for O(log S) access, not O(n) traversal), and not a storage engine (operates in-memory with optional compression). The closest relative is the gap buffer; both introduce structure within a contiguous-seeming interface, but the Nacre Array generalizes from one cursor position to multiple independently-managed segments with thermal states.
3. Nature Analog
3.1 Nacre Biology
Nacre, commonly known as mother-of-pearl, is the iridescent inner lining produced by mollusks (oysters, abalone, nautilus). It is composed of approximately 95% aragonite (calcium carbonate) crystals arranged in microscopic layered tablets, separated by approximately 5% organic biopolymer matrix (chitin and proteins).
The tablets are flat and roughly hexagonal, typically 5-15 micrometers in diameter and 0.5 micrometers thick [9]. They stack in layers (the brick-and-mortar microstructure), where aragonite tablets are the bricks and the organic biopolymer is the mortar. This layered architecture gives nacre unusually high toughness relative to pure aragonite: despite being made of a brittle mineral, nacre is approximately 3,000 times more fracture-resistant than pure aragonite [10].
The mechanism is crack deflection [10]. When a crack propagates through pure aragonite, it travels in a straight line through the crystal. In nacre, when a crack reaches the organic mortar layer between tablets, the mortar absorbs energy and redirects the crack along the layer interface. Propagation is impeded and spread across multiple interfaces rather than continuing as a single catastrophic path [9, 10].
Three additional biological properties map to computational concepts:
Growth by accretion. Nacre grows by depositing new aragonite tablets at the mantle edge. Existing layers are never disturbed. Older layers may mineralize further (becoming denser) while newer layers remain soft. Growth is append-only and non-destructive.
Variable density. In mature nacre, inner layers (deposited earlier) are denser and more crystallized than outer layers (deposited recently). The structure has a density gradient: old material is compact and structural; new material is soft and metabolically active.
Self-describing layers. Each tablet has distinct thickness and composition reflecting the conditions at the time of deposition. The layer itself records information about when and how it was formed.
3.2 Computational Mapping
The mapping from nacre biology to array design is structural, not metaphorical:
| Biological Feature | Computational Feature |
|---|---|
| Aragonite tablets (bricks) | Segments: fixed-capacity element storage |
| Organic biopolymer (mortar) | SlimElementHeader + SegmentSummary metadata |
| Crack deflection at mortar layers | Fracture planes: structural split boundaries |
| Growth by accretion at mantle edge | Append-only tail segment; new segment on overflow |
| Variable density (old = dense) | Thermal FSM: cold segments compress, hot segments stay mutable |
| Layer-specific composition | Per-segment state: each segment tracks its own access pattern |
The nacre analog provides four distinct computational insights that are not naming conventions applied to standard techniques:
- Segmented storage with per-segment metadata enables summary-based filtering at O(S) instead of O(n)
- Fracture-guided splitting enables payload-copy-free split at structural boundaries instead of O(n) data copying
- Append-without-disturb growth eliminates the reallocation cliff
- Thermal gradient with selective compression enables hot/cold data management within a single data structure
3.3 Convergent Analogs
The segmented-with-selective-compression pattern appears across multiple biological systems, providing convergence evidence:
Bone remodeling (Wolff's Law). Bone strengthens along stress lines: heavily loaded regions become denser, unloaded regions resorb [6]. This is thermal-state management in biology: active regions receive investment, inactive regions conserve resources.
Tree growth rings. Sapwood (recent rings) is metabolically active; heartwood (old rings) is structurally sound but inactive. The tree does not maintain metabolic activity in old wood; it transitions old material to a compressed, structural-only state.
Coral reef architecture. Active growth occurs at the outer edge. The interior is structurally important but no longer living. The thermal gradient from hot (growing edge) to cold (fossil interior) is a physical analog of segment thermal states.
Three independent biological systems (mollusks, vertebrate bone, plants) converge on the same pattern: layered structures where active regions maintain full functionality and inactive regions transition to a compressed state. This is strong convergence across phyla.
4. Architecture
4.1 NacreArray<T>
The top-level container manages a sequence of segments with cumulative indexing for efficient random access. Its core responsibilities are: maintaining a prefix-sum index over segment lengths for O(log S) element resolution, driving a monotonic tick clock for thermal transitions, and optionally caching the last-resolved segment to accelerate sequential access patterns.
NacreArray conceptual components:
────────────────────────────────
segments ordered sequence of Segment instances
cumulative index prefix sums enabling O(log S) element lookup
tick clock monotonic counter driving thermal transitions
cooldown config thresholds for Hot → Warm → Cold → Compressed
locality cache optional 1-entry cache of last-resolved segment
The cumulative_lens vector maintains prefix sums of segment lengths. Given global index i, the target segment is found via binary search (partition_point) in O(log S) where S is the segment count. The reason is straightforward: cumulative_lens is monotone increasing, so partition_point(|x| x <= i) returns the first prefix sum strictly greater than i, which is exactly the segment containing that global index. At 100K elements with the current default segment capacity 4096, S is approximately 25, requiring roughly 5 binary search comparisons.
The segment locality cache is a 1-entry cache remembering the last resolved segment. For sequential or localized access patterns, consecutive lookups hit the same segment. A cache hit costs 2 comparisons (is i >= cached_start && i < cached_end?) versus roughly 5 comparisons for the binary search fallback. The cache uses Cell<usize> for interior mutability from &self, allowing get() to remain an immutable operation.
4.2 Segment<T>
Each segment is a self-contained storage unit with two possible representations: live (uncompressed, directly accessible) or compressed (byte buffer with codec tag). There is no partial compression within a segment; a segment is fully accessible or fully compressed. This binary representation simplifies the thermal FSM and eliminates the need for per-element compression tracking.
Each segment carries a summary (element count, total accesses, and a base tick for relative timestamps), its current thermal state, a fixed capacity, and a counter tracking how many ticks it has spent in its current state.
The SegmentSummary tracks aggregate metadata directly: element count and total access count are incremented on push and decremented on remove, without reading per-element fields. This enables scan_segments, filtering entire segments by summary predicate without touching element data.
4.3 SlimElementHeader
Every element in a Nacre Array carries a SlimElementHeader (4 bytes) containing a single relative_tick: u32 offset from the segment's base tick. This is the "mortar" in the brick-and-mortar metaphor, reduced to the minimum viable self-description.
An audit of all read sites in the Nacre Array codebase found that every per-element field access was consumed only at the segment-aggregate level (computing min/max timestamps and access count sums for SegmentSummary). No production code path compared individual element headers against each other or returned per-element metadata to callers. This made it safe to hoist all aggregate tracking to the segment, leaving elements with only a relative tick offset.
The previous ElementHeader (40 bytes) tracked creation tick, last-accessed tick, access count, a compression flag, and a generation counter. Of these: access counts and timestamps are now segment-level aggregates; the compression flag was redundant with SegmentState; and the generation counter was unused by nacre. The full ElementHeader remains available in mutuus-core for organics that need per-element tracking (e.g., diatom container descriptors).
The scan_headers(predicate) API remains, operating on the 4-byte SlimElementHeader per element. For segment-level filtering, scan_segments is preferred as it examines only the SegmentSummary per segment.
The per-element overhead is 4 bytes. For a 10K-element array of u64 values, that is 40 KB of metadata versus 80 KB of data (0.5:1 ratio), compared to 400 KB of metadata under the previous 40-byte header (5:1 ratio).
4.4 Thermal State Machine
The cooldown FSM governs per-segment resource allocation through four states:
Hot. Recently accessed. Uncompressed and mutable. The default state for new segments and any segment touched by get, get_mut, set, insert, or remove.
Warm. Not recently accessed. Still uncompressed, but a candidate for cooling. Transitions from Hot after hot_to_warm ticks without access (default: 10).
Cold. Stale. Eligible for compression. Transitions from Warm after warm_to_cold ticks without access (default: 20).
Compressed. Data compressed with LZ4 or Zstd. Mutable access and the owned read path trigger lazy decompression and transition to Hot; borrowed Rust get(&self) remains a live-only view because immutable borrows cannot perform the reheat mutation. Transitions from Cold after cold_to_compressed ticks (default: 5), triggered by tick().
Transitions upward (toward Hot) happen immediately on access. Transitions downward (toward Compressed) happen only during tick(), which advances the clock and evaluates each segment's ticks_in_state counter against the cooldown thresholds. This asymmetry ensures the hot path never pays compression cost; compression is deferred to background maintenance.
The CooldownConfig controls transition timing:
| Transition | Default Ticks | Rationale |
|---|---|---|
| Hot to Warm | 10 | Short grace period for bursty access |
| Warm to Cold | 20 | Longer cooldown before compression eligibility |
| Cold to Compressed | 5 | Compress quickly once Cold is confirmed |
4.5 Segment Locality Cache
The locality cache exploits a common access pattern: consecutive accesses often land in the same segment. With the current default segment capacity of 4096 elements, a sequential scan hits the same segment for 4096 consecutive get() calls before crossing a boundary.
Index resolution uses a two-tier strategy: a fast path that checks whether the requested index falls within the last-resolved segment, and a slow path that performs binary search over the cumulative index when the cache misses.
- 1input: global index i
- 2output: (segment index, local index within segment)
- 3if cache is enabled and i falls within cached segment range then
- 4return (cached segment, i - cached start)
- 5segment := binarySearch(cumulative index, i)
- 6localIndex := i - cumulativeStart(segment)
- 7updateCache(segment)
- 8return (segment, localIndex)
The cache is configurable. For pure random access workloads where the cache always misses, it adds measurable overhead and should usually be disabled. For sequential access, it can eliminate most binary searches within a segment.
4.6 Fracture Planes
Fracture planes are the global indices at segment boundaries: the values stored in cumulative_lens. They are the computational equivalent of nacre's mortar layers: structural division points where splitting is cheap.
split_at(index) at a fracture plane: Divide the segments vector and cumulative metadata at the boundary, then renormalize the moved right-half cumulative entries. No element data is copied.
split_at(index) off a fracture plane: Split the containing segment internally, then divide the segments vector and derive a fresh right-half cumulative vector from the split point onward. The local element movement is bounded by s, and the metadata work scales with the moved suffix rather than a full rebuild of both halves.
nearest_fracture(index): Binary search over fracture planes to find the closest structural split point. Tie-breaking prefers the lower index.
fracture_planes(): Returns all segment boundary indices. O(S).
The low-copy property follows from the representation. At a fracture plane, the operation divides the segment vector and its cumulative metadata at an existing boundary; it does not move element payloads between segments. The remaining work is limited to container-level bookkeeping for the two resulting arrays and renormalization of the moved right-half metadata. For sequence workloads that already cut logs or buffers at structural boundaries (Kafka is one example [7]), this remains attractive because repartitioning at a boundary avoids copying 5M element payloads even though the current implementation still renormalizes metadata for the moved suffix.
5. Formal Specification
5.1 Core Traits
The Nacre Array and its underlying segments rely on four traits from mutuus-core:
| Trait | Implemented By | Methods | Purpose |
|---|---|---|---|
Fracturable | NacreArray<T> | fracture_planes(), nearest_fracture() | Expose structural split boundaries |
Compressible | Segment<T> | compress(), decompress(), is_compressed() | Selective compression at segment level |
Maintainable | NacreArray<T> | tick() -> MaintenanceReport | Thermal FSM advancement |
Adaptive | NacreArray<T> | strengthen(), decay() | Required by the shared core trait set |
5.2 Operation Complexity
Let n = total elements, s = segment capacity (current default 4096), S = number of segments = ceil(n/s).
| Operation | NacreArray | Vec | Winner | Notes |
|---|---|---|---|---|
push(v) | O(1) amortized | O(1) amortized | Mixed | Nacre briefly beats default-growth Vec at 100K, but loses to preallocated Vec |
get(i) | O(log S) / O(1) cached | O(1) | Vec | Cache helps some repeated-locality workloads, not pure random access |
set(i, v) | O(log S) | O(1) | Vec | Same segment-resolution overhead as get |
insert(i, v) | O(s + S) | O(n) | Mixed | Shift is bounded by segment capacity s, then cumulative suffix metadata is updated across trailing segments |
remove(i) | O(s + S) | O(n) | Mixed | Local removal plus trailing cumulative metadata maintenance |
split_at(i) | O(S) | O(n) | Nacre at scale | No element payloads are copied at a fracture plane; the moved right-half metadata is renormalized |
iter() | O(n) | O(n) | Vec* | *Vec has better cache locality |
scan_headers(pred) | O(n) | O(n) | Mixed | Current implementation is near parity with iter().filter() on Vec |
scan_segments(pred) | O(S) | N/A | Nacre | Eliminate segments by summary |
tick() | O(S) | N/A | Nacre | Thermal transitions + compression scheduling |
len() | O(1) | O(1) | Tie | Cached in total_len |
is_empty() | O(1) | O(1) | Tie |
The bounds in Table 2 describe the implementation evaluated in this paper. get(i) and set(i, v) first resolve a segment by binary search over cumulative_lens, which gives O(log S), and then perform O(1) work inside the resolved segment. insert(i, v) and remove(i) perform a local shift inside one segment, bounded by s, followed by suffix metadata maintenance across trailing segments, which gives O(s + S). split_at(i) at a fracture plane divides existing vectors and renormalizes only the moved right-half cumulative entries, so the implemented operation is O(S) in metadata volume rather than O(n) in payload volume. scan_segments(pred) is O(S) because it evaluates one summary per segment and does not inspect individual elements.
5.3 Novel Complexity Classes
The Nacre Array introduces complexity classes that have no Vec equivalent:
- O(s): "per-segment" complexity, where
s= segment capacity (currently 4096 by default). This is the cost of operating locally within a single segment. - O(S): "segment-count" complexity, where
S= ceil(n/s). This is the cost of scanning segment metadata or maintaining cumulative metadata across a moved suffix in the current implementation. - O(log S): "segment-search" complexity. Binary search over cumulative lengths to locate the target segment. At n=10M with s=512, S is approximately 19,531 and log2(S) is approximately 14 comparisons.
5.4 Invariants
I1. Length consistency. total_len == sum(segment.len() for segment in segments).
I2. Cumulative index consistency. cumulative_lens[i] == sum(segments[0..=i].len()) for all i.
I3. Header validity. Every element has a valid SlimElementHeader with relative_tick <= u32::MAX.
I4. Compressed segment state. Compressed segments have state == Compressed and summary.count > 0.
I5. Fracture plane alignment. Fracture planes exist only at segment boundaries (indices cumulative_lens[i] for each i).
I6. Segment capacity immutability. Segment capacity is fixed at creation and never changes.
I7. Non-empty structure. At least one segment always exists, even when the array is empty.
5.5 Maintenance Model
The tick() method is the structure's self-tuning heartbeat. Called periodically by the application (not internally scheduled), it performs:
- Advance the monotonic tick clock
- For each segment: increment
ticks_in_state; evaluate cooldown thresholds - Transition segments that exceed their cooldown timer
- Compress segments transitioning to Compressed state (LZ4 default, Zstd optional)
- Return a
MaintenanceReportwith segments compressed, bytes saved, and transition counts
The tick cycle is O(S). With the current default segment capacity, 100K elements correspond to roughly 25 segments, so maintenance walks a few dozen segment records rather than the full payload. Compression cost (P3 tier) is amortized across multiple ticks since only newly-Cold segments are compressed.
6. Implementation
6.1 Implementation Overview
The Nacre Array is implemented as a Rust crate (mutuus-nacre) with 38 unit tests and 1 doctest across nine benchmark groups. The implementation supports four optional feature flags:
Feature Flags
─────────────
lz4 (default) LZ4 compression for cold segments (fast compress/decompress)
zstd Zstd compression (higher ratio, slower decompression)
serde Serialization support
wasm WebAssembly target compatibility
LZ4 is the default compression codec, offering fast compression and decompression suitable for the hot path where a Cold segment may be accessed and decompressed. Zstd provides higher compression ratios for workloads where cold data is rarely accessed and memory savings matter more than decompression latency.
6.2 OrganicElement Trait
Elements stored in a Nacre Array must implement the SlimOrganicElement trait from mutuus-core, which requires read and write access to the element's slim header and a Clone bound for compression round-trips:
pub trait SlimOrganicElement: Clone + Sized {
fn slim_header(&self) -> &SlimElementHeader;
fn slim_header_mut(&mut self) -> &mut SlimElementHeader;
}
The Nacre Array reads slim_header() for element-level predicate scanning. Aggregate metadata (access counts, timestamps) is tracked at the segment level, not per-element.
6.3 Compression Model
Compression operates at the segment level. When a segment transitions to the Compressed state, its element data is compressed in place while its summary (maintained by the segment directly) remains accessible. This means scan_segments continues to work on compressed segments without decompression.
Access to a compressed segment through the owned read path (read), mutable access (get_mut, set, remove), or the WASM/TypeScript get() wrapper triggers lazy decompression: the segment decompresses, deserializes its elements, and transitions back to the Hot state. Borrowed Rust get(&self) remains live-only.
- 1if segment is Cold and cooldown is exceeded then
- 2compress(segment, codec)
- 3segment transitions to Compressed
- 4summary remains accessible without decompression
- 5if access targets a compressed segment through a reheat path then
- 6decompress(segment)
- 7segment transitions to Hot
- 8cooldown counter resets
6.4 Fracture Plane Splitting
The split operation has two paths depending on whether the requested index aligns with a segment boundary.
- 1input: global index i
- 2output: new NacreArray containing elements [i..]
- 3if i aligns with a fracture plane then
- 4divide segment sequence and cumulative metadata at boundary
- 5renormalize moved right-half cumulative entries
- 6else
- 7resolve containing segment for i
- 8split that segment internally at local offset -> O(s)
- 9divide segment sequence after split point
- 10derive right-half cumulative entries from the split base
At a fracture plane, split divides only the segment sequence; no element data is copied. Both halves are immediately usable as independent NacreArray instances with their own cumulative indices and locality caches.
6.5 Access-Path Measurements
The benchmark harness for get() excludes random-number generation and modulus work from the timed loop so the measured path reflects lookup cost rather than benchmark overhead. The medians below therefore isolate the lookup path itself:
| Size | Nacre (cache on) | Nacre (cache off) | Vec | Observation |
|---|---|---|---|---|
| 1K | 2.85 ns | 3.55 ns | 0.99 ns | Small arrays are still slower than Vec, but the absolute gap is modest |
| 10K | 5.02 ns | 7.01 ns | 1.03 ns | The cache still hurts pure random access |
| 100K | 10.98 ns | 8.33 ns | 1.03 ns | Cache-off is the correct configuration for pure random access |
The measured implementation uses a cumulative index with an optional locality cache. Under this harness, the segment cache helps sequential and localized workloads, hurts pure random access, and is not enough to make the current Zipfian workload competitive with Vec.
6.6 Test Coverage
38 unit tests plus 1 doctest, organized by operation category:
| Category | Count | Coverage |
|---|---|---|
| Basic CRUD (push/get/set) | 6 | Empty, single, multi-segment |
| Insert/Remove | 6 | Middle, boundary, overflow |
| Fracture planes | 5 | At boundary, off-boundary, nearest |
| Compression | 5 | Round-trip, stats, explicit decompression path |
| Thermal FSM | 4 | Hot to Warm to Cold to Compressed transitions |
| Iteration | 4 | iter, iter_mut, empty, single |
| Edge cases | 4 | Empty array, single element, exactly-full segment |
| Summary scanning | 2 | scan_headers, scan_segments |
| Segment cache | 2 | With/without cache behavior |
All tests pass under cargo test -p mutuus-nacre. Clippy reports zero warnings.
7. Evaluation
7.1 Benchmark Suite
Nine Criterion benchmark groups in benches/vs_vec.rs:
| Group | What It Measures | Sizes |
|---|---|---|
push | Build array from empty | 1K, 10K, 100K, 1M |
get_random | Random access latency (cache-on, cache-off, Vec) | 1K, 10K, 100K |
insert_middle | Insert at n/2 (steady-state) | 1K, 10K, 100K |
split | Split at midpoint (fracture vs Vec::split_off) | 100K |
iter | Full sequential iteration throughput | 1K, 10K, 100K |
metadata_scan | Collect-heavy and count-only metadata traversals | 100K |
sequential_scan | Sequential get() over the full range | 1K, 10K, 100K |
zipfian | Zipfian-distributed random access (alpha=1.0) | 10K, 100K |
localized | Random access within +/-64 windows | 10K, 100K |
7.2 Push Throughput
| Size | Nacre | Vec (default growth) | Vec (preallocated) | Interpretation |
|---|---|---|---|---|
| 1K | 4,568 ns | 1,323 ns | 569 ns | Vec wins decisively |
| 10K | 40,025 ns | 12,202 ns | 5,711 ns | Vec still wins decisively |
| 100K | 436,050 ns | 505,050 ns | 322,840 ns | Nacre beats default-growth Vec, but loses to preallocated Vec |
| 1M | 6,843,200 ns | 5,000,500 ns | 2,740,900 ns | Vec wins clearly at scale |
At 100K, the medians are 436,050 ns for Nacre, 505,050 ns for default-growth Vec, and 322,840 ns for preallocated Vec. Note: elements are 12 bytes with the SlimElementHeader (4-byte header + 8-byte u64), compared to 48 bytes under the previous ElementHeader. Both Nacre and Vec benefit from smaller elements; the absolute times dropped roughly 2-3x across all configurations.
Push is not a structural Nacre win. The only measured Nacre advantage in this benchmark is a brief crossover against default-growth Vec at 100K, which disappears when Vec is given the preallocated capacity that many production code paths already know. The correct interpretation is that Nacre avoids reallocation cliffs, not that it outperforms contiguous arrays on append throughput in general.
7.3 Random Access
| Size | Nacre (cache) | Nacre (no-cache) | Vec | cache/Vec | no-cache/Vec |
|---|---|---|---|---|---|
| 1K | 2.14 ns | 2.72 ns | 0.76 ns | 2.8x | 3.6x |
| 10K | 3.52 ns | 3.82 ns | 0.73 ns | 4.8x | 5.2x |
| 100K | 6.73 ns | 5.96 ns | 0.75 ns | 9.0x | 7.9x |
For pure random access, cache-off is the correct configuration; the cache adds overhead without enough hits to repay it. With unrelated work removed from the timed loop, the benchmark isolates lookup cost directly. On that measurement, Vec still dominates random access.
7.4 Insert at Middle
| Size | Nacre | Vec | Result |
|---|---|---|---|
| 1K | 786 ns | 733 ns | Near parity |
| 10K | 4,604 ns | 3,573 ns | Vec 1.3x faster |
| 100K | 19,723 ns | 72,047 ns | Nacre 3.7x faster |
Insert remains a mutation win at scale. Nacre's local shift cost is bounded by segment capacity, while Vec shifts the entire suffix. The measured implementation updates cumulative suffix metadata incrementally after insertion. The improvement factor has decreased from 8.3x (with 48-byte elements under the old header) to 3.7x because smaller elements (12 bytes) also make Vec shifts cheaper. The structural advantage is unchanged; the magnitude is element-size-sensitive.
7.5 Split at Midpoint
| Size | Nacre (fracture) | Vec (split_off) | Result |
|---|---|---|---|
| 100K | 12,105 ns | 90,944 ns | Nacre 7.5x faster |
Split remains the strongest measured structural win, but only against a same-semantics baseline. The comparison here uses Vec::split_off, which is the appropriate contiguous-array comparison. At a fracture plane, Nacre does not copy element payloads, and the implementation carries only the moved right-half cumulative metadata instead of rebuilding both halves from scratch. The factor has decreased from 23.6x to 7.5x because smaller elements (12 bytes vs 48 bytes) also make Vec::split_off cheaper; the memcpy cost that Nacre avoids is proportional to element size.
7.6 Iteration
| Size | Nacre vs Vec | Analysis |
|---|---|---|
| 1K | 804 ns vs 103 ns | Vec 7.8x faster |
| 10K | 10,671 ns vs 1,295 ns | Vec 8.2x faster |
| 100K | 99,054 ns vs 15,684 ns | Vec 6.3x faster |
Iteration still strongly favors Vec. With smaller elements, Vec iteration benefits more from cache-line density (more elements per cache line), widening the gap from the previous 2.9x to 6.3x at 100K. Contiguous memory and hardware prefetch dominate segment-by-segment traversal.
7.7 Workload-Specific Access
| Workload (100K) | Nacre (cache on) | Nacre (no-cache) | Vec | Interpretation |
|---|---|---|---|---|
| Sequential scan | 130,960 ns | 537,320 ns | 25,890 ns | Cache helps materially, but Nacre is still 5.1x slower than Vec |
| Zipfian | 69,851 ns | 58,735 ns | 3,527 ns | Both modes remain far slower than Vec; cache is less harmful than in the pure-random case but still not enough to close the gap |
| Localized | 14,273 ns | 53,876 ns | 3,724 ns | Cache helps substantially, but Nacre is still 3.8x slower than Vec |
The locality cache is workload-sensitive rather than universally beneficial. It is strongly helpful for sequential and localized access, harmful for the current pure-random benchmark, and still not enough to make Zipfian access competitive with Vec. It is best treated as a product-tuning knob rather than a claim that cache always helps.
7.8 Memory Under Mixed Workload
Selective compression remains a core design goal, but this paper does not publish a numeric mixed hot/cold memory result. The repository does not yet contain a reproducible benchmark harness or checked-in dataset supporting a mixed hot/cold memory claim. Those numbers are deferred pending a dedicated memory benchmark.
7.9 Win/Loss Summary
| Category | Winner | Magnitude | Notes |
|---|---|---|---|
| Push throughput | Mixed | No structural Nacre win | Nacre beats default-growth Vec at 100K, but preallocated Vec still wins everywhere |
| Random access | Vec | 7.9x at 100K (cache-off) | Pure random access strongly favors Vec |
| Sequential access | Vec | 5.1x at 100K (cache-on) | Cache helps, but Vec still dominates |
| Insert/remove (middle) | Nacre at scale | 3.7x at 100K | Segment-bounded local shift beats suffix shift; factor is element-size-sensitive |
| Split at fracture | Nacre | 7.5x at 100K | Measured against Vec::split_off; factor is element-size-sensitive |
| Iteration | Vec | 6.3x at 100K | Contiguous memory wins; gap widened with smaller elements |
| Scan headers | Mixed | Near parity at 100K | Scans 4-byte SlimElementHeader per element |
| Scan segments | Nacre | 173 ns at 100K | Summary-level filtering is the structural scan win |
| Memory (mixed workload) | Pending | Not published | Reproducible harness still needed |
8. Discussion
8.1 When to Use Nacre Array
The Nacre Array is designed for workloads with three characteristics:
- Append-heavy with occasional mid-sequence mutation. Event logs, time-series buffers, and message queues where most writes are appends, but occasional ordered insertions occur.
- Hot/cold access patterns. Datasets where recent data is accessed frequently and older data is rarely touched. The thermal FSM compresses cold regions automatically.
- Structural splitting requirements. Systems that need to partition sequences at runtime: log segment management, event store sharding, partition rebalancing.
Representative workload classes include segmented logs, append-heavy event stores, and hot/cold time-series buffers. Kafka partition logs, Arrow-style batch pipelines, and event-sourced databases illustrate related pressures, but this paper does not claim Nacre is a drop-in replacement for those systems. The justified conclusion is narrower: these domains contain sequence paths where fracture-plane splitting, segment summaries, and selective compression are worth evaluating.
8.2 When NOT to Use Nacre Array
Small collections (< 1000 elements). The per-segment fixed overhead (summary, thermal state, capacity, base tick, cumulative index entry) is unchanged by the slim header. At 500 elements with s=512, there is still only 1 segment, and segmentation provides no structural benefit. The slim header reduces the per-element tax but does not change the segment-level crossover point. Use Vec.
Random-access-dominated workloads. If the primary operation is get(random_index) with no insert, split, or compression needs, the 7.9x overhead at 100K (cache off) is pure cost. Use Vec.
Small element types. The SlimElementHeader adds 4 bytes per element. For a u64 payload (8 bytes), the per-element overhead is 50%; for a u32 payload (4 bytes), it is 100%. These ratios are reduced from the previous 40-byte header (where a u64 incurred 5:1 overhead), but a 100% overhead on 4-byte scalars is still substantial. Evaluate whether the structural benefits (segmentation, compression, splitting) justify the cost for your element size.
Tight iteration loops. If the dominant operation is full sequential iteration (summing, mapping, reducing), Vec is about 6.3x faster at 100K. With smaller elements, more fit per cache line, widening Vec's prefetcher advantage.
Memory-constrained with uniform access. If all data is accessed uniformly (no hot/cold differentiation), the segment metadata, cumulative index, and cache state add memory overhead with no compression payoff. The per-segment overhead is approximately 50-80 bytes per segment plus the cumulative index vector.
8.3 Tradeoff Analysis
The motivating tradeoff is O(log S) random access versus segment-bounded mutation and element-copy-free splitting. In the current implementation, that ideal is only partially realized. Insert and remove now maintain cumulative suffix metadata incrementally, and split carries only the moved right-half cumulative metadata instead of rebuilding both halves from scratch. The asymptotic lookup gap remains structural; split is improved but still not the idealized O(1) fracture-plane operation.
The per-element header overhead is 4 bytes (SlimElementHeader), reduced from 40 bytes. Vec elements are pure data; Nacre Array elements carry a slim header. The per-element tax is now small relative to most payloads, but the per-segment fixed costs (summary, thermal state, capacity, base tick, cumulative index entry) are unchanged. For workloads that do not exercise segmentation, compression, or splitting, this fixed cost is pure waste.
Memory amplification at small scale is reduced but not eliminated. A Nacre Array with 10 u64 elements has 1 segment with roughly 80 bytes of segment metadata plus 40 bytes of element headers plus 80 bytes of data, versus Vec<u64> at approximately 104 bytes total (24 overhead + 80 data). The per-element crossover point has improved, but the segment-level crossover point depends on having enough segments to justify the structure. With the default capacity of 4096, arrays under roughly 4K elements fit in a single segment and gain only the compression benefit, not the mutation or splitting benefits.
8.4 Configuration Surface
The Nacre Array exposes five configuration parameters:
| Parameter | Default | Purpose |
|---|---|---|
segment_capacity | 4096 | Maximum elements per segment |
hot_to_warm | 10 ticks | Cooldown before Hot to Warm transition |
warm_to_cold | 20 ticks | Cooldown before Warm to Cold transition |
cold_to_compressed | 5 ticks | Cooldown before Cold to Compressed |
use_segment_cache | true | Enable/disable segment locality cache |
Three of these are cooldown timers that control the thermal FSM's sensitivity. Shorter timers mean more aggressive compression (save memory, risk decompression on access). Longer timers mean more conservative compression (use more memory, avoid decompression latency). The defaults are tuned for event log workloads where cold data is genuinely cold.
The segment_capacity parameter controls the fundamental tradeoff between random access speed and local mutation cost. Larger segments mean fewer segments (faster O(log S) lookup) but more work inside a segment when inserts or removes land there. Smaller segments mean more segments (slower lookup) but smaller local shifts. The current tuned default of 4096 favors the current measured workload mix, which cares more about lookup, scan, and split behavior than about tiny per-segment local shifts.
8.5 Performance Profiles
| Profile | Current Measured Envelope | Operations |
|---|---|---|
| Point lookup | 2.9-11.0 ns | get, set, get_mut |
| Structural mutation | 22-306 us at 100K | insert, remove, split_at, scan_headers |
| Segment summary scanning | 6-241 ns at 100K | scan_segments, segment-count predicates |
| Maintenance / compression | Not benchmarked here | tick, compression scheduling, decompression |
The important observation is not that every operation is faster than Vec, but that different operations scale with different internal units. Lookup scales with log(S), segment summary scans scale with S, and structural mutation cost is dominated by per-segment work plus the current cumulative metadata maintenance.
8.6 WASM Integration
The Nacre Array is available through WASM with a TypeScript wrapper in @mutuus/core:
const arr = await new NacreArray({ segmentCapacity: 4096 }).init();
arr.push(42.0);
arr.get(0);
const report = arr.tick();
const right = arr.splitAt(arr.nearestFracture(500));
WASM type erasure restricts elements to f64 values. Complex types require serialization. The core operations (get, set, push, insert, remove, split) are direct WASM calls with no serialization overhead. Maintenance operations (tick(), compressionStats()) return serialized results.
9. Conclusion
Segmented storage enables structural operations that contiguous dynamic arrays do not provide directly. Mid-array insert is 3.7x faster at 100K elements because the shift is bounded by segment capacity rather than array length. Split at fracture planes is 7.5x faster at 100K because no element payloads are copied. Segment-summary scans run in hundreds of nanoseconds because they operate on summaries rather than payloads. These improvement factors are element-size-sensitive: larger elements amplify the advantage because Vec's copy cost grows linearly with element size while Nacre's structural operations avoid per-element copying.
The costs are real and documented. Pure random access is 7.9x slower than Vec at 100K elements with cache disabled. Iteration is about 6.3x slower (smaller elements increase Vec's cache-line density advantage). Push only beats default-growth Vec at one tested size and loses to preallocated Vec throughout. These are the current costs of segmented storage, and they mean Nacre is not a general-purpose Vec replacement.
The Nacre Array studies a specific design point within this tradeoff space: segments with per-element headers, per-segment summaries, and a thermal state machine. In this representation, hot segments remain mutable, cold segments compress to save memory, and fracture planes permit structural splits without payload copying. The nacre analog is structural, not decorative: layered tablets with crack-deflecting mortar achieve 3,000x the fracture resistance of the raw mineral through the same principle that makes segment-boundary splits cheap.
For sequence workloads such as event logs, stream-processing buffers, and time-series partitions, the current results support a specific conclusion: segmented storage can exchange some contiguous-read performance for materially better split and mid-sequence mutation behavior, while also enabling summary-level scans and selective compression within one array-like interface. That is enough to justify Nacre as a specialized segmented-sequence primitive for workloads where those structural operations are the bottleneck.
Future Work
LateralLine adaptation. Named after the lateral line organ in fish. A planned snap-on that senses access patterns across segments and proactively decompresses segments predicted to be accessed next, reducing decompression latency for workloads with predictable sequential access.
Enhancer adaptation. A planned snap-on that dynamically adjusts segment capacity for frequently-accessed segments, allowing hot segments to grow beyond the default capacity to reduce segment boundary overhead.
Adaptive segment sizing. Current segment capacity is fixed at construction. Future work could allow segment capacity to vary based on data characteristics: larger segments for uniform regions, smaller segments for regions with frequent mutation.
SIMD-assisted binary search. The cumulative index binary search could use SIMD 4-wide comparison over the prefix sum array, reducing the constant factor of O(log S) resolution.
References
Boehm, H.-J., Atkinson, R., & Plass, M. (1995). Ropes: an alternative to strings. Software: Practice and Experience, 25(12), 1315-1330.
O'Neil, P., Cheng, E., Gawlick, D., & O'Neil, E. (1996). The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4), 351-385.
Jacobson, G. (1989). Space-efficient static trees and graphs. Proceedings of the 30th Annual Symposium on Foundations of Computer Science, 549-554.
Abadi, D. J., Madden, S. R., & Ferreira, M. C. (2006). Integrating compression and execution in column-oriented database systems. Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, 671-682.
Chambi, S., Lemire, D., Kaser, O., & Godin, R. (2016). Better bitmap performance with Roaring bitmaps. Software: Practice and Experience, 46(5), 709-719.
Wolff, J. (1892). Das Gesetz der Transformation der Knochen. Berlin: Hirschwald.
Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: a distributed messaging system for log processing. Proceedings of the NetDB Workshop, 1-7.
Abadi, D. J., Madden, S. R., & Hachem, N. (2008). Column-stores vs. row-stores: How different are they really? Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 967-980.
Jackson, A. P., Vincent, J. F. V., & Turner, R. M. (1988). The mechanical design of nacre. Proceedings of the Royal Society of London. Series B, Biological Sciences, 234(1277), 415-440.
Barthelat, F., Tang, H., Zavattieri, P. D., Li, C.-M., & Espinosa, H. D. (2007). On the mechanics of mother-of-pearl: A key feature in the material hierarchical structure. Journal of the Mechanics and Physics of Solids, 55(2), 306-337.
Shao, Z., Reppy, J. H., & Appel, A. W. (1994). Unrolling lists. In Proceedings of the 1994 ACM Conference on LISP and Functional Programming (pp. 185-195). ACM.
Bayer, R., & McCreight, E. (1972). Organization and maintenance of large ordered indexes. Acta Informatica, 1(3), 173-189.
Citation
BibTeX
@article{phillips2026nacre,
title={Nacre Array: A Segmented Dynamic Array with Thermal State Management and Structural Fracture Planes},
author={B. D. Phillips},
year={2026},
journal={Mutuus Research},
doi={10.5281/zenodo.19071709},
url={https://mutuus.bytequilt.com/research/nacre-array-formal-spec}
}APA
B. D. Phillips (2026). Nacre Array: A Segmented Dynamic Array with Thermal State Management and Structural Fracture Planes. Mutuus Research. https://doi.org/10.5281/zenodo.19071709