Hybrid Zeckendorf Arithmetic
Can a formally verified number system beat GMP?
GMP is the gold standard for arbitrary-precision arithmetic. It is hand-optimized assembly, decades of engineering, used everywhere from cryptography to computer algebra. Beating it at anything, ever, is not supposed to happen—and certainly not with a formally verified implementation.
Vladimir Veselov's Hybrid Zeckendorf paper proposes an alternative: instead of flat bit strings, decompose integers into a doubly-exponential weight hierarchy where each level carries a small Fibonacci-encoded coefficient. A million-bit number uses only ~11 levels. For sparse numbers (few nonzero Fibonacci indices), addition is O(K) where K is the total index count—versus GMP's O(N/64) scan of every machine word.
We formalized the complete theory in Lean 4 (68 theorems, zero sorry) and built a rigorous Rust benchmark suite with anti-hack constraints. The result: 55x speedup over GMP on sparse addition at 1M bits. To our knowledge, this is the first empirically-demonstrated case where a formally-verified number representation outperforms GMP at any operation, at any scale, under fair benchmarking conditions.
Why this is Ruliology
Veselov's paper proposed the representation. Our Lean formalization proved its mathematical properties. The ruliology—the discovery that emerged through the verification process—changed how we understand the system.
Formalizing the normalization algorithm found a bug in the paper. Algorithm 3 specifies descending-level carry (reverse=True), but the carry semantics require ascending order. Our Lean proofs use independent carryAt calls with fuel; our Rust implementation uses bottom-up iteration. Both are correct despite the pseudocode error. This is exactly the kind of finding that only emerges when you force every claim through a type checker.
The closure operator result was unexpected. NucleusBridge proves that normalization is idempotent (normalizing a normal form returns itself), which sounds obvious but requires careful proof. The negative result— carryAt_not_pointwise_monotone—was a surprise: carry at one level can decrease the coefficient at another level. This subtlety is invisible in informal reasoning.
The deepest ruliology finding: formal verification and computational advantage are not in tension. The structural properties that make the Lean proofs work (evaluation preservation, canonical uniqueness, carry soundness) are the same properties that make the computation fast. The proof is the optimization.
Lean 4 Formalization (HeytingLean/Bridge/Veselov/HybridZeckendorf/)
WeightSystem.lean ───── w(k+2) = w(k+1)², closed form, strict mono
HybridNumber.lean ──── core types, evaluation maps
FibIdentities.lean ─── Fibonacci identities, Zeckendorf helpers
Normalization.lean ─── normalize_sound, normalize_canonical, carry
Addition.lean ──────── addLazy_eval, add_correct
Multiplication.lean ── multiplyBinary_correct
DensityBounds.lean ─── active_levels_bound ≤ log₁₀₀₀(n) + 2
Uniqueness.lean ────── canonical_eval_injective
NucleusBridge.lean ─── normalize_is_closure_operator (idempotent)
DragStability.lean ─── transport stability
TransportCoherence ─── round-trip coherence
│
CAB Pipeline (Lean → C → Rust)
│
Rust Benchmark Suite (bench/hybrid_zeckendorf/, 2,415 lines)
weight.rs ─────── weight hierarchy (// Mirrors: WeightSystem)
zeckendorf.rs ─── Zeckendorf encoding (// Mirrors: FibIdentities)
flat_hybrid.rs ── HybridNumber types (// Mirrors: HybridNumber)
normalization.rs normalize + carry (// Mirrors: Normalization)
arithmetic.rs ─── add, multiply (// Mirrors: Addition/Multiplication)
sparse.rs ─────── sparse number construction + E4/E5/E7
density.rs ────── density measurement (// Mirrors: DensityBounds)
│
Experiments (7 total)
E1: modular exponentiation │ E5: lazy accumulation
E2: polynomial multiplication│ E6: sparse modular exponentiation
E3: random density │ E7: crossover determination
E4: sparse addition (CORE RESULT)Sparse Addition: HZ vs GMP (E4)
100,000 bits (30 samples)
| ρ | K | HZ (ns) | GMP (ns) | Speedup |
|---|---|---|---|---|
| 10&sup5; | 1 | 48 | 784 | 15.2x |
| 10⁻&sup4; | 14 | 112 | 840 | 7.0x |
| 10⁻³ | 144 | 272 | 912 | 3.4x |
| 10⁻² | 1,440 | 688 | 920 | 1.2x |
1,000,000 bits (30 samples)
| ρ | K | HZ (ns) | GMP (ns) | Speedup |
|---|---|---|---|---|
| 10⁻&sup5; | 14 | 112 | 6,310 | 55.4x |
| 10⁻&sup4; | 144 | 272 | 6,192 | 22.8x |
| 10⁻³ | 1,440 | 784 | 6,088 | 8.2x |
| 10⁻² | 14,404 | 3,312 | 6,272 | 1.9x |
SCALING
Advantage grows with input size: 2.4–3.6x per 10x increase in N. GMP is flat at ~6,200 ns (scans all words). HZ scales only with K.
CROSSOVER (E7)
ρ* ≈ 5.6×10⁻³ at 100k bits. ρ* ≈ 3.2×10⁻² at 1M bits. Advantage regime expands 6x with input size.
LAZY ACCUM (E5)
1000 sparse additions: 50–593x faster than eager. R5–R8 normalizer campaign resolved: split-update kernel passes S9 scaling gate at ρ=10⁻&sup5; (32x→20x speedup, ratio 0.61). Normalization cost reduced 64x per pass vs original Vec kernel.
Normalization Campaign: R5–R8
Lazy addition is O(K), but the deferred normalization cost matters for materialized arithmetic (add + normalize). The R5–R8 campaign diagnosed and fixed the sparse normalizer’s scaling bottleneck through four rounds of hostile-audited implementation and measurement.
ρ = 10⁻&sup5; (very sparse, 1M bits)
| Count | Speedup | norm (ns) | S9 |
|---|---|---|---|
| 10 | 32.2x | 1,232 | — |
| 100 | 29.3x | 13,488 | — |
| 500 | 23.9x | 92,272 | — |
| 1000 | 19.6x | 244,833 | PASS (0.61) |
Normalizer Kernel Evolution
| Kernel | norm (ns) | fuel | ns/pass |
|---|---|---|---|
| R5 Vec::insert | 244,762k | 5 | 48,952k |
| R7 BTreeMap | 45,847k | 9 | 5,094k |
| R8 split-update | 6,831k | 9 | 759k |
At ρ=10⁻&sup4;, count=1000. R8 is 64x cheaper per pass than Vec, 6.7x cheaper than BTreeMap.
RESOLVED
R8 split-update sorted Vec: binary-search existing entries for O(1) update, buffer new entries for single O(K+D) merge after each pass. Eliminates Vec::insert's O(K) per-element shifting while preserving cache-friendly sorted storage. S9 scaling gate passes at ρ=10⁻&sup5;.
OPEN QUESTION
Deferred merge prevents within-pass cascading, adding a 1.8x fuel tax at denser regimes (ρ=10⁻&sup4;). S9 fails there (ratio 0.24). Restoring partial cascading or a split-phase normalization algorithm could close this gap.
What is proved vs. what is measured
FORMALLY VERIFIED (Lean 4)
- • w(i+1) = 1000^(2^i) (weight closed form)
- • Normalization preserves evaluation
- • Canonical forms are unique (injective)
- • Lazy addition preserves value
- • Multiplication correct
- • Active levels ≤ log₁₀₀₀(n) + 2
- • Normalization is idempotent closure operator
- • Carry is NOT pointwise monotone (negative result)
EMPIRICALLY MEASURED (Rust bench)
- • Sparse addition 15–55x faster than GMP
- • Dense arithmetic 5–48x slower than GMP
- • Crossover density ρ* ≈ 5.6×10⁻³ – 3.2×10⁻²
- • Lazy accumulation 50–593x faster than eager
- • R5–R8 normalizer campaign: split-update kernel (64x cheaper per pass)
- • Fuel profiling: Vec vs BTreeMap vs split-update convergence dynamics
- • Materialized arithmetic: 32x at count=10, 20x at count=1000 (ρ=10⁻&sup5;)
- • End-to-end pipeline with deferred normalization
Weight System & Core Types
Super-exponential weight hierarchy w(0)=1, w(1)=1000, w(k+2)=w(k+1)². Closed-form: w(i+1) = 1000^(2^i). Strict monotonicity. Core HybridNumber and LazyHybridNumber types with evaluation maps. Fibonacci double identity and Zeckendorf encoding helpers.
Key Results
- •weight_closed (w(i+1) = 1000^(2^i))
- •weight_strict_mono (i < j ⟹ w(i) < w(j))
- •eval_fromNat (evaluation map correctness)
- •fib_double_identity, lazyEvalFib_append
Normalization & Carry
The central algorithmic content: inter-level carry that transforms any lazy representation into canonical form. normalize_sound: normalization preserves evaluation. normalize_canonical: the result is unique. carryAt_preserves_eval: each carry step preserves value. Errata: paper's Algorithm 3 specifies descending carry, but the semantics require ascending order.
Key Results
- •normalize_sound (evaluation preservation)
- •normalize_canonical (unique canonical form)
- •carryAt_preserves_eval (carry soundness)
- •interCarry_idempotent_of_canonical
- •Errata: carry must be ascending, not descending
Arithmetic Operations
Lazy addition: concatenate payload vectors at each level without normalizing. addLazy_eval proves this preserves value. add_correct proves the normalized pipeline (lazy + normalize) is correct. Binary multiplication via shift-and-add. multiplyBinary_correct proves the multiplication algorithm correct.
Key Results
- •addLazy_eval (lazy addition preserves value)
- •add_correct (normalized addition preserves value)
- •multiplyBinary_correct
- •add_comm_repr (addition commutes at repr level)
Uniqueness, Density & Nucleus
canonical_eval_injective: canonical forms are unique — equal evaluation implies identical representation. active_levels_bound: support cardinality ≤ log₁₀₀₀(n) + 2. NucleusBridge: normalization is an idempotent closure operator (sheaf-glue extension). Subtle negative result: carryAt_not_pointwise_monotone — carry is NOT monotone per-level.
Key Results
- •canonical_eval_injective (unique representation)
- •active_levels_bound (≤ log₁₀₀₀(n) + 2 levels)
- •density_upper_bound
- •normalize_is_closure_operator (idempotent)
- •carryAt_not_pointwise_monotone (negative result)
Benchmark: Dense Regime (E1–E3)
Establishing the baseline: HZ loses in the dense regime, as expected. E1 (modular exponentiation): 5–48x slower than GMP. E2 (polynomial multiplication): millions of times slower. E3 (random integer density): ρ ≈ 0.27 for random numbers — dense. The proved level-count bound holds with significant slack. These results confirm the theory: HZ advantage requires sparse inputs.
Key Results
- •E1: modular exponentiation 5–48x slower than GMP
- •E2: polynomial multiplication millions of times slower
- •E3: random density ρ ≈ 0.27 (dense regime)
- •Level-count bound holds with slack
- •Confirms: HZ advantage requires sparse inputs
Benchmark: Sparse Regime (E4–E7)
The core positive result. E4 (sparse addition): 15.2x at 100k bits/ρ=10⁻⁵, 55.4x at 1M bits/ρ=10⁻⁵. GMP timing is flat (~6,200 ns at 1M) because it scans every word; HZ timing scales only with K. Scaling factor 2.4–3.6x per 10x input size. E5 (lazy accumulation): 50–593x faster than eager for 1000 sparse additions. E7 (crossover): ρ* ≈ 5.6×10⁻³ at 100k, ρ* ≈ 3.2×10⁻² at 1M — advantage regime expands 6x with input size.
Key Results
- •E4: 15.2x speedup at 100k/ρ=10⁻⁵, 55.4x at 1M/ρ=10⁻⁵
- •GMP flat at ~6,200 ns (scans all words); HZ scales with K only
- •Scaling factor 2.4–3.6x per 10x increase in N
- •E5: lazy accumulation 50–593x faster than eager
- •E7: crossover regime expands 6x from 100k to 1M bits
