Assembly Theory Program
The Central Question
Can molecular complexity be rigorously measured by the minimum number of construction steps required to build a molecule from basic building blocks? Assembly Theory proposes exactly this: the assembly index — the shortest path through an assembly space — as a universal complexity measure. Objects with high assembly index are overwhelmingly unlikely to arise without selection, making this a candidate biosignature detectable by mass spectrometry alone.
Original Researchers
School of Chemistry, University of Glasgow. Nature 2023, 2024.
Scientific Context
Assembly Theory addresses a fundamental question in astrobiology and origin-of-life research: how can we distinguish molecules produced by living systems from those arising through random chemistry? The key insight is that complex molecules require many construction steps, and the probability of assembling them decreases exponentially with complexity — unless a selection process is at work.
What formalization reveals: The assembly index equals the DAG join count — the number of distinct intermediate products in the most efficient construction. This is proved as assemblyIndex_eq_dagJoinCount in Lean 4. The tight bounds log₂(size) ≤ AI ≤ size - 1 are proved as dagJoinCount_bounds. Quotient monotonicity ensures the measure is well-behaved under symmetry.
What we built: A complete 10-module Lean 4 formalization plus verified C and safe Rust transpilations. The formalization extends the core theory to hypergraphs, molecular bond graphs, copy number selection, and string permutation spaces — covering the full breadth of the Assembly Theory program.
Layer 1: The Mathematical Ground Truth
An assembly space is a triple (Ω, U, J) where Ω is a set of objects, U ⊆ Ω is the set of primitives (basic building blocks), and J: Ω × Ω → Ω is a binary join operation. Every object is either a primitive or constructed by joining two sub-objects:
- Obj.Base(a): a primitive atom from the universe U
- Obj.Join(l, r): the join of two sub-objects
- size(o): number of primitive leaves in the syntax tree
- joinCount(o): number of internal join nodes (without reuse)
The assembly index is defined as the minimum number of distinct join operations needed to construct the object — tracking which intermediates have already been built:
- assemblyIndex(o) = dagJoinCount(o) — count distinct joins in the DAG
- Lower bound: log₂(size(o)) ≤ assemblyIndex(o)
- Upper bound: assemblyIndex(o) ≤ size(o) - 1
- Primitive iff zero: assemblyIndex(o) = 0 ↔ o ∈ U
AssemblySpace.lean — Obj inductive type, size, joinCount, structural lemmas
AssemblyIndex.lean — dagJoinCount via Finset tracking, assemblyIndex_eq_dagJoinCount
AssemblyBounds.lean — dagJoinCount_bounds: log₂ lower bound, size-1 upper bound
AssemblyQuotient.lean — quotient_assemblyIndex_le: monotonicity under equivalence
Layer 2: Extensions & Applications
The formalization extends Assembly Theory to several applied domains:
MOLECULAR SPACE
Molecules as bond graphs with atom types and bond multiplicities. The assembly index of a molecule measures the minimum number of distinct bond-forming operations needed to synthesize it — directly measurable by tandem mass spectrometry.
HYPERGRAPH SPACE
Extension from binary join to k-ary hyperedge join. Models chemical reactions with multiple reactants. Preserves all core bounds and monotonicity results.
COPY NUMBER SELECTION
Copy number constraints model biological replication: how many copies of each intermediate are available at each step. Introduces resource-bounded assembly complexity.
STRING PERMUTATION SPACE
Assembly over strings (polymers, DNA sequences). Permutation equivalence classes capture rotational/reflective symmetry. Assembly index on strings models sequence complexity.
Project Phases
Complete formalization of Assembly Theory: 10 modules covering assembly spaces, paths, index computation, tight bounds, quotient monotonicity, molecular semantics, hypergraph extension, copy number selection, and string permutation spaces. All theorems proved without sorry/admit.
- ▸assemblyIndex_eq_dagJoinCount: AI = DAG join count
- ▸dagJoinCount_bounds: log₂(size) ≤ AI ≤ size - 1
- ▸quotient_assemblyIndex_le: monotonicity under equivalence
- ▸size_eq_joinCount_add_one: structural identity
Faithful translation to verified C (gcc -Wall -Wextra -Werror: 0 warnings per unit) and safe Rust (cargo test: 16/16 pass). Syntax tree model, DAG join count with hash-table reuse tracking, and bounds verification functions.
- ▸gcc -std=c11 -Wall -Wextra -Werror: PASS (per unit)
- ▸cargo build: 0 warnings
- ▸cargo test: 16/16 pass
- ▸Hash-based reuse tracking matches Lean HashSet semantics
All artifacts content-addressed and pinned to IPFS. Lean proofs, verified C, safe Rust archives with CIDv1 identifiers. Immutable, globally retrievable via any IPFS gateway.
- ▸Lean archive: 10 source files (23KB)
- ▸C archive: 3 translation units (2.6KB)
- ▸Rust archive: 5 source files + Cargo.toml (2.6KB)
- ▸SHA-256 content hashes for all archives
Resources
MENTAT Contribution Certificates
MENTAT Contribution Record
IDEA
Conceptual Contribution
CONTRIBUTION LEVEL: IDEA
Ontological EngineerAssembly Theory: Quantifying Molecular Complexity via Shortest Paths
Contributor
Stuart Walker, Lee Cronin
University of Glasgow
Core conceptual insight: the complexity of a molecule (or any combinatorial object) can be quantified by the minimum number of joining operations needed to construct it from a set of basic building blocks, counting reuse. This assembly index (AI) captures selection and evolution — objects with high AI are statistically unlikely without selection processes, making AI a candidate biosignature for astrobiology.
MENTAT Contribution Record
THEORY
Mathematical Foundation
CONTRIBUTION LEVEL: THEORY
Ontological EngineerAssembly Spaces as Syntax Tree Models with DAG Reuse Semantics
Contributor
Stuart Walker, Lee Cronin
University of Glasgow
Full mathematical framework: (1) Objects as binary syntax trees with primitive leaves, (2) Assembly paths as sequences of join operations, (3) DAG join count as the reuse-aware complexity measure equivalent to assembly index, (4) Quotient spaces preserving monotonicity under symmetry, (5) Extension to hypergraphs, molecular spaces with bond graphs, and string permutation spaces.
Builds Upon
MENTAT Contribution Record
PROOF
Formally Verified
CONTRIBUTION LEVEL: PROOF
Lean 4 Formalization — 10 Modules, Complete Assembly Theory
Contributor
Apoth3osis Labs
R&D Division
Machine-checked Lean 4 formalization of the full Assembly Theory framework. 10 modules covering assembly spaces, paths, index computation, tight bounds, quotient monotonicity, molecular semantics, hypergraph extension, copy number selection, and string permutation spaces. All theorems proved without sorry/admit. Key results: assemblyIndex_eq_dagJoinCount, dagJoinCount_bounds (log₂ lower, size-1 upper), quotient_assemblyIndex_le.
Builds Upon
MENTAT Contribution Record
KERNEL
Computationally Verified
CONTRIBUTION LEVEL: KERNEL
Assembly Theory Verified Kernel — C & Rust Transpilation
Contributor
Apoth3osis Labs
R&D Division
Verified C and safe Rust transpilations of the core Assembly Theory modules. C: gcc -std=c11 -Wall -Wextra -Werror, 0 warnings on each translation unit. Rust: cargo build 0 warnings, cargo test 16/16 pass. Assembly space (Obj type), assembly index (DAG join count with hash tracking), and bounds verification (both upper and lower bounds) faithfully translated.
Builds Upon
Governed by MENTAT-CA-001 v1.0 · March 2026
