<_RESEARCH/MOLECULAR_COMPLEXITY

Assembly Theory Program

VERIFIED3 PHASES10 MODULES0 SORRY4 MENTAT CERTSLean 4 + C + Rust

❓

The Central Question

Can molecular complexity be rigorously measured by the minimum number of construction steps required to build a molecule from basic building blocks? Assembly Theory proposes exactly this: the assembly index — the shortest path through an assembly space — as a universal complexity measure. Objects with high assembly index are overwhelmingly unlikely to arise without selection, making this a candidate biosignature detectable by mass spectrometry alone.

Original Researchers

Stuart Walker

Researcher

Assembly Theory mathematical foundations. Syntax tree models, DAG reuse semantics, quotient monotonicity.

Lee Cronin

Regius Professor of Chemistry

University of Glasgow. Pioneer of Assembly Theory as a framework for detecting selection and evolution in molecular systems via mass spectrometry.

School of Chemistry, University of Glasgow. Nature 2023, 2024.

Scientific Context

Assembly Theory addresses a fundamental question in astrobiology and origin-of-life research: how can we distinguish molecules produced by living systems from those arising through random chemistry? The key insight is that complex molecules require many construction steps, and the probability of assembling them decreases exponentially with complexity — unless a selection process is at work.

What formalization reveals: The assembly index equals the DAG join count — the number of distinct intermediate products in the most efficient construction. This is proved as assemblyIndex_eq_dagJoinCount in Lean 4. The tight bounds log₂(size) ≤ AI ≤ size - 1 are proved as dagJoinCount_bounds. Quotient monotonicity ensures the measure is well-behaved under symmetry.

What we built: A complete 10-module Lean 4 formalization plus verified C and safe Rust transpilations. The formalization extends the core theory to hypergraphs, molecular bond graphs, copy number selection, and string permutation spaces — covering the full breadth of the Assembly Theory program.

Layer 1: The Mathematical Ground Truth

An assembly space is a triple (Ω, U, J) where Ω is a set of objects, U ⊆ Ω is the set of primitives (basic building blocks), and J: Ω × Ω → Ω is a binary join operation. Every object is either a primitive or constructed by joining two sub-objects:

Obj.Base(a): a primitive atom from the universe U
Obj.Join(l, r): the join of two sub-objects
size(o): number of primitive leaves in the syntax tree
joinCount(o): number of internal join nodes (without reuse)

The assembly index is defined as the minimum number of distinct join operations needed to construct the object — tracking which intermediates have already been built:

assemblyIndex(o) = dagJoinCount(o) — count distinct joins in the DAG
Lower bound: log₂(size(o)) ≤ assemblyIndex(o)
Upper bound: assemblyIndex(o) ≤ size(o) - 1
Primitive iff zero: assemblyIndex(o) = 0 ↔ o ∈ U

AssemblySpace.lean — Obj inductive type, size, joinCount, structural lemmas

AssemblyIndex.lean — dagJoinCount via Finset tracking, assemblyIndex_eq_dagJoinCount

AssemblyBounds.lean — dagJoinCount_bounds: log₂ lower bound, size-1 upper bound

AssemblyQuotient.lean — quotient_assemblyIndex_le: monotonicity under equivalence

Layer 2: Extensions & Applications

The formalization extends Assembly Theory to several applied domains:

MOLECULAR SPACE

Molecules as bond graphs with atom types and bond multiplicities. The assembly index of a molecule measures the minimum number of distinct bond-forming operations needed to synthesize it — directly measurable by tandem mass spectrometry.

HYPERGRAPH SPACE

Extension from binary join to k-ary hyperedge join. Models chemical reactions with multiple reactants. Preserves all core bounds and monotonicity results.

COPY NUMBER SELECTION

Copy number constraints model biological replication: how many copies of each intermediate are available at each step. Introduces resource-bounded assembly complexity.

STRING PERMUTATION SPACE

Assembly over strings (polymers, DNA sequences). Permutation equivalence classes capture rotational/reflective symmetry. Assembly index on strings models sequence complexity.

Project Phases

PHASE 1VERIFIEDLean 4 Formal Proofs

Complete formalization of Assembly Theory: 10 modules covering assembly spaces, paths, index computation, tight bounds, quotient monotonicity, molecular semantics, hypergraph extension, copy number selection, and string permutation spaces. All theorems proved without sorry/admit.

▸assemblyIndex_eq_dagJoinCount: AI = DAG join count
▸dagJoinCount_bounds: log₂(size) ≤ AI ≤ size - 1
▸quotient_assemblyIndex_le: monotonicity under equivalence
▸size_eq_joinCount_add_one: structural identity

PHASE 2VERIFIEDC & Rust Transpilation

Faithful translation to verified C (gcc -Wall -Wextra -Werror: 0 warnings per unit) and safe Rust (cargo test: 16/16 pass). Syntax tree model, DAG join count with hash-table reuse tracking, and bounds verification functions.

▸gcc -std=c11 -Wall -Wextra -Werror: PASS (per unit)
▸cargo build: 0 warnings
▸cargo test: 16/16 pass
▸Hash-based reuse tracking matches Lean HashSet semantics

PHASE 3PENDINGIPFS Permanent Storage

All artifacts content-addressed and pinned to IPFS. Lean proofs, verified C, safe Rust archives with CIDv1 identifiers. Immutable, globally retrievable via any IPFS gateway.

▸Lean archive: 10 source files (23KB)
▸C archive: 3 translation units (2.6KB)
▸Rust archive: 5 source files + Cargo.toml (2.6KB)
▸SHA-256 content hashes for all archives

Resources

PPCPaper → Proof → Code Page GHGitHub Repository DLLean Proofs Archive (23KB)DLVerified C Archive (2.6KB)DLSafe Rust Archive (2.6KB)

MENTAT Contribution Certificates

← All Research Projects Paper → Proof → Code →