Human / AI Collaborative Theorem Proving
Context engineering for tool-mediated LLM assistance in formal mathematics. Transform probabilistic AI exploration into verified, auditable proof artifacts through structured interaction with Lean 4.
The LLM becomes a guided search process; Lean becomes the acceptance boundary; and audits turn correctness into a reviewable ledger.
Extending LeanDojo for Lean 4
Building on LeanDojo (Yang et al., NeurIPS 2023), our lean_ai_dojo toolkit provides LLMs with programmatic access to Lean 4 through a unified API combining automated tactic execution, retrieval-augmented premise selection, and proof state management.
Tool-Mediated Proof Development
The model does not "write a proof" as free-form text. It proposes tactic actions through structured tools, and every action is executed inside a verified Lean session.
Tiered Parallel Tactic Execution
Automated discharge of routine goals through tiered solver cascades with early termination. Within each tier, tactics run in parallel; the first success terminates remaining attempts.
Beyond Lexical Similarity
The core challenge in proof retrieval: embeddings must capture semantic structure, not just textual similarity. A proof about commutativity should be close to related algebraic lemmas, even with different variable names and surface syntax. We evaluate embeddings by their ability to retrieve actual dependencies—not just visually similar proofs.
Lean Expr-derived AST contexts. Full type information preserved. Path contexts from value tree with type fallback.
Canonicalized operator-level view. Variables normalized to VAR, module prefixes stripped. Captures computational "shape" invariant to naming.
C function AST from Lean-emitted code. Validates that semantic structure survives compilation to executable form.
Embeddings are evaluated by dependency retrieval: can the embedding of theorem T retrieve the lemmas that T actually uses in its proof? This tests genuine semantic understanding, not surface-level pattern matching. We measure MRR (Mean Reciprocal Rank) and hit@k across the full HeytingLean corpus.
Two declarations are R-equivalent if they have the same lambda_view canonical form. This implements a computable quotient: orbit members share representations in the "R-quotiented" embedding space.
Vector similarity alone misses exact matches. We fuse dense (FAISS) with sparse (BM25) retrieval using Reciprocal Rank Fusion, with OOD warnings when lexical coverage is low.
Verification Boundaries
Because model output is probabilistic, guardrails are non-negotiable. If something is proven through this system, it is proven in a way that can be audited, replayed, and independently verified.
Proof Extraction for ML
Extract proof structures into machine-readable formats for training, similarity search, and visualization. The proof2vec pipeline supports AST, telescope, and Lambda Core representations.
{
"name": "Reentry.idempotent",
"module": "HeytingLean.LoF.Nucleus",
"type": { "node": "forallE", ... },
"value": { "node": "lam", ... },
"lambda_core_type_tree": { ... },
"lambda_core_value_tree": { ... },
"consts_type": ["Reentry", "PrimaryAlgebra"],
"consts_value": ["rfl", "Eq.mpr"],
"type_pretty": "... (R : Reentry α) ...",
"node_count_type": 42,
"node_count_value": 156
}{
"decl": "Reentry.idempotent",
"binders": [
{ "binder_info": "implicit",
"name": "α",
"type_pretty": "Type u" },
{ "binder_info": "instImplicit",
"name": "inst",
"type_pretty": "PrimaryAlgebra α" },
...
],
"target_pretty": "R (R a) = R a",
"is_prop": true
}Proof Latent Space
Diagrams as cognitive leverage. The Heyting Viz dashboard displays Lean declarations embedded in 2D/3D latent space via UMAP, enabling humans to identify structural patterns—trust bottlenecks, reuse opportunities, isolated clusters—invisible in raw source code.
