Apoth3osis
<_HOME/ROSETTA/>COLLABORATIVE_PROVING

Human / AI Collaborative Theorem Proving

Context engineering for tool-mediated LLM assistance in formal mathematics. Transform probabilistic AI exploration into verified, auditable proof artifacts through structured interaction with Lean 4.

CORE PRINCIPLE

The LLM becomes a guided search process; Lean becomes the acceptance boundary; and audits turn correctness into a reviewable ledger.

>LEAN_AI_DOJO

Extending LeanDojo for Lean 4

Building on LeanDojo (Yang et al., NeurIPS 2023), our lean_ai_dojo toolkit provides LLMs with programmatic access to Lean 4 through a unified API combining automated tactic execution, retrieval-augmented premise selection, and proof state management.

LLM AGENT
Claude, GPT, etc.
Tool Calls
UNIFIED DOJO API
ProofSession
Management
BudgetMgmt
time/calls
AuditTrail
verification logs
LeanAdapter
run_tactic
build_proof
check_axioms
SolverCascade
Tiered
Parallel
Early exit
RetrievalPipeline
FAISS (dense)
BM25 (sparse)
RRF fusion
LEAN 4 + LAKE ENVIRONMENT
Mathlib 4
~120k thms
HeytingLean
custom
Project libs
user code
LEAN 4 NATIVE
Targets Lean 4 with Lake integration
SYNCHRONOUS API
Tool use for real-time workflows
RETRIEVAL-AUGMENTED
FAISS + BM25 + RRF fusion
GUARDRAILS
Strict verification boundaries
>UNIFIED_DOJO_API

Tool-Mediated Proof Development

The model does not "write a proof" as free-form text. It proposes tactic actions through structured tools, and every action is executed inside a verified Lean session.

PROOF CONSTRUCTION
begin_proof()ProofSession
apply_tactic()TacticResult
undo_tactic()bool
try_auto_solve()AutoSolveResult
finalize_proof()FinalProof
STATE INSPECTION
get_current_goal()str
get_all_goals()List[str]
get_hypotheses()List[Hyp]
is_proof_complete()bool
get_proof_script()str
RETRIEVAL
get_relevant_premises()List[Premise]
get_suggested_tactics()List[str]
get_similar_proofs()List[Proof]
~120k mathlib + ~500 HeytingLean theorems indexed
VERIFICATION
build_proof()BuildResult
get_axioms()List[str]
Kernel verification + axiom audit
>SOLVER_CASCADE

Tiered Parallel Tactic Execution

Automated discharge of routine goals through tiered solver cascades with early termination. Within each tier, tactics run in parallel; the first success terminates remaining attempts.

1
TRIVIAL
rfl, trivial, decide
2
AUTOMATION
simp, ring, omega, linarith, norm_num
3
LIBRARY
exact <premise>, apply <premise>
4
HEAVY (guarded)
aesop, decide, native_decide
>SEMANTIC_EMBEDDINGS

Beyond Lexical Similarity

The core challenge in proof retrieval: embeddings must capture semantic structure, not just textual similarity. A proof about commutativity should be close to related algebraic lemmas, even with different variable names and surface syntax. We evaluate embeddings by their ability to retrieve actual dependencies—not just visually similar proofs.

MULTI-STAGE EXTRACTION PIPELINE
lean

Lean Expr-derived AST contexts. Full type information preserved. Path contexts from value tree with type fallback.

lambda_view

Canonicalized operator-level view. Variables normalized to VAR, module prefixes stripped. Captures computational "shape" invariant to naming.

c

C function AST from Lean-emitted code. Validates that semantic structure survives compilation to executable form.

EVALUATION PRINCIPLE

Embeddings are evaluated by dependency retrieval: can the embedding of theorem T retrieve the lemmas that T actually uses in its proof? This tests genuine semantic understanding, not surface-level pattern matching. We measure MRR (Mean Reciprocal Rank) and hit@k across the full HeytingLean corpus.

NUCLEUS QUOTIENT (R-SIGNATURE)

Two declarations are R-equivalent if they have the same lambda_view canonical form. This implements a computable quotient: orbit members share representations in the "R-quotiented" embedding space.

R_signature(decl) = hash(lambda_view(decl))
HYBRID RETRIEVAL

Vector similarity alone misses exact matches. We fuse dense (FAISS) with sparse (BM25) retrieval using Reciprocal Rank Fusion, with OOD warnings when lexical coverage is low.

score = RRF(dense_rank, sparse_rank)
>GUARDRAILS

Verification Boundaries

Because model output is probabilistic, guardrails are non-negotiable. If something is proven through this system, it is proven in a way that can be audited, replayed, and independently verified.

BANNED PATTERNS
sorry
admit
native_decide (unguarded)
unsafe
sorryAx, lcProof, lcUnreachable
ACCEPTABLE AXIOMS
propext (propositional ext)
Quot.sound (quotient soundness)
Classical.choice (classical logic)
Nonstandard axioms trigger warnings
BUDGET CONTROLS
Total time: 300s per proof
LLM calls: 20 max per session
Per-tactic timeout: 30-120s
>PROOF2VEC

Proof Extraction for ML

Extract proof structures into machine-readable formats for training, similarity search, and visualization. The proof2vec pipeline supports AST, telescope, and Lambda Core representations.

// proof2vec_extract
{
  "name": "Reentry.idempotent",
  "module": "HeytingLean.LoF.Nucleus",
  "type": { "node": "forallE", ... },
  "value": { "node": "lam", ... },
  "lambda_core_type_tree": { ... },
  "lambda_core_value_tree": { ... },
  "consts_type": ["Reentry", "PrimaryAlgebra"],
  "consts_value": ["rfl", "Eq.mpr"],
  "type_pretty": "... (R : Reentry α) ...",
  "node_count_type": 42,
  "node_count_value": 156
}
// proof2vec_telescope
{
  "decl": "Reentry.idempotent",
  "binders": [
    { "binder_info": "implicit",
      "name": "α",
      "type_pretty": "Type u" },
    { "binder_info": "instImplicit",
      "name": "inst",
      "type_pretty": "PrimaryAlgebra α" },
    ...
  ],
  "target_pretty": "R (R a) = R a",
  "is_prop": true
}
>VISUALIZATION

Proof Latent Space

Diagrams as cognitive leverage. The Heyting Viz dashboard displays Lean declarations embedded in 2D/3D latent space via UMAP, enabling humans to identify structural patterns—trust bottlenecks, reuse opportunities, isolated clusters—invisible in raw source code.