Lean

The Lean Programming Language and Theorem Prover

Richard Kelley

(Based on slides by Leonardo de Moura)

Lean

Testing vs. Formal Verification

  • Testing: "This program works on these inputs."

  • Formal verification: "This program works on all inputs."

  • Testing can only show the presence of bugs, not their absence.

  • Formal verification can give a mathematical guarantee of correctness.

  • Why have we only recently been able to do formal verification for real-world software?

    • Previously, the cost of writing and checking proofs was too high.

    • Recent advances in proof assistants and AI have made it more feasible.

Lean

Last Month, Something Unexpected Happened

A general-purpose AI, with no special training for theorem proving, converted zlib (a C compression library) to Lean, passed the test suite, and then proved that the code is correct.

Not tested. Proved. For every possible input. lean-zip

"This was not expected to be possible yet."

How?

Lean

What is Lean?

  • A programming language and a proof assistant

  • Same language for code and proofs. No translation layer.

  • Small trusted kernel. Proofs can be exported and independently checked.

  • Lean is implemented in Lean. Very extensible.

  • The math community made Lean their own: >50,000 lines of extensions.

def reverse : List α List α | [] => [] | x :: xs => reverse xs ++ [x] theorem reverse_append (xs ys : List α) : reverse (xs ++ ys) = reverse ys ++ reverse xs := by induction xs with | nil => simp [reverse] | cons x xs ih => simp [reverse, ih] #eval reverse [1, 2, 3]
[3, 2, 1]
Lean

Success Stories

Several in mathematics, but I want to focus on software verification.

Lean

Cedar (AWS)

"We've found Lean to be a great tool for verified software development." — Emina Torlak

Cedar QED comparison

Lean

SymCrypt (Microsoft)

SymCrypt verification with Aeneas

Lean

Veil: Verified Distributed Protocols

  • Built on Lean by Ilya Sergey's group.

  • Combines model checking with full formal proof.

  • Found an inconsistency in a prior formal verification of Rabia consensus that went undetected across two separate tools.

Veil code

Lean

Some Simple Lean

Lean is a full programming language, so you can write code like this:

def x : Nat := 5 def y : Nat := 4 def sum : Nat := x + y #eval sum
9
Lean

More Simple Lean

  • We can also work with Strings, Lists, and more complex data structures.

#eval String.append "it is " (if 1 > 2 then "yes" else "no")
"it is no"
#eval [1, 2, 3].map (fun x => x * x)
[1, 4, 9]
Lean

We can define functions

def factorial : Nat Nat | 0 => 1 | n + 1 => (n + 1) * factorial n #eval factorial 5
120
  • This is recursive function definition.

  • This is also an example of pattern matching: the function behaves differently on 0 and on n + 1.

  • This is equivalent to the more familiar definition using if, but often more concise and easier to read.

Lean

Function definitions can name parameters

def addThree (n : Nat) : Nat := n + 3 #eval addThree 7
10
  • (n : Nat) says that addThree takes a natural number parameter named n.

  • addThree 7 passes 7 as the value of that parameter.

  • Lean function calls use spaces, so we write addThree 7, not addThree(7).

Lean

We can also define structures for more complex data

structure Person where name : String age : Nat def richard : Person := { name := "Richard", age := 42 } #eval richard.age > 30
true
Lean

BUT we can also write proofs!

theorem addition_commutes (a b : Nat) : a + b = b + a := by induction a with | zero => simp | succ a ih => grind #check addition_commutes
addition_commutes (a b : Nat) : a + b = b + a

How does this work? Why does it look like addition_commutes is a function with a weird type?

Lean

Dependent Types

  • In most programming languages, values and types live in separate worlds.

  • In Lean, types can depend on values.

  • That means a type can say more than "this is a list"; it can say "this is a list of length n."

  • It's not obvious at first, but giving types the ability to depend on values opens the door to writing proofs directly into our code.

Lean

Propositions as Types

  • In Lean, propositions are just types.

  • A proof of a proposition is just a value of that type.

  • This is called the "propositions as types" principle, and it's a powerful way to connect programming and proving.

    • Also known as the Curry-Howard correspondence: for every proposition, there is a corresponding type, and for every proof of that proposition, there is a corresponding value of that type.

Lean

Propositions as Types

def odd (n : Nat) : Prop := k, n = 2 * k + 1
  • The type of odd 3 is Prop, a statement that can be true or false.

  • For any n, odd n is the assertion that the number n can be written as 2 * k + 1 for some k - precisely the definition of an odd number.

  • We can't evaluate it: odd returns a Prop, not a Bool.

  • But we can prove it by providing a witness.

example : odd 3 := 1, by decide
Lean

Theorem Proving Is a Game

"You have written my favorite computer game" — Kevin Buzzard

The "game board": you see goals and hypotheses, then apply "moves" (tactics).

theorem square_of_odd_is_odd : odd n odd (n * n) := by intro k₁, e₁ simp [e₁, odd] exists 2 * k₁ * k₁ + 2 * k₁ grind

Each tactic transforms the goal until nothing remains. The kernel checks the final proof term.

Lean

The Kernel Catches Mistakes

example : odd 3 := 2, by decide

You don't need to trust me, or my automation. You only need to trust the small kernel.

An incorrect proof is rejected immediately.

Lean has multiple independent kernels.

You can build your own and submit it to arena.lean-lang.org.

Lean

How do we write Lean?

  • It's a programming language, so you can use any text editor.

  • Most people use VS Code, because of an extension that gives real-time feedback.

Lean

VS Code Extension

  • Real-time feedback: errors, goals, and hints as you type.

  • Goal state panel: see what you need to prove at each tactic step.

  • Hover information: types, docstrings, source locations.

  • Go to definition. Find references. Code actions.

  • 200,000+ installs.

Lean IDE with Claude Code

The IDE is where you collaborate with AI now.

Lean

Examples

Lean

Types and Pattern Matching

inductive Tree (α : Type u) where | leaf | node (left : Tree α) (value : α) (right : Tree α) deriving Inhabited, Repr def Tree.map (f : α β) : Tree α Tree β | .leaf => .leaf | .node left value right => .node (map f left) (f value) (map f right) #eval Tree.map (2*·) (.node .leaf 3 (.node .leaf 5 .leaf))
Tree.node (Tree.leaf) 6 (Tree.node (Tree.leaf) 10 (Tree.leaf))
Lean

Theorems

Three proofs of the same theorem.

theorem Tree.map_id₁ (tree : Tree α) : tree.map id = tree := by induction tree with | leaf => rfl | node l v r ih_l ih_r => simp [map, *] theorem Tree.map_id₂ (tree : Tree α) : tree.map id = tree := by fun_induction Tree.map <;> simp [*] theorem Tree.map_id₃ (tree : Tree α) : tree.map id = tree := by try?
Lean

Theorems

Three proofs of the same theorem.

theorem Tree.map_compose₁ (tree : Tree α) (f : α β) (g : β γ) : (tree.map f).map g = tree.map (g f) := by induction tree with | leaf => rfl | node l v r ih_l ih_r => simp [map, *] theorem Tree.map_compose₂ (tree : Tree α) (f : α β) (g : β γ) : (tree.map f).map g = tree.map (g f) := by fun_induction Tree.map f tree <;> simp [map, *] theorem Tree.map_compose₃ (tree : Tree α) (f : α β) (g : β γ) : (tree.map f).map g = tree.map (g f) := by try?
Lean

Structures

structure Point (α : Type u) where x : α y : α inductive Color where | red | green | blue structure ColorPoint (α : Type u) extends Point α where c : Color def p : ColorPoint Nat := { x := 10, y := 20, c := .red } example : p.x = 10 := rfl example : p.y = 20 := rfl example : p.c = .red := rfl #eval p.x + p.y
30
Lean

Option and do Notation

def find? (p : α Bool) (xs : List α) : Option α := do for x in xs do if not (p x) then continue return x failure #eval find? (· > 5) [1, 3, 5, 7, 10]
some 7
Lean

Example: A Rate Limiter

  • Tiny example, but it has the same shape as real verified systems.

  • Imagine an API service that wants to avoid being overwhelmed by bursts of traffic.

  • The service keeps a small bucket of tokens. Each granted request spends one token.

  • Over time, the bucket refills, but it never grows past a fixed capacity.

  • If a request arrives when no token is available, the limiter rejects it.

  • State: how many tokens are available.

  • Parameters: capacity and refill rate.

  • Input: either .idle or .request.

  • Step function: refill, maybe consume one token, return the new state and whether the request was granted.

Lean

Rate Limiter State

-- State of the rate limiter structure RLState where -- How many tokens are left in the bucket. tokens : Nat deriving Repr
  • structure defines a record type.

  • An RLState value has one field: tokens.

  • Nat means a natural number, so tokens can never be negative.

  • deriving Repr asks Lean to generate a printable representation for debugging and examples.

Lean

Rate Limiter Parameters

-- Parameters structure RLParams where capacity : Nat refill : Nat deriving Repr
  • Parameters are values that configure the state machine.

  • capacity is the maximum number of tokens the limiter can hold.

  • refill is how many tokens are added at each step.

  • Both are Nat, so the model rules out negative capacities and negative refill rates.

Lean

Rate Limiter Inputs

inductive RLInput | idle | request deriving Repr
  • inductive defines a type by listing its possible cases.

  • RLInput has exactly two possible values: .idle and .request.

  • .idle means time passes and the bucket refills, but no request arrives.

  • .request means time passes, the bucket refills, and then the limiter tries to grant one request.

Lean

Rate Limiter Step Function

-- One step of the system def step (p : RLParams) (s : RLState) : RLInput RLState × Bool | .idle => let tokens₁ := min p.capacity (s.tokens + p.refill) ({ tokens := tokens₁ }, false) | .request => let tokens₁ := min p.capacity (s.tokens + p.refill) if _h : 0 < tokens₁ then ({ tokens := tokens₁ - 1 }, true) else ({ tokens := tokens₁ }, false)
  • step is a pure state transition function.

  • It takes parameters, the current state, and one input.

  • It returns the next state plus a Bool: was the request granted?

  • Both branches refill first, capped by capacity.

  • A request succeeds exactly when at least one token is available after refill.

Lean

Testing the Rate Limiter

  • Before proving a theorem, we can still write ordinary concrete tests.

  • In Lean, a test can be an example: a small proposition that Lean must check.

  • rfl proves the equality by reducing both sides to the same value.

  • If the implementation changes and this expected result is no longer true, this slide stops compiling.

-- A few concrete tests, written as examples that Lean checks. section Tests def testParams : RLParams := { capacity := 3, refill := 1 } example : step testParams { tokens := 0 } .idle = ({ tokens := 1 }, false) := by rfl end Tests
Lean

Testing Requests

example : step testParams { tokens := 0 } .request = ({ tokens := 0 }, true) := by rfl example : step testParams { tokens := 3 } .request = ({ tokens := 2 }, true) := by rfl example : step { capacity := 0, refill := 0 } { tokens := 0 } .request = ({ tokens := 0 }, false) := by rfl
  • Empty bucket, normal refill: 0 refills to 1, the request consumes it, and the result is true.

  • Full bucket: 3 + 1 is capped at capacity 3, then one token is consumed.

  • Zero capacity: refill cannot create any token, so the request is rejected with false.

Lean

Invariants

  • An invariant is a fact that should remain true as the system runs.

  • For this rate limiter: the number of tokens should never exceed capacity.

  • In Lean, invariants are written directly in code, in the same file as the implementation.

  • Prop means this is a logical proposition, not a runtime Boolean.

-- Safety invariant: tokens never exceed capacity def Invariant (p : RLParams) (s : RLState) : Prop := s.tokens p.capacity
Lean

Specifications

  • A specification says what the implementation is supposed to mean.

  • Lean lets us express the spec as executable-adjacent code, using the same data types.

  • Later, we can prove that step matches this specification for every input state.

Lean

Invariants and Specifications for the Rate Limiter

  • What facts should remain true no matter how many steps the limiter takes?

  • Can the token count ever be negative? Can it ever exceed capacity?

  • When should a .request be granted?

  • What should happen when the bucket is empty after refill?

  • Which claims are safety invariants, and which claims describe functional correctness?

Lean

canGrant

  • canGrant captures the condition under which a request should succeed.

-- Spec: when should a request be granted? def canGrant (p : RLParams) (s : RLState) : Prop := 0 < min p.capacity (s.tokens + p.refill)
Lean

Invariant Preservation

theorem step_preserves_inv
    (p : RLParams) (s : RLState) (i : RLInput) :
    Invariant p s → Invariant p (step p s i).1
  • For any parameters p, current state s, and input i:

  • If the invariant is true before one step,

  • then the invariant is true after running step.

  • (step p s i).1 means the next state, ignoring the Boolean output.

Lean

Invariant Preservation

-- Invariant is preserved by one step theorem step_preserves_inv (p : RLParams) (s : RLState) (i : RLInput) : Invariant p s Invariant p (step p s i).1 := by intro h unfold step Invariant grind
  • intro h assumes the invariant holds in the starting state.

  • unfold expands the definitions of step and Invariant.

  • grind finishes the arithmetic and case analysis automatically.

Lean

Request Correctness Statement

theorem request_granted_iff
    (p : RLParams) (s : RLState) :
    (step p s .request).2 = true ↔
      0 < min p.capacity (s.tokens + p.refill)
  • This theorem connects the implementation to the specification.

  • The left side is what the implementation returns for a request.

  • The right side is the mathematical condition for granting the request.

  • means "if and only if": both directions must be proved.

Lean

Request Correctness Proof

-- Functional correctness: implementation matches spec theorem request_granted_iff (p : RLParams) (s : RLState) : (step p s .request).2 = true 0 < min p.capacity (s.tokens + p.refill) := by unfold step grind
  • unfold step exposes the refill and request-handling logic.

  • grind splits the relevant cases and proves both directions of the .

Lean

Implementation vs. Specification

  • The implementation computes a concrete answer: (step p s .request).2.

  • The specification states the condition we mean: 0 < min p.capacity (s.tokens + p.refill).

  • To connect them, we need to rule out two kinds of mismatch.

  • If the implementation returns true, the spec condition should hold.

  • If the spec condition holds, the implementation should return true.

Lean

Soundness

  • Soundness is a one-way guarantee: if the implementation returns true, then the specification condition holds.

  • It rules out false positives: returning true when the spec says the request should not be granted.

  • It does not, by itself, say the implementation returns true every time the spec condition holds.

  • The other direction is completeness: if the spec condition holds, then the implementation returns true.

Lean

Request Soundness

-- soundness theorem request_sound (p : RLParams) (s : RLState) : (step p s .request).2 = true 0 < min p.capacity (s.tokens + p.refill) := by unfold step grind
  • If step grants the request, then a token was available after refill.

  • The theorem proves the limiter never grants a request without satisfying the spec condition.

  • This is the left-to-right direction of request_granted_iff.

Lean

Request Completeness

-- completeness theorem request_complete (p : RLParams) (s : RLState) : 0 < min p.capacity (s.tokens + p.refill) (step p s .request).2 = true := by unfold step grind
  • If the spec condition holds, then step grants the request.

  • The theorem proves the limiter does not reject a request when a token is available after refill.

  • This is the right-to-left direction of request_granted_iff.

Lean

Putting Both Directions Together

  • Soundness rules out granting a request when the spec condition is false.

  • Completeness rules out rejecting a request when the spec condition is true.

  • Together, they say the implementation's Boolean result exactly matches the specification.

  • In Lean, the combined statement is the theorem:

theorem request_granted_iff
    (p : RLParams) (s : RLState) :
    (step p s .request).2 = true ↔
      0 < min p.capacity (s.tokens + p.refill)
Lean

Proving the Combined Statement

example (p : RLParams) (s : RLState) : (step p s .request).2 = true 0 < min p.capacity (s.tokens + p.refill) := by constructor · exact request_sound p s · exact request_complete p s
  • constructor splits an goal into its two directions.

  • The first direction is exactly request_sound p s.

  • The second direction is exactly request_complete p s.

Lean

From One Step to Many Steps

  • So far, every theorem has been about one call to step.

  • That proves the behavior of a single transition: one state, one input, one output.

  • Real systems run for many transitions.

  • To model a run of the limiter, we feed it a list of inputs and thread the state through each step.

  • The result should include the final state and the sequence of grant/reject decisions.

Lean

Running the Rate Limiter

-- Run the state machine across a list of inputs. -- Returns the final state plus the list of outputs. def run (p : RLParams) : RLState List RLInput RLState × List Bool | s, [] => (s, []) | s, i :: is => let (s', out) := step p s i let (s'', outs) := run p s' is (s'', out :: outs)
  • The empty input list leaves the state unchanged and produces no outputs.

  • For i :: is, we run one step, then recursively run the remaining inputs.

  • out :: outs keeps the Boolean decisions in the same order as the inputs.

Lean

Final State

-- Sometimes we only care about the final state. def finalState (p : RLParams) (s : RLState) (inputs : List RLInput) : RLState := (run p s inputs).1
  • run returns a pair: the final state and the list of Boolean outputs.

  • (run p s inputs).1 selects the first component of that pair.

  • This helper lets later theorems talk only about the state after a whole run.

  • The output history is still available from run when we need it.

Lean

Invariant Across a Run

theorem run_preserves_inv
    (p : RLParams) (s : RLState) (inputs : List RLInput) :
    Invariant p s → Invariant p (finalState p s inputs)
  • This lifts the one-step invariant theorem to any list of inputs.

  • If the initial state satisfies the invariant,

  • then the final state after running all inputs also satisfies the invariant.

  • The proof follows the structure of the input list.

Lean

Proving the Run Invariant

-- The invariant is preserved across any run. theorem run_preserves_inv (p : RLParams) (s : RLState) (inputs : List RLInput) : Invariant p s Invariant p (finalState p s inputs) := by intro hs induction inputs generalizing s with | nil => simp [finalState, run] exact hs | cons i is ih => simp [finalState, run] apply ih exact step_preserves_inv p s i hs
  • intro hs assumes the invariant holds in the starting state.

  • The empty-list case reduces to the same state, so hs proves it.

  • The nonempty case uses step_preserves_inv for the first input, then the induction hypothesis for the remaining inputs.

Lean

Example: Liskov Substitution

  • Liskov substitution is about contracts.

  • If code expects a rectangle-like thing, it may assume width and height can be changed independently.

  • In Lean, we can write that expectation directly as a contract.

-- A rectangle-like API promises width and height can be set independently. structure RectAPI where Shape : Type width : Shape Nat height : Shape Nat resize : Shape Nat Nat Shape resize_area : s w h, width (resize s w h) * height (resize s w h) = w * h
Lean

Contract vs. Representation

  • RectAPI is the contract.

  • It says what code using a rectangle-like thing is allowed to expect.

  • There is some shape type, some way to read width and height, and some way to resize.

  • The proof field resize_area says resizing to w and h really gives area w * h.

Lean

Contract vs. Representation

  • Rect is the concrete representation.

  • It says what one particular rectangle value stores.

  • Here, a rectangle stores two independent numbers: width and height.

  • rectAPI : RectAPI connects the two:

  • it packages Rect together with operations and a proof that those operations satisfy the contract.

Lean

A Rectangle Satisfies the Contract

structure Rect where width : Nat height : Nat def rectAPI : RectAPI where Shape := Rect width r := r.width height r := r.height resize _ w h := { width := w, height := h } resize_area := by intro s w h rfl
  • After resize r w h, the area is exactly w * h.

  • That is the behavior code using RectAPI is allowed to rely on.

Lean

A Square Does Not

structure Square where side : Nat def squareResize (_ : Square) (w _h : Nat) : Square := { side := w } theorem square_not_rect : ¬ ( s w h, (squareResize s w h).side * (squareResize s w h).side = w * h) := by intro contract have bad := contract { side := 1 } 2 3 simp [squareResize] at bad
  • The square has only one side length, so it cannot independently remember w and h.

  • The theorem asks whether squareResize can satisfy the rectangle contract for every w and h.

  • Lean proves the mismatch using the concrete case w = 2, h = 3: the square area is 4, but the rectangle contract requires 6.

Lean

Liskov Takeaway

  • Mathematically, a square is a rectangle.

  • But a mutable square is not a safe substitute for a mutable rectangle.

  • Subtypes should preserve the behavior their clients are allowed to rely on.

Lean

Concluding Thoughts

Lean

The Trend: AI Writes the Formal Math

  • The workflow is moving from AI as a helper to AI as a proof-writing partner.

  • AI systems increasingly generate candidate theorems, supporting lemmas, and Lean proofs.

  • Lean gives the feedback loop: models can propose, test, repair, and retry against the kernel.

  • Humans still set the goals and judge meaning; the machine can handle more of the formalization and proof search.

Lean

Success Stories: AI

To date, every AI system that solved IMO problems with formal proofs used Lean.

  • AlphaProof (Google DeepMind) — silver medal, IMO 2024

  • Aristotle (Harmonic) — gold medal, IMO 2025

  • SEED Prover (ByteDance) — silver medal, IMO 2025

Three independent teams, all used Lean.

Lean

AI Scoreboard

  • Axiom: 12/12 on Putnam 2025.

  • DeepSeek Prover-V2: 88.9% on MiniF2F-test. Open-source, 671B parameters.

  • Harmonic: Built the Aristotle AI — gold medal, IMO 2025

None of these systems existed in early 2024.

Startups using Lean

Lean

The Kernel as AI Safety Infrastructure

"It's really important with these formal proof assistants that there are no backdoors or exploits you can use to somehow get your certified proof without actually proving it, because reinforcement learning is just so good at finding these backdoors." — Terence Tao

The kernel's soundness is not just a theoretical property. It is AI safety infrastructure.

RL agents will find every shortcut. The kernel is the one thing they can't fake.

Who Watches the Provers? | Comparator | Validating a Lean Proof

Lean

Another Example: Radix - 10 AI Agents, One Weekend

10 AI agents built a verified embedded DSL with proved optimizations in a single weekend.

  • Zero sorry — 52 theorems, all complete

  • 5 verified compiler optimizations

  • Determinism, type safety, memory safety — all proved

  • Interpreter correctness — sound and complete

  • 27 modules, ~7,400 lines of Lean

  • Full agent autonomy - no human wrote any of the code or proofs.

github.com/leodemoura/RadixExperiment

Lean

Where to Go Next

Lean

Other Resources

Lean

Questions?