Proposal: GPT for Proof Search

Name of Project: GPT for Proof Search

Proposal in one sentence: Leverage GPT to enable search using natural language within proof assistants such as ACL2.

Description of the project and what problem it is solving: Proof assistants such as Coq, ACL2s, LEAN, etc. are software similar to integrated developer environments, but useful for writing proofs rather than programs. Such software are used within the specialist formal verification community for mathematical research, compiler development, and the analysis of software. For example, my colleagues and I recently used the ACL2s proof assistant to model and reason about a protocol used by FileCoin and Ethereum, ultimately discovering a vulnerability in the process. The vulnerability was acknowledged by Protocol Labs and the Ethereum Foundation, and awarded a CVE number. This short anecdote illustrates how, when used correctly, proof assistants are powerful tools for reasoning about not only mathematical phenomena but also complex real-world systems such as network protocols.

Unfortunately, proof assistants are very difficult to use, and most practitioners have a graduate degree in computer science or mathematics. One of the most basic challenges in a proof assistant is knowing how to get started. Because these tools are so sophisticated, and typically used by such expert practitioners, there is a dearth of simple tutorials or “hello world” style example-code for novice users to reference. The problem is compounded by poor documentation - like most academic software, proof assistants tend to be poorly documented, if documented at all. Often the best way to learn a proof assistant is to consult with the scientist who created it, or to read the actual source code.

GPT could solve this problem. I propose to train a GPT instance on popular formal methods textbooks and open-source repositories in ACL2, a popular proof assistant I’m personally experienced with. The instance should learn to answer simple natural language queries, look up code examples, and explain concepts. A tool like this would be enormously useful for both students hoping to learn formal methods and also professional software engineers who want to incorporate formal verification into their work-flow or dev-ops pipeline, but lack the time to achieve a graduate-level comprehension of the requisite topics.

Grant Deliverables:

  • Trained GPT instance tailored to the ACL2 proof assistant.
  • Open-source repository of data used to train the instance.
  • Open-source set of benchmark queries the assistant can successfully answer, to be used by future developers training future instances on comparable proof assistants.


Squad Lead: Max von Hippel (myself), Northeastern PhD student in theoretical computer science. Relevant prior work includes RFCNLP project.


I want to see this!
Do you think your project, if successful, could be extended to computational complexity proofs? e.g. finetuning on the text of reduction proofs from NP-complete problems, and given a formal definition of a decision problem, the assistant determines whether it is NP-hard with proof.
Not sure if ACL2 would be the best proof assistant for this (though there is [2205.11700] Modeling Asymptotic Complexity Using ACL2 which uses it for software complexity proofs).

The one question that I have is Why a GPT model?

I do believe that you can distill from a large amount of text, some of the rules behind proofs, but the it seems that the computational primitives that they use, are far from the computational primitives that doing formal proofs use. Why are you using a GPT model instead of say a genetic algorithm mutating an ensemble of decision trees?

Good question! First of all, to be clear, I’m not proposing that we use GPT to write proofs. Proof synthesis is very difficult and GPT is pretty mediocre at it in my experience. All I am suggesting is we use GPT to search existing proofs, i.e., to make it so I can write English-language queries like “show me all arithmetic proofs relating the p-adic representation of a number with the sum of its factors”, and have the system return a list of all the proofs in the known proof library that match the query. (I just made that example up, I have no idea if any interesting theorems exist relating the p-adic representation of a number with the sum of its factors, but hopefully this helps illustrate the idea.) So, why GPT? Because GPT is actually pretty good at this already. I’ve used Algovera successfully to search the Ivy verifier’s database of theorems, and I’ve used GPT3 to search (whatever corpus it was trained on) for Coq proofs before. In contrast, other approaches I’ve used for similar tasks, such as a custom BERT model for extracting FSMs from prose, have been pretty cumbersome and brittle.

Where GPT fails in my experience is in its propensity toward lying. I think for that reason it would be interesting to see if we could train the model to vastly prefer retrieving memorized code over creating new code. IE, penalize the creation of new code blocks, and reward the retrieval (with correct reference link) of existing ones. I am not sure if this is possible but it seems like a reasonable approach to me.

1 Like

Yes, I think this could be extended to arbitrary kinds of proofs, however, I do not think my proposal will actually prove anything for you, only look up existing proofs in some corpus based off your natural-language query.

Why use ACL2? Probably Coq would be better, or LEAN, and I’d be happy to amend my proposal and re-submit w/ one of those in mind. I chose ACL2 because I use it a lot and therefore I’m more qualified to determine if the results the model produces are correct. But if someone with more experience in a more popular language like Coq or LEAN wanted to work with me on this, I’m totally open to that!

Anyone who has used Coq has struggled to find things with the Search command. All I am proposing is that we replace this command with a natural-language chatbot :slight_smile: .

1 Like

Yes, I think Coq would be a better fit for problem complexity/undecidability proofs, with the existence of GitHub - uds-psl/coq-library-complexity
In any case, better work with the tools you are most comfortable with IMO. Showing that it is possible, with ACL2 or any other framework, would be a great starting point for all sorts of fun LLM applications. Will keep an eye on your progress.

1 Like