Contract Sentinel: AI-powered smart contract auditing

Name of Project: Contract Sentinel

Proposal in one sentence:

Make an application that leverages the power of AI to audit smart contracts, providing peace of mind and ensuring their security.

Description of the project and what problem it is solving:

The problem with smart contracts is that they are prone to errors and vulnerabilities, which can have serious consequences if not detected and corrected. To ensure the safety of your contracts, it is essential to identify and fix any vulnerabilities before deploying it to the blockchain. The app will be trained to recognize the types of security vulnerabilities and issues commonly found in smart contracts. After the model is trained, you can use the app to analyze new or existing smart contracts by providing it with the contract code and any related information. The model will use this data to identify potential security vulnerabilities and provide recommendations for addressing or mitigating them. For the proof of concept, the app could leverage chatGPT.

Grant Deliverables:

App using chatGPT

Squad

Squad Lead: Felipe

Twitter handle: @AFelipePedreros
Discord handle: felipe2894#8319

3 Likes

Nice! I heard that while chat GPT can make amazing texts, what it says isn’t too factual. Do you have some ideas on how to avoid this issue?

Hi! You can make small adjustments to chatGPT to improve its performance in the PoC. Later to obtain more accurate and consistent performance it would be necessary to train GPT-3 exclusively in auditing smart contracts

1 Like

First questions that comes to mind - no pressure in answering:

A) How will the dataset be obtained or built?

B) If chatGPT is used specifically, will this project upstream/provide feedback to the chatGPT (RLHF) system itself?

There are some GitHub repositories that collect smart contracts that have been tagged with vulnerabilities, like not-so-smart-contracts and smartbugs-curated. ConsenSys also has some good examples of attacks. But I think for our next deliverables, it would be better if we built our own dataset. It would be a great task to spread the love and involve some security expert to help us find more vulnerable smart contracts and/or create new ones.

I’m not sure what you’re asking in your second question. Are you asking whether chatGPT is getting feedback while we’re using it in our project?

Cool concept @Felipe! Thanks for posting. Couple of misc thoughts/questions

In addition to these repos with explicit vulnerabilities, you could use static analysis tools like slither to programmatically generate a ton of “labeled” data. List of some tools here

Natural language is one way to approach, but you could also use compiled bytecode as the input. This could lend itself nicely to training a CNN or recurrent model

and +1 on being cautious with “creative” outputs

1 Like

@devin, that is a great idea. I hadn’t thought about using bytecode as the input. Thank you for the suggestion and for sharing the tools in the link. You are both correct that being factual in the outputs is key for this product.

This is a very interesting idea, when I was at ETH Barcelona , they mentioned IA and smart contract auditing. It can be a shift and help avoid smart contract hacks even by some % .

A GPT model is not likely the right tool for this job, because these are generative models that work stochastically, it does not actually keep track of the states of the variables or chain data, it simply returns the most likely next set of tokens based on statistical heuristics.

In fact most of the models that have generated code, have relied on the generative model for part of it, and they sample from that generative model multiple times while passing the outputs into a traditional code analysis tools.

compare the symbolic methods of SAT solvers, decision trees, and theorem provers, which are for example in how Wolfram Alpha is able to operate on math concepts, compared to how something like GPT3 consistently fails math problems.

@endomorphosis this is great feedback. Do you have some resources for me to take a look?

Competition-Level Code Generation with AlphaCode

Barking up the Wrong Tree: Correctness & Debugging

Thank you @endomorphosis. I will take the time to review it.