Proposal: Code generation apps

curranjanssens · October 10, 2022, 9:24pm

Name of Project: Code generation application suite

Proposal in one sentence: A collection of code generation apps for quickly creating code from natural language prompts.

Description of the project and what problem it is solving:
An example of one of these code generators can be found here: Words_to_sql - a Hugging Face Space by Curranj, it’s currently the best SQL generator I can find on the internet, and there are a lot of improvements I could make to it. Many people use this to complete their SQL work/homework and it’d be great to make similar versions for other coding languages.

Grant Deliverables:
Highly configured SQL generation app building off of previous version
Regular expression generator
Python function generator
Code bug explainer

Squad Lead:
Linkedin: https://www.linkedin.com/in/curranjanssens/
Discord: cjanssens#2932

arthur88 · October 11, 2022, 2:02pm

I really like this idea, a few questions:

How you gonna improve the user experience? Right now there is a lot of competition with Adept, Codex, etc. And the interface is still the same, the difference is going to be the user experience.
What model or architecture will you use to train these apps or will you use an existing one?
How are you going to fine-tune the model to have an advantage over the others?

curranjanssens · October 11, 2022, 6:21pm

Thanks for the feedback, Good questions!

User Experience - I think Adept will create some incredible products, as does Codex, but neither are immediately accessible and ready to solve problems. Specializing in solving simple problems allows for a simple UI like in the link, and can be expanded for additional functionality like a database plug-in
Model/Architecture - I’ve tried SOTA open-source models and continue looking out for improvements but the best quality model I’ve found is Openai’s Codex model, which is easily accessible through an API as long as you have access
fine-tuning/ creating an advantage - A well thought out prompt gives by far the best ROI, and subsequent gains can be made through fine-tuning the model based on mistakes as they come up. Focusing on specific topics like SQL lets you better understand the optimal language for a task

silentspring30 · October 12, 2022, 8:47am

I like that this projec is improving an existing code and you will learn a lot.

My question would be, 1. Do you have rough time estimates on these deliverables? How many hours are you planning to work on this per day? It might be better to do less.

How do you define Highly configured SQL generation app building off of previous version? Perhaps list some smaller features you will implement.

Are you going to create wireframes/prototype and how will you test the UX/UI?

Grant Deliverables:
Highly configured SQL generation app building off of previous version
Regular expression generator
Python function generator
Code bug explainer

Best of luck

curranjanssens · October 12, 2022, 5:12pm

Thanks for replying!

Questions -

Deliverables/Hours?
I plan on dedicating a couple days a week to work on specific goals, probably starting out with making a bunch of generators listed (to get accustomed to the API’s for a variety of tasks) and then focus on improving the best one(s) into a more polished product
How do you define a Highly configured SQL generation app?
I’ve been thinking of potential ways to improve it - one would be a way to import a spreadsheet/csv/db file which you could use to provide additional context to the model behind the scenes. Currently it has to guess on variable names, and providing a list of them would give a lot more information about the data and would reduce accidental miss-spellings. Additionally, adding a way to visualize changes or a dataset would be helpful

Are you going to create wireframes/prototype and how will you test the UX/UI?
I’m planning on making a bunch of smaller apps and then creating wireframes/prototypes before adding additional features, especially if I add more complicated features like data visualization, but most testing will likely be done iteratively

smejak · October 17, 2022, 12:00pm

This is really cool. Are you planning on fine-tuning the models locally? Or are you using some cloud services? Also, are you gathering the data for fine-tuning yourself?

curranjanssens · October 17, 2022, 10:13pm

If I were using an open source model I would probably use a GPU cloud platform for training, but OpenAI hosts all of that themselves for their models so I’ll just be using that. For data, I would see what capabilities are missing from the model and create the data myself

endomorphosis · October 23, 2022, 3:44pm

have you considered doing this for solidity contracts?