Prompt Extend: Extending stable diffusion prompts with suitable style cues for better image generation

Name of Project: Prompt Extend

Proposal in one sentence: Text generation model to extend stable diffusion prompts with suitable style cues.

Description of the project and what problem it is solving:

To generate beautiful images, currently available diffusion models usually require complex prompts with additional style cues added to them.

So I’ve made this text generation model that helps with prompt engineering by expanding on the main idea of a prompt and generating suitable style cues to add to the prompt.

Example:

You could play with it on HuggingFace Space. Here’s the GitHub repo for the project, and I’ve also uploaded the model on HuggingFace Hub. The project is all open-source.

Grant Deliverables:

  • Scaling up the model architecture and trying out different techniques for improvements.
  • Experiment with fine-tuning currently available pre-trained text models on the prompts dataset and comparing their generated outputs with the current from-scratch approach (maybe try a mix of both?)
  • Add this as a custom pipeline to the diffusers library so that it can be used directly with the diffusers library for image generation.

Round 7 deliverables completed:

Increased the training dataset size from 80k prompts to ~2 million prompts and made improvements to the tokenizer and the model, leading to much better and more context-aware style cues suggestions generated.

Squad

Partho Das. So far, it is a solo project.

  • Twitter handle: daspartho_

  • Discord handle: daspartho#3367

  • ETH mainnet wallet address for potential funds: 0xb70003E35ec3368c1B1BA82aa64C3687A730e107

Grants for the project will help me to develop this further.

4 Likes

Interesting. This might be a basic q but how do you know that the prompt generated is better for stable diffusion? Do you have a dataset that takes in a simple prompt and the resulting good prompt?

In general, having related style cues added to the prompt leads to better image generation; it is more of a prompt engineering trick. If you’d look around on lexica, you’d find almost all prompts have style cues added to them. It does not guarantee enhancements but it generally does make the generated image better.

1 Like

I recently downloaded the diffusiondb 2m images onto S3, with the intent to use it for retrieval augmented diffusion do you need access to it perhaps to retrain your model based on scoring for semantic similarity and aesthetics.

2 Likes

Hey!
I was thinking of trying something like this, would be interesting to try. Can we move to discord DM?
My discord: daspartho#3367

I highly relate with the idea you are working on , I faced the same problem of not being able to create a good prompt.

Just a quick suggestion , would be to add the categories and themes , like that would set the environment or theme the image is based on . This might help in enhancing the prompt more according to user’s preference .

1 Like

Right now the closest thing you could do with the current setup is to add the theme you want as a style cue like tokyo street, cyberpunk and then pass this through the model. It would lead to style cues generations related to the theme.

I’ll look into adding this to the UI directly so that folks could choose from a list of themes. Thank you for your suggestion!

1 Like

Nice! I was implementing rdm for diffusers to test in my project too. Is it as good as stable diffusion for you?

And it kinda reminds me of diffusers/examples/community at main · huggingface/diffusers · GitHub looking forward to your project!

1 Like

First of all, I’d like to say every time I generate an image, I use prompt extend. It’s super useful to me. It would be cool if there were a taxonomy of generated style cues, or general themes that I could choose from, and then it would generate style cues under that theme. It would reduce the amount of times I have to use it since it’s just random.

1 Like

First of all, I’d like to say every time I generate an image, I use prompt extend. It’s super useful to me.

I’m glad you found it helpful.

It would be cool if there were a taxonomy of generated style cues, or general themes that I could choose from, and then it would generate style cues under that theme. It would reduce the amount of times I have to use it since it’s just random.

Yes! As mentioned in the above threads, I’ll look into adding this to the UI to make it more user-friendly.