Guiding Diffusion models with Human-in-the-Loop

Project Statement
One statement: Training and guiding Diffusion models to generate images as per user-guided, either via text or image or both.

Generative image models are a huge-leap but user controlled generation is still not fine-grained with the human in the loop feedback. One of the approaches I want to build with is guiding the model through embedding on human feedback. For instance, a text-prompt passed to the model can encoded by CLIP like model-score and then measure how accurate the generation is to prompt via embedding distances geometrically.

The user enters “monkey” → model generates monkey
User prompts “on a hill-top” → CLIP generates with hill-like visuals as the loss metric encoded in vectors → User feedback …-> repeats

Implementing a CLIP encoder for guidance loss.

Amit Singh (metamyth)
Applied AI, Paperplane Communications
Discord: metamyth #8558
USDC Wallet Address: 0x0159af752e0220ed3eef439bef36f982cc0a6fbf


Interesting! Do you mean doing clip guidance but with user feed back added somehow? How do you plan to add the user feed back for guidance? This kind of made me think of control net/prompt to prompt

Hey, thanks! I’m thinking of user adding feedback on the go, generating a sample → user input → apply that as a transform on the image.

Would love to know more about methods you referred to.

Do you mean to guide it using a technique like this CLIP-Guided-Diffusion - a Hugging Face Space by EleutherAI ?

I’m curious if it could be made into a continuous process, where the human can change the guiding prompt or image at any point as the diffusion is happening

Yes, that would need human input in denoising steps of the model. CLIP but guided.