Guiding Diffusion models with Human-in-the-Loop

metamyth · February 10, 2023, 8:10pm

Project Statement
One statement: Training and guiding Diffusion models to generate images as per user-guided, either via text or image or both.

Descriptive
Generative image models are a huge-leap but user controlled generation is still not fine-grained with the human in the loop feedback. One of the approaches I want to build with is guiding the model through embedding on human feedback. For instance, a text-prompt passed to the model can encoded by CLIP like model-score and then measure how accurate the generation is to prompt via embedding distances geometrically.

The user enters “monkey” → model generates monkey
User prompts “on a hill-top” → CLIP generates with hill-like visuals as the loss metric encoded in vectors → User feedback …-> repeats

Deliverables
Implementing a CLIP encoder for guidance loss.

Squad
Amit Singh (metamyth)
Applied AI, Paperplane Communications
Twitter: https://twitter.com/not_amyth?t=BSpbbssQAIkrRAgg1ZTUIQ&s=09
Discord: metamyth #8558
USDC Wallet Address: 0x0159af752e0220ed3eef439bef36f982cc0a6fbf

isamu · February 12, 2023, 2:33am

Interesting! Do you mean doing clip guidance but with user feed back added somehow? How do you plan to add the user feed back for guidance? This kind of made me think of control net/prompt to prompt

metamyth · February 12, 2023, 9:56pm

Hey, thanks! I’m thinking of user adding feedback on the go, generating a sample → user input → apply that as a transform on the image.

Would love to know more about methods you referred to.

guillefix · February 15, 2023, 2:20pm

Do you mean to guide it using a technique like this CLIP-Guided-Diffusion - a Hugging Face Space by EleutherAI ?

I’m curious if it could be made into a continuous process, where the human can change the guiding prompt or image at any point as the diffusion is happening

metamyth · February 19, 2023, 2:47pm

Yes, that would need human input in denoising steps of the model. CLIP but guided.