One statement: Training and guiding Diffusion models to generate images as per user-guided, either via text or image or both.
Generative image models are a huge-leap but user controlled generation is still not fine-grained with the human in the loop feedback. One of the approaches I want to build with is guiding the model through embedding on human feedback. For instance, a text-prompt passed to the model can encoded by CLIP like model-score and then measure how accurate the generation is to prompt via embedding distances geometrically.
The user enters “monkey” → model generates monkey
User prompts “on a hill-top” → CLIP generates with hill-like visuals as the loss metric encoded in vectors → User feedback …-> repeats
Implementing a CLIP encoder for guidance loss.
Amit Singh (metamyth)
Applied AI, Paperplane Communications
Discord: metamyth #8558
USDC Wallet Address: 0x0159af752e0220ed3eef439bef36f982cc0a6fbf