Welcome to
WARP. This is our little organization for multimodal generative models, focusing on the visual domain. We have been working with generative image models a lot and
will soon work on video models as well. Our main team consists of:
A special thanks to the Huggingface Team for helping to bring our research to Diffusers! (Special thanks to Kashif, Patrick and Sayak!)
Feel free to join our Discord channel!
Models:
Paella
- A simple & straightforward text-conditional image generation model that works on quantized latents.
- More details can be found in the paper, the blog post and the YouTube video.
- Only accessible through GitHub.
Würstchen
- An efficient text-to-image model to train and use for inference. Achieves competetive performance to state-of-the-art methods, while needing only a fraction of the compute.
- More details can be found in the paper.
- Versions: