Welcome to WARP. This is our little organization for multimodal generative models, focusing on the visual domain. We have been working with generative image models a lot and will soon work on video models as well. Our main team consists of:

A special thanks to the Huggingface Team for helping to bring our research to Diffusers! (Special thanks to Kashif, Patrick and Sayak!)

Feel free to join our Discord channel!

Models:

Paella

A simple & straightforward text-conditional image generation model that works on quantized latents.
More details can be found in the paper, the blog post and the YouTube video.
Only accessible through GitHub.

Würstchen

An efficient text-to-image model to train and use for inference. Achieves competetive performance to state-of-the-art methods, while needing only a fraction of the compute.
More details can be found in the paper.
Versions:

v1: Only accessible through GitHub.
v2: Accessible through GitHub and Diffusers