Even_Adder@lemmy.dbzer0.com to

Stable Diffusion@lemmy.dbzer0.comEnglish · 3 months ago

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

2

11

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers

Even_Adder@lemmy.dbzer0.com to

Stable Diffusion@lemmy.dbzer0.comEnglish · 3 months ago

2

Abstract

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096×4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include: (1) Deep compression autoencoder: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens. (2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality. (3) Decoder-only text encoder: we replaced T5 with modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment. (4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence. As a result, Sana-0.6B is very competitive with modern giant diffusion model (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. Sana enables content creation at low cost. Code and model will be publicly released.

Paper: https://arxiv.org/abs/2410.10629

Code: https://github.com/NVlabs/Sana

Models: https://huggingface.co/collections/Efficient-Large-Model/sana-673efba2a57ed99843f11f9e

Demo: https://nv-sana.mit.edu/

Project Page: https://hanlab.mit.edu/projects/sana

Chat

m_f
link
fedilink
English
arrow-up
2·
3 months ago
It’ll be interesting to see where we go when it’s practical to generate video in realtime

Stable Diffusion@lemmy.dbzer0.com

stable_diffusion@lemmy.dbzer0.com

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !stable_diffusion@lemmy.dbzer0.com

Discuss matters related to our favourite AI Art generation technology

Also see

Other communities

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

4 users / day
38 users / week
101 users / month
611 users / 6 months
20 local subscribers
4.46K subscribers
901 Posts
1.74K Comments
Modlog

mods:
db0@lemmy.dbzer0.com