Real-time generation & editing on iPhone 17 Pro — no cloud, fully on-device.
Click on any image to view it in full resolution along with the prompt.
Text-to-Image generation results
Text-guided image editing results
In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through In-Context spatial concatenation in the latent space. To stabilize the training of this compact model, we introduce a Task-Progressive Joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After SFT and RL, DreamLite outperforms existing on-device models and remaining competitive with several server-side models in both generation and editing tasks. By employing step distillation, we further achieve 4-step inferencing, enabling our DreamLite could generate or edit a 1024 × 1024 image in less than 5s on a iPhone17 pro.
Our contributions are summarized as follows:
Overview of the proposed framework and its key components.
| Method | Params | GenEval ↑ | DPG ↑ | ImgEdit ↑ | GEdit-EN-Q ↑ |
|---|---|---|---|---|---|
| FLUX.1-Dev / Kontext | 12B | 0.67 | 84.0 | 3.76 | 6.79 |
| BAGEL | 7B | 0.82 | 85.1 | 3.42 | 7.20 |
| OmniGen2 | 4B | 0.80 | 83.6 | 3.44 | 6.79 |
| LongCat-Image / Edit | 6B | 0.87 | 86.6 | 4.49 | 7.55 |
| DeepGen1.0 | 2B | 0.83 | 84.6 | 4.03 | 7.54 |
| SANA-1.6B | 1.6B | 0.67 | 84.8 | - | - |
| MEISSONIC | 1B | 0.54 | 65.3 | - | - |
| VIBE | 1.6B | - | - | 3.85 | 7.28 |
| SANA-0.6B | 0.6B | 0.64 | 83.6 | - | - |
| SnapGen++ (small) | 0.4B | 0.66 | 85.2 | - | - |
| EditMGT | 0.96B | - | - | 2.89 | 6.33 |
| DreamLite (Ours) | 0.39B | 0.72 | 85.8 | 4.11 | 6.88 |
| Experiments | Mechanism | Training Stage | GenEval ↑ | ImgEdit ↑ |
|---|---|---|---|---|
| Text-to-image Pretraining | 0.70 | - | ||
| Condition Mechanism | Pix2Pix | T2I → Edit | 0.56 | 3.67 |
| Pix2Pix | T2I → Edit → Unified | 0.61 | 3.65 | |
| Training Strategy | In-context | T2I → T2I | 0.65 | - |
| In-context | T2I → Edit | 0.64 | 3.88 | |
| In-context | T2I → Unified | 0.65 | 3.14 | |
| In-context | T2I → Edit → Unified | 0.71 | 3.94 | |
| Reinforcement Learning | In-context | TPJ Pretrain → RLHF | 0.72 | 4.11 |
| Step Distillation | In-context | TPJ Pretrain → RLHF → DMD | 0.70 | 3.8 |
Our release plan and how to reach us.
If you have any questions about this work, feel free to reach out.
@article{feng2026dreamlite,
title={DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing},
author={Kailai Feng and Yuxiang Wei and Bo Chen and Yang Pan and Hu Ye and Songwei Liu and Chenqian Yan and Yuan Gao},
journal={arXiv preprint arXiv:2603.28713},
year={2026}
}