A Lightweight On-Device Unified Model for Image Generation and Editing

We're recruiting passionate Research Interns in Efficient & On-device GenAI! Join Us →

On-Device Demo

Real-time generation & editing on iPhone 17 Pro — no cloud, fully on-device.

Image Editing
Style Transfer
Background Change

About DreamLite

In this paper, we propose DreamLite, a compact unified on-device diffusion model (0.39B) that supports both T2I generation and text-guided image editing within a single network. DreamLite is built on a pruned mobile U-Net backbone and unifies conditioning through In-Context spatial concatenation in the latent space. To stabilize the training of this compact model, we introduce a Task-Progressive Joint pretraining strategy that sequentially targets T2I, editing, and joint tasks. After SFT and RL, DreamLite outperforms existing on-device models and remaining competitive with several server-side models in both generation and editing tasks. By employing step distillation, we further achieve 4-step inferencing, enabling our DreamLite could generate or edit a 1024 × 1024 image in less than 5s on a iPhone17 pro.

Our contributions are summarized as follows:

  • We propose, to the best of our knowledge, the first unified on-device model that supports both text-to-image generation and text-based image editing, eliminating the need to deploy two separate models.
  • We introduce an in-context conditioning mechanism for UNet to unify generation and editing, and propose a task-progressive joint pretraining scheme (i.e., T2I → Edit → Unified Joint Training) to stably train the model.
  • DreamLite achieves competitive performance on standard benchmarks and consistently outperforms prior mobile models. After deployment on mobile device, DreamLite could generate or edit a 1024 × 1024 image in less than 5s.

Model Architecture

Overview of the proposed framework and its key components.

Model Architecture
Figure 1. Overall architecture of DreamLite.

Visual Results

Training Pipeline
Figure 2. Generation and Editing Results on Mobile Device.

Main Results

Table 1. Comparison with existing methods on GenEval, DPG, ImgEdit and GEdit-EN Benchmarks.
Method Params GenEval ↑ DPG ↑ ImgEdit ↑ GEdit-EN-Q ↑
FLUX.1-Dev / Kontext12B0.6784.03.766.79
BAGEL7B0.8285.13.427.20
OmniGen24B0.8083.63.446.79
LongCat-Image / Edit6B0.8786.64.497.55
DeepGen1.02B0.8384.64.037.54
SANA-1.6B1.6B0.6784.8--
MEISSONIC1B0.5465.3--
VIBE1.6B--3.857.28
SANA-0.6B0.6B0.6483.6--
SnapGen++ (small)0.4B0.6685.2--
EditMGT0.96B--2.896.33
DreamLite (Ours)0.39B0.7285.84.116.88
Table 2. Ablation study on GenEval and ImgEdit benchmarks. "TPJ" denotes "Task-progressive Joint".
Experiments Mechanism Training Stage GenEval ↑ ImgEdit ↑
Text-to-image Pretraining0.70-
Condition MechanismPix2PixT2I → Edit0.563.67
Pix2PixT2I → Edit → Unified0.613.65
Training StrategyIn-contextT2I → T2I0.65-
In-contextT2I → Edit0.643.88
In-contextT2I → Unified0.653.14
In-contextT2I → Edit → Unified0.713.94
Reinforcement LearningIn-contextTPJ Pretrain → RLHF0.724.11
Step DistillationIn-contextTPJ Pretrain → RLHF → DMD0.703.8

Roadmap & Contact

Our release plan and how to reach us.

Release TODO
  • ✓ Done Paper on arXiv
  • Coming Inference code release
  • Coming Model weights on HuggingFace
  • Coming Online Demo
  • Coming Android & iOS App
Contact

If you have any questions about this work, feel free to reach out.

BibTeX

@article{feng2026dreamlite,
  title={DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing},
  author={Kailai Feng and Yuxiang Wei and Bo Chen and Yang Pan and Hu Ye and Songwei Liu and Chenqian Yan and Yuan Gao},
  journal={arXiv preprint arXiv:2603.28713},
  year={2026}
}