MaskUnet

Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

¹PCA Lab, VCIP, College of Computer Science, Nankai University, ²Shenzhen Futian, NKIARI
CVPR 2025
^✉Corresponding authors

Abstract

The diffusion models, in early stages focus on constructing basic image structures, while the refined details, including local features and textures, are generated in later stages. Thus the same network layers are forced to learn both structural and textural information simultaneously, significantly differing from the traditional deep learning architectures (e.g., ResNet or GANs) which captures or generates the image semantic information at different layers. This difference inspires us to explore the time-wise diffusion models. We initially investigate the key contributions of the U-Net parameters to the denoising process and identify that properly zeroing out certain parameters (including large parameters) contributes to denoising, substantially improving the generation quality on the fly. Capitalizing on this discovery, we propose a simple yet effective method—termed “MaskUNet”— that enhances generation quality with negligible parameter numbers. Our method fully leverages timestep- and sample-dependent effective U-Net parameters. To optimize MaskUNet, we offer two fine-tuning strategies: a training-based approach and a training-free approach, including tailored networks and optimization functions. In zero-shot inference on the COCO dataset, MaskUNet achieves the best FID score and further demonstrates its effectiveness in downstream task evaluations.

BibTeX

@inproceedings{wang2025not, title={Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability}, author={Wang, Lei and Li, Senmao and Yang, Fei and Wang, Jianye and Zhang, Ziheng and Liu, Yuhan and Wang, Yaxing and Yang, Jian}, booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference}, pages={12880--12890}, year={2025} }

Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability

Abstract

Method

Results

Quality results compared to other methods.

Quality results compared to other methods.

Quality results by Textual Inversion with or without mask.

Quality results by ReVersion with or without mask.

Quality results by Text2Video-Zero with or without mask.

Mask Analysis

Poster

BibTeX