Finding NeMo

Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models

German Research Center for Artificial Intelligence (DFKI),
Technical University of Darmstadt,
Hessian Center for AI,
CISPA Helmholtz Center for Information Security

^*Indicates Equal Contribution

Abstract

Diffusion models (DMs) produce very detailed and high-quality images. Their power results from extensive training on large amounts of data, usually scraped from the internet without proper attribution or consent from content creators. Unfortunately, this practice raises privacy and intellectual property concerns, as DMs can memorize and later reproduce their potentially sensitive or copyrighted training images at inference time. Prior efforts prevent this issue by either changing the input to the diffusion process, thereby preventing the DM from generating memorized samples during inference, or removing the memorized data from training altogether. While those are viable solutions when the DM is developed and deployed in a secure and constantly monitored environment, they hold the risk of adversaries circumventing the safeguards and are not effective when the DM itself is publicly released. To solve the problem, we introduce NeMo, the first method to localize memorization of individual data samples down to the level of neurons in DMs' cross-attention layers. Through our experiments, we make the intriguing finding that in many cases, single neurons are responsible for memorizing particular training samples. By deactivating these memorization neurons, we can avoid the replication of training data at inference time, increase the diversity in the generated outputs, and mitigate the leakage of private and copyrighted data. In this way, our NeMo contributes to a more responsible deployment of DMs.

Overview of NeMo

For memorized prompts, we observe that the same (original training) image is constantly generated independently of the initial random seed. In the initial stage, NeMo first identifies candidate neurons potentially responsible for the memorization based on out-of-distribution activations. In a refinement step, NeMo detects the memorization neurons from the candidate set by leveraging the noise similarities during the first denoising step.

Quantifying Memorization Strength

NeMo computes the memorization strength by analyzing the consistency of the denoising trajectory. Starting with two randomly initialized noise images, we perform a single denoising step and calculate the difference between the predicted and initial noise. The Structural Similarity Index (SSIM) between these differences quantifies memorization, with higher similarity indicating stronger memorization.

Visualizing Noise Differences

Noise difference for memorized prompts — **Memorized Prompt**

Noise difference for non-memorized prompts — **Non-Memorized Prompt**

We visualize the normalized noise differences between the predicted noise (after the first denoising step) and the initial Gaussian noise, using four random seeds. The top row shows the final generated images. For memorized prompts (top), noise differences reveal low diversity and hint at the final image's structure, while for non-memorized prompts (bottom), the noise differences lack clear structure and vary significantly across initial noise samples.

Impact of Deactivating Memorization Neurons

The top rows display images generated with verbatim memorized and template memorized prompts, closely replicating training data. In contrast, the bottom row shows that deactivating specific memorization neurons enhances diversity and mitigates memorization. Remarkably, only a small number of neurons are responsible for triggering memorization, as indicated by the counts in the boxes.

BibTeX

@inproceedings{hintersdorf24nemo, author = {Hintersdorf, Dominik and Struppek, Lukas and Kersting, Kristian and Dziedzic, Adam and Boenisch, Franziska}, title = {Finding NeMo: Localizing Neurons Responsible For Memorization in Diffusion Models}, booktitle = {Conference on Neural Information Processing Systems (NeurIPS)}, year = {2024}, }