In recent years, large diffusion models such as DALL-E 2 and Stable Diffusion have gained recognition for their capacity to generate high-quality, photorealistic images and their ability to perform various image synthesis and editing tasks.
But concerns are arising about the potential misuse of user-friendly generative AI models, which can enable the creation of inappropriate or harmful digital content. For example, malicious actors might exploit publicly shared photos of individuals by utilizing an off-the-shelf diffusion model to edit them with harmful intent.
To tackle the mounting challenges surrounding unauthorized image manipulation, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have introduced “PhotoGuard,” an AI tool designed to combat advanced gen AI models like DALL-E and Midjourney.
In the research paper “Raising the Cost of Malicious AI-Powered Image Editing,” the researchers claim that PhotoGuard can detect imperceptible “perturbations” (disturbance or irregularity) in pixel values, which are invisible to the human eye but detectable by computer models.
“Our tool aims to ‘fortify’ images before uploading to the internet, ensuring resistance against AI-powered manipulation attempts,” Hadi Salman, MIT CSAIL doctorate student and paper lead author, told VentureBeat. “In our proof-of-concept paper, we focus on manipulation using the most popular class of AI models currently employed for image alteration. This resilience is established by incorporating subtly crafted, imperceptible perturbations to the pixels of the image to be protected. These perturbations are crafted to disrupt the functioning of the AI model driving the attempted manipulation.”
According to MIT CSAIL researchers, the AI employs two distinct “attack” methods to create perturbations: encoder and diffusion.
The “encoder” attack focuses on the image’s latent representation within the AI model, causing the model to perceive the image as random and rendering image manipulation nearly impossible. Likewise, the “diffusion” attack is a more sophisticated approach and involves determining a target image and optimizing perturbations to make the generated image closely resemble the target.
Salman explained that the key mechanism employed in its AI is ‘adversarial perturbations.’
“Such perturbations are imperceptible modifications of the pixels of the image that have proven to be exceptionally effective in manipulating the behavior of machine learning models,” he said. “PhotoGuard uses these perturbations to manipulate the AI model processing the protected image into producing unrealistic or nonsensical edits.”
A team of MIT CSAIL graduate students and lead authors — including Alaa Khaddaj, Guillaume Leclerc and Andrew Ilyas —contributed to the research paper alongside Salman.
The work was also presented at the International Conference on Machine Learning in July and was partially supported by National Science Foundation grants at Open Philanthropy and Defense Advanced Research Projects Agency.
Using AI as a defense against AI-based image manipulation
Salman said that although AI-powered generative models such as DALL-E and Midjourney have gained prominence due to their capability to create hyper-realistic images from simple text descriptions, the growing risks of misuse have also become evident.
These models enable users to generate highly detailed and realistic images, opening up possibilities for innocent and malicious applications.
Salman warned that fraudulent image manipulation can influence market trends and public sentiment in addition to posing risks to personal images. Inappropriately altered pictures can be exploited for blackmail, leading to substantial financial implications on a larger scale.
Although watermarking has shown promise as a solution, Salman emphasized the necessity for a preemptive measure to proactively prevent misuse remains critical.
“At a high level, one can think of this approach as an ‘immunization’ that lowers the risk of these images being maliciously manipulated using AI — one that can be considered a complementary strategy to detection or watermarking techniques,” Salman explained. “Importantly, the latter techniques are designed to identify falsified images once they have been already created. However, PhotoGuard aims to prevent such alteration to begin with.”
Changes imperceptible to humans
PhotoGuard alters selected pixels in an image to enable the AI’s ability to comprehend the image, he explained.
AI models perceive images as complex mathematical data points representing each pixel’s color and position. By introducing imperceptible changes to this mathematical representation, PhotoGuard ensures the image remains visually unaltered to human observers while protecting it from unauthorized manipulation by AI models.
The “encoder” attack method introduces these artifacts by targeting the algorithmic model’s latent representation of the target image — the complex mathematical description of every pixel’s position and color in the image. As a result, the AI is essentially prevented from understanding the content.
On the other hand, the more advanced and computationally intensive “diffusion” attack method disguises an image as different in the eyes of the AI. It identifies a target image and optimizes its perturbations to resemble the target. Consequently, any edits the AI attempts to apply to these “immunized” images will be mistakenly applied to the fake “target” images, generating unrealistic-looking images.
“It aims to deceive the entire editing process, ensuring that the final edit diverges significantly from the intended outcome,” said Salman. “By exploiting the diffusion model’s behavior, this attack leads to edits that may be markedly different and potentially nonsensical compared to the user’s intended changes.”
Simplifying diffusion attack with fewer steps
The MIT CSAIL research team discovered that simplifying the diffusion attack with fewer steps enhances its practicality, even though it remains computationally intensive. Furthermore, the team said it is integrating additional robust perturbations to bolster the AI model’s protection against common image manipulations.
Although researchers acknowledge PhotoGuard’s promise, they also cautioned that it is not a foolproof solution. Malicious individuals could attempt to reverse-engineer protective measures by applying noise, cropping or rotating the image.
As a research proof-of-concept demo, the AI model is not currently ready for deployment, and the research team advises against using it to immunize photos at this stage.
“Making PhotoGuard a fully effective and robust tool would require developing versions of our AI model tailored to specific gen AI models that are present now and would emerge in the future,” said Salman. “That, of course, would require the cooperation of developers of these models, and securing such a broad cooperation might require some policy action.”