Site icon R-bloggers

Having some fun with Stable Diffusion Inpainting in Python on New Year’s Day

[This article was first published on 0-fold Cross-Validation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is New Year’s Day 2023 :sweat_smile:. Happy New Year!!! :fireworks: I am currently driving with my family coast-to-coast on a road trip through the United States, but for New Year’s Eve and New Year’s Day we stayed in one place. Taking advantage of the driving free days, I and my 4-year old son had some great fun with the open-source stable diffusion models; in particular, the Text-Guided Image Inpainting techniques.

Basically, inpainting allows you to replace or transform image areas of your choice with something else AI-generated based on a text prompt. You can see some of my results in the collage above. The top left panel shows the original (real) image. That’s a photo I took of my son during breakfast at a restaurant this morning, and he found it absolutely hilarious how we can drastically modify it with the computer – the text prompts we used were based on his suggestions to a large part.

A few code snippets

I already had played around a few times with image generation with stable diffusion in Python, and with textual inversion for representation of a specific artistic style. Immediately I was (and still am) positively surprised by how easy and pleasant the developers made it to use stable diffusion via the Huggingface diffusers library in Python. But I haven’t looked at inpainting techniques until today. I learned a lot from great tutorials about stable diffusion such as the FastAI notebook “Stable Diffusion Deep Dive”, but I haven’t specifically seen examples of inpainting so far (though I haven’t looked :stuck_out_tongue:). So, I’m providing some relevant code snippets here.

There are two clear ways in which inpainting could be applied to the image I started with (top left in the collage above). Either replace/transform the boy, or replace/transform the drawing that he is holding.

However, first, one has to define an image mask:

mask = np.zeros(init_image.size).T
mask[270:, :] = 255
mask[550:, 400:] = 0
mask = Image.fromarray(np.uint8(mask)).convert('RGB')
plt.imshow(mask)

Generating the selected image areas based on a text prompt “from scratch”

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-2-inpainting",
    revision="fp16",
    torch_dtype=torch.float16,
).to("cuda")
pipe.enable_attention_slicing()  # to save some gpu memory in exchange for a small speed decrease
import torch
torch.manual_seed(2023)

inp_img = square_padding(init_image)  # my own function, init_image is loaded with PIL.Image
mask = square_padding(mask)
inp_img = inp_img.resize((512, 512))
mask = mask.resize((512, 512))

prompt = "something..."
negative_prompt = "something..."

result = pipe(prompt, image = inp_img, mask_image = mask, negative_prompt=negative_prompt,
    num_inference_steps = 50, guidance_scale = 11).images
result[0]  # this is the generated image

Generating selected image areas in an image-to-image fashion

Alternatively, the generated image can be created in an image-to-image fashion. For this, I adapted an example from the huggingface/diffusers repository, along the lines of:

from diffusers import DiffusionPipeline
import torch

torch.manual_seed(2023)

inp_img = my_input_image  # loaded with PIL.Image
mask = my_image_mask      # also PIL.Image
inner_image = inp_img.convert("RGBA")

pipe = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    custom_pipeline="img2img_inpainting",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()  # to save some gpu memory in exchange for a small speed decrease

prompt = "something..."
negative_prompt = "something..."

result = pipe(prompt=prompt, image=inp_img, inner_image=inner_image,
    mask_image=mask, negative_prompt=negative_prompt,
    num_inference_steps = 50, guidance_scale = 10).images
result[0]  # this is the generated image

Remarks

To leave a comment for the author, please follow the link and comment on their blog: 0-fold Cross-Validation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.