In a previous article, we saw how to use the Text In-painting pipeline to edit images by locating and replacing objects using their text description. In this article, we will use the simple In-painting pipeline to create panormic views. The process is as follows:

First, we generate few images from our prompt
Second, we concatenate slices of each adjacent image
Then, we pass these slices to in-painting pipeline to smooth the concatenation
Finally, we plot everything in the output image

Setup

First, we need to install the dependencies.

%%capture
%%bash

pip install --upgrade accelerate diffusers transformers

Second, accept the term of Stable Diffusion model to be able to download it form Hugging Face.

!huggingface-cli login

    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` now requires a token generated from https://huggingface.co/settings/tokens .
    
Token: 
Add token as git credential? (Y/n) n
Token is valid.
Your token has been saved to /root/.huggingface/token
Login successful

Import needed modules

import random
import numpy as np
from PIL import Image, ImageDraw
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
import torch
from diffusers import StableDiffusionPipeline, StableDiffusionInpaintPipeline

Set seed for reproducibility

random.seed(123)
np.random.seed(123)

Base images

You can use your own images, or why not use Stable Diffusion to generate them. Let's download the weights of CompVis/stable-diffusion-v1-4.

generate = StableDiffusionPipeline.from_pretrained(
    "CompVis/stable-diffusion-v1-4", 
    use_auth_token=True,
    torch_dtype=torch.float16, 
    revision="fp16",
).to('cuda')

To avoid out of memory issues, we use half precision and attention slicing which both are supposed to reduce the amout of memory needed by the pipeline.

generate.enable_attention_slicing()

# Parameters
num_images = 6
base_width = 768
base_height = 512
num_steps = 50

Pick a random prompt of your choice

prompt = 'Rocky Mountain High; landscape, natural beauty, deep shadows'

It is usually hard to get the prompt right from the first try, so we need to generate one image first to check if it matches the kind of landscape we want to work on.

img = generate(
    prompt=prompt,
    num_inference_steps=num_steps,
    width=base_width,
    height=base_height,
    ).images[0]

img

For simplicity, we will generate images for our single prompt. But for better results, you may want use as much prompts as the number of output images and check each one of them so that the resulting panorama view will be coherent.

prompts = [prompt] * num_images

imgs = generate(
    prompt=prompts,
    num_inference_steps=num_steps,
    width=base_width,
    height=base_height,
).images

To save on GPU memory, we remove the Stable Diffusion pipeline from GPU as we won't be using it later.

del generate
torch.cuda.empty_cache()

Let's plot a sample of the images we generated in the previous step

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

image_grid(random.sample(imgs, 3), 1, 3)

Because we used the same prompt to generate our images they seem very unlikely to be part of same view. For instance, look at the differences in the color of the sky in each of those images.

You may want use individual prompts and inspect the resulting images individually before proceeding to next step.

Panoramic view

The image input to the in-painting pipeline is created by concatenating adjacent images. The following helper functions takes half of the left image and concatenate it with half of the right image.

def create_input_image(left, right):
    w,h = left.size
    new = Image.new('RGB', (base_width, base_height))
    new.paste(left, (-w+base_width//2, 0))
    new.paste(right, (base_width//2, 0))
    return new

Let's plot the output of the previous function to better understand what it does

_, ax = plt.subplots(1, 3, figsize=(15, 4))
[a.axis('off') for a in ax.flatten()]
images = [imgs[0], imgs[1], create_input_image(imgs[0], imgs[1])]
[ax[i].imshow(images[i]) for i in range(3)];
titles = ["left", "right", "combination"]
[ax[i].text(0, -15, titles[i]) for i in range(3)];

The in-painting pipeline needs a mask that can be used to paint over and generate an image matching the prompt. The following helper function creates a mask where the edges are preserved. The model will need to paint the area in the middle to make the transition from the left to the right smooth.

def create_mask(width, height):
    msk = Image.new('L', (width, height))
    drw = ImageDraw.Draw(msk)
    drw.rectangle((width//4, 0, 3*width//4, height), fill=255)
    return msk

msk = create_mask(base_width, base_height)

This is what the mask looks like

msk.resize((128, 128))

The following helper function concatenates the different parts to create a resulting image.

Note: The middle image will be generated by the in-painting pipeline

def create_output_image(left, middle, right):
    w,h = left.size
    new = Image.new('RGB', (w+base_width, base_height))
    new.paste(left, (0, 0))
    new.paste(middle, (w-base_width//2, 0))
    img = right.crop((base_width//2, 0, base_width, base_height))
    new.paste(img, (w+base_width//2, 0))
    return new

Let's visualize the result of the previous function to have a sense of what it does.

left = Image.new(mode="RGB", size=(base_width,base_height), color="red")
middle = Image.new(mode="RGB", size=(base_width,base_height), color="green")
right = Image.new(mode="RGB", size=(base_width,base_height), color="blue")
create_output_image(left, middle, right).resize((128, 128))

Now we can create the in-painting pipeline by downloading the weights from runwayml/stable-diffusion-inpainting.

inpaint = StableDiffusionInpaintPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    torch_dtype=torch.float16, 
    revision="fp16",
).to("cuda")

Note: Same as before to avoid OOM, we use half precision and attention slicing

inpaint.enable_attention_slicing()

Finally, we put everything together to generate the panormic view by:

Combining each of the previous image and
Leveraging in-painting pipeline to smooth the concatenation of adjacent images.

prev = imgs[0]

for curr in tqdm(imgs[1:] + [imgs[0]]):
    new = create_input_image(prev, curr)

    merged = inpaint(
        prompt=prompt,
        image=new,
        mask_image=msk,
        height=base_height,
        width=base_width,
        num_inference_steps=num_steps,
        ).images[0]

    prev = create_output_image(prev, merged, curr)

w, h = prev.size
output = prev.crop((base_width//2, 0, w-base_width//2, h))

del inpaint
torch.cuda.empty_cache()

Save the resulting image and upload to this panorama web viewer for better visualization.

panorama = Image.new('RGB', (w, 3*h))
panorama.paste(output, (0, h))
panorama.size

(5376, 1536)

panorama.save('panorama.jpeg')

That's all folks

In this article, we saw how Stable Diffusion can be used to create panoamic views by generating few random images then using in-paiting to merge them smoothly.

I hope you enjoyed this article, feel free to leave a comment or reach out on twitter @bachiirc.