ドメインに依存しない画像生成モデル mobiusをDockerで試してみた

hugging faceにあるCorcelio/mobiusを試してみたいと思います。

Corcelio/mobius · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Mobiusとは

上記のURLから読み取れるmobiusの特徴をnotebookLMに要約してもいました。

ドメインにとらわれないデバイアス: Mobiusは、他の拡散モデルに共通する固有のバイアスを実質的に含まない画像を生成します。
優れた汎用性: Mobiusは、幅広いスタイルやドメインに適応できる比類のない能力を備えています。
効率的なファインチューニング: Mobiusベースモデルは、特定のタスクやドメインに合わせて調整された特殊なモデルを作成するための優れた基盤となります。
構成的デコンストラクションフレームワーク: Mobiusは、構成的デコンストラクションフレームワークを採用しています。

要は、「今までのモデルと違って、高品質の幅広いドメインの画像が作れます」ということだと思います。

LoRAを使わなくてもいいよーってことですね。

実際、デモ画像も相当きれいな画像が多く、写真、抽象画、アニメ、CGなどの適用事例が載っています。

Corcelというチーム(企業)によって開発されているみたいですね。

Corcel

Corcel API

実行環境

実行環境は、以前animateDiff-lightningを動かしたDocker環境と同じです。

Dockefileを再掲載しときます。

FROM nvidia/cuda:12.1.0-runtime-ubuntu20.04

ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm

RUN apt-get update \
    && groupadd --gid $USER_GID $USERNAME \
    && useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME \
    && apt-get install -y sudo \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME \
    && apt-get -y install locales \
    && localedef -f UTF-8 -i ja_JP ja_JP.UTF-8

RUN apt-get -y update \
    && apt-get install -y software-properties-common \
    && add-apt-repository ppa:deadsnakes/ppa

RUN apt install -y bash \
    build-essential \
    git \
    git-lfs \
    curl \
    ca-certificates \
    libsndfile1-dev \
    libgl1 \
    python3.10 \
    python3-pip \
    python3.10-venv && \
    rm -rf /var/lib/apt/lists

# make sure to use venv
RUN python3.10 -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# pre-install the heavy dependencies (these can later be overridden by the deps from setup.py)
RUN python3.10 -m pip install --no-cache-dir --upgrade pip uv==0.1.11 && \
    python3.10 -m uv pip install --no-cache-dir \
    torch \
    torchvision \
    torchaudio \
    invisible_watermark && \
    python3.10 -m pip install --no-cache-dir \
    accelerate \
    datasets \
    hf-doc-builder \
    huggingface-hub \
    Jinja2 \
    librosa \
    numpy \
    scipy \
    tensorboard \
    transformers \
    pytorch-lightning

RUN pip install --upgrade diffusers[torch]
RUN pip install xformers

ENV HF_HOME /work/.cache/huggingface
ENV TORCH_HOME /work/.cache/torchvision

キャッシュディレクトリを設定しておかないと、リビルドするたびにモデルが消えるので、ちゃんと設定しましょう。

実行してみる

サンプルコードをコピペして、動かしてみます。

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    KDPM2AncestralDiscreteScheduler,
    AutoencoderKL
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "Corcelio/mobius", 
    vae=vae,
    torch_dtype=torch.float16
)
pipe.scheduler = KDPM2AncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

# Define prompts and generate image
prompt = "anime boy, protagonist, best quality"
negative_prompt = ""

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=7,
    num_inference_steps=50,
    clip_skip=3
).images[0]


image.save("generated_image.png")

私の実行環境のGPUはショボいので、軽量化のために次のコードを追加しています。

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

実行すると、モデルのダウンロードが始まります。

diffusion_pytorch_model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.3G/10.3G [06:17<00:00, 11.4MB/s]
diffusion_pytorch_model.safetensors:  58%|██████████████████████████████████████████████████████████████████████████▉                                                      | 5.97G/10.3G [20:14<14:36, 4.91MB/s]
Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [20:16<00:00, 76.04s/it]
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [01:19<00:00, 11.37s/it]
 34%|██████████████████████████████████████████████████████████▏                                                                                                                | 17/50 [01:24<02:48,  5.12s/it]

数十分かかりました。

実行が終わると、generatted_image.pngが生成されています。

デモ画像ほど綺麗では無いような？気もしますが、キレイな画像です。

また、流石に50stepsなので生成速度は遅いです。

LCM LoRAを使ってみる

流石に１枚生成するのに数分かかると、実験しづらいので、LCMを使ってみます。

LCM LoRAのダウンロードに、PETFが必要と言われたので、インストールします。Dockerfileに追記してもらってもいいです。

RUN pip install peft

LCM LoRAを使う形にサンプルコードを変えます。

import torch
from diffusers import (
    StableDiffusionXLPipeline, 
    KDPM2AncestralDiscreteScheduler,
    AutoencoderKL,
    LCMScheduler
)

# Load VAE component
vae = AutoencoderKL.from_pretrained(
    "madebyollin/sdxl-vae-fp16-fix", 
    torch_dtype=torch.float16
)

# Configure the pipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
    "Corcelio/mobius", 
    vae=vae,
    torch_dtype=torch.float16
)
# set scheduler
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

# load LCM-LoRA
pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

pipe.to('cuda')

# Define prompts and generate image
prompt = "anime boy, protagonist, best quality"
negative_prompt = ""

pipe.enable_xformers_memory_efficient_attention()
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=1.0,
    num_inference_steps=8,
    clip_skip=3
).images[0]


image.save("generated_image.png")

stepsは8にしました。

生成された画像がこちら