最近都在研究和学习stable diffusion和langchain的相关常识,而且看到stable diffusion也是有类似于ChatGLM的api调用方式,那在想有没有可能将stable diffusion也集成到langchain中来呢?看到网上材料比较多的是能够借助chatgpt来辅佐stable diffusion提示词的生成,本文就根据此思路来尝试运用LLM+LangChain+stable diffusion完成一句话主动生成图片的功用。

步骤

扩大提示词

运用OpenAI来生成提示词

参照“AI协同打工,ChatGPT生成提示词+AI作图”文中的方式生成stable diffusion的提示词

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chains import LLMChain
_template = """
以下提示用于指导Al绘画模型创立图画。它们包含人物外观、布景、色彩和光影作用,以及图画的主题和风格等各种细节。这些提示的格局通常包含带权重的数字括号,用于指定某些细节的重要性或着重。例如,"(masterpiece:1.4)"表明著作的质量非常重要。以下是一些示例:
1. (8k, RAW photo, best quality, masterpiece:1.2),(realistic, photo-realistic:1.37), ultra-detailed, 1girl, cute, solo, beautiful detailed sky, detailed cafe, night, sitting, dating, (nose blush), (smile:1.1),(closed mouth), medium breasts, beautiful detailed eyes, (collared shirt:1.1), bowtie, pleated skirt, (short hair:1.2), floating hair, ((masterpiece)), ((best quality)),
2. (masterpiece, finely detailed beautiful eyes: 1.2), ultra-detailed, illustration, 1 girl, blue hair black hair, japanese clothes, cherry blossoms, tori, street full of cherry blossoms, detailed background, realistic, volumetric light, sunbeam, light rays, sky, cloud,
3. highres, highest quallity, illustration, cinematic light, ultra detailed, detailed face, (detailed eyes, best quality, hyper detailed, masterpiece, (detailed face), blue hairlwhite hair, purple eyes, highest details, luminous eyes, medium breats, black halo, white clothes, backlighting, (midriff:1.4), light rays, (high contrast), (colorful)
"""
llm = OpenAI(temperature=0)
prompt = PromptTemplate(
    input_variables=["desc"],
    template=_template3,
)
chain = LLMChain(prompt=prompt,llm=llm)
res = chain.run("湖人总冠军")
print(res)
  • 生成的提示词如下:

(masterpiece:1.4), ultra-detailed, 1man, strong, solo, detailed basketball court, detailed stadium, night, standing, celebrating, (fist pump), (smile:1.1), (closed mouth), muscular body, beautiful detailed eyes, (jersey:1.1), shorts, (short hair:1.2), floating hair, (trophy:1.3), (confetti:1.2), (fireworks:1.2), (crowd cheering:1.2), (high contrast), (colorful)

将提示词直接输入到stable diffusion webui中得到成果如下:

stable diffusion+LangChain+LLM自动生成图片

格局化输出

为了保证输出的成果能够便利解析,能够再加入一些引导,终究的提示词如下:

_template = """
以下提示用于指导Al绘画模型创立图画。它们包含人物外观、布景、色彩和光影作用,以及图画的主题和风格等各种细节。这些提示的格局通常包含带权重的数字括号,用于指定某些细节的重要性或着重。例如,"(masterpiece:1.4)"表明著作的质量非常重要。以下是一些示例:
1. (8k, RAW photo, best quality, masterpiece:1.2),(realistic, photo-realistic:1.37), ultra-detailed, 1girl, cute, solo, beautiful detailed sky, detailed cafe, night, sitting, dating, (nose blush), (smile:1.1),(closed mouth), medium breasts, beautiful detailed eyes, (collared shirt:1.1), bowtie, pleated skirt, (short hair:1.2), floating hair, ((masterpiece)), ((best quality)),
2. (masterpiece, finely detailed beautiful eyes: 1.2), ultra-detailed, illustration, 1 girl, blue hair black hair, japanese clothes, cherry blossoms, tori, street full of cherry blossoms, detailed background, realistic, volumetric light, sunbeam, light rays, sky, cloud,
3. highres, highest quallity, illustration, cinematic light, ultra detailed, detailed face, (detailed eyes, best quality, hyper detailed, masterpiece, (detailed face), blue hairlwhite hair, purple eyes, highest details, luminous eyes, medium breats, black halo, white clothes, backlighting, (midriff:1.4), light rays, (high contrast), (colorful)
仿照之前的提示,写一段描绘如下要素的提示:
{desc}
你应该仅以 JSON 格局呼应,如下所述:
回来格局如下:
{{
  "question":"$YOUR_QUESTION_HERE",
  "answer": "$YOUR_ANSWER_HERE"
}}
保证呼应能够被 Python json.loads 解析。
"""

终究生成的成果如下:

{
  "question":"湖人总冠军",
  "answer": "(masterpiece:1.4), ultra-detailed, 1man, strong, solo, detailed basketball court, detailed stadium, night, standing, celebrating, (fist pump), (smile:1.1), (closed mouth), muscular body, beautiful detailed eyes, (jersey:1.1), shorts, (short hair:1.2), floating hair, (trophy:1.3), (confetti:1.2), (fireworks:1.2), (crowd cheering:1.2), (high contrast), (colorful)"
}

这样咱们就能够比较便利的解析数据了

# 解析json
import json
result = json.loads(res)
print("result:",result)
result["answer"]

运用ChatGLM来生成提示词

llm = ChatGLM(temperature=0.1,history=prompt_history)
prompt = PromptTemplate(
    input_variables=["desc"],
    template=_template,
)
chain = LLMChain(prompt=prompt,llm=llm)

ChatGLM根据ChatGLM 集成进LangChain东西的封装

终究生成的作用不是很好,这里就不展现了。首要问题包含:1.没有依照指令生成json格局;2.生成的描绘很多都是中文方式的。

MagicPrompt主动续写SD提示词

from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline
text_refine_tokenizer = AutoTokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
text_refine_model = AutoModelForCausalLM.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
text_refine_gpt2_pipe = pipeline("text-generation", model=text_refine_model, tokenizer=text_refine_tokenizer, device="cpu")
text = "湖人总冠军"
refined_text = text_refine_gpt2_pipe(text)[0]["generated_text"]
print(refined_text)

输出如下:

湖人总冠军 港子 Imoko Ikeda, Minaba hideo, Yoshitaka Amano, Ruan Jia, Kentaro Miura, Artgerm, post processed, concept

纯英文输入,终究的输出如下:

lakers championship winner trending on artstation, painted by greg rutkowski

可见MagicPrompt对于中文输入不是很友好,假如想运用的话,需求将输入先翻译成英文。

调用stable diffusion的api生成图片

参考:Mikubill/sd-webui-controlnet。首要代码如下:

import cv2
import requests
import base64
import re
ENDPOINT = "http://localhost:7860"
def do_webui_request(url, **kwargs):
    reqbody = {
        "prompt": "best quality, extremely detailed",
        "negative_prompt": "longbody, lowres, bad anatomy, bad hands, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality",
        "seed": -1,
        "subseed": -1,
        "subseed_strength": 0,
        "batch_size": 1,
        "n_iter": 1,
        "steps": 15,
        "cfg_scale": 7,
        "width": 512,
        "height": 768,
        "restore_faces": True,
        "eta": 0,
        "sampler_index": "Euler a",
        "controlnet_input_images": [],
        "controlnet_module": 'canny',
        "controlnet_model": 'control_canny-fp16 [e3fe7712]',
        "controlnet_guidance": 1.0,
    }
    reqbody.update(kwargs)
    print("reqbody:",reqbody)
    r = requests.post(url, json=reqbody)
    return r.json()
  • 调用api
import io
from PIL import Image
prompt = "a cute cat"
resp = do_webui_request(
    url=ENDPOINT + "/sdapi/v1/txt2img",
    prompt=prompt,
)
image = Image.open(io.BytesIO(base64.b64decode(resp["images"][0])))
display(image)

假如需求运用api功用,stable diffusion 需求敞开api功用,启动时需求加上--api

结合stable diffusion+LangChain+LLM主动生成图片

stable diffusion+LangChain+OpenAI

  • 封装完成
import io, base64
import uuid
from PIL import Image
class RefinePrompt:
    llm = OpenAI(temperature=0)
    prompt = PromptTemplate(
        input_variables=["desc"],
        template=_template,
    )
    chain = LLMChain(prompt=prompt,llm=llm)
    def run(self,text):
        res = self.chain.run(text)
        # 解析json
        result = json.loads(res)
        return result["answer"]
class T2I:
    def __init__(self):
        self.text_refine = RefinePrompt()
    def inference(self, text):
        image_filename = os.path.join('output/image', str(uuid.uuid4())[0:8] + ".png")
        refined_text = self.text_refine.run(text)
        print(f'{text} refined to {refined_text}')
        resp = do_webui_request(
            url=ENDPOINT + "/sdapi/v1/txt2img",
            prompt=refined_text,
        )
        image = Image.open(io.BytesIO(base64.b64decode(resp["images"][0])))
        image.save(image_filename)
        print(f"Processed T2I.run, text: {text}, image_filename: {image_filename}")
        return image_filename,image
  • 运用封装的类,而且展现图片(在python的notebook中展现)
t2i = T2I()
image_filename,image = t2i.inference("湖人总冠军")
print("filename:",image_filename)
display(image)

stable diffusion+LangChain+LLM自动生成图片

stable diffusion+MagicPrompt

  • 封装完成
from transformers import AutoModelForCausalLM, AutoTokenizer, CLIPSegProcessor, CLIPSegForImageSegmentation
from transformers import pipeline, BlipProcessor, BlipForConditionalGeneration, BlipForQuestionAnswering
import io, base64
import uuid
from PIL import Image
class T2I:
    def __init__(self, device):
        print("Initializing T2I to %s" % device)
        self.device = device
        self.text_refine_tokenizer = AutoTokenizer.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
        self.text_refine_model = AutoModelForCausalLM.from_pretrained("Gustavosta/MagicPrompt-Stable-Diffusion")
        self.text_refine_gpt2_pipe = pipeline("text-generation", model=self.text_refine_model, tokenizer=self.text_refine_tokenizer, device=self.device)
    def inference(self, text,image_path=None):
        image_filename = os.path.join('output/image', str(uuid.uuid4())[0:8] + ".png")
        refined_text = self.text_refine_gpt2_pipe(text)[0]["generated_text"]
        print(f'{text} refined to {refined_text}')
        resp = do_webui_request(
            url=ENDPOINT + "/sdapi/v1/txt2img",
            prompt=refined_text,
            controlnet_input_images=[readImage(image_path) if image_path else None], 
        )
        image = Image.open(io.BytesIO(base64.b64decode(resp["images"][0])))
        image.save(image_filename)
        print(f"Processed T2I.run, text: {text}, image_filename: {image_filename}")
        return image_filename,image
  • 运用封装的类,而且展现图片(在python的notebook中展现)
t2i = T2I("cpu")
image_filename,image = t2i.inference("lakers championship")
print("filename:",image_filename)
display(image)

stable diffusion+LangChain+LLM自动生成图片

总结

本文运用了stable diffusion+LangChain+LLM来完成一句话主动生成图片的功用,虽然终究的作用还不是很满足,可是能够看出来方案可行的。假如还需求优化作用的话,能够尝试:1.针对特不同模型需求输入该模型的更多的示例来辅佐和优化终究模型的生成;2.尝试结合controlnet来更好的操控终究图片的生成。

ps:在学习和参考Mikubill/sd-webui-controlnet的代码时,发现了其中有一个仿照“Visual ChatGPT”的示例代码,还挺有意思的,接下来也会进一步分析其完成,敬请期待。

参考

Mikubill/sd-webui-controlnet

stable diffusion webui的controlnet插件,里边有调用api生成图片的示例。

AI协同打工,ChatGPT生成提示词+AI作图