使用 LangChain 开发 LLM 应用（2）：模型, 提示词以及数据解析

注:

本文是基于吴恩达《LangChain for LLM Application Development》课程的学习笔记；

完好的课程内容以及示例代码/Jupyter笔记见：LangChain-for-LLM-Application-Development；

课程大纲

此首要包括两部：

直接运用 OpenAI API 接口进行调用，直观感触下其 API 的运用方式以及存在的问题；
运用 LangChain 的 API 对上述问题进行优化，首要包括 3 个方面：
1. Prompts：提示词，希望言语模型回来的内容。也便是咱们通常了解的问题；
2. Models：模型，是指支撑大部分作业的言语模型；
3. Output parsers:回来数据解析，是指将言语模型的回来的结构化信息（JSON）进一步处理分发；

其间 LangChain 部分的首要内容如下图：

直接运用 OpenAI API 完结一个 LLM 运用的开发，会存在许多的胶水代码，而 LangChain 便是供给的一种很好的方式来处理上述的一些问题。

获取 OpenAI API Key

在开端之前请保证你现已有 OpenAI API Key 了，这个是访问 OpenAI API 接口的凭据，也是经过这个 Key 来进行计费的。

将 API Key 设置到体系的环境变量之后，就能够装置 OpenAI 供给的 Python SDK 了。经过 os.environ['OPENAI_API_KEY'] 获取体系遍历中设置的 key，让后将其设置到 SDK 中。

注：这儿是最佳实践，这种方式能够在多个项目中运用相同的 key，一起不必将 key 记录到版别控制中。当然，你也能够直接将 key 写死在代码中。

# 装置对应的依靠
%pip install python-dotenv
%pip install openai

import os
import openai
# 设置 API Key
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
openai.api_key = os.environ['OPENAI_API_KEY']

OpenAI : Chat API

下面咱们就开端看一下 OpenAI 对话的接口是怎样运用的。首要界说一个对话函数 get_completion()，然后运用这个函数开端一个简略的对话。

# 界说函数，封装 OpenAI SDK 的一些基本参数
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, 
    )
    return response.choices[0].message["content"]

# 问询问题：What is 1+1?
get_completion("What is 1+1?")

'As an AI language model, I can tell you that the answer to 1+1 is 2.'

杂乱 Prompt

上面 1+1 等于几的问题十分简略，在正式的项目开发中运用场景往往会更加杂乱。咱们以收到用户反应邮件场景为例看一下 OpenAI 的 SDK 应该怎样运用。

场景是你在一个国际化的公司，经常会收到一些不同国家用户的邮件反应，一起用户遇到问题时心情可能是比较激烈的。所以咱们的 Prompt 需求做两件事情：

将不同的语种翻译成中文，便利阅读；
将邮件中的口气进行调整，使其更加安静友好（这样简单开展作业）；

咱们编写的 Prompt 中首要会有用户的邮件内容，一起为了扩展性咱们把邮件的口气也提取出来，便利将内容依据不同的场景扩展成不同的风格。

下面是一些具体的代码细节：

# 用户的原始邮件内容
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""
# 转录邮件的口气风格
style = """Simplified Chinese, in a calm/friendly/formal/respectful tone"""
prompt = f"""Translate the text \
that is delimited by triple backticks 
into a style that is {style}.
text: ```{customer_email}```
"""
print(prompt)

日志输出为：

Translate the text that is delimited by triple backticks into a style that is Simplified Chinese, in a calm/friendly/formal/respectful tone. text: “` Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don’t cover the cost of cleaning up me kitchen. I need yer help right now, matey! `

Translate the text that is delimited by triple backticks
into a style that is Simplified Chinese, in a calm/friendly/formal/respectful tone.
text: ```
Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```

恳求接口：

response = get_completion(prompt)
print(response)

嗯，我很生气，我的搅拌机盖飞了出去，把我的厨房墙面都弄得满是果汁！更糟糕的是，保修不包括清理厨房的费用。伙计，我现在需求你的帮助！

LangChain : 对话 API

上面是运用 OpenAI 原始的 SDK 完结上述使命需求将不同信息以及不同的指令组装真一个完好的 Prompt，然后将其传递给模型。假如咱们需求的封装的信息许多，上述拼接的过程将会变得繁琐，而且会成为胶水代码。另外便是，Prompt 的构建有必要在所有的信息都预备完结之后，才能界说 Prompt 的字段。

下面咱们就看一下运用 LangChain 怎样完结上述的使命。

# 下载 langchain 依靠
# langchain 版别迭代速度很快，现在示例代码是基于 0.0.188 版别开发，其他版别可能会导致一些代码无法运转。
%pip install --upgrade langchain==0.0.188

模型：Model

这儿的模型指的便是大言语模型（LLM），能够依据输入的 Prompt 进行推理，并给出对应的成果。LangChain 封装多个 LLM 的实现， OpenAI 便是其间的一个。下面看一下怎样经过 LangChain 来访问 OpenAI 的功用，封装的模型为 ChatOpenAI ，运用前需求先导包。

# 导包 ChatOpenAI
from langchain.chat_models import ChatOpenAI
# 有许多参数（是gpt-3.5仍是gpt-4等等）能够设置，这儿以 temperature 参数为例
chat = ChatOpenAI(temperature=0.0)
print(chat)

verbose=False callbacks=None callback_manager=None client=<class 'openai.api_resources.chat_completion.ChatCompletion'> model_name='gpt-3.5-turbo' temperature=0.0 model_kwargs={} openai_api_key=None openai_api_base=None openai_organization=None openai_proxy=None request_timeout=None max_retries=6 streaming=False n=1 max_tokens=None

提示词模版：Prompt template

翻译邮件

在 OpenAI 原始接口的部分，咱们需求把所有的信息组装成一个终究的 Prompt 进行运用，在 LangChain 中对这部分也进行了封装，是其运用起来更加便利。

首要咱们界说一个 template_string Prompt，其包括两个带输入的信息：style 和 text ，与上面的信息基本共同，然后经过这个字符串能够构建出一个 PromptTemplate 类，咱们能够经过这个类能够做到一些更灵敏的事情，代码大致如下：

from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
# 界说 Prompt
template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""
# 备注：应该是视频中运用的 LangChain 版别与我本地运用的版别（0.0.188）不同，导致 API 报错
# 已运用 HumanMessagePromptTemplate 替换 ChatPromptTemplate 的部分用法；
# 原始写法如下：
# prompt_template = ChatPromptTemplate.from_messages(template_string)
human_prompt_template = HumanMessagePromptTemplate.from_template(template_string)
prompt_template = ChatPromptTemplate.from_messages([human_prompt_template])

# 打印构建出的 Prompt 的完好内容
prompt = prompt_template.messages[0].prompt
print(prompt)
# 打印构建出的 Prompt 包括需求替换的属性
print(prompt.input_variables)

input_variables=['style', 'text'] output_parser=None partial_variables={} template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n' template_format='f-string' validate_template=True
['style', 'text']

好的，下面咱们补全短少的 style 和 text 信息，并经过 ChatPromptTemplate 中的 format_messages 函数将其传入，大致代码如下：

# style
customer_style = "Simplified Chinese, in a calm/friendly/formal/respectful tone"
# text
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

# 传入对应的 style 和 text 信息
customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)
# 检查输出
print(customer_messages[0])

content="Translate the text that is delimited by triple backticks into a style that is Simplified Chinese, in a calm/friendly/formal/respectful tone. text: ```\nArrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```\n" additional_kwargs={} example=False

依据输出能够看出，上面短少的信息现已弥补完好，customer_messages 现已是一个完好的 Prompt 了。下面把这个完好的 Prompt 输入到 ChatOpenAI 中，检查其回来状况：

# Call the LLM to translate to the style of the customer message
customer_response = chat(customer_messages)
print(customer_response.content)

嗨，我很生气，我的搅拌机盖子飞了出去，把我的厨房墙面都弄得满是果汁！更糟糕的是，保修不包括清理厨房的费用。伙计，我现在需求你的帮助！

回复邮件

上述运用 LangChain 完结翻译润色客户邮件的场景，下面咱们就看一下怎样运用相同的方式来回复用户的邮件。

咱们运用中文编写对应的回复邮件，需求将其翻译成目标言语（这儿是英文）而且需求去除一些口语化的表达，下面是回复的内容以及对应的 Prompt。

# 回复内容
service_reply = """额，保修不包括你的厨房的清洁费用，
由于你运用搅拌机的时分忘记盖上盖子了，这是你操作的失误，这是你的问题。真倒运！再会！
"""
# 要求
service_style_pirate = """\
a polite tone \
that speaks in English Pirate\
"""
# 组装 prompt
service_messages = prompt_template.format_messages(
    style=service_style_pirate,
    text=service_reply)
# 打印&检查 prompt
print(service_messages[0].content)

Translate the text that is delimited by triple backticks into a style that is a polite tone that speaks in English Pirate. text: ```额，保修不包括你的厨房的清洁费用，
由于你运用搅拌机的时分忘记盖上盖子了，这是你操作的失误，这是你的问题。真倒运！再会！
```

# 生成对应的回复邮件文本
service_response = chat(service_messages)
print(service_response.content)

Arrr, me hearty! Beggin' yer pardon, but the warranty don't be coverin' the cost o' cleanin' yer galley, as ye forgot to put the lid on yer blender, which be a mistake on yer part. 'Tis yer own problem, matey. Aye, tough luck! Farewell!

prompt template 的优势

能够看出 prompt_template 是对咱们自己的 Prompt 做了一层封装，是其运用起来比较便利，在杂乱运用开发时也能够很好的复用这些逻辑。

除此之外，LangChain 中的 prompt template 还供给了一些其他的功用：

内置了多种 Prompt（如文本总结/QA/SQL/API调用等），不必咱们自己再调试最佳的 Prompt；
支撑输出解析（Output Parsers），将 LLM 的成果以结构化（如 JSON）的方式回来；
内置了思维链，使 LLM 的答复更加合理；

输出解析器：Output Parsers

下面咱们介绍下输出解析（Output Parsers）的运用场景。假如咱们现在有到一些产品的点评，想要对这些点评做一些处理然后能够更好的剖析数据。比方，咱们希望能够依据输入用户的产品点评，输出对应的 JOSN 数据，包括以下字段信息：

gift：是否是把产品当作礼物， bool 类型；
delivery_days：产品配送时刻，没有的话回来 -1 ；
price_value：提起有关价格相关的信息，回来一个列表；

咱们希望 LLM 回来下面的 JSON 格式：

{
  "gift": False,
  "delivery_days": 5,
  "price_value": "pretty affordable!"
}

{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}

下面是从京东上随机选择的一个 MacBook Pro 的点评，以及对应的 Prompt 信息。

# 用户的产品点评
customer_review = """
MacBook Pro特别棒！特别喜爱！m2处理器功用超强，便是价钱有点小贵！电池续航逆天！不发热！还带有黑科技触控栏！
现在Mac 软件还算蛮多的，常用的工作软件都能有！用来日常工作彻底没问题！
我想要点点评一下他的音频接口！这代MacBook Pro 带有先进的高阻抗耳机支撑功用！相同的耳机，
插MacBook Pro上，作用要好于iPhone！还有它的录音功用！插上一根简略的转接头后，在合作电容麦，
还有库乐队软件，录音作用逆天！真的特别棒！我有比较老版别的Mac，但这代MacBook Pro的录音作用，
真的比曾经的Mac作用要好好多倍！特别逆天！适合音乐人！（个人感觉，不插电源，录音作用好像会更好！）
"""
# Prompt 编写
review_template = """\
For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.
Format the output as JSON with the following keys:
gift
delivery_days
price_value
text: {text}
"""

下面是具体的恳求逻辑：

from langchain.prompts import ChatPromptTemplate
# 创立 ChatPromptTemplate
prompt_template = ChatPromptTemplate.from_template(review_template)
messages = prompt_template.format_messages(text=customer_review)
print(messages)
# 创立 LLM
chat = ChatOpenAI(temperature=0.0)
# 恳求
response = chat(messages)
print(response.content)

[HumanMessage(content='For the following text, extract the following information:\n\ngift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.\n\ndelivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.\n\nprice_value: Extract any sentences about the value or price,and output them as a comma separated Python list.\n\nFormat the output as JSON with the following keys:\ngift\ndelivery_days\nprice_value\n\ntext: \nMacBook Pro特别棒！特别喜爱！m2处理器功用超强，便是价钱有点小贵！电池续航逆天！不发热！还带有黑科技触控栏！\n现在Mac 软件还算蛮多的，常用的工作软件都能有！用来日常工作彻底没问题！\n我想要点点评一下他的音频接口！这代MacBook Pro 带有先进的高阻抗耳机支撑功用！相同的耳机，\n插MacBook Pro上，作用要好于iPhone！还有它的录音功用！插上一根简略的转接头后，在合作电容麦，\n还有库乐队软件，录音作用逆天！真的特别棒！我有比较老版别的Mac，但这代MacBook Pro的录音作用，\n真的比曾经的Mac作用要好好多倍！特别逆天！适合音乐人！（个人感觉，不插电源，录音作用好像会更好！）\n\n', additional_kwargs={}, example=False)]
{
    "gift": false,
    "delivery_days": -1,
    "price_value": ["便是价钱有点小贵！"]
}

能够看出 response 回来成果基本和咱们预期的共同，可是这个时分咱们并不能直接操作这个 JOSN，现在他还仅仅一个字符串，比方下面的代码会报错。

# 下面的代码会报错，由于 content 并不是字典，而是一个字符串
response.content.get('gift')

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[58], line 5
      1 # You will get an error by running this line of code 
      2 # because'gift' is not a dictionary
      3 # 'gift' is a bool
      4 response.content
----> 5 response.content.get('gift')
AttributeError: 'str' object has no attribute 'get'

Parse the LLM output string into a Python dictionary

处理上面的报错，就需求用到 LangChain 的数据解析功用了。首要咱们需求导入 ResponseSchema 和 StructuredOutputParser，代码如下：

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser
# 对每个希望回来的内容做一个具体的描绘，保证 LLM 知道你的目的
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased as a gift for someone else? Answer True if yes,False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days did it take for the product to arrive? If this information is not found, output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any sentences about the value or price, and output them as a comma separated Python list.")
# 以字典的方式构建 schema
response_schemas = [gift_schema, 
                    delivery_days_schema,
                    price_value_schema]
# 创立 OutputParser
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
# 生成对应的指令，这儿会做为 Prompt 内容的一部分
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json
{
	"gift": string  // Was the item purchased as a gift for someone else? Answer True if yes,False if not or unknown.
	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found, output -1.
	"price_value": string  // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```

下面便是将 format_instructions 内容与 Prompt 整合在一起了，而且生成终究的 Prompt 信息。

review_template_2 = """\
For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.
text: {text}
{format_instructions}
"""
human_prompt_template = HumanMessagePromptTemplate.from_template(review_template_2)
prompt = ChatPromptTemplate.from_messages([human_prompt_template])
messages = prompt.format_messages(text=customer_review, format_instructions=format_instructions)
print(messages[0].content)

For the following text, extract the following information:
gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.
price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.
text: 
MacBook Pro特别棒！特别喜爱！m2处理器功用超强，便是价钱有点小贵！电池续航逆天！不发热！还带有黑科技触控栏！
现在Mac 软件还算蛮多的，常用的工作软件都能有！用来日常工作彻底没问题！
我想要点点评一下他的音频接口！这代MacBook Pro 带有先进的高阻抗耳机支撑功用！相同的耳机，
插MacBook Pro上，作用要好于iPhone！还有它的录音功用！插上一根简略的转接头后，在合作电容麦，
还有库乐队软件，录音作用逆天！真的特别棒！我有比较老版别的Mac，但这代MacBook Pro的录音作用，
真的比曾经的Mac作用要好好多倍！特别逆天！适合音乐人！（个人感觉，不插电源，录音作用好像会更好！）
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
```json
{
	"gift": string  // Was the item purchased as a gift for someone else? Answer True if yes,False if not or unknown.
	"delivery_days": string  // How many days did it take for the product to arrive? If this information is not found, output -1.
	"price_value": string  // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
```

咱们恳求 LLM 并检查其回来的信息，

# 恳求 LLM
response = chat(messages)
# 将成果解析成对应的字典
output_dict = output_parser.parse(response.content)
print(output_dict.get('price_value'))

['便是价钱有点小贵！']

小结

本节课程带你体会了运用 OpenAI SDK 进行对话的简略，在这个过程中会存在一些胶水代码。这个问题在小项目中并不是什么问题，可是在大型项目中却是不能忽视的工程问题，为了处理上述问题咱们引入了 LangChain 这个库，他会把整个对话的流程封装起来，依据简单扩展：

在 Prompt 输入方面，除了运用灵敏之外还供给了各种场景的最佳 Prompt 写法；
在 Model 方面，供给了一些默许的 OpenAI 恳求参数，一起假如后期替换成其他的 LLM 模型也会比较便利；
在 LLM 数据处理便利，能够运用 Output Parsers 很好的将成果处理成字典格式，便利后续的进一步操作；

好的，下一节咱们将讲解 LangChain 的回忆（memory）能力。

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。