从LangChain运用Agent的懵懵懂懂,到AutoGen的”放肆”Agents,这周的学习和写作实在太幸福了。假设您是第一次触摸AugoGen,请先要检查AutoGen官方文档,或翻阅AutoGen系列的前几篇。

逐渐把握最佳Ai Agents结构-AutoGen 一

逐渐把握最佳Ai Agents结构-AutoGen 二

逐渐把握最佳Ai Agents结构-AutoGen 三 写新闻稿新方法

[逐渐把握最佳Ai Agents结构-AutoGen 四 多署理群聊实例]

逐渐把握最佳Ai Agents结构-AutoGen 五 与LangChain手拉手 – (juejin.cn)

前语

   在上篇文章中, 咱们结合了LangChain和AutoGen新旧两个AI结构,打造了一个AI写手。LangChain负责专家知识库功用,AutoGen基于这个AI知识库做写作、发邮件等。

  这个项目与用户的交互是基于文字的, 本篇文章将优化产品体验。咱们将运用AutoGen + LangChain + ChromaDB + PlayHT 一起来打造一款能够语音交互的AI署理。

需求

  咱们首要在上篇文章的项目中加入语音功用。prompt 需求如下:

user_proxy.initiate_chat( assistant, message=""" I'm writing a blog to introduce the version 3 of Uniswap protocol. Find the answers to the 3 questions below and write an introduction based on them and speak it out loudly.
1. What is Uniswap? 2. What are the main changes in Uniswap version 3? 3. How to use Uniswap? Start the work now.

  现在的prompt多加了and speak it out loudly,咱们有没有思路?先让咱们来看看PlayHT

PlayHT

  让署理生成的文字,转换成语音,给咱们介绍PlayHT,十分的好用。

  在注册、登录后呢,咱们能够拿到它的api-key。免费额度,用于测验和学习够了,如果有商业化需要,能够检查付费计划。

  咱们会装置它的python库pyht,用到它的User-ID、Secret Key,

逐渐把握最佳Ai Agents结构-AutoGen 六 语音AI署理

PlayHT 的Demo, 代码来自pyht/demo/main.py at master playht/pyht (github.com)

from typing import Generator, Iterable
import time
import threading
import os
import re
import numpy as np
import simpleaudio as sa
from pyht.client import Client, TTSOptions
from pyht.protos import api_pb2
def play_audio(data: Generator[bytes, None, None] | Iterable[bytes]):
    buff_size = 10485760
    ptr = 0
    start_time = time.time()
    buffer = np.empty(buff_size, np.float16)
    audio = None
    for i, chunk in enumerate(data):
        if i == 0:
            start_time = time.time()
            continue  # Drop the first response, we don't want a header.
        elif i == 1:
            print("First audio byte received in:", time.time() - start_time)
        for sample in np.frombuffer(chunk, np.float16):
            buffer[ptr] = sample
            ptr += 1
        if i == 5:
            # Give a 4 sample worth of breathing room before starting
            # playback
            audio = sa.play_buffer(buffer, 1, 2, 24000)
    approx_run_time = ptr / 24_000
    time.sleep(max(approx_run_time - time.time() + start_time, 0))
    if audio is not None:
        audio.stop()
def convert_text_to_audio(
    text: str
):
    text_partitions = re.split(r'[,.]', text)
    # Setup the client
    client = Client(os.environ['PLAY_HT_USER_ID'], os.environ['PLAY_HT_API_KEY'])
    # Set the speech options
    voice = "s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json"
    options = TTSOptions(voice=voice, format=api_pb2.FORMAT_WAV, quality="faster")
    # Get the streams
    in_stream, out_stream = client.get_stream_pair(options)
    # Start a player thread.
    audio_thread = threading.Thread(None, play_audio, args=(out_stream,))
    audio_thread.start()
    # Send some text, play some audio.
    for t in text_partitions:
        in_stream(t)
    in_stream.done()
    # cleanup
    audio_thread.join()
    out_stream.close()
    # Cleanup.
    client.close()
    return 0

  咱们要点会运用的是convert_text_to_audio,代码直接来自PlayHT官方,拿来就用,不做过多解说。

AutoGen思路

  AutoGen怎么在原有代码里边,集成PlayHT呢?这让我想到了最近几回都在用的agents with function calls。 当proxy 把使命交给assistant时,assistant经过语议剖析,convert_text_to_audio能够完成speak it out loudly, 告知proxy 执行这个函数。要实现这个功用,咱们需要在原有的functions装备中加一个函数。代码如下:

llm_config={
    "request_timeout": 600,
    "seed": 42,
    "config_list": config_list,
    "temperature": 0, 
    "functions": [ 
        { "name": "answer_uniswap_question", 
                "description": "Answer any Uniswap related questions", 
                "parameters": { 
                    "type": "object", 
                    "properties": { 
                        "question": 
                        { "type": "string", 
                            "description": "The question to ask in relation to Uniswap protocol", } }, "required": ["question"], }, 
        },
         {
                    "name": "convert_text_to_audio",
                    "description": "Convert text to audio and speak it out loud",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "text": {
                                "type": "string",
                                "description": "The text to be converted and spoken out loud",
                            }
                        },
                        "required": ["text"],
                    },
                }
    ],
}

  上面的代码,咱们就在llm_config中多添加了一个convert_text_to_audio的函数声明。接着,咱们在proxy agent的实例化代码里加上这个函数。

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    code_execution_config={"work_dir": "."},
    llm_config=llm_config,
    system_message="""Reply TERMINATE if the task has been solved at full satisfaction.
Otherwise, reply CONTINUE, or the reason why the task is not solved yet.""",
    function_map={
        "answer_uniswap_question": answer_uniswap_question,
        "convert_text_to_audio": convert_text_to_audio
    }
)

   user_proxy 相比上一篇文章里,在function_map中添加了convert_text_to_audio声明。

  当咱们再执行user_proxy.initiate_chat上面的代码时,就会在之前项目反应的项目之后, 再加上语音语出来。

逐渐把握最佳Ai Agents结构-AutoGen 六 语音AI署理

总结

   本文之所以独立成文,一是上篇文章内容比较多,二是以加餐的方式,让咱们再回味一下LangChain和AutoGen协作的精彩。咱们首要为AI助理添加了语音功用。今日的收成首要有以下:

  • AI服务类产品的运用, 比如数字人、语音生成等,下次开个文章聚集下好用的AI服务
  • agents with function calls 是咱们增加或操控AutoGen 解决问题才能的法宝,也是AutoGen chat协作简略架构模式与其它功用或结构交流的接口

  感谢咱们的阅览, 如果您也在学习AutoGen/LangChan, 请在评论区多多交流。 闭关一年,学习AI,加油….

参考资料