• 作者简介:大家好,我是Zeeland,全栈范畴优质创作者。
  • 主页:Zeeland
  • 我的博客:Zeeland
  • Github主页: Undertone0809 (Zeeland) (github.com)
  • 支撑我:点赞+收藏⭐️+留言
  • 介绍:The mixture of software dev+Iot+ml+anything

本文节选自笔者博客: www.blog.zeeland.cn/archives/10…

简介

谷歌机器人联合Everyday机器人研制的新的言语处理模型SayCan,该模型能够更好地舆解言语指令并给出答复,而且能结合当时物理环境评估每个答复的真实完结可能性,然后让机器人更好地协助用户完结使命。SayCan模型还能够提取大型言语模型中的成果,进行以言语为条件的价值函数的学习和练习,并选用强化学习方法。试验标明,SayCan模型的规划成功率为84%,履行成功率为74%,比其他模型更好地将言语使命转化为机器人行为。

自谷歌提出SayCan结构以来,大言语模型赋能机器人对杂乱使命的指令解释成为抢手研讨。人关于机器人的需求不再局限于我说你做,而是期望进一步对潜在需求发掘的的语义理解和使命履行。以ChatGPT为代表的技能呈现推动了这一技能的演进,而RoboSDK和ROS针对机器人制定的面向云、机器人目标的统一API能够满意对异构设备的数据采集和使命履行。

本文尝试构建一种思路,以prompt technique来完结Robo特定的功用,如经过用户输入来构成特定的信息,构建特定指令等。下面,本文将从技能布景、规划思路上描绘怎么构建一个能够用ChatGPT控制的机器人。

功用结构

一个能够用LLM来控制的机器人需求有什么样的功用?

  1. 以复合机器人在室内或室外的导航和抓取使命为测验场景,依据给出的指令(例如:你去前面那辆比亚迪看一下,是否有落下什么重要的东西。),结合LLM(如chatgpt),输出机器人的感知、规划与控制等算法。
  2. 完结该过程在仿真环境中的闭环验证。
  3. 能够部署在实地的运用场景中,经过ROS或许RoboSDK去开发依据LLM的机器人结构。

当然,这样的机器人能够有许多的可能性,能够经过LLM来控制机器人,那么首要这些机器人就需求敞开对应的控制权限,如智能车控制权限与对应API,LLM经过调用这些API然后达到控制机器人的意图,那么,怎么运用LLM控制机器人来使其履行特定的指令呢?

一个简略的思路完结

关于上面的问题,一个简略的流程是,LLM在接纳到来自用户发送的需求指令之后,LLM需求在内部拆解命令,下面是一个简略的示例。

用户发送“请你到厨房帮我拿我的水杯过来”这个指令,LLM来接纳到这个指令之后,内部应该先生成一系列的方案,如“我应该先找到厨房的方位”;“然后我需求移动到厨房的方位”;“我需求在厨房中找到水杯的方位”;“我需求移动到水杯周围,用我的机械臂来夹取水杯”;“我需求回来本来出发的地方,并把水杯交给用户。”

关于每一个使命,LLM应该对其构建一个使命树,一个使命树的架构可能如下所示,一个使命在构建出来时分又可能会产生子使命,又会构建出一个子树。经过这种机制,LLM会在终究使命成功之后把成果回来。

ChatGPT新突破:打造自己的智能机器人控制系统

关于第一个使命,“我应该先找到厨房的方位”,LLM需求调用传感器的数据,如激光雷达、深度相机等,获取到厨房的方位,然后运用控制相关的权限,让机器人移动到厨房的方位。

怎么让机器人进行自主导航?

一个简略的思路完结能够参阅Microsoft的PromptCraft-Robotics,PromptCraft-Robotics资料库是为人们供给一个社区,以在机器人范畴测验和同享大型言语模型(LLMs)风趣的提示示例。此外,PromptCraft-Robotics还供给了一个示例机器人模拟器(依据Microsoft AirSim),与ChatGPT集成,让用户能够开始运用。

在这个仓库中,他们将ChatGPT的功用扩展到了机器人上,并运用言语直观地控制多平台,如机器臂、无人机和家庭助手机器人。

提示LLMs是一门高度经验主义的科学。经过试错,咱们建立了一套写机器人使命提示的方法论和规划准则:

首要,咱们定义一组高级机器人 API 或函数库。这个库能够针对特定的机器人,而且应该映射到来自机器人控制堆栈或感知库的现有低级完结。关于高级 API,运用描绘性名称非常重要,这样 ChatGPT 就能够推断它们的行为;

接下来,咱们为 ChatGPT 编写文本提示,描绘使命目标,一同明确指出高级库中可用的函数。提示还能够包含有关使命约束的信息,或许 ChatGPT 应怎么构成其答案(运用特定的编程言语,运用辅佐解析元素);

用户会一直留在循环中,以评估 ChatGPT 的代码输出,能够直接检查或运用模拟器。假如需求,用户将运用自然言语向 ChatGPT 供给反馈,以评估答案的质量和安全性。当用户对解决方案满意时,终究代码能够部署到机器人上。

ChatGPT新突破:打造自己的智能机器人控制系统
ChatGPT新突破:打造自己的智能机器人控制系统

从下面一个简略的prompt,咱们能够了解到其详细是怎么工作的,下面是一个运用仿真进行LLM控制机器人进行室内导航的prompt。


Imagine I am a robot equipped with a camera and a depth sensor. I am trying to perform a task, and you should help me by sending me commands. You are only allowed to give me the following commands:

  • turn(angle): turn the robot by a given number of degrees
  • move(distance): moves the robot straight forward by a given distance in meters.

On each step, I will provide you with the objects in the scene as a list of <object name, distance, angle in degrees>. You should reply with only one command at a time.
The distance is in meters, and the direction angle in degrees with respect to the robot’s orientation. Negative angles are to the left and positive angles are to the right. If a command is not valid, I will ignore it and ask you for another command. If there is no relevant information in the scene, use the available commands to explore the environment.

ChatGPT新突破:打造自己的智能机器人控制系统

Task: go to the chairs
Objects in the scene: <door, 0.53 m, 22>, <chair, 4.84 m, -21>, <chair, 2.12 m, -24>, <window, 6.74 m, -3>, <window, 7.17 m, 6>, <door, 0.61 m, 18>, <stairs, 1.42 m, 0>
Command:


move(1.5) # Move forward by 1.5 meters to explore the environment.
ChatGPT新突破:打造自己的智能机器人控制系统

Objects in the scene: <seating, 5.05 m, 21>, <chair, 3.93 m, -25>, <window, 4.55 m, 22>, <window, 5.24 m, -4>, <stairs, 3.49 m, 13>, <window, 5.67 m, 7>, <window, 4.37 m, 23>
Command:


turn(-25) # Turn to the left to face the chair that is 3.93 meters away. 

Objects in the scene: <chair, 4.85 m, -14>, <chair, 4.76 m, -9>, <chair, 1.57 m, -23>, <chair, 2.66 m, -24>, <window, 4.96 m, 10>, <window, 4.19 m, 25>
Command:


turn(-10) # Turn to the right to explore the environment further. 

Objects in the scene: <door, 5.07 m, -25>, <chair, 5.36 m, -6>, <table, 2.17 m, -24>, <chair, 5.01 m, -1>, <chair, 1.80 m, -16>, <chair, 3.15 m, -16>, <window, 4.30 m, 19>
Command:


turn(-6) # Turn to the right to face the chair that is 5.36 meters away. 

Objects in the scene: <counter, 1.24 m, -24>, <door, 5.93 m, -17>, <chair, 5.48 m, 1>, <table, 2.60 m, -14>, <chair, 4.92 m, 6>, <chair, 2.02 m, -7>, <chair, 3.53 m, -7>, <chair, 3.16 m, -18>
Command:


turn(1) # Turn to the left to explore the environment further. 
ChatGPT新突破:打造自己的智能机器人控制系统

Objects in the scene: <door, 5.07 m, -25>, <chair, 5.36 m, -6>, <table, 2.17 m, -24>, <chair, 5.01 m, -1>, <chair, 1.80 m, -16>, <chair, 3.15 m, -16>, <window, 4.30 m, 19>
Command:


move(5.5) # Move forward towards the chair that is 5.36 meters away. 

Objects in the scene: <chair, 0.96 m, -10>, <chair, 0.99 m, 20>, <chair, 1.77 m, -20>
Command:


move(1) # Move forward to approach the chairs.

Full Navigation


这儿的演示应该比较浅显了,我就不直接翻译了,大约的意思便是给定LLM人物预设,让其进行特定格局的输入和输出,而且每履行一步,传感器监测到的数据就会传给LLM,其依据传感器回来的当时位姿信息进行进一步调整,规划下一步应该做什么,经过这种机制,LLM控制机器人移动到了椅子周围的方位。

当然这仅仅一个很简略的例子,里面还有许多杂乱问题丞待解决:

  • 假如运用GPT,回来的成果的时效性无法保证,怎么解决GPT指令回来时间的不确定性问题
  • 假如指令更新时间过久,两个指令之间的中间过程怎么处理
  • 传感器部分失效之后的判别机制怎么处理
  • ……

总的来说,这儿仅仅一个简略地demo演示,假如想要看更多相关的prompt demo,能够检查github.com/microsoft/P…

RoboAgent以及配套RoboToolKit的构建

上面仅仅一个简略的demo,关于杂乱使命的处理,需求愈加杂乱的Prompt,而不是简略的进行室内二维导航,正如最最初介绍的使命树,一个杂乱的使命需求LLM构建一套杂乱的结构去谨慎地履行各个方面的指令,通常来说,能够履行杂乱使命的LLM咱们叫做Agent。关于这方面的工作,咱们不需求从零开始构建一个这种Agent结构,在LLM蓬勃发展的过程中,咱们已经做了一些工作出来。

在项目初期,咱们能够选用LangChain的才能,用agent + tool的思路来构建一个具有使命规划、使命分析、指令生成、使命履行一体的RoboAgent。下面,为了更好地介绍RoboAgent怎么能够具有处理杂乱Robo使命的才能,我将会介绍一下LangChain、agent、tool、ReAct等相关的Prompt technique概念。

技能布景

LangChain

假如你想构建一些杂乱的LLM运用,我强烈建议你运用LangChain(虽然有一点私心,本人也是LangChain的开发者之一),LangChain是一个强壮的结构,旨在协助开发人员运用言语模型构建端到端的运用程序。它供给了一套东西、组件和接口,可简化创立由大型言语模型 (LLM) 和聊天模型供给支撑的运用程序的过程。LangChain 能够轻松管理与言语模型的交互,将多个组件链接在一同,并集成额外的资源,例如 API 和数据库。

ReAct

paper: arxiv.org/pdf/2210.03…

ReAct是Reasoning和Acting的缩写。这个结构的根本思路是给一个Prompt,这个Prompt将Question拆解成几个步骤。分别是:

  • Tought: 面临这个 Question 我下一步应该做什么。
  • Action:履行某个动作。在 ReAct 里有三种动作,第一个是 Search[entity] 假如存在对应实体的维基页面,则回来前5句话,否则运用维基百科查找引擎查找前5个类似的实体;第二个是 Look[sring] 它将回来包含该字符串的页面中的下一句话,模拟浏览器上的 Ctrl+F 功用。第三个是 Finish[answer] 它将运用答案完结当时使命。
  • Observation:调查到的外部东西给到的成果,将作为新的提示输入给 ChatGPT。

下面是一些ReAct的Prompt demo:

Question: What is the elevation range for the area that the eastern sector of the
Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector
of the Colorado orogeny extends into, then find the elevation range of the
area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building (an orogeny) in
Colorado and surrounding areas.
Thought 2: It does not mention the eastern sector. So I need to look up eastern
sector.
Action 2: Lookup[eastern sector]
Observation 2: (Result 1 / 1) The eastern sector extends into the High Plains and is called
the Central Plains orogeny.
Thought 3: The eastern sector of Colorado orogeny extends into the High Plains. So I
need to search High Plains and find its elevation range.
Action 3: Search[High Plains]
Observation 3: High Plains refers to one of two distinct land regions
Thought 4: I need to instead search High Plains (United States).
Action 4: Search[High Plains (United States)]
Observation 4: The High Plains are a subregion of the Great Plains. From east to west, the
High Plains rise in elevation from around 1,800 to 7,000 ft (550 to 2,130
m).[3]
Thought 5: High Plains rise in elevation from around 1,800 to 7,000 ft, so the answer
is 1,800 to 7,000 ft.
Action 5: Finish[1,800 to 7,000 ft]
Question: Musician and satirist Allie Goertz wrote a song about the "The Simpsons"
character Milhouse, who Matt Groening named after who?
Thought 1: The question simplifies to "The Simpsons" character Milhouse is named after
who. I only need to search Milhouse and find who it is named after.
Action 1: Search[Milhouse]
Observation 1: Milhouse Mussolini Van Houten is a recurring character in the Fox animated
television series The Simpsons voiced by Pamela Hayden and created by Matt
Groening.
Thought 2: The paragraph does not tell who Milhouse is named after, maybe I can look up
"named after".
Action 2: Lookup[named after]
Observation 2: (Result 1 / 1) Milhouse was named after U.S. president Richard Nixon, whose
middle name was Milhous.
Thought 3: Milhouse was named after U.S. president Richard Nixon, so the answer is
Richard Nixon.
Action 3: Finish[Richard Nixon]
Question: Which documentary is about Finnish rock groups, Adam Clayton Powell or The
Saimaa Gesture?
Thought 1: I need to search Adam Clayton Powell and The Saimaa Gesture, and find which
documentary is about Finnish rock groups.
Action 1: Search[Adam Clayton Powell]
Observation 1 Could not find [Adam Clayton Powell]. Similar: [’Adam Clayton Powell
III’, ’Seventh Avenue (Manhattan)’, ’Adam Clayton Powell Jr. State Office
Building’, ’Isabel Washington Powell’, ’Adam Powell’, ’Adam Clayton Powell
(film)’, ’Giancarlo Esposito’].
Thought 2: To find the documentary, I can search Adam Clayton Powell (film).
Action 2: Search[Adam Clayton Powell (film)]
Observation 2: Adam Clayton Powell is a 1989 American documentary film directed by
Richard Kilberg. The film is about the rise and fall of influential
African-American politician Adam Clayton Powell Jr.[3][4] It was later aired
as part of the PBS series The American Experience.
Thought 3: Adam Clayton Powell (film) is a documentary about an African-American
politician, not Finnish rock groups. So the documentary about Finnish rock
groups must instead be The Saimaa Gesture.
Action 3: Finish[The Saimaa Gesture]
Question: What profession does Nicholas Ray and Elia Kazan have in common?
Thought 1: I need to search Nicholas Ray and Elia Kazan, find their professions, then
find the profession they have in common.
Action 1: Search[Nicholas Ray]
Observation 1: Nicholas Ray (born Raymond Nicholas Kienzle Jr., August 7, 1911 - June 16,
1979) was an American film director, screenwriter, and actor best known for
the 1955 film Rebel Without a Cause.
Thought 2: Professions of Nicholas Ray are director, screenwriter, and actor. I need
to search Elia Kazan next and find his professions.
Action 2: Search[Elia Kazan]
Observation 2: Elia Kazan was an American film and theatre director, producer, screenwriter
and actor.
Thought 3: Professions of Elia Kazan are director, producer, screenwriter, and actor.
So profession Nicholas Ray and Elia Kazan have in common is director,
screenwriter, and actor.
Action 3: Finish[director, screenwriter, actor]

其间,前面部分归于少样本学习中的示例,意图是教会 ChatGPT 怎么运用东西。终究一行的 Question 便是咱们要问 ChatGPT 的问题。关于这个问题,ChatGPT 将运用外部东西答复。详细的成果如下:

Question: Author David Chanoff has collaborated with a U.S. Navy admiral who served as the ambassador to the United Kingdom under which President?Thought 1: I need to search David Chanoff and the U.S. Navy admiral, find the ambassador to the United Kingdom, then find the President they served under.
Action 1: Search[David Chanoff]
Observation 1: David Chanoff is an American author and journalist. He is the co-author of
several books, including Inside the Mirage: America's Fragile Partnership
with Saudi Arabia (2004) and The Game Player: Confessions of the CIA's
Original Political Operative (1989).
Thought 2: I need to search the U.S. Navy admiral David Chanoff collaborated with.
Action 2: Search[U.S. Navy admiral]
Observation 2: Admiral Charles R. Larson was a United States Navy four-star admiral who
served as the United States Ambassador to the United Kingdom from 1998 to
2001.
Thought 3: Admiral Charles R. Larson served as the United States Ambassador to the
United Kingdom from 1998 to 2001. So the President they served under is
Bill Clinton.
Action 3: Finish[Bill Clinton]

其间,前面部分归于少样本学习中的示例,意图是教会 ChatGPT 怎么运用东西。终究一行的 Question 便是咱们要问 ChatGPT 的问题。关于这个问题,ChatGPT 将运用外部东西答复。详细的成果如下:

Thought 1: I need to search David Chanoff and the U.S. Navy admiral, find the ambassador to the United Kingdom, then find the President they served under.
Action 1: Search[David Chanoff]
Observation 1: David Chanoff is an American author and journalist. He is the co-author of
several books, including Inside the Mirage: America's Fragile Partnership
with Saudi Arabia (2004) and The Game Player: Confessions of the CIA's
Original Political Operative (1989).
Thought 2: I need to search the U.S. Navy admiral David Chanoff collaborated with.
Action 2: Search[U.S. Navy admiral]
Observation 2: Admiral Charles R. Larson was a United States Navy four-star admiral who
served as the United States Ambassador to the United Kingdom from 1998 to
2001.
Thought 3: Admiral Charles R. Larson served as the United States Ambassador to the
United Kingdom from 1998 to 2001. So the President they served under is
Bill Clinton.
Action 3: Finish[Bill Clinton]

当然,这儿仅仅对ReAct的思路做了简略的论述,其背后还有一些愈加杂乱的逻辑处理,这儿不做详细论述,如音讯切断、Prompt等处理不做详细论述。经过介绍ReAct,咱们能够了解到怎么处理Prompt来让ReAct处理愈加杂乱的功用。

Agent与Tool

单单有ReAct的Prompt并不能很好地构建起本项意图解决方案,必须有一套完善的结构能够更好地对ReAct的Prompt进行愈加详尽化地调整,告诉ChatGPT能够运用哪些东西,而且怎么运用这些东西,然后结构能够依据ChatGPT输出的内容准确地进行东西调用,并运用东西回来的成果进行进一步操作。随着体系的杂乱化,咱们需求引入Agent和Tool的概念。

在LLM的Prompt Engineering中,Agent是更高级的履行器,负责杂乱使命的调度和分发,在用户向Agent输入了其要求之后,Agent内部会经过Action Plan Generation拆解用户的要求并构成一系列的方案,进一步地,咱们让Agent内部自动履行每一个Plan,并经过ReAct Prompting technique来让Agent对自己Plan的履行方案的输出进行一个调查,对输出的成果得出自己的定论,并依据定论持续履行使命,直到Agent认为其得到了想要的成果。

咱们能够为Agent构建相关的ToolKit,关于每个Tool,供给其运用方法和东西名的Prompt,并完结其对应的功用,如关于FileWriteTool,咱们需求在代码上完结写入文件的功用。有了Tool,咱们能够在Agent初始化的时分注入到SystemMessage中作为体系预设,然后为Agent供给调用外部东西的才能。而LangChain已经供给了这种结构,能够让咱们愈加便利的完结Agent的才能,并供给了高度的自定义化,咱们由此能够对RoboAgent进行深度定制化。

RoboToolKit的构建

关于怎么去构建Robot查询句子以及校验等,咱们能够参阅一下langchain中SqlDatabaseTookKit的思路来构建RoboToolKit,详细来说,咱们能够将RoboToolKit分为以下几个部分。

  • RoboQueryTool Robo指令查询东西
  • RoboInfoTool Robo当时信息查询东西
  • RoboActionTool Robo行为指令东西,这儿或许并不是RoboActionTool,而是某某一些详细的行为完结,如前进,后退等动作.

在将使命输入到RoboToolKit之前,咱们可能需求使命进行预处理。详细而言,预处理过程包括:将使命转换为合适RoboToolKit处理的格局,如对使命进行特征提取,例如提取出详细的行为信息等。

RoboAgent的构建

构建一个RoboAgent,RoboAgent能够调用RoboToolKit的功用,咱们需求构建一个合适的Prompt,然后经过ReAct完结Zero-shot的杂乱需求理解,让RoboSDK对生成的详细指令行为进行履行,并终究驱动机器人。

关于Prompt规划,遵从1)规划模板;2)生成模板;3)挑选最佳模板的流程;关于Prompt有效性的验证,需求进行后续的测验进行横向对比。

构建仿真,在仿真中完结闭环验证

在完结了根本的功用验证之后,咱们需求在仿真中完结闭环验证,进一步地,咱们需求一步步调试以优化机器人的表现才能,终究达到特定的预期。

在装置LangChain、RoboSDK等开发环境,针对当时构建RoboAgent和RoboToolKit的各个模块进行单元测验,而且分别对单机形式和什物形式(假如能够的话)下进⾏测验,得到运⾏作用。

总结

本文介绍了怎么运用LLM的才能构建一个能够控制机器人的杂乱指令体系,而且介绍了当时前言的一些研讨,如Google的SayCan,MicroSoft的PromptCraft等,终究,本文介绍了一下笔者的构建思路。2023是LLM蓬勃发展的一年,未来,肯定会有越来越多LLM+机器人相关的项目和研讨出来,能够等待一下!笔者也等待能够与志同道合的小伙伴能够一同沟通一下。

References

  • github: promptulate
  • github: PromptCraft
  • 谷歌联合发布SayCan模型,让机器人做出合理答复,还能“说到做到”
  • github: LangChain
  • ChatGPT for Robotics: Design Principles and Model Abilities