ChatGPT的泛用性极高,上知地舆,下通地舆,参阅古今,博稽中外,简直一窍不通,无所不晓。但假如触及垂直范畴的专业知识点,ChatGPT不免也会有语焉不详,含糊其辞的毛病,本次咱们将特定范畴的学习材料“喂”给ChatGPT,让它“学习”后再来答复专业问题。

专业范畴语料问题

所谓专业范畴语料问题,能够了解为特定规模内的知识图谱,也就是给GPT提供前置的检索维度,举个例子,大家都读过鲁迅的名篇《从百草园到三味书屋》,文章中触及一个“美女蛇”的典故,假定咱们没有给GPT设置一个特定规模,直接问“美女蛇”的相关问题:

读破万卷,神交古人,突破ChatGPT4096的Token限制,建立自己的垂直领域资料人工智能助理

一目了然,ChatGPT对于“美女蛇”典故的了解呈现了信息误差问题,它认为“美女蛇”指的是《白蛇传》中的白素贞和许仙以及法海的故事。

但其实咱们都知道,《从百草园到三味书屋》中“美女蛇”指的是人首蛇身的怪物,能唤人名,倘一答应,夜间便要来吃这人的肉的故事。

所以,假如咱们想议论“美女蛇”相关的问题,必须让ChatGPT有一个特定的“语境”,它才干了解真正要议论的话题,所以需要把《从百草园到三味书屋》作为语料“喂”给ChatGPT才干够,当然了《从百草园到三味书屋》作为人尽皆知的杂文,它肯定默许存储于ChatGPT的语料库中,但假定假如某一个范畴的论文或许其他材料,并未呈现在ChatGPT的语料库中,而该文章的长度又超过ChatGPT输入的4096个token的约束,那么就十分麻烦了,所以让ChatGPT具备学习“新材料”的才能就显得十分必要了。

llama_index装备语料索引

LlamaIndex(GPT Index)是一个针对特定语料检索的GPT项目,能够经过索引文件把外部语料数据和GPT连接起来,首先装置项目:

pip3 install llama-index

注意该项目依赖langchain模块,为了保证不出问题,最好晋级一下langchain模块:

pip3 install --upgrade langchain

LlamaIndex所做的是将咱们的原始语料数据转换成一个依据向量的索引,这对检索来说是十分高效的。它将运用这个索引,依据查询和数据的相似性,找到最相关的部分。然后,它将把检索到的内容插入到它将发送给GPT的引导词(prompt)中,这样GPT就有了答复问题的“语境”:

读破万卷,神交古人,突破ChatGPT4096的Token限制,建立自己的垂直领域资料人工智能助理

具体工作流

将本地答案数据集,转为向量存储到向量数据(index.json)

当用户输入查询的问题时,把问题转为向量然后从向量数据库中查询相近的答案topK 这个时分其实就是咱们最普遍的问答查询计划,在没有GPT的时分就直接回来相关的答案整个流程就结束了。

依据GPT能够优化答复内容的整体结构,在单纯的搜索场景下其实这个优化没什么意义。但假如在垂直范畴特定的聊天场景下,引证相关范畴内容回复时,数据检索会更加精准。

首先把《从百草园到三味书屋》这篇文章写入到项目的data目录中,随后编写代码

import os
from llama_index import SimpleDirectoryReader, GPTSimpleVectorIndex,LLMPredictor,ServiceContext  
from langchain import OpenAI  
os.environ["OPENAI_API_KEY"] = 'apikey'  
class LLma:  
    # 树立本地索引  
    def create_index(self,dir_path="./data"):  
        # 读取data文件夹下的文档  
        documents = SimpleDirectoryReader(dir_path).load_data()  
        index = GPTSimpleVectorIndex.from_documents(documents)  
        print(documents)  
        # 保存索引  
        index.save_to_disk('./index.json')

这儿经过GPTSimpleVectorIndex.from_documents办法读取data目录中的语料文章,随后转换为向量索引存储在本地磁盘的index.json文件中。

履行树立索引办法:

if __name__ == '__main__':
    llma = LLma()  
    # 树立索引  
    llma.create_index()

索引的内容:

{"index_struct": {"__type__": "simple_dict", "__data__": {"index_id": "86c83b5a-a975-43ab-8505-cbc8f0ae68e2", "summary": null, "nodes_dict": {"da552579-e0f4-4ee0-be68-a3c392e39dc2": "a2521cfa-13c5-49b2-9cfd-7206fe493666", "c1f7df04-5e6c-4327-a0cc-4a3489d50d19": "68b609e3-2ec5-4de2-ac43-eb28105364ca"}, "doc_id_dict": {"87411099-60d8-4272-a7d1-6e8676fc42a0": ["da552579-e0f4-4ee0-be68-a3c392e39dc2", "c1f7df04-5e6c-4327-a0cc-4a3489d50d19"]}, "embeddings_dict": {"da552579-e0f4-4ee0-be68-a3c392e39dc2": [0.004821529611945152, -0.005787167698144913, 0.00886388961225748, -0.0005273548304103315, -0.0007779211737215519, 0.022242968901991844, -0.0035828494001179934, -0.023534925654530525, -0.03012790158390999, -0.014744291082024574, 0.004718306474387646, -0.0010788505896925926, -0.006236688699573278, 0.0033247910905629396, 0.01692862994968891, 0.02300216071307659, 0.01628931239247322, 0.008157975040376186, 0.028822625055909157, -0.0011337919859215617, 0.006499741692095995, 0.02746407315135002, -0.016302630305290222, -0.002881929511204362, -0.011933951638638973, 0.016502417623996735, 0.03031436912715435, -0.016489099711179733, 0.003935806918889284, -0.0009106963989324868, 0.0039058385882526636, 0.004168891813606024, -0.018566885963082314, -0.00980954896658659, -0.026451818645000458, -0.027490710839629173, -0.008237889967858791, -0.005337646696716547, 0.010009336285293102, 0.0037393493112176657, 0.013931823894381523, 0.0008798958733677864, -0.004105625674128532, 0.011208058334887028, -0.01188733521848917, 0.008311145007610321, -0.020058630034327507, -0.006176752503961325, -0.01582314260303974, 0.026438498869538307, 0.005500806029886007, 0.005507465451955795, -0.028103390708565712, -0.01434471644461155, -0.010175825096666813, 0.011747484095394611, 0.01688867248594761, 0.026558371260762215, -0.010755208320915699, -0.011754143983125687, 0.014970717020332813, 0.017115099355578423, -0.02107088454067707, -0.015317014418542385, -0.020990969613194466, 0.028103390708565712, 0.007438741158694029, -0.040436916053295135, -0.0037460089661180973, -0.0131593132391572, 0.032285600900650024, 0.030554113909602165, 0.005933678243309259, -0.015090588480234146, 0.041422534734010696, -0.01100827194750309, -0.03617479279637337, 0.010488824918866158, -0.010701931081712246, 0.009243485517799854, -0.005277710501104593, -0.01450454629957676, -0.02233620174229145, 0.02344169095158577, 0.01539692934602499, 0.019046373665332794, -0.019206203520298004, 0.04653708636760712, -0.015490163117647171, -0.023175308480858803, 0.012573271058499813, 0.026398541405797005, 0.013179291971027851, 9.521106403553858e-05, 0.018100714311003685, 0.020857777446508408, -0.007119081914424896, 0.013279185630381107, -0.0021826745942234993, -0.0350826233625412, -0.0061501143500208855, -0.009136931970715523, -0.009862825274467468, 0.002357488265261054, -0.023561563342809677, 0.008231230080127716, 0.02282901108264923, 0.00845099613070488, 0.03207249566912651, -0.01539692934602499, -0.007352166809141636, 0.03236551582813263, 0.008677421137690544, -0.04581785202026367, -0.0017514672363176942, -0.026385221630334854, 0.027863647788763046, 0.008018123917281628, -0.016955269500613213, -0.0055873803794384, -0.004668359644711018, 0.01126133557409048, -0.004535167943686247, -0.026385221630334854, -0.0008724038489162922, 0.020697947591543198, -0.011780781671404839, 0.01530369557440281, -0.02426747791469097, -0.013505610637366772, 0.010715250857174397, 0.028662795200943947, 0.017354842275381088, -0.006056880112737417, -0.04019717499613762, 0.0062999543733894825, -0.017195014283061028, -0.003965775016695261, 0.0009897787822410464, -0.02342837303876877, 0.005976965185254812, 0.016822077333927155, -0.012699802406132221, -0.021030927076935768, 0.0061334650963544846, 0.031992580741643906, -9.115288412431255e-05, 0.007465379778295755, -0.011481101624667645, -0.0066828797571361065, 0.02358820289373398, -0.0002526475000195205, 0.03460313379764557, -0.005790497176349163, 0.01036229357123375, 0.013139334507286549, -0.0052244337275624275, 0.011660909280180931, -0.019033055752515793, 0.011667569167912006, 0.024027733132243156, 0.02807675302028656, 0.021803436800837517, -0.011048229411244392, 0.002240945817902684, 0.024760287255048752, 0.004934742581099272, -0.004338710568845272, -0.0006101832259446383, -0.015023993328213692, -0.011694207787513733, 0.013119355775415897, -0.021590329706668854, 0.028263220563530922, 0.003759328043088317, 0.007625209167599678, 0.012107101269066334, 0.015849780291318893, -0.019659055396914482, -0.012972844764590263, -0.04509861767292023, -0.022908926010131836, 0.01881994865834713, 0.01108152698725462, -0.019073013216257095, -0.020458202809095383, -0.012686483561992645, 0.0038159345276653767, -0.0018863235600292683, -0.0006988387904129922, 0.00893048569560051, 0.02617211639881134, -0.019539183005690575, -0.014384673908352852, -0.5932878851890564, -0.011580994352698326, -0.01566331274807453, 0.0027354189660400152, 0.008977102115750313, 0.01149442046880722, 0.006489752326160669, 0.01800748147070408, -0.004032370634377003, -0.04208848997950554, -0.023028798401355743, 0.015050631016492844, 0.005440869834274054, 0.0030251103453338146, -0.01690199226140976, -0.007392124272882938, 0.009263464249670506, -0.011094845831394196, -0.002357488265261054, -0.01977892778813839, -0.008197932504117489, 0.015503481961786747, 0.004951391369104385, 0.007432081736624241, -0.013239228166639805, 0.020951012149453163, 0.012413441203534603, -0.018766671419143677, -0.004964710678905249, 0.034683048725128174, -0.031140156090259552, 0.016196077689528465, 0.013798631727695465, 0.00372935994528234, 0.05791163444519043, -0.005960316397249699, -0.011427824385464191, 0.01466437615454197, 0.027970200404524803, 0.0183271411806345, 0.011274654418230057, -0.00394579628482461, 0.012460058555006981, -0.007025847677141428, -0.0052310931496322155, -0.013918504118919373, 0.025785861536860466, -0.007798358332365751, 0.0028436370193958282, 0.02583913691341877, 0.0004337045829743147, 0.0019362703897058964, -0.00045826175482943654, -0.010415569879114628, 0.023015478625893593, -0.0012678159400820732, 0.018580203875899315, -0.04086313024163246, -0.0015425232704728842, 0.0047849020920693874, 0.016795439645648003, 0.026744838804006577, -0.004138923715800047, -0.010189143940806389, -0.005763859022408724, 0.02454718016088009, -0.04099632054567337, -0.00870405975729227, 0.008690740913152695, -0.0016657252563163638, 0.0038692110683768988, 0.011647590436041355, -0.006389858666807413, 0.0014010072918608785, 0.002547285985201597, 0.022775733843445778, 0.02202986180782318, 0.007245613727718592, -0.007991485297679901, 0.01403837651014328, 0.01580982282757759, -0.003476296318694949, -0.023201946169137955, -0.01768782176077366, 0.010355633683502674, -0.003989083226770163, -0.011387866921722889, 0.0207379050552845, -0.0004757431452162564, -0.009436612948775291, -0.01507726963609457, 0.012506674975156784, -0.004158902447670698, 0.006250008009374142, 0.002142717130482197, 0.008644123561680317, -0.022229649126529694, -0.012100441381335258, 0.009316740557551384, -0.033883899450302124, 0.007305549923330545, -0.04517853260040283, -0.006925954483449459, 0.01164093054831028, -0.0032665198668837547, 0.024853520095348358, -0.014131610281765461, -0.010302357375621796, 0.018233906477689743, -0.0215237345546484, 0.0005473335040733218, 0.0028070092666894197, 0.020697947591543198, -0.0022043180651962757, 0.005677284672856331, -0.019366033375263214, 0.021150799468159676, 0.021643606945872307, 0.03929147124290466, 0.00853757094591856, 0.013132674619555473, 0.02967504970729351, 0.006872677709907293, -0.0004037365142721683, 0.038359131664037704, 0.009429953061044216, -0.02007194794714451, -0.005027976352721453, 0.014211525209248066, 0.00625666743144393, 0.0001508809218648821, -0.014904120936989784, 0.036680918186903, -0.017141737043857574, 0.03657436743378639, -0.017101779580116272, 0.01579650305211544, -0.004821529611945152, 0.02362816035747528, -0.009389995597302914, -0.007771719712764025, 0.016182757914066315, 0.02091105468571186, -0.004601764027029276, -0.009449931792914867, -0.017621226608753204, -0.010542101226747036, -0.014597781002521515, -0.036601003259420395, 0.0007879105396568775, 0.0012811350170522928, -0.007978166453540325, -0.013219249434769154, -0.010974973440170288, -0.01932607591152191, 0.041076235473155975, -0.013618824072182178, -0.0075852517038583755, -0.025439562276005745, 0.01736816205084324, 0.01325254701077938, 0.0026987912133336067, -0.01690199226140976, -0.004954721312969923, 0.004082317464053631, -0.027157733216881752, -0.010661973617970943, -0.005624007899314165, -0.03300483524799347, -0.03396381437778473, -0.010435548610985279, -0.014011737890541553, 0.0018430363852530718, 0.00505794445052743, -0.005011327564716339, -0.015383610501885414, -0.02697126381099224, -0.000325278437230736, -0.019419310614466667, -0.003979093860834837, 0.014211525209248066, 0.0023175308015197515, -0.003479626029729843, -0.0031166793778538704, 0.007725102826952934, -0.02057807520031929, 0.011181420646607876, 0.02759726345539093, 0.01992543786764145, 0.021124159917235374, 0.01269314344972372, 0.02280237339437008, -0.013705397956073284, 0.01612948253750801, -0.012706462293863297, -0.00011789522250182927, 0.030074624344706535, 0.013439015485346317, -0.008457656018435955, 0.03785300254821777, 0.003639455884695053, 0.02488015964627266, 0.0007954025641083717, -0.021590329706668854, 0.011967250145971775, -0.023881223052740097, 0.014850844629108906, -0.016848715022206306, 0.012420101091265678, -0.005164497531950474, -0.001505063148215413, -0.005430880468338728, -0.012346845120191574, -0.02011190541088581, 0.0077783796004951, 0.017168374732136726, -0.012619887478649616, 0.00326818460598588, 0.011953930370509624, -0.002958514727652073, -0.013998419046401978, -0.0045451573096215725, 0.010182484984397888, -0.0003080051683355123, -0.01852692849934101, 0.003238216508179903, 0.0018846587045118213, 0.0013976775808259845, -0.012333526276051998, 0.0011654250556603074, -0.017994161695241928, -0.01493075955659151, 0.006386529188603163, 0.013299164362251759, -0.005633997265249491, 0.01579650305211544, 0.032605260610580444, -0.04030372574925423, 0.0295152198523283, -0.014397993683815002, 0.0036094877868890762, 0.02614547684788704, 0.005800486542284489, -0.02091105468571186, 0.007598571013659239, 0.023827945813536644, 0.007831656374037266, -0.006163433194160461, -0.0167554821819067, 0.024187562987208366, -0.012180356308817863, 0.014864163473248482, -0.018633481115102768, -0.012426760047674179, 0.013339121825993061, -0.046350616961717606, -0.0049214232712984085, 0.022922245785593987, 0.02677147649228573, 0.013359100557863712, 0.003148312447592616, -0.007312209345400333, -0.001261988771148026, 0.007538634818047285, 0.034070368856191635, 0.006126805674284697, -0.0043953172862529755, -0.020618032664060593, -0.02153705433011055, 0.006383199244737625, 0.002515653148293495, -0.020671309903264046, 0.007438741158694029, -0.01085510104894638, 0.017954204231500626, -0.008650783449411392, -0.015103908255696297, -0.009736292995512486, -0.008624144829809666, -0.007771719712764025, -0.030527476221323013, -0.013079398311674595, 0.007378804963082075, 0.007365486118942499, -0.00893714465200901, -0.0028602860402315855, -0.0199387576431036, -0.005534104071557522, -0.00011040320532629266, 0.030900411307811737, -0.014531184919178486, -0.0036061578430235386, 0.007025847677141428, 0.005171157419681549, -0.010568739846348763, 0.00614345446228981, 0.008351102471351624, -0.006686209701001644, -0.015130545943975449, 0.005763859022408724, -0.011607632972300053, -0.015876417979598045, -0.016555694863200188, 0.008231230080127716, 0.024360712617635727, 0.016022928059101105, -0.02007194794714451, 0.0009631405118852854, -0.0019928766414523125, -0.0016507413238286972, -0.03409700468182564, -0.01660897210240364, -0.014171567745506763, -0.011594314128160477, -0.00028844267944805324, 0.01628931239247322, -0.012713122181594372, -0.02262922376394272, 0.041742194443941116, 0.00034400849835947156, -0.006789433304220438, -0.025253094732761383, -0.0335376001894474, 0.01283299457281828, 0.08353766053915024, -0.001618275884538889, -0.012972844764590263, 0.003762657754123211, 0.034523218870162964, 0.006632933393120766, -0.025745904073119164, -0.03718704730272293, 0.022456074133515358, 0.0007313041714951396, 0.007198996841907501, -0.006799422670155764, 0.014397993683815002, -0.004981359466910362, 0.004049019422382116, -0.01165425032377243, -0.006802752148360014, -0.014810887165367603, 0.0077783796004951, 0.0004247557953931391, -0.0016623955452814698, 0.010468846186995506, -0.00013964288518764079, 0.032631900161504745, -0.017821013927459717, 0.0022592595778405666, 0.02378798834979534, 0.015849780291318893, 0.03494942933320999, -0.017314886674284935, -0.014850844629108906, -0.017181694507598877, 0.006523050367832184, 0.004262125585228205, -0.013945142738521099, -0.006729497108608484, -0.00812467746436596, 0.01820726878941059, -0.004045689478516579, -0.011993887834250927, 0.02677147649228573, 0.044113002717494965, 0.026358583942055702, -0.04102296009659767, 0.016835397109389305, -0.005727231502532959, -0.0067494758404791355, 0.0032765092328190804, -0.017900928854942322, -0.007172358222305775, 0.035002708435058594, -0.01347897294908762, 0.011028250679373741, -0.020325012505054474, 0.009683016687631607, -0.006852698978036642, 0.005334316752851009, -0.00701918825507164, -0.0030833815690129995, 0.009456591680645943, 0.004468572326004505, -0.028529604896903038, 0.0035695303231477737, -0.010555421002209187, -0.018074076622724533, -0.02075122483074665, 0.014637738466262817, -0.0065896459855139256, -0.006609624717384577, -0.003095035906881094, 0.001224528648890555, 0.0011879010125994682, -0.028156667947769165, -0.03092704899609089, 0.016995226964354515, 0.014051695354282856, 0.0302877314388752, 0.006100167520344257, -0.01451786607503891, 0.008490953594446182, -0.00028240744723007083, -0.02300216071307659, 0.01491743978112936, -0.018580203875899315, -0.015277056954801083, 0.025146542116999626, 0.0167288426309824, -0.003922487609088421, -0.0006946765352040529, 0.001029736245982349, 0.020657990127801895, 0.006423156708478928, 0.014557823538780212, 0.005943667609244585, 0.019885480403900146, -0.019099650904536247, -0.012912909500300884, 0.01931275799870491, 0.00634657172486186, -0.016302630305290222, -0.0021543714683502913, -0.014477908611297607, -0.007578592281788588, -0.013019462116062641, 0.02409433014690876, -0.01786097139120102, 0.0023724723141640425, 0.0010230767074972391, -0.011414505541324615, 0.005157838109880686, 0.014544503763318062, -0.018447013571858406, 0.009842846542596817, -0.0019346055341884494, 0.002054477808997035, 0.021776799112558365, -0.005254401825368404, 0.035668663680553436, 0.0034463282208889723, -0.017674501985311508, 0.02422752045094967, -0.018220586702227592, 0.014904120936989784, 0.021350586786866188, -0.008222210220992565, -0.017960945144295692, 0.020941250026226044, -0.0187258031219244]}, "text_id_to_doc_id": {"da552579-e0f4-4ee0-be68-a3c392e39dc2": "87411099-60d8-4272-a7d1-6e8676fc42a0", "c1f7df04-5e6c-4327-a0cc-4a3489d50d19": "87411099-60d8-4272-a7d1-6e8676fc42a0"}}}}

llama_index依据向量索引进行语义化查询

llama_index能够在本地匹配向量索引后再构建提示词:

class LLma:
    def query_index(self,prompt,index_path="./index.json"):  
        # 加载索引  
        local_index = GPTSimpleVectorIndex.load_from_disk(index_path)  
        # 查询索引  
        res = local_index.query(prompt)  
        print(res)

经过GPTSimpleVectorIndex.load_from_disk办法将向量索引导入,履行办法:

if __name__ == '__main__':
    llma = LLma()  
    # 树立索引  
    #llma.create_index()  
    # 查询索引  
    llma.query_index("讲一下美女蛇的故事")

程序回来:

美女蛇的故事能够追溯到古庙里的一个读书人。晚间,他在宅院里纳凉的时分,忽然听到有人在叫他的姓名。他四面看时,却见一个美女的脸露在墙头上,向他一笑,隐去了。他很高兴,但竟给那走来夜谈的老和尚识破了机关,说他脸上有些妖气,必定遇见“美女蛇”了。

能够看到,“美女蛇”的故事终所以咱们想要的“美女蛇”的故事了。

llama_index模型定制化

llama_index默许的答案生成模型计划为text-davinci-002,咱们也能够定制化适合自己的模型装备:

class LLma:
    def __init__(self) -> None:  
        self.llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003",max_tokens=1800))  
        self.service_context = ServiceContext.from_defaults(llm_predictor=self.llm_predictor)  
    # 查询本地索引  
    def query_index(self,prompt,index_path="./index.json"):  
        # 加载索引  
        local_index = GPTSimpleVectorIndex.load_from_disk(index_path)  
        # 查询索引  
        res = local_index.query(prompt)  
        print(res)  
    # 树立本地索引  
    def create_index(self,dir_path="./data"):  
        # 读取data文件夹下的文档  
        documents = SimpleDirectoryReader(dir_path).load_data()  
        index = GPTSimpleVectorIndex.from_documents(documents,service_context=self.service_context)  
        print(documents)  
        # 保存索引  
        index.save_to_disk('./index.json')  
if __name__ == '__main__':  
    llma = LLma()

这儿经过初始化函数定制self.llm_predictor特点,生成本地向量索引时,经过service_context参数进行动态调用即可:index = GPTSimpleVectorIndex.from_documents(documents,service_context=self.service_context) 。

结语

藉此,咱们就能够经过垂直范畴语料来“定制化”ChatGPT的答复了,最终奉上项目地址:github.com/zcxey2911/llama_index_examples_python3.10,与君共觞。