在我之前的文章:
我详细地描绘了怎么运用 LangChain 及 OpenAI 进行向量查找及 RAG。在那篇文章中,它没有用户界面。在今日的文章中,我将展示怎么运用 OpenAI 来把数据进行向量化(不是运用 Elastic Stack 提供的 eland 上传模型的方法。这种计划是完全免费的),并写入到 Elasticsearch。咱们运用 Web UI 来对向量进行查找。咱们能够在如下的地址下载代码:
git clone https://github.com/liu-xiao-guo/elasticsearch-labs
咱们将运用其中的一个例子:
1. $ pwd
2. /Users/liuxg/python/elasticsearch-labs/example-apps/openai-embeddings
装置
装置 Elasticsearch 及 Kibana
假如你还没有装置好自己的 Elasticsearch 及 Kibana,那么请参考一下的文章来进行装置:
在装置的时分,请选择 Elastic Stack 8.x进行装置。在我下面的操练中,我将运用 Elastic Stack 8.11 来进行展示。
运转运用
在运转之前,咱们在自己的 terminal 中打入如下的指令:
1. export ELASTICSEARCH_URL=https://localhost:9200
2. export ELASTIC_USERNAME=elastic
3. export ELASTIC_PASSWORD=o6G_pvRL=8P*7on+o6XH
4. export OPENAI_API_KEY=YourOpenAIKey
在我的设置中,我运用自签名证书的 Elasticsearch 集群。在上面,你需要根据自己的 Elasticsearch 超级用户及密码进行装备。你也需要在 OpenAI 的网站中请求开发者 key。你能够在地址platform.openai.com/api-keys进行请求。
别的,咱们需要拷贝 Elasticsearch 的证书到当时的目录中:
1. $ pwd
2. /Users/liuxg/python/elasticsearch-labs/example-apps/openai-embeddings
3. $ cp ~/elastic/elasticsearch-8.11.0/config/certs/http_ca.crt .
4. $ ls
5. LICENSE http_ca.crt package.json utils.js
6. README.md images sample_data views
7. generate_embeddings.js package-lock.json search_app.js
如上所示,generate_embeddings.js 这个文件是用来运用 OpenAI 来生产 embeddings 的。关于怎么运用证书及签名连接到 Elasticsearch,请参阅之前的文章 “Elasticsearch:运用最新的 Nodejs client 8.x 来创立索引并查找”。有关怎么连接到 Elasticsearch 的部分代码,请参阅上面的 utils.js。
在运动代码之前,咱们运用如下的指令来装置相应的包:
npm install
1. $ vi package.json
2. $ npm install
4. removed 10 packages, and audited 110 packages in 1s
6. 10 packages are looking for funding
7. run `npm fund` for details
9. found 0 vulnerabilities
咱们能够检查当时的 nodejs 版别:
1. $ node --version
2. v19.0.1
咱们也能够检查 openai 的版别:
1. $ npm list | grep openai
2. openai-integration-example-javascript@1.0.0 /Users/liuxg/python/elasticsearch-labs/example-apps/openai-embeddings
3. └── openai@4.20.1
在这里需要强调的是 openai 的版别不同,调用的 API 的接口会有区别。
1. $ npm list | grep elasticsearch
2. openai-integration-example-javascript@1.0.0 /Users/liuxg/python/elasticsearch-labs/example-apps/openai-embeddings
3. ├── @elastic/elasticsearch@8.8.0
生成向量
咱们能够检查 package.json 的文档定义:
package.json
1. {
2. "name": "openai-integration-example-javascript",
3. "version": "1.0.0",
4. "description": "OpenAI integration example",
5. "main": "search_app.js",
6. "scripts": {
7. "app": "node search_app.js",
8. "generate": "node generate_embeddings.js"
9. },
10. "author": "Elastic",
11. "license": "MIT",
12. "dependencies": {
13. "@elastic/elasticsearch": "^8.8.0",
14. "express": "^4.18.2",
15. "hbs": "^4.2.0",
16. "openai": "^4.20.1"
17. }
18. }
咱们运用如下的指令来生成 embeddings:
npm run generate
1. $ npm run generate
3. > openai-integration-example-javascript@1.0.0 generate
4. > node generate_embeddings.js
6. Connecting to Elasticsearch: https://localhost:9200
7. connection success true
8. Creating index openai-integration...
9. Reading from file sample_data/medicare.json
10. Processing 12 documents...
11. Processing batch of 10 documents...
12. docsBatch size: 10
13. Calling OpenAI API for 10 embeddings with model text-embedding-ada-002
14. Indexing 10 documents to index openai-integration...
15. Processing batch of 2 documents...
16. docsBatch size: 2
17. Calling OpenAI API for 2 embeddings with model text-embedding-ada-002
18. Indexing 2 documents to index openai-integration...
19. Processing complete
在运转上面的指令时,必定要在 terminal 中设置上面的变量。在上面,咱们能够看到有12个文档现已被摄入到 Elasticsearch 中。它运用的是 OpenAI 的接口来进行向量化的。咱们能够运用如下的指令在 Kibana 中进行检查:
GET openai-integration/_search
发动 web 运用
咱们能够运用如下的指令来发动 web 运用:
npm run app
1. $ npm run app
3. > openai-integration-example-javascript@1.0.0 app
4. > node search_app.js
6. Connecting to Elasticsearch: https://localhost:9200
7. Express app listening on port 3000
8. connection success true
如上所示,咱们的 web 运用在 localhost:3000 的端口上运转。咱们能够在浏览器中进行翻开:
在 web 运用中进行语义查找
咱们的数据结构如下:
1. {
2. "url": "https://faq.ssa.gov/en-us/Topic/article/KA-01735",
3. "title": "How do I get a replacement Medicare card?",
4. "content": "If your Medicare card was lost, stolen, or destroyed, you can request a replacement online at Medicare.gov. You can print an official copy of your card from your online Medicare account or call 1-800-MEDICARE (1-800-633-4227 TTY 1-877-486-2048) to order a replacement card to be sent in the mail."
5. },
6. {
7. "url": "https://faq.ssa.gov/en-us/Topic/article/KA-02713",
8. "title": "How do I terminate my Medicare Part B (medical insurance)?",
9. "content": "You can voluntarily terminate your Medicare Part B (Medical Insurance). However, you may need to have a personal interview with Social Security to review the risks of dropping coverage and to assist you with your request. To find out more about how to terminate Medicare Part B or to schedule a personal interview, contact us at 1-800-772-1213 (TTY: 1-800-325-0778) or visit your nearest Social Security office."
10. },
在咱们的完成中,咱们是针对 content 这个 text 字段进行向量化的,也就是说咱们能够针对这个字段进行语义查找。
咱们测验进行如下的查找:
how much does Medicare cost?
咱们还能够进行如下的查找:
how can I terminate my Medicare?
How can I tell whether I am eligible for Medicare?