你的意思是” 是查找引擎中一个非常重要的功用,由于它们经过显现主张的术语来协助用户,以便他能够进行更精确的查找。比如,在百度中,咱们进行查找时,它通常会显现一些更为常用推荐的查找选项来供咱们挑选:

Elasticsearch:创建一个简单的 “你的意思是?” 推荐搜索

为了创建 “你的意思是”,咱们将运用 phrase suggester,由于经过它咱们将能够主张语句更正,而不仅仅是术语。在我之前的文章 “Elasticsearch:怎么实现短语主张 – phrase suggester”,我有涉及到这个问题。

首先,咱们将运用一个 shingle 过滤器,由于它将供给一个分词,短语主张器将运用该符号来进行匹配并回来更正。有关 shingle 过滤器的描述,请阅览之前的文章 “Elasticsearch: Ngrams, edge ngrams, and shingles”。

准备数据

咱们首先来界说映射:


1.  PUT movies
2.  {
3.    "settings": {
4.      "analysis": {
5.        "analyzer": {
6.          "en_analyzer": {
7.            "tokenizer": "standard",
8.            "filter": [
9.              "lowercase",
10.              "stop"
11.            ]
12.          },
13.          "shingle_analyzer": {
14.            "type": "custom",
15.            "tokenizer": "standard",
16.            "filter": [
17.              "lowercase",
18.              "shingle_filter"
19.            ]
20.          }
21.        },
22.        "filter": {
23.          "shingle_filter": {
24.            "type": "shingle",
25.            "min_shingle_size": 2,
26.            "max_shingle_size": 3
27.          }
28.        }
29.      }
30.    },
31.    "mappings": {
32.      "properties": {
33.        "title": {
34.          "type": "text",
35.          "analyzer": "en_analyzer",
36.          "fields": {
37.            "suggest": {
38.              "type": "text",
39.              "analyzer": "shingle_analyzer"
40.            }
41.          }
42.        },
43.        "actors": {
44.          "type": "text",
45.          "analyzer": "en_analyzer",
46.          "fields": {
47.            "keyword": {
48.              "type": "keyword",
49.              "ignore_above": 256
50.            }
51.          }
52.        },
53.        "description": {
54.          "type": "text",
55.          "analyzer": "en_analyzer",
56.          "fields": {
57.            "keyword": {
58.              "type": "keyword",
59.              "ignore_above": 256
60.            }
61.          }
62.        },
63.        "director": {
64.          "type": "text",
65.          "fields": {
66.            "keyword": {
67.              "type": "keyword",
68.              "ignore_above": 256
69.            }
70.          }
71.        },
72.        "genre": {
73.          "type": "text",
74.          "fields": {
75.            "keyword": {
76.              "type": "keyword",
77.              "ignore_above": 256
78.            }
79.          }
80.        },
81.        "metascore": {
82.          "type": "long"
83.        },
84.        "rating": {
85.          "type": "float"
86.        },
87.        "revenue": {
88.          "type": "float"
89.        },
90.        "runtime": {
91.          "type": "long"
92.        },
93.        "votes": {
94.          "type": "long"
95.        },
96.        "year": {
97.          "type": "long"
98.        },
99.        "title_suggest": {
100.          "type": "completion",
101.          "analyzer": "simple",
102.          "preserve_separators": true,
103.          "preserve_position_increments": true,
104.          "max_input_length": 50
105.        }
106.      }
107.    }
108.  }

咱们接下来运用_bulk指令来写入一些文档到这个索引中去。咱们运用这个链接中的内容。咱们运用如下的方法:


1.  POST movies/_bulk
2.  {"index": {}}
3.  {"title": "Guardians of the Galaxy", "genre": "Action,Adventure,Sci-Fi", "director": "James Gunn", "actors": "Chris Pratt, Vin Diesel, Bradley Cooper, Zoe Saldana", "description": "A group of intergalactic criminals are forced to work together to stop a fanatical warrior from taking control of the universe.", "year": 2014, "runtime": 121, "rating": 8.1, "votes": 757074, "revenue": 333.13, "metascore": 76}
4.  {"index": {}}
5.  {"title": "Prometheus", "genre": "Adventure,Mystery,Sci-Fi", "director": "Ridley Scott", "actors": "Noomi Rapace, Logan Marshall-Green, Michael Fassbender, Charlize Theron", "description": "Following clues to the origin of mankind, a team finds a structure on a distant moon, but they soon realize they are not alone.", "year": 2012, "runtime": 124, "rating": 7, "votes": 485820, "revenue": 126.46, "metascore": 65}
7.  ....

在上面,为了阐明的便利,我省去了其它的文档。你需要把整个 movies.txt 的文件复制过来,并悉数写入到Elasticsearch中。它共有1000 个文档。

查找数据

现在让咱们运行一个基本查询来检查 suggest 的成果:


1.  GET movies/_search?filter_path=suggest
2.  {
3.    "suggest": {
4.      "text": "transformers revenge of the falen",
5.      "did_you_mean": {
6.        "phrase": {
7.          "field": "title.suggest",
8.          "size": 5
9.        }
10.      }
11.    }
12.  }

上面指令显现的成果为:


1.  {
2.    "suggest": {
3.      "did_you_mean": [
4.        {
5.          "text": "transformers revenge of the falen",
6.          "offset": 0,
7.          "length": 33,
8.          "options": [
9.            {
10.              "text": "transformers revenge of the fallen",
11.              "score": 0.004467494
12.            },
13.            {
14.              "text": "transformers revenge of the fall",
15.              "score": 0.00020402104
16.            },
17.            {
18.              "text": "transformers revenge of the face",
19.              "score": 0.00006419608
20.            }
21.          ]
22.        }
23.      ]
24.    }
25.  }

请注意,在几行中你已经获得了一些有希望的成果。

现在让咱们经过运用更多短语主张功用来增加咱们的查询。让咱们运用 max_errors = 2,这样咱们希望语句中最多有两个术语。 添加了 highlight 显现以突出​​显现主张的术语。


1.  GET movies/_search?filter_path=suggest
2.  {
3.    "suggest": {
4.      "text": "transformer revenge of the falen",
5.      "did_you_mean": {
6.        "phrase": {
7.          "field": "title.suggest",
8.          "size": 5,
9.          "confidence": 1,
10.          "max_errors":2,
11.          "highlight": {
12.            "pre_tag": "<strong>",
13.            "post_tag": "</strong>"
14.          }
15.        }
16.      }
17.    }
18.  }

上面指令回来的成果为:


1.  {
2.    "suggest": {
3.      "did_you_mean": [
4.        {
5.          "text": "transformer revenge of the falen",
6.          "offset": 0,
7.          "length": 32,
8.          "options": [
9.            {
10.              "text": "transformers revenge of the fallen",
11.              "highlighted": "<strong>transformers</strong> revenge of the <strong>fallen</strong>",
12.              "score": 0.004382903
13.            },
14.            {
15.              "text": "transformers revenge of the fall",
16.              "highlighted": "<strong>transformers</strong> revenge of the <strong>fall</strong>",
17.              "score": 0.00020015794
18.            },
19.            {
20.              "text": "transformers revenge of the face",
21.              "highlighted": "<strong>transformers</strong> revenge of the <strong>face</strong>",
22.              "score": 0.00006298054
23.            },
24.            {
25.              "text": "transformers revenge of the falen",
26.              "highlighted": "<strong>transformers</strong> revenge of the falen",
27.              "score": 0.00006159308
28.            },
29.            {
30.              "text": "transformer revenge of the fallen",
31.              "highlighted": "transformer revenge of the <strong>fallen</strong>",
32.              "score": 0.000048000533
33.            }
34.          ]
35.        }
36.      ]
37.    }
38.  }

咱们再改善一点好吗? 咱们添加了 “collate”,咱们能够对每个成果履行查询,改善主张的成果。 我运用了带有 “and” 运算符的匹配项,以便在同一个语句中匹配所有术语。 假如我依然想要不符合查询条件的成果,我运用 prune = true。


1.  GET movies/_search?filter_path=suggest
2.  {
3.    "suggest": {
4.      "text": "transformer revenge of the falen",
5.      "did_you_mean": {
6.        "phrase": {
7.          "field": "title.suggest",
8.          "size": 5,
9.          "confidence": 1,
10.          "max_errors":2,
11.          "collate": {
12.            "query": { 
13.              "source" : {
14.                "match": {
15.                  "{{field_name}}": {
16.                    "query": "{{suggestion}}",
17.                    "operator": "and"
18.                  }
19.                }
20.              }
21.            },
22.            "params": {"field_name" : "title"}, 
23.            "prune" :true
24.          },
25.          "highlight": {
26.            "pre_tag": "<strong>",
27.            "post_tag": "</strong>"
28.          }
29.        }
30.      }
31.    }
32.  }

现在的成果是:

Elasticsearch:创建一个简单的 “你的意思是?” 推荐搜索

请注意,答案已更改,我有一个新字段 “collat​​e_match”,它指示成果中是否匹配收拾规则(这是由于 prune = true)。

让咱们设置 prune 为 false:


1.  GET movies/_search?filter_path=suggest
2.  {
3.    "suggest": {
4.      "text": "transformer revenge of the falen",
5.      "did_you_mean": {
6.        "phrase": {
7.          "field": "title.suggest",
8.          "size": 5,
9.          "confidence": 1,
10.          "max_errors":2,
11.          "collate": {
12.            "query": { 
13.              "source" : {
14.                "match": {
15.                  "{{field_name}}": {
16.                    "query": "{{suggestion}}",
17.                    "operator": "and"
18.                  }
19.                }
20.              }
21.            },
22.            "params": {"field_name" : "title"}, 
23.            "prune" :false
24.          },
25.          "highlight": {
26.            "pre_tag": "<strong>",
27.            "post_tag": "</strong>"
28.          }
29.        }
30.      }
31.    }
32.  }

这次咱们得到的成果是:

Elasticsearch:创建一个简单的 “你的意思是?” 推荐搜索

咱们能够看到只要一个成果是最相关的主张。