商业发展与功用技术部-体验保障研制组

康睿 姚再毅 李振 刘斌 王北永

阐明:以下悉数均根据eslaticsearch 8.1 版本

一.索引的界说

官网文档地址:www.elastic.co/guide/en/el…

索引的全局认知

ElasticSearch Mysql
Index Table
Type废弃 Table废弃
Document Row
Field Column
Mapping Schema
Everything is indexed Index
Query DSL SQL
GET http://… select * from
POST http://… update table set …
Aggregations group by\sum\sum
cardinality 去重 distinct
reindex 数据迁移

索引的界说

界说: 相同文档结构(Mapping)文档的结合 由唯一索引称号标定 一个集群中有多个索引 不同的索引代表不同的事务类型数据 留意事项: 索引称号不支持大写 索引称号最大支持255个字符长度 字段的称号,支持大写,不过主张悉数一致小写

索引的创立

ElasticSearch必知必会-基础篇

index-settings 参数解析

官网文档地址:www.elastic.co/guide/en/el…

留意: 静态参数索引创立后,不再能够修正,动态参数能够修正 考虑: 一、为什么主分片创立后不可修正? A document is routed to a particular shard in an index using the following formula: <shard_num = hash(_routing) % num_primary_shards> the defalue value userd for _routing is the document`s _id es中写入数据,是依据上述的公式计算文档应该存储在哪个分片中,后续的文档读取也是依据这个公式,一旦分片数改变,数据也就找不到了 简略理解 依据ID做Hash 然后再 除以 主分片数 取余,被除数改变,成果就不相同了 二、假定事务层面依据数据情况,确实需求扩展主分片数,那怎么办? reindex 迁移数据到另外一个索引 www.elastic.co/guide/en/el…

ElasticSearch必知必会-基础篇

索引的根本操作

ElasticSearch必知必会-基础篇


二.Mapping-Param之dynamic

官网文档地址:www.elastic.co/guide/en/el…

中心功用

自动检测字段类型后增加字段 也便是哪怕你没有在es的mapping中界说该字段,es也会动态的帮你检测字段类型

初识dynamic

// 删去test01索引,确保这个索引现在是干净的
DELETE test01
// 不界说mapping,直接一条刺进数据试试看,
POST test01/_doc/1
{
  "name":"kangrui10"
}
// 然后咱们检查test01该索引的mapping结构 看看name这个字段被界说成了什么类型
// 由此能够看出,name一级为text类型,二级界说为keyword,但其实这并不是咱们想要的成果,
// 咱们事务查询中name字段并不会被分词查询,一般都是全匹配(and name = xxx)
// 以下的这种成果,咱们想要完成全匹配 就需求 name.keyword = xxx  反而麻烦
GET test01/_mapping
{
  "test01" : {
    "mappings" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

dynamic的可选值

可选值 阐明 解说
true New fields are added to the mapping (default). 创立mapping时,假定不指定dynamic的值,默许true,即假定你的字段没有收到指定类型,就会es帮你动态匹配字段类型
false New fields are ignored. These fields will not be indexed or searchable, but will still appear in the _source field of returned hits. These fields will not be added to the mapping, and new fields must be added explicitly. 若设置为false,假定你的字段没有在es的mapping中创立,那么新的字段,相同能够写入,但是不能被查询,mapping中也不会有这个字段,也便是被写入的字段,不会被创立索引
strict If new fields are detected, an exception is thrown and the document is rejected. New fields must be explicitly added to the mapping. 若设置为strict,假定新的字段,没有在mapping中创立字段,增加会直接报错,生产环境推荐,愈加谨慎。示例如下,如要新增字段,就有必要手动的新增字段

动态映射的坏处

  • 字段匹配相对准确,但不一定是用户期望的

  • 比方现在有一个text字段,es只会给你设置为默许的standard分词器,但咱们一般需求的是ik中文分词器

  • 占用多余的存储空间

  • string类型匹配为text和keyword两种类型,意味着会占用更多的存储空间

  • mapping爆破

  • 假定不小心写错了查询句子,get用成了put误操作,就会错误创立很多字段

三.Mapping-Param之doc_values

官网文档地址:www.elastic.co/guide/en/el…

中心功用

DocValue其实是Lucene在构建倒排索引时,会额外建立一个有序的正排索引(根据document => field value的映射列表) DocValue本质上是一个序列化的 列式存储,这个结构十分适用于聚合(aggregations)、排序(Sorting)、脚本(scripts access to field)等操作。而且,这种存储方法也十分便于压缩,特别是数字类型。这样能够减少磁盘空间而且进步拜访速度。 简直所有字段类型都支持DocValue,除了text和annotated_text字段。

何为正排索引

正排索引其实便是相似于数据库表,经过id和数据进行相关,经过查找文档id,来获取对应的数据

doc_values可选值

  • true:默许值,默许敞开
  • false:需手动指定,设置为false后,sort、aggregate、access the field from script将会无法运用,但会节约磁盘空间

真题演练

// 创立一个索引,test03,字段满意以下条件
//     1. speaker: keyword
//     2. line_id: keyword and not aggregateable
//     3. speech_number: integer
PUT test03
{
  "mappings": {
    "properties": {
      "speaker": {
        "type": "keyword"
      },
      "line_id":{
        "type": "keyword",
        "doc_values": false
      },
      "speech_number":{
        "type": "integer"
      }
    }
  }
}

四.分词器analyzers

ik中文分词器装置

github.com/medcl/elast…

何为倒排索引

ElasticSearch必知必会-基础篇

数据索引化的进程

ElasticSearch必知必会-基础篇

分词器的分类

官网地址: www.elastic.co/guide/en/el…

ElasticSearch必知必会-基础篇


五.自界说分词

自界说分词器三段论

1.Character filters 字符过滤

官网文档地址:www.elastic.co/guide/en/el… 可配置0个或多个

HTML Strip Character Filter:用处:删去HTML元素,如 ,并解 码HTML实体,如&amp

Mapping Character Filter:用处:替换指定字符

Pattern Replace Character Filter:用处:根据正则表达式替换指定字符

2.Tokenizer 文本切为分词

官网文档地址:www.elastic.co/guide/en/el… 只能配置一个 用分词器对文本进行分词

3.Token filters 分词后再过滤

官网文档地址:www.elastic.co/guide/en/el… 可配置0个或多个 分词后再加工,比方转小写、删去某些特殊的停用词、增加同义词等

真题演练

有一个文档,内容相似 dag & cat, 要求索引这个文档,而且运用match_parase_query, 查询dag & cat 或许 dag and cat,都能够查到 标题分析: 1.何为match_parase_query:match_phrase 会将检索关键词分词。match_phrase的分词成果有必要在被检索字段的分词中都包括,而且次序有必要相同,而且默许有必要都是连续的。 2.要完成 & 和 and 查询成果要等价,那么就需求自界说分词器来完成了,定制化的需求 3.怎么自界说一个分词器:www.elastic.co/guide/en/el… 4.解法1中心运用功用点,Mapping Character Filter 5.解法2中心运用功用点,www.elastic.co/guide/en/el…

解法1

# 新建索引
PUT /test01
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "char_filter": [
            "my_mappings_char_filter"
          ],
          "tokenizer": "standard",
        }
      },
      "char_filter": {
        "my_mappings_char_filter": {
          "type": "mapping",
          "mappings": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content":{
        "type": "text",
        "analyzer": "my_analyzer"
      }
    }
  }
}
// 阐明
// 三段论之Character filters,运用char_filter进行文本替换
// 三段论之Token filters,运用默许分词器
// 三段论之Token filters,未设定
// 字段content 运用自界说分词器my_analyzer
# 填充测验数据
PUT test01/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}
# 履行测验,doc & cat || oc and cat 成果输出都为两条
POST test01/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}

解法2

# 解题思路,将& 和 and  设定为同义词,运用Token filters
# 创立索引
PUT /test02
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_synonym_analyzer": {
          "tokenizer": "whitespace",
          "filter": [
            "my_synonym"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "lenient": true,
          "synonyms": [
            "& => and"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_synonym_analyzer"
      }
    }
  }
}
// 阐明
// 三段论之Character filters,未设定
// 三段论之Token filters,运用whitespace空格分词器,为什么不用默许分词器?因为默许分词器会把&分词后剔除了,就无法在去做分词后的过滤操作了
// 三段论之Token filters,运用synony分词后过滤器,对&和and做同义词
// 字段content 运用自界说分词器my_synonym_analyzer
# 填充测验数据
PUT test02/_bulk
{"index":{"_id":1}}
{"content":"doc & cat"}
{"index":{"_id":2}}
{"content":"doc and cat"}
# 履行测验
POST test02/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "content": "doc & cat"
          }
        }
      ]
    }
  }
}

六.multi-fields

官网文档地址:www.elastic.co/guide/en/el…

// 单字段多类型,比方一个字段我想设置两种分词器
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "analyzer":"standard",
        "fields": {
          "fieldText": { 
            "type":  "text",
            "analyzer":"ik_smart",
          }
        }
      }
    }
  }
}

七.runtime_field 运行时字段

官网文档地址:www.elastic.co/guide/en/el…

发生背景

假定事务中需求依据某两个数字类型字段的差值来排序,也便是我需求一个不存在的字段, 那么此刻应该怎么办? 当然你能够刷数,新增一个差值成果字段来完成,假定此刻不允许你刷数新增字段怎么办?

解决方案

ElasticSearch必知必会-基础篇

应用场景

  1. 在不重新建立索引的情况下,向现有文档新增字段
  2. 在不了解数据结构的情况下处理数据
  3. 在查询时覆盖从原索引字段回来的值
  4. 为特定用处界说字段而不修正底层架构

功用特性

  1. Lucene完全无感知,因没有被索引化,没有doc_values
  2. 不支持评分,因为没有倒排索引
  3. 打破传统先界说后运用的方法
  4. 能阻止mapping爆破
  5. 增加了API的灵活性
  6. 留意,会使得查找变慢

实际运用

  • 运行时检索指定,即检索环节可运用(也便是哪怕mapping中没有这个字段,我也能够查询)
  • 动态或静态mapping指定,即mapping环节可运用(也便是在mapping中增加一个运行时的字段)

真题演练1

# 假定有以下索引和数据
PUT test03
{
  "mappings": {
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
POST test03/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 要求:emotion > 5, 回来emotion_falg = '1',  
# 要求:emotion < 5, 回来emotion_falg = '-1',  
# 要求:emotion = 5, 回来emotion_falg = '0',  

解法1

检索时指定运行时字段: www.elastic.co/guide/en/el… 该字段本质上是不存在的,所以需求检索时要加上 fields *

GET test03/_search
{
  "fields": [
    "*"
  ], 
  "runtime_mappings": {
    "emotion_falg": {
      "type": "keyword",
      "script": {
        "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
      }
    }
  }
}

解法2

创立索引时指定运行时字段:www.elastic.co/guide/en/el… 该方法支持经过运行时字段做检索

# 创立索引并指定运行时字段
PUT test03_01
{
  "mappings": {
    "runtime": {
      "emotion_falg": {
        "type": "keyword",
        "script": {
          "source": """
          if(doc['emotion'].value>5)emit('1');
          if(doc['emotion'].value<5)emit('-1');
          if(doc['emotion'].value==5)emit('0');
          """
        }
      }
    },
    "properties": {
      "emotion": {
        "type": "integer"
      }
    }
  }
}
# 导入测验数据
POST test03_01/_bulk
{"index":{"_id":1}}
{"emotion":2}
{"index":{"_id":2}}
{"emotion":5}
{"index":{"_id":3}}
{"emotion":10}
{"index":{"_id":4}}
{"emotion":3}
# 查询测验
GET test03_01/_search
{
  "fields": [
    "*"
  ]
}

真题演练2

# 有以下索引和数据
PUT test04
{
  "mappings": {
    "properties": {
      "A":{
        "type": "long"
      },
      "B":{
        "type": "long"
      }
    }
  }
}
PUT task04/_bulk
{"index":{"_id":1}}
{"A":100,"B":2}
{"index":{"_id":2}}
{"A":120,"B":2}
{"index":{"_id":3}}
{"A":120,"B":25}
{"index":{"_id":4}}
{"A":21,"B":25}
# 需求:在task04索引里,创立一个runtime字段,其值是A-B,称号为A_B; 创立一个range聚合,分为三级:小于0,0-100,100以上;回来文档数
// 运用知识点:
// 1.检索时指定运行时字段: https://www.elastic.co/guide/en/elasticsearch/reference/8.1/runtime-search-request.html
// 2.范围聚合 https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-range-aggregation.html

解法

# 成果测验
GET task04/_search
{
  "fields": [
    "*"
  ], 
  "size": 0, 
  "runtime_mappings": {
    "A_B": {
      "type": "long",
      "script": {
        "source": """
          emit(doc['A'].value - doc['B'].value);
          """
      }
    }
  },
  "aggs": {
    "price_ranges_A_B": {
      "range": {
        "field": "A_B",
        "ranges": [
          { "to": 0 },
          { "from": 0, "to": 100 },
          { "from": 100 }
        ]
      }
    }
  }
}

八.Search-highlighted

highlighted语法初识

官网文档地址:www.elastic.co/guide/en/el…

ElasticSearch必知必会-基础篇

九.Search-Order

Order语法初识

官网文档地址: www.elastic.co/guide/en/el…

// 留意:text类型默许是不能排或聚合的,假定非要排序或聚合,需求敞开fielddata
GET /kibana_sample_data_ecommerce/_search
{
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  },
  "highlight": {
    "number_of_fragments": 3,
    "fragment_size": 150,
    "fields": {
      "customer_last_name": {
        "pre_tags": [
          "<em>"
        ],
        "post_tags": [
          "</em>"
        ]
      }
    }
  },
  "sort": [
    {
      "currency": {
        "order": "desc"
      },
      "_score": {
        "order": "asc"
      }
    }
  ]
}

十.Search-Page

page语法初识

官网文档地址:www.elastic.co/guide/en/el…

# 留意 from的起始值是 0 不是 1
GET kibana_sample_data_ecommerce/_search
{
  "from": 5,
  "size": 20,
  "query": {
    "match": {
      "customer_last_name": "wood"
    }
  }
}

真题演练1

# 标题
In the spoken lines of the play, highlight the word Hamlet (int the text_entry field) startint the highlihnt with "#aaa#" and ending it with "#bbb#"
return all of speech_number field lines in reverse order; '20' speech lines per page,starting from line '40'
# highlight 处理 text_entry 字段 ; 关键词 Hamlet 高亮
# page分页:from:40;size:20
# speech_number:倒序
POST test09/_search
{
  "from": 40,
  "size": 20,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "text_entry": "Hamlet"
          }
        }
      ]
    }
  },
  "highlight": {
    "fields": {
      "text_entry": {
        "pre_tags": [
          "#aaa#"
        ],
        "post_tags": [
          "#bbb#"
        ]
      }
    }
  },
  "sort": [
    {
      "speech_number.keyword": {
        "order": "desc"
      }
    }
  ]
}

十一.Search-AsyncSearch

官网文档地址:www.elastic.co/guide/en/el…

发行版本

7.7.0

适用场景

允许用户在异步查找成果时能够检索,从而消除了仅在查询完成后才等待终究呼应的情况

常用命令

  • 履行异步检索

  • POST /sales*/_async_search?size=0

  • 检查异步检索

  • GET /_async_search/id值

  • 检查异步检索状态

  • GET /_async_search/id值

  • 删去、终止异步检索

  • DELETE /_async_search/id值

异步查询成果阐明

回来值 含义
id 异步检索回来的唯一标识符
is_partial 当查询不再运行时,指示再所有分片上查找是成功仍是失败。在履行查询时,is_partial=true
is_running 查找是否依然再履行
total 将在多少分片上履行查找
successful 有多少分片已经成功完成查找

十二.Aliases索引别号

官网文档地址:www.elastic.co/guide/en/el…

Aliases的作用

在ES中,索引别号(index aliases)就像一个快捷方法或软连接,能够指向一个或多个索引。别号带给咱们极大的灵活性,咱们能够运用索引别号完成以下功用:

  1. 在一个运行中的ES集群中无缝的切换一个索引到另一个索引上(无需停机)

  2. 分组多个索引,比方按月创立的索引,咱们能够经过别号构造出一个最近3个月的索引

  3. 查询一个索引里边的部分数据构成一个相似数据库的视图(views

假定没有别号,怎么处理多索引的检索

方法1:POST index_01,index_02.index_03/_search 方法2:POST index*/search

创立别号的三种方法

  1. 创立索引的一起指定别号
# 指定test05的别号为 test05_aliases
PUT test05
{
  "mappings": {
    "properties": {
      "name":{
        "type": "keyword"
      }
    }
  },
  "aliases": {
    "test05_aliases": {}
  }
}
  1. 运用索引模板的方法指定别号
PUT _index_template/template_1
{
  "index_patterns": ["te*", "bar*"],
  "template": {
    "settings": {
      "number_of_shards": 1
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    },
    "aliases": {
      "mydata": { }
    }
  },
  "priority": 500,
  "composed_of": ["component_template1", "runtime_component_template"], 
  "version": 3,
  "_meta": {
    "description": "my custom"
  }
}
  1. 对已有的索引创立别号
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

删去别号

POST _aliases
{
  "actions": [
    {
      "remove": {
        "index": "logs-nginx.access-prod",
        "alias": "logs"
      }
    }
  ]
}

真题演练1

# Define an index alias for 'accounts-row' called 'accounts-male': Apply a filter to only show the male account owners
# 为'accounts-row'界说一个索引别号,称为'accounts-male':应用一个过滤器,只显示男性账户所有者
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "accounts-row",
        "alias": "accounts-male",
        "filter": {
          "bool": {
            "filter": [
              {
                "term": {
                  "gender.keyword": "male"
                }
              }
            ]
          }
        }
      }
    }
  ]
}

十三.Search-template

官网文档地址:www.elastic.co/guide/en/el…

功用特色

模板承受在运行时指定参数。查找模板存储在服务器端,能够在不更改客户端代码的情况下进行修正。

初识search-template

# 创立检索模板
PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "{{query_key}}": "{{query_value}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    }
  }
}
# 运用检索模板查询
GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_key": "your filed",
    "query_value": "your filed value",
    "from": 0,
    "size": 10
  }
}

索引模板的操作

创立索引模板

PUT _scripts/my-search-template
{
  "script": {
    "lang": "mustache",
    "source": {
      "query": {
        "match": {
          "message": "{{query_string}}"
        }
      },
      "from": "{{from}}",
      "size": "{{size}}"
    },
    "params": {
      "query_string": "My query string"
    }
  }
}

验证索引模板

POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}

履行检索模板

GET my-index/_search/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 0,
    "size": 10
  }
}

获取悉数检索模板

GET _cluster/state/metadata?pretty&filter_path=metadata.stored_scripts

删去检索模板

DELETE _scripts/my-search-templateath=metadata.stored_scripts

十四.Search-dsl 简略检索

官网文档地址:www.elastic.co/guide/en/el…

检索选型

ElasticSearch必知必会-基础篇

检索分类

ElasticSearch必知必会-基础篇

自界说评分

怎么自界说评分

ElasticSearch必知必会-基础篇

1.index Boost索引层面修正相关性

// 一批数据里,有不同的标签,数据结构一致,不同的标签存储到不同的索引(A、B、C),最终要严格依照标签来分类展示的话,用什么查询比较好?
// 要求:先展示A类,然后B类,然后C类
# 测验数据如下
put /index_a_123/_doc/1
{
  "title":"this is index_a..."
}
put /index_b_123/_doc/1
{
  "title":"this is index_b..."
}
put /index_c_123/_doc/1
{
  "title":"this is index_c..."
}
# 普通不指定的查询方法,该查询方法下,回来的三条成果数据评分是相同的
POST index_*_123/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}
官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-search.html
indices_boost
# 也便是索引层面提高权重
POST index_*_123/_search
{
  "indices_boost": [
    {
      "index_a_123": 10
    },
    {
      "index_b_123": 5
    },
    {
      "index_c_123": 1
    }
  ], 
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "this"
          }
        }
      ]
    }
  }
}

2.boosting 修正文档相关性

某索引index_a有多个字段, 要求完成如下的查询:
1)针对字段title,满意'ssas'或许'sasa’。
2)针对字段tags(数组字段),假定tags字段包括'pingpang',
则提高评分。
要求:写出完成的DSL?
# 测验数据如下
put index_a/_bulk
{"index":{"_id":1}}
{"title":"ssas","tags":"basketball"}
{"index":{"_id":2}}
{"title":"sasa","tags":"pingpang; football"}
# 解法1
POST index_a/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ],
      "should": [
        {
          "match": {
            "tags": {
              "query": "pingpang",
              "boost": 1
            }
          }
        }
      ]
    }
  }
}
# 解法2
// https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
POST index_a/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "function_score": {
            "query": {
              "match": {
                "tags": {
                  "query": "pingpang"
                }
              }
            },
            "boost": 1
          }
        }
      ],
      "must": [
        {
          "bool": {
            "should": [
              {
                "match": {
                  "title": "ssas"
                }
              },
              {
                "match": {
                  "title": "sasa"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

3.negative_boost下降相关性

对于某些成果不满意,但又不想经过 must_not 排除掉,能够考虑能够考虑boosting query的negative_boost。
即:下降评分
negative_boost
(Required, float) Floating point number between 0 and 1.0 used to decrease the relevance scores of documents matching the negative query.
官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-boosting-query.html
POST index_a/_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "tags": "football"
        }
      },
      "negative": {
        "term": {
          "tags": "pingpang"
        }
      },
      "negative_boost": 0.5
    }
  }
}

4.function_score 自界说评分

怎么一起依据 销量和阅读人数进行相关度提高?
问题描绘:针对商品,例如有想要有一个提高相关度的计算,一起针对销量和阅读人数?
例如oldScore*(销量+阅读人数)
**************************  
商品        销量        阅读人数  
A         10           10      
B         20           20
C         30           30
************************** 
# 示例数据如下    
put goods_index/_bulk
{"index":{"_id":1}}
{"name":"A","sales_count":10,"view_count":10}
{"index":{"_id":2}}
{"name":"B","sales_count":20,"view_count":20}
{"index":{"_id":3}}
{"name":"C","sales_count":30,"view_count":30}
官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/query-dsl-function-score-query.html
知识点:script_score
POST goods_index/_search
{
  "query": {
    "function_score": {
      "query": {
        "match_all": {}
      },
      "script_score": {
        "script": {
          "source": "_score * (doc['sales_count'].value+doc['view_count'].value)"
        }
      }
    }
  }
}

十五.Search-del Bool复杂检索

官网文档地址:www.elastic.co/guide/en/el…

根本语法

ElasticSearch必知必会-基础篇

真题演练

写一个查询,要求某个关键字再文档的四个字段中至少包括两个以上
功用点:bool 查询,should / minimum_should_match
    1.检索的bool查询
    2.细节点 minimum_should_match
留意:minimum_should_match 当有其他子句的时分,默许值为0,当没有其他子句的时分默许值为1
POST test_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "filed1": "kr"
          }
        },
        {
          "match": {
            "filed2": "kr"
          }
        },
        {
          "match": {
            "filed3": "kr"
          }
        },
        {
          "match": {
            "filed4": "kr"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

十六.Search-Aggregations

官网文档地址:www.elastic.co/guide/en/el…

聚合分类

ElasticSearch必知必会-基础篇

ElasticSearch必知必会-基础篇

分桶聚合(bucket)

terms

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-terms-aggregation.html
# 依照作者计算文档数
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      }
    }
  }
}

date_histogram

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-bucket-datehistogram-aggregation.html
# 依照up_time 按月进行计算
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_up_time": {
      "date_histogram": {
        "field": "up_time",
        "calendar_interval": "month"
      }
    }
  }
}

目标聚合 (metrics)

Max

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-max-aggregation.html
# 获取up_time最大的
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "agg_max_up_time": {
      "max": {
        "field": "up_time"
      }
    }
  }
}

Top_hits

官网文档地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-metrics-top-hits-aggregation.html
# 依据user聚合只取一个聚合成果,而且获取命中数据的概况前3条,并依照指定字段排序
POST bilili_elasticsearch/_search
{
  "size": 0,
  "aggs": {
    "terms_agg_user": {
      "terms": {
        "field": "user",
        "size": 1
      },
      "aggs": {
        "top_user_hits": {
          "top_hits": {
            "_source": {
              "includes": [
                "video_time",
                "title",
                "see",
                "user",
                "up_time"
              ]
            }, 
            "sort": [
              {
                "see":{
                  "order": "desc"
                }
              }
            ], 
            "size": 3
          }
        }
      }
    }
  }
}
// 回来成果如下
{
  "took" : 91,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "terms_agg_user" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 975,
      "buckets" : [
        {
          "key" : "Elastic查找",
          "doc_count" : 25,
          "top_user_hits" : {
            "hits" : {
              "total" : {
                "value" : 25,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "5ccCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "03:45",
                    "see" : "92",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: 用加 Gatling 进行Elasticsearch的负载测验,寓教于乐。",
                    "user" : "Elastic查找"
                  },
                  "sort" : [
                    "92"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "8scCVoQBUyqsIDX6wIgn",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "10:18",
                    "see" : "79",
                    "up_time" : "2020-10-20",
                    "title" : "为Elasticsearch发动htpps拜访",
                    "user" : "Elastic查找"
                  },
                  "sort" : [
                    "79"
                  ]
                },
                {
                  "_index" : "bilili_elasticsearch",
                  "_id" : "7scCVoQBUyqsIDX6wIcm",
                  "_score" : null,
                  "_source" : {
                    "video_time" : "04:41",
                    "see" : "71",
                    "up_time" : "2021-03-19",
                    "title" : "Elastic 社区大会2021: Elasticsearch作为一个地理空间的数据库",
                    "user" : "Elastic查找"
                  },
                  "sort" : [
                    "71"
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}

子聚合 (Pipeline)

Pipeline:根据聚合的聚合 官网文档地址:www.elastic.co/guide/en/el…

bucket_selector

官网文档地址:www.elastic.co/guide/en/el…

# 依据order_date按月分组,而且求销售总额大于1000
POST kibana_sample_data_ecommerce/_search
{
  "size": 0,
  "aggs": {
    "date_his_aggs": {
      "date_histogram": {
        "field": "order_date",
        "calendar_interval": "month"
      },
      "aggs": {
        "sum_aggs": {
          "sum": {
            "field": "total_unique_products"
          }
        },
        "sales_bucket_filter": {
          "bucket_selector": {
            "buckets_path": {
              "totalSales": "sum_aggs"
            },
            "script": "params.totalSales > 1000"
          }
        }
      }
    }
  }
}

真题演练

earthquakes索引中包括了曩昔30个月的地震信息,请经过一句查询,获取以下信息
l 曩昔30个月,每个月的均匀 mag
l 曩昔30个月里,均匀mag最高的一个月及其均匀mag
l 查找不能回来任何文档
max_bucket 官网地址:https://www.elastic.co/guide/en/elasticsearch/reference/8.1/search-aggregations-pipeline-max-bucket-aggregation.html
POST earthquakes/_search
{
  "size": 0, 
  "query": {
    "range": {
      "time": {
        "gte": "now-30M/d",
        "lte": "now"
      }
    }
  },
  "aggs": {
    "agg_time_his": {
      "date_histogram": {
        "field": "time",
        "calendar_interval": "month"
      },
      "aggs": {
        "avg_aggs": {
          "avg": {
            "field": "mag"
          }
        }
      }
    },
    "max_mag_sales": {
      "max_bucket": {
        "buckets_path": "agg_time_his>avg_aggs" 
      }
    }
  }
}