其他分享
首页 > 其他分享> > 2021-08-08 Elasticsearch 1-3杂记

2021-08-08 Elasticsearch 1-3杂记

作者:互联网

文章目录

GET _search
{
  "query": {
    "match_all": {}
  }
}

GET securities-futures-2021.31/_doc/?
// 查看索引相关信息
GET securities-futures-2021.31

// 查看索引文档总数
GET securities-futures-2021.31/_count

//查看前10条文档,了解文档格式
POST securities-futures-2021.31/_search
{

}

//查看indices
GET /_cat/indices/securities-*?v&s=index

//查看状态为绿的索引
GET /_cat/indices?v&health=green

//按照文档个数排序
GET /_cat/indices?v&s=docs.count:desc

//查看具体的字段
GET /_cat/indices/securities*?pri&v&h=health,index.pri,rep,docs,count,mt

// 查看每个索引所占用的内存
GET /_cat/indices?v&h=i,tm&s=tm:desc

//
//查看集群的健康状态
GET _cluster/health
GET _cat/nodes
GET _cat/shards

查看索引的mapping的定义
GET indices/_mapping

CURD操作

//PUT文档的时候可以指定文档的ID,使用POST可以把ID去掉,会自动生成ID

PUT

//PUT方法文档的时候可以指定文档的ID,使用POST方法可以把ID去掉,会自动生成ID

PUT securities-futures-2021.31/_cereat/1
{
  
}
PUT securities-futures-2021.31/_doc/1
{
  
}

index与Create不一样的地方是,如果文档不存在就索引新文档,否则现有文档会被删除,新的文档被索引。版本信息+1

POST securities-futures-2021.31/_create
{
  
}

GET securities-futures-2021.31/_doc/1

//update一个文档,使用POST方法
POST securities-futures-2021.31/_update/1
{
 "doc":{
   
 } 
}

使用POST 来create一个文档,自动生成ID

//create document,自动生成_id,自动生成文档ID要用POST
POST securities-futures-2021.31/_doc
{
  "tags": ["Mike","is a person"],
  "@metadata":"2021-08-07T14:12:12",
  "message": "test kibana message"
}

结果如下

{
  "_index" : "securities-futures-2021.31",
  "_type" : "_doc",
  "_id" : "rRZBIHsBY4N9DGxpewtO",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
  "_seq_no" : 39501,
  "_primary_term" : 1
}

把文档拿出来看看

GET securities-futures-2021.31/_doc/rRZBIHsBY4N9DGxpewtO

GET securities-futures-2021.31/_doc/k7FDIHsBzlSCb0pCVpJs

PUT一个文档指定文档的ID

//PUT方法创建一个文档,指定文档ID为1,但是如果id已经存在,则报错:

PUT securities-futures-2021.31/_doc/1?op_type=create
{
  "tags": ["Jason","is a person"],
  "@metadata":"2021-08-07T14:12:12",
  "message": "test kibana message"
  
}

//以上是一个创建的操作op_type,所以如果存在id就会报错,但是如果不是创建的type,则默认是更新的操作,可以更新文档的版本号,如下,内容一样,但是每一次都会更新文档的版本号,这其实是一个index的操作,把旧文档删除,然后写入新文档,把版本号加1

PUT securities-futures-2021.31/_doc/1
{
    "tags": ["Jason","is a person"]
}

GET securities-futures-2021.31/_doc/1

更新操作,对文档增加两个字段

POST securities-futures-2021.31/_update/1
{
  "doc":{
    "@metadata": "2021-08-07T14:13:13",
    "message": "test kibana message2"
  }
}

BULK的API

#在一次REST的API中执行多种操作,还可以对不同的索引进行操作

POST _bulk
{"index":{"_index":"test","_id":1}}
{"field1":"value1"}
{"delete":{"_index":"test","_id":"2"}}
{"create":{"_index":"test2","_id":"3"}}
{"field1":"value3"}
{"update":{"_id":"1","_index":"test"}}
{"doc":{"field2":"value2"}}

GET test/_doc/1
GET test2/_doc/3

mget 批量读取操作

GET _mget
{
  "docs":[
    {
      "_index": "user",
      "_id":1
    },
    {
      "_index": "comment",
      "_id":1
    },
    {
      "_index": "test",
      "_id": 1
    }
    ]
}

msearch 指搜索

批量查询

GET securities-futures-2021.08.05
POST securities-futures-2021.31/_msearch
{}
{"query":{"match_all":{}},"size":1}
{"index":"securities-futures-2021.08.05"}
{"query":{"match_all":{}},"size":2}

POST /_msearch
{"index":"securities-futures-2021.31"}
{"query":{"match_all":{}},"size":1}
{"index":"securities-futures-2021.08.05"}
{"query":{"match_all":{}},"size":2}

3.15节—倒排索引##############

倒排索引按照word进行索引标明这个word在哪个文档中出现过。
拿图书来类比,图书中的目录页就是正排索引,而书最后的索引页,就是一种倒排索引的实现。
正排索引是文档ID到文档内容的单词的索引,
倒排索引是从单词来找文档的ID的索引。
所以在文档定入的时候要进行分词。

倒排索引项:
1,文档ID
2,词频TF,出现的次数
3,位置,position
4,偏移,Offset

Elasticsearch的JSON文档中的每个字段,都有自己的倒排索引,可以指定某些字段不用做索引,这样可以节省空间,但是该字段就无法进行搜索了。

3.16节–analysis,分词器

analysis是把文档分成单词的过程,也叫分词。是通过analyzer来实现的,elasticsearch内置了一些分词器,也可以按需求定制化一些分词器。
除了在写入时转换词条外,匹配query语句的时候也需要用相同的分词器对查询语句进行分析。

Analyzer分词器由3部分组成:
1,Chracter Filters:
对原始文本进行一些处理,例如去除html的标签。
2,Tokenizer:
按照一定的规则,将输入的字符串进行切分,例如按照空格进行切分
3,Token Filters:
对切分的单词进行加工,例如大小转小写,频用词去掉,增加一些近义词

内置的分词器:

1,standard analyzer:默认的分词器,按词切分,小写处理。
2,simple analyzer:按照非字母切分(符号被过滤),小写处理。
3,stop analyzer:小写处理,停用词过滤(the,a,is等)tokennizer是lowercase,Token Filter 是Stop
4,whitespace analyzer:按照空格进行切分,不转小写。
5,keyword analyzer:不分词,直接将输入当作输出。不不需要进行分词的时候就选这个。
6,patter analyzer: 正则表达式切分,默认\W+(非字符分割),Tokenizer是pattern,而Token Filter是Lower case + Stop.
7,language ,提供了30多种常见语言的分词器。什么ing形式转成正式单词,复数转单数。
8,customer analyzer: 自定义分词器。

_analyze的api

直接指定analyzer进行测试

GET /_analyze
{
	"analyzer":"standard",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}
GET /_analyze
{
	"analyzer":"simple",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}
GET /_analyze
{
	"analyzer":"stop",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}
GET /_analyze
{
	"analyzer":"whitespace",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}

GET /_analyze
{
	"analyzer":"keyword",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}
GET /_analyze
{
	"analyzer":"pattern",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}

GET /_analyze
{
	"analyzer":"english",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}

指定的字段进行测试

POST books/_analyze
{
	"field": "title",
	"text":"Mastering Elasticsearch"
}

#自定义分词器进行测试
POST /_analyze
{
	"tokenizer":"standard",
	"filter":[“lowercase”],
	"test":"Mastering Elasticsearch"
}

中文分词

中文需要根据上下文进行分词。需要安装一个插件
elasticsearch-plugin install analysis-icu
提供了一个unicode的支持,更好的支持亚洲语言。

GET /_analyze
{
	"analyzer":"icu_analyzer",
	"text": "Mastering Elasticsearch,elasticsearch in Action"
}

还有更好的分词器:IK,THULAC(清华大学自然语言处理和社会人文计算实验室)

3.14节—Search API 概览

API 分两类
1,URIsearch,在URI中使用GET参数来查询
2,Request BODY方式,在BODY中使用JSON格式进行查询。
2.1,/_search
2.2,/index1/_search
2.3,/index1,index2/_search
2.4,/index*/_search

curl -XGET “http://localhost:9200/securities-futures-2021.31/_search?q=message:testmessage”
上述URL中,_search指明要查询
q指明查询的条件。

curl -XGET "http://localhost:9200/securities-futures-2021.31/_search"-H 'Content-Type:application/json' -d'
{
	"query":{
		"match_all":{}
	}
}

3.15节------URI Search

GET /securities-futures-2021.31/_search?q=2012&df=title&sort=year:desc&from=o&size=10&timeout=1s
{
	"prifile": true
}

指定字段查询 与泛查询

q=title:2012 q=2012

指定字段查询,以下指定从host.name字段进行查询

GET securities-futures-2021.31/_search?q=host.name:ELK-NODE-1&from=0&size=1
{
  "profile": true
}
结果如下:
{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 0.0029382796,
    "hits" : [
      {
        "_index" : "securities-futures-2021.31",
        "_type" : "_doc",
        "_id" : "9aymFXsB-X2346SZyKCM",
        "_score" : 0.0029382796,
        "_source" : {
          "log" : {
            "file" : {
              "path" : "/var/log/messages"
            },
            "offset" : 2809904605
          },
          "ecs" : {
            "version" : "1.8.0"
          },
          "agent" : {
            "hostname" : "ELK-NODE-1",
            "version" : "7.13.4",
            "ephemeral_id" : "dcfa9937-6316-4684-a2dd-e9437cbe5c81",
            "id" : "3fa3c407-7491-4888-b338-d52ed73ddab3",
            "type" : "filebeat",
            "name" : "ELK-NODE-1"
          },
          "message" : "Aug  5 17:30:49 elk-node-1 logstash: [2021-08-05T17:30:49,416][INFO ][logstash.outputs.file    ][main][f88fdfac4ab33fa2abbfe198d5bf459f178520a5935bb820ace171000ecb342f] Closing file /tmp/futures/ELK-NODE-1-192.168.9.45-2021.08.05",
          "@timestamp" : "2021-08-05T09:30:55.909Z",
          "input" : {
            "type" : "log"
          },
          "tags" : [
            "标签",
            "可以随便写",
            "可以写中文",
            "kafka-tags",
            "from-filebeat-tag"
          ],
          "host" : {
            "name" : "ELK-NODE-1"
          },
          "fields" : {
            "applicationname" : "frontend",
            "address" : "dlfjdjfla kjfdlja flj asdljf ljd",
            "hostname" : "ELK-NODE-1",
            "systemname" : "futures",
            "ipaddress" : "192.168.9.45"
          },
          "@version" : "1"
        }
      }
    ]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[kicYTiloQ_ed57aZCXfjnA][securities-futures-2021.31][0]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "TermQuery",
                "description" : "host.name:ELK-NODE-1",
                "time_in_nanos" : 3119640,
                "breakdown" : {
                  "set_min_competitive_score_count" : 10,
                  "match_count" : 0,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 10988,
                  "next_doc" : 1423795,
                  "match" : 0,
                  "next_doc_count" : 10080,
                  "score_count" : 10080,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 88104,
                  "advance_count" : 8,
                  "score" : 1082164,
                  "build_scorer_count" : 16,
                  "create_weight" : 233821,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 280768
                }
              }
            ],
            "rewrite_time" : 3678,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 2026048
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

泛查询:所有字段都查询:

GET securities-futures-2021.31/_search?q=ELK-NODE-1
  "profile": true
}
GET securities-futures-2021.31/_search?q=ELK-NODE-1&df=host.name&from=0&size=1
  "profile": true
}

Term 与Phrase

Beautiful Mind,等同于查Beautiful 或者 Mind 两个词。随便出现一个即匹配到查询
“Beautiful Mind” 有引号引起来表示2个必需同时出现,且顺序要保持一致

分组与引号

title:(Beautiful AND Mind)
title=“Beautiful Mind”

GET /indices/_search?q=title:Beautiful Mind
{
	"profile": true
}

如上,这样写就是查看Beautiful 或者Mind两个词的就可以

GET /indices/_search?q=title:(Beautiful Mind)
{
	"profile": true
}

如上,必需(Beautiful Mind) 在一起,而且前后顺序一致

布尔操作

GET /indices/_search?q=title:(Beautiful AND Mind)
{
	"profile":true
}
GET /indices/_search?q=title:(Beautiful NOT Mind)
{
	"profile":true
}
GET /indices/_search?q=title:(Beautiful NOT Mind)
{
	"profile":true
}
GET /indices/_search?q=title:(Beautiful @2BMind)
{
	"profile":true
}

%2B表示+号,表示must

范围查询

GET /indices/_search?q=year:>=2018
{
	“pfofile”:"true"
}

算数符号

通配符查询(通配符查询效率低,占用内存大,不建议使用,特别是在最前面)

正则表达式查询

模糊匹配与近似查询

GET /indices/_search?q=title:beautifl~1
{
	"profile":"true"
}

模糊匹配,但是可以模糊匹配。U为故意写错的,查找与这个词近似的值。

Request Body的查询

URIsearch的查询语法有限,一些高阶的查询方法只能用Request Body的方法进行查询

一个简单的示例


POST /indices,404_idx/_search?ignore_unavailable=true
{
	"profile":“true",
	"query":{
			"match_all":{}
	}
}

POST /indices,404_idx/_search?ignore_unavailable=true
{
	"profile":“true",
	"sort"[{"order_date":"desc"}],
	"_source":["order_date","order_date","category.keyword"],
	"from": 10,
	"size": 5,
	"query":{
			"match_all":{}
	}
}

sort指明以哪个字段进行排序
_source:,指明从哪几个字段查询

排序最好在“数字型”与“日期型”字段上进行排序
如果不指定"_source",则从所有字段中进行查询,可以使用source写明指定字段进行查询。

脚本字段

“painless" 拼接

GET /index/_search
{
	"script_fields":{
		"new_field":{
			"script":{
				"lang":"painless",
				"source":"doc['order_date'].value+'_hello'"
			}
		}
	}
	"from":10,
	"size":5,
	"query":{
		"match_all":{}
	}
}
GET securities-futures-2021.31/_search
{
  "script_fields": {
    "test_field": {
      "script": {
        "lang": "painless",
        "source": "doc['agent.hostname'].value+doc['agent.id'].value"
      }
    }
  },
  "from":10,
  "size":5,
  "query":{
    "match_all":{}
  }
}

painless是elasticsearch的脚本去算出一个新的字段结果来。
订单中有不同的汇率,需要结合汇率对订单价格进行排序。

match 查询

GET /securities-futures-2021.31/_search
{
  "query":{
    "match": {
      "fields.hostname": "ELK-NODE-1"
    }
  },
  "size":10,
  "from":5
}

使用match匹配你要查询的字段的值。

GET /securities-futures-2021.31/_search
{
  "query":{
    "match": {
      "fields.hostname": "ELK-NODE-1 ELK-NODE-2"
    }
  },
  "size":1,
  "from":5
}

但是如果如上所示,如果填入两个词的话,则只查询后面的词,如果想两个都查,

GET /securities-futures-2021.31/_search
{
  "query":{
    "match": {
      "fields.hostname": {
      	"query":"ELK-NODE-1 ELK-NODE-2"
      	"operator":"AND"
      }
    }
  },
  "size":1,
  "from":5
}

指明需要一个AND的操作,两个字一起查。

match_phrase

GET /securities-futures-2021.31/_search
{
  "query":{
    "match_phrase": {
      "agent.hostname": {
        "query": "INFO monitoring",
        "slop": 1
      }
    }
  }
}

slop的值为1,表示中间可以有1个不同的值。

Query String 与Simple Query

Query String

GET securities-futures-2021.31/_search
{
  "query":{
    "query_string": {
      "default_field": "message",
      "query": "Non-zero AND metrics"
    }
  }
}

Simple Query

GET securities-futures-2021.31/_search
{
  "query":{
    "simple_query_string": {
      "query": "Non-zero AND metrics",
      "fields": ["message"],
      "default_operator": "AND"
    }
    }
  }
}

Mapping 与Dynamic Mapping

Mapping:

与数据中中定义表的属性差不多,

字段类型

Dynamic Mapping

能否更改Mapping的字段类型

3.19节 显示Mapping设置与常见参数介绍

创建一个Mapping

PUT indices
{
	"mapping":{
	# define your mapping here.
	}
}

自定义Mapping的一些建议

如果对一个字段设置好type的类型后,确定这个不需要索引的,可以“index":“false”

index Options:

"bio":{
	"type":"text",
	"index_options":"offsets"
}

null_value

如果需要对null这个值进行搜索,那么type需要设置为keyword类型,只有keyword类型支持设置null_value。

mobile:{
	"type": "keyword",
	"null_value":"NULL"
}

copy_to

举个例子:

PUT users
{
	"mapping":{
		"properties":{
			"firstname":{
				"type":"text",
				"copy_to" "fullname"
			}
			"lastname":{
				"type":"text",
				"copy_to":"fullname"
				}
		}
	}
}

如上,意思就是将fistname与lastname的值都拷贝到fullname,

数组类型

PUT users/_doc/1
{
	"name":"onebird",
	"interests":"reading"
}
PUT user/_doc/2
{
	"name":"twobirds",
	"interests": ["reading","music"]
}

3.20 多字段特性及Mapping中配置自定义Analyzer

PUT products
{
	"mapping":{
		"properties":{
			"company":{
				"type":"text",
				"fields":{
					"keyword":{
						"type":"keyword",
						"ignore_above": 256
					}
				}
			}
			"comment":{
				"type":"text",
				"fields":{
					"english_comment":{
						"type":"text",
						"analyzer":"english",
						"search_analyzer":"english"
					}
				}
			}
		}
	}
}

Exact Values v.s Full Text

自定义分词器

当Elasticsearch 自带的分词器无法满足时,可以自定义分词器,通过自组合不同的组件实现。
一些自带的Character Filters

POST _analyze
{
	"tokenizer":"standard",
	"char_filter":[
		{
		"type":"mapping",
		"mappings":[":) => happy",":( => sad"]
		}
	],
	"text":["i am felling :)" , "Feeling :( today"]
}
{
  "tokens" : [
    {
      "token" : "i",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "am",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "felling",
      "start_offset" : 5,
      "end_offset" : 12,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "happy",
      "start_offset" : 13,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "Feeling",
      "start_offset" : 16,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 104
    },
    {
      "token" : "sad",
      "start_offset" : 24,
      "end_offset" : 26,
      "type" : "<ALPHANUM>",
      "position" : 105
    },
    {
      "token" : "today",
      "start_offset" : 27,
      "end_offset" : 32,
      "type" : "<ALPHANUM>",
      "position" : 106
    }
  ]
}

正则表达式

GET _analyze
{
  "tokenizer": "standard",
  "char_filter": [
    {
    "type":"pattern_replace",
    "pattern":"http://(.*)",
    "replacement": "$1"
  }
  ],
  "text":"http://www.baidu.com"
}

结果:

{
  "tokens" : [
    {
      "token" : "www.baidu.com",
      "start_offset" : 0,
      "end_offset" : 20,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

如上:该语法就把http://去掉了,只留一个纯网址。

tokenizer

POST _analyze
{
  "tokenizer": "path_hierarchy",
  "text":"/data/elasticsearch/log/server/err.log"
}

结果:

{
  "tokens" : [
    {
      "token" : "/data",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/data/elasticsearch",
      "start_offset" : 0,
      "end_offset" : 19,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/data/elasticsearch/log",
      "start_offset" : 0,
      "end_offset" : 23,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/data/elasticsearch/log/server",
      "start_offset" : 0,
      "end_offset" : 30,
      "type" : "word",
      "position" : 0
    },
    {
      "token" : "/data/elasticsearch/log/server/err.log",
      "start_offset" : 0,
      "end_offset" : 38,
      "type" : "word",
      "position" : 0
    }
  ]
}

每一级目录都会被拆分出来

为索引创建一个analyzer

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer":{
          "type":"custome",
          "char_filter":["emotions"],
          "tokenizer":"punctuation",
          "filter":[
            "lowercase",
            "english_stop"
            ]
        }
      },
      "tokenizer": {
        "punctuation":{
          "type":"pattern",
          "pattern":"[.,!?]"
        }
      }
    }
  }
}

3.21 Index Template 和 Dynamic Template

3.22 Elasticsearch 聚合分析简介

集合的分类:

Bucket 与 Metric

Bucket 相当于sql里面的group,一组满足条件的文档
Metric 相当于一系列的统计算法。

Metric

标签:GET,08,futures,securities,索引,文档,2021,Elasticsearch,type
来源: https://blog.csdn.net/Huangfei10086/article/details/119507312