其他分享
首页 > 其他分享> > ELK-filter过滤器使用方法

ELK-filter过滤器使用方法

作者:互联网

kibana自带grok插件工具

处理日志读取,思路是:先分析日志信息是什么格式,以及日志规则需要filter里面的什么模块处理或者组合使用处理??

官网地址

https://www.elastic.co/guide/en/logstash/7.12/filter-plugins.html

grok正则测试

https://grokdebug.herokuapp.com/

logstash的grok路径

[root@es-web1 ~]# ll /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns

-rw-r--r-- 1 root root 5514 Apr 21 03:50 /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.3.1/patterns/ecs-v1/grok-patterns

案例 非json格式日志

192.168.7.10 - - [24/May/2021:15:50:47 +0800] "GET /shijiange HTTP/1.1" 404 571 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36"

通过grok正则获取

%{IP:clientip} - - \[(?<requesttime>[^ ]+ \+\d+)\] "(?<requesttype>\w+) (?<requesturl>[^ ]+) HTTP/\d.\d" (?<status>\d+) (?<size>\d+) "[^"]+" "(?<ua>[^"]+)"

效果

Grok提供的常用Patterns说明及举例

大多数Linux使用人员都有过用正则表达式来查询机器中相关文件或文件里内容的经历,在Grok里,我们也是使用正则表达式来识别日志里的相关数据块。
  有两种方式来使用正则表达式:

  直接写正则来匹配
  用Grok表达式映射正则来匹配
  在我看来,每次重新写正则是一件很痛苦的事情,为什么不用表达式来一劳永逸呢?
  特别提示:Grok表达式很像C语言里的宏定义
  要学习Grok的默认表达式,我们就要找到它的具体配置路径,路径如下:
# Windows下路径[你的logstash安装路径]\vendor\bundle\jruby\x.x\gems\logstash-patterns-core-x.x.x\patterns\grok-patterns  现在对常用的表达式进行说明:

常用表达式

  USERNAME 或 USER
  用户名,由数字、大小写及特殊字符(._-)组成的字符串
  比如:1234、Bob、Alex.Wong等

  EMAILLOCALPART
  电子邮件用户名部分,首位由大小写字母组成,其他位由数字、大小写及特殊字符(_.+-=:)组成的字符串。注意,国内的QQ纯数字邮箱账号是无法匹配的,需要修改正则
  比如:stone、Gary_Lu、abc-123等

  EMAILADDRESS
  电子邮件
  比如:stone@abc.com、Gary_Lu@gmail.com、abc-123@163.com等

  HTTPDUSER
  Apache服务器的用户,可以是EMAILADDRESS或USERNAME
  INT
  整数,包括0和正负整数
  比如:0、-123、43987等

  BASE10NUM 或 NUMBER
  十进制数字,包括整数和小数
  比如:0、18、5.23等

  BASE16NUM
  十六进制数字,整数
  比如:0x0045fa2d、-0x3F8709等

  BASE16FLOAT
  十六进制数字,整数和小数
  WORD
  字符串,包括数字和大小写字母
  比如:String、3529345、ILoveYou等

  NOTSPACE
  不带任何空格的字符串
  SPACE
  空格字符串
  QUOTEDSTRING 或 QS
  带引号的字符串
  比如:"This is an apple"、'What is your name?'等

  UUID
  标准UUID
  比如:550E8400-E29B-11D4-A716-446655440000

  MAC
  MAC地址,可以是Cisco设备里的MAC地址,也可以是通用或者Windows系统的MAC地址
  IP
  IP地址,IPv4或IPv6地址
  比如:127.0.0.1、FE80:0000:0000:0000:AAAA:0000:00C2:0002等

  HOSTNAME
  主机名称
  IPORHOST
  IP或者主机名称
  HOSTPORT
  主机名(IP)+端口
  比如:127.0.0.1:3306、api.stozen.NET:8000等

  PATH
  路径,Unix系统或者Windows系统里的路径格式
  比如:/usr/local/nginx/sbin/nginx、c:\windows\system32\clr.exe等

  URIPROTO
  URI协议
  比如:http、ftp等

  URIHOST
  URI主机
  比如:www.stozen.Net、10.0.0.1:22等

  URIPATH
  URI路径
  比如://www.stozen.net/abc/、/api.PHP等

  URIPARAM
  URI里的GET参数
  比如:?a=1&b=2&c=3

  URIPATHPARAM
  URI路径+GET参数
  比如://www.stozen.net/abc/api.php?a=1&b=2&c=3

  URI
  完整的URI
  比如:http://www.stozen.net/abc/api.php?a=1&b=2&c=3

日期时间表达式

  MONTH
  月份名称
  比如:Jan、January等

  MONTHNUM
  月份数字
  比如:03、9、12等

  MONTHDAY
  日期数字
  比如:03、9、31等

  DAY
  星期几名称
  比如:Mon、Monday等

  YEAR
  年份数字
  HOUR
  小时数字
  MINUTE
  分钟数字
  SECOND
  秒数字
  TIME
  时间
  比如:00:01:23

  DATE_US
  美国日期格式
  比如:10-15-1982、10/15/1982等

  DATE_EU
  欧洲日期格式
  比如:15-10-1982、15/10/1982、15.10.1982等

  ISO8601_TIMEZONE
  ISO8601时间格式
  比如:+10:23、-1023等

  TIMESTAMP_ISO8601
  ISO8601时间戳格式
  比如:2016-07-03T00:34:06+08:00

  DATE
  日期,美国日期%{DATE_US}或者欧洲日期%{DATE_EU}
  DATESTAMP
  完整日期+时间
  比如:07-03-2016 00:34:06

  HTTPDATE
  http默认日期格式
  比如:03/Jul/2016:00:36:53 +0800

Log表达式

  LOGLEVEL
  日志等级
  比如:Alert、alert、ALERT、Error等

三、创建自己的Grok表达式
  在业务领域中,可能会有越来越多的日志格式出现在我们眼前,而Grok的默认表达式显然已无法满足我们的需求(比如用户身份证号、手机号等信息),所以,我们需要自己动手添加些表达式。
表达式正则表达式说明DATE_CHS%{YEAR}[./-]%{MONTHNUM}[./-]%{MONTHDAY}中国人习惯的日期格式ZIPCODE_CHS[1-9]\d{5}国内邮政编码GAME_ACCOUNT[a-zA-Z][a-zA-Z0-9_]{4,15}游戏账号,首字符为字母,4-15位字母、数字、下划线组成  还有很多,需要您在业务中灵活运用!

官方grok自带语法

USERNAME [a-zA-Z0-9_-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b

POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
#QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"])*"|(?:'(?:\\.|[^\\'])*')|(?:`(?:\\.|[^\\`])*`)))
QUOTEDSTRING (?:(?<!\\)(?:"(?:\\.|[^\\"]+)*"|(?:'(?:\\.|[^\\']+)*')|(?:`(?:\\.|[^\\`]+)*`)))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}

# Networking
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
IP (?<![0-9])(?:(?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2})[.](?:25[0-5]|2[0-4][0-9]|[0-1]?[0-9]{1,2}))(?![0-9])
HOSTNAME \b(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(\.?|\b)
HOST %{HOSTNAME}
IPORHOST (?:%{HOSTNAME}|%{IP})
HOSTPORT (?:%{IPORHOST=~/\./}:%{POSINT})

# paths
PATH (?:%{UNIXPATH}|%{WINPATH})
UNIXPATH (?:/(?:[\w_%!$@:.,-]+|\\.)*)+
NUXTTY (?:/dev/pts/%{NONNEGINT})
BSDTTY (?:/dev/tty[pq][a-z0-9])
TTY (?:%{BSDTTY}|%{LINUXTTY})
WINPATH (?:[A-Za-z]+:|\\)(?:\\[^\\?*]*)+
URIPROTO [A-Za-z]+(\+[A-Za-z+]+)?
URIHOST %{IPORHOST}(?::%{POSINT:port})?
# uripath comes loosely from RFC1738, but mostly from what Firefox
# doesn't turn into %XX
URIPATH (?:/[A-Za-z0-9$.+!*'(){},~:;=#%_-]*)+
#URIPARAM \?(?:[A-Za-z0-9]+(?:=(?:[^&]*))?(?:&(?:[A-Za-z0-9]+(?:=(?:[^&]*))?)?)*)?
URIPARAM \?[A-Za-z0-9$.+!*'|(){},~#%&/=:;_-]*
URIPATHPARAM %{URIPATH}(?:%{URIPARAM})?
URI %{URIPROTO}://(?:%{USER}(?::[^@]*)?@)?(?:%{URIHOST})?(?:%{URIPATHPARAM})?

# Months: January, Feb, 3, 03, 12, December
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])

# Days: Monday, Tue, Thu, etc...
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)

# Years?
YEAR [0-9]+
# Time: HH:MM:SS
#TIME \d{2}:\d{2}(?::\d{2}(?:\.\d+)?)?
# I'm still on the fence about using grok to perform the time match,
# since it's probably slower.
# TIME %{POSINT<24}:%{POSINT<60}(?::%{POSINT<60}(?:\.%{POSINT})?)?
HOUR (?:2[0123]|[01][0-9])
MINUTE (?:[0-5][0-9])
# '60' is a leap second in most time standards and thus is valid.
SECOND (?:(?:[0-5][0-9]|60)(?:[.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
# datestamp is YYYY/MM/DD-HH:MM:SS.UUUU (or something like it)
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{YEAR}[/-]%{MONTHNUM}[/-]%{MONTHDAY}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}

# Syslog Dates: Month Day HH:MM:SS
SYSLOGTIMESTAMP %{MONTH} +%{MONTHDAY} %{TIME}
PROG (?:[\w._/%-]+)
SYSLOGPROG %{PROG:program}(?:\[%{POSINT:pid}\])?
SYSLOGHOST %{IPORHOST}
SYSLOGFACILITY <%{POSINT:facility}.%{POSINT:priority}>
HTTPDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{INT:ZONE}

# Shortcuts
QS %{QUOTEDSTRING}

# Log formats
SYSLOGBASE %{SYSLOGTIMESTAMP:timestamp} (?:%{SYSLOGFACILITY} )?%{SYSLOGHOST:logsource} %{SYSLOGPROG}:
COMBINEDAPACHELOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{URIPATHPARAM:request}(?: HTTP/%{NUMBER:httpversion})?|-)" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}

# Log Levels
LOGLEVEL ([D|d]ebug|DEBUG|[N|n]otice|NOTICE|[I|i]nfo|INFO|[W|w]arn?(?:ing)?|WARN?(?:ING)?|[E|e]rr?(?:or)?|ERR?(?:OR)?|[C|c]rit?(?:ical)?|CRIT?(?:ICAL)?|[F|f]atal|FATAL)/*#UNIXPATH (?<![\w*/

案例 json格式日志

{"@timestamp":"2021-08-28T21:17:31+08:00","host":"172.31.2.107","clientip":"172.31.0.1","size":0,"responsetime":0.000,"upstreamtime":"-","upstreamhost":"-","http_host":"172.31.2.107","url":"/web/index.html","domain":"172.31.2.107","xff":"-","referer":"-","status":"304"}

通过json模块处理

input {
  redis {
    data_type => "list"
    key => "qq-m44-nginx-log"
    host => "172.31.2.106"
    port => "6379"
    db => "3"
    password => "123456"
    codec => json
  }
}

# 过滤器
filter {
  json {
    source => "message"
    remove_field => ["message","@version","path","beat","input","log","offset","prospector","source","tags"]
  }
  date {
        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
        target => "@timestamp"
    }
}

output {
  if [fields][app] == "nginx-errorlog" {
    elasticsearch {
      hosts => ["172.31.2.101:9200"]
      index => "qq-123test-filebeat-nginx-errorlog-%{+YYYY.MM.dd}"
  }}

  if [fields][app] == "nginx-accesslog" {
    elasticsearch {
      hosts => ["172.31.2.101:9200"]
      index => "qq-123test-filebeat-nginx-accesslog-%{+YYYY.MM.dd}"
  }}
}

访问nginx,终端输出效果

{
           "agent" => {
                "name" => "es-web1.example.local",
                "type" => "filebeat",
        "ephemeral_id" => "2a8806fd-48de-46e0-bdde-502aa74b4c83",
             "version" => "7.12.1",
            "hostname" => "es-web1.example.local",
                  "id" => "51f9df27-4170-4844-ba12-c719de1f4410"
    },
          "domain" => "172.31.2.107",
          "status" => "304",
    "upstreamtime" => "-",
            "size" => 0,
             "xff" => "-",
             "ecs" => {
        "version" => "1.8.0"
    },
      "@timestamp" => 2021-08-29T05:31:29.000Z,
        "clientip" => "172.31.0.1",
         "referer" => "-",
    "responsetime" => 0.0,
    "upstreamhost" => "-",
       "http_host" => "172.31.2.107",
             "url" => "/web/index.html",
            "host" => "172.31.2.107",
          "fields" => {
        "group" => "n125",
          "app" => "nginx-accesslog"
    }
}

标签:ELK,nginx,URI,filter,patterns,过滤器,172.31,比如,表达式
来源: https://www.cnblogs.com/xuanlv-0413/p/15374789.html