也跑起来了,也瞅了,干点啥吧

What

Logstash作为Elasicsearch常用的实时数据采集引擎,可以采集来自不同数据源的数据,并对数据进行处理后输出到多种输出源,是Elastic Stack 的重要组成部分。

数据集

下载Logstash

保持与elasticsearch版本一致

配置

  • 将文件创建在/logstash-7.2.0/bin目录下

  • 文件开头的input/file/path需要修改为数据集文件地址

配置文件内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
input {
file {
path => "/Users/userName/dataSource.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["id","content","genre"]
}

mutate {
split => { "genre" => "|" }
remove_field => ["path", "host","@timestamp","message"]
}

mutate {

split => ["content", "("]
add_field => { "title" => "%{[content][0]}"}
add_field => { "year" => "%{[content][1]}"}
}

# mutate {

# gsub => [
#
# "year", "\\)", ""
# ]
# }


mutate {
convert => {
"year" => "integer"
}
strip => ["title"]
remove_field => ["path", "host","@timestamp","message","content"]
}



}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "movies"
document_id => "%{id}"
}
stdout {}
}

运行

  • 运行:sudo bin/logstash -f bin/logstash.conf
  • 控制台将持续输出json格式的数据就是运行成功

参考

Logstash参考文档