
Sampler是一个用于shell命令执行,可视化和告警的工具。其配置使用的是一个简单的YAML文件。 1、为什么我需要它?你可以直接从终端对任意动态进程进行采样 – 观察数据库中的更改,监控MQ动态消息(in-flight messages),触发部署脚本并在完成后获取通知。 如果有一种方法可以使用shell命令获取指标(metric),那么可以使用Sampler立即对其进行可视化。 2、安装macOSbrew cask install sampler或' N1 r% ]2 T4 i% o sudo curl -Lo /usr/local/bin/sampler https://github.com/sqshq/sampler ... -1.0.3-darwin-amd64# @6 c3 e7 l9 Zsudo chmod +x /usr/local/bin/sampler+ x- z8 Y! b7 P0 m: p& ]2 T/ U Linux - V2 M, u1 g4 Y8 y n ) h' c" Q5 U5 [; f sudo wget https://github.com/sqshq/sampler ... r-1.0.3-linux-amd64 -O /usr/local/bin/sampler sudo chmod +x /usr/local/bin/sampler Windows(实验) 建议在高级控制台模拟器下使用,如Cmder 3、使用 市面早已有许多监控系统 Sampler绝不是监控系统的替代品,而是易于设置的开发工具。 如果spinning up和使用Grafana配置Prometheus是完全多余的任务,那么Sampler可能是正确的解决方案。没有服务器,没有数据库,不需要部署 – 你指定了shell命令,它就可以工作了。 我监控的每台服务器上都需要安装吗?不,你可以在本地运行Sampler,但仍然可以从多台远程计算机上收集遥测数据。任何可视化都可能具有init命令,你可以在其中ssh到远程服务器。请参阅SSH example 4、组件以下是每种组件类型的配置示例列表,其中包含与macOS兼容的采样脚本。 Runchart![]() - title: Search engine response time0 u% b5 d7 u5 ~* {% l: ]4 D rate-ms: 500 # sampling rate, default = 1000 scale: 2 # number of digits after sample decimal point, default = 1 legend:+ Z+ |$ j. Z& ]: a* k8 Z( V: @ l: [ enabled: true # enables item labels, default = true8 A7 E3 F/ n* p6 T: N details: false # enables item statistics: cur/min/max/dlt values, default = true w( z d, D; d8 k* r items: - label: GOOGLE sample: curl -o /dev/null -s -w '%{time_total}' https://www.google.com$ u4 t x5 n7 ^1 n) ?: E+ c color: 178 # 8-bit color number, default one is chosen from a pre-defined palette - label: YAHOO sample: curl -o /dev/null -s -w '%{time_total}' https://search.yahoo.com$ E$ U" R8 S2 h; l- \6 s7 }, _ - label: BING sample: curl -o /dev/null -s -w '%{time_total}' https://www.bing.com- t |6 g0 w( J Sparkline ![]() - title: CPU usage rate-ms: 200 scale: 0 sample: ps -A -o %cpu | awk '{s+=$1} END {print s}'2 d" H. o( a4 m4 [% ^& Z% N5 | - title: Free memory pages rate-ms: 200 scale: 0" p( w/ F3 Y5 e" I0 |* n4 I sample: memory_pressure | grep 'Pages free' | awk '{print $3}' 8 m! X3 p4 p+ [ Barchart ![]() - title: Local network activity* \3 L7 X! S- h5 y7 _7 d' {1 w rate-ms: 500 # sampling rate, default = 1000 [: X: A4 M8 |/ s; u9 o/ D scale: 0 # number of digits after sample decimal point, default = 1 items: - label: UDP bytes in sample: nettop -J bytes_in -l 1 -m udp | awk '{sum += $4} END {print sum}' - label: UDP bytes out sample: nettop -J bytes_out -l 1 -m udp | awk '{sum += $4} END {print sum}' - label: TCP bytes in: ], @5 M% D# L: B: E sample: nettop -J bytes_in -l 1 -m tcp | awk '{sum += $4} END {print sum}'6 @9 F4 O- B. s - label: TCP bytes out5 v3 b# e) V# Q, o, \7 P sample: nettop -J bytes_out -l 1 -m tcp | awk '{sum += $4} END {print sum}' 6 M9 Q V3 k5 q4 l% G8 O& k8 M+ O Gauge ![]() - title: Minute progress7 B$ ~( Y. m K" r* M3 b) S( Q rate-ms: 500 # sampling rate, default = 1000 scale: 2 # number of digits after sample decimal point, default = 1 percent-only: false # toggle display of the current value, default = false color: 178 # 8-bit color number, default one is chosen from a pre-defined palette1 m, @7 c3 u* _( G" x5 y cur: sample: date +%S # sample script for current value max: sample: echo 60 # sample script for max value min: sample: echo 0 # sample script for min value, H( [) u4 a. x( i$ f, l - title: Year progress cur:3 T7 y& p. r+ n) ?$ W& z% o3 O! [0 Q$ { sample: date +%j max:5 ?: v5 @- r, ]0 I* ] sample: echo 3659 {, t* |+ U; J; Z! v0 D min: sample: echo 0Textbox ![]() - title: Local weather# T8 k1 U5 d2 i) U- Z0 p rate-ms: 10000 # sampling rate, default = 1000 sample: curl wttr.in?0ATQF% L/ U$ K" E" M border: false # border around the item, default = true; ?, o' s; V( O color: 178 # 8-bit color number, default is white' k7 {& d) h E& ~2 p q2 M - title: Docker containers stats rate-ms: 500- |0 l7 D& X z" s sample: docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.PIDs}}": w: ~; M- c2 E* r1 s" ^, w& U Asciibox ![]() - title: UTC time, i' u: j5 X' O t5 S. @# ^* G& i rate-ms: 500 # sampling rate, default = 1000 font: 3d # font type, default = 2d border: false # border around the item, default = true 1 d) m; P0 ?) f) E! V2 s5 \( B color: 43 # 8-bit color number, default is white sample: env TZ=UTC date +%r5 o* k Z( \( x. i4 z8 J 5、额外功能9 l1 w+ m6 p, K! |/ w Triggers 触发器允许执行条件操作,如视觉/声音告警或任意shell命令。以下示例说明了此概念。 Clock gauge,从开始的每分钟显示时间进度和当前时间 gauges:4 \# y2 T- C. Z( H5 o8 x( U- title: MINUTE PROGRESS2 ~5 s; D1 W; r( v& A position: [[0, 18], [80, 0]] cur:3 N6 l. ^0 G6 C sample: date +%S max: sample: echo 60" B! Y* S4 z- v8 J' p min:8 e; @* G ^6 N6 \" ^0 a& B, N sample: echo 0 triggers:8 x% b/ o, _3 n3 n - title: CLOCK BELL EVERY MINUTE condition: '[ $label == "cur" ] && [ $cur -eq 0 ] && echo 1 || echo 0' # expects "1" as TRUE indicator3 N0 R7 n/ o; @! i actions: terminal-bell: true # standard terminal bell, default = false sound: true # NASA quindar tone, default = false visual: false # notification with current value on top of the component area, default = false7 u: R* g: w4 u3 n8 A) E script: say -v samantha `date +%I:%M%p` # an arbitrary script, which can use $cur, $prev and $label variables : n3 s* J- ~8 l7 g" L 搜索引擎延迟图表,在延迟超过阈值时向用户发出告警 runcharts:- title: SEARCH ENGINE RESPONSE TIME (sec) rate-ms: 200/ _9 f, V. D5 X items: d# K9 l0 B2 x5 F; H6 ^3 m - label: GOOGLE sample: curl -o /dev/null -s -w '%{time_total}' https://www.google.com - label: YAHOO sample: curl -o /dev/null -s -w '%{time_total}' https://search.yahoo.com % s: |& z- |& g* T; h triggers: - title: Latency threshold exceeded, K5 }5 }* b! p4 `! G condition: echo "$prev < 0.3 && $cur > 0.3" |bc -l # expects "1" as TRUE indicator actions:7 L$ a0 i4 l1 O9 S; z terminal-bell: true # standard terminal bell, default = false sound: true # NASA quindar tone, default = false visual: true # visual notification on top of the component area, default = false# Q* Q+ V4 i7 W6 c script: 'say alert: ${label} latency exceeded ${cur} second' # an arbitrary script, which can use $cur, $prev and $label variables7 k/ s( M& E; K0 ] u* t& x# ^- u: G* R 交互式 shell 支持 除了sample命令之外,还可以指定init命令(在采样前仅执行一次)和transform命令(后处理采样命令输出)。这包括交互式shell用例,例如仅建立与数据库的连接一次,然后在交互式shell会话中执行轮询。 Basic modetextboxes: - title: MongoDB polling rate-ms: 500 init: mongo --quiet --host=localhost test # executes only once to start the interactive session sample: Date.now(); # executes with a required rate, in scope of the interactive session transform: echo result = $sample # executes in scope of local session, $sample variable is available for transformation) S+ r& h5 y, {! _1 u+ C PTY mode 在某些情况下,交互式shell将无法工作,因为它的stdin不是终端。这种情况下我们可以使用PTY模式: textboxes:& q: O! ^( g. O- title: Neo4j polling pty: true # enables pseudo-terminal mode, default = false init: cypher-shell -u neo4j -p pwd --format plain sample: RETURN rand(); transform: echo "$sample" | tail -n 1 - title: Top on a remote server pty: true # enables pseudo-terminal mode, default = false init: ssh -i ~/user.pem ec2-user@1.2.3.4" i$ q) g& B/ X! ^$ Y% T sample: top2 K5 b& ^3 N! L$ i: h : d7 G6 V5 A$ a5 V init 命令逐步执行 在开始采样之前,还可以逐个执行多个init命令。 textboxes:- title: Java application uptime multistep-init:4 L% ]3 v- i) t0 V - java -jar jmxterm-1.0.0-uber.jar - open host:port # or local PID - bean java.lang:type=Runtime sample: get Uptime8 b2 g+ ?, Q& H u- _ 变量 如果配置文件包含重复的模式,则可以将它们提取到变量部分。此外,还可以在启动时使用-v/–variable标志指定变量,并且任意的系统环境变量也可以在脚本中使用。 variables:mongoconnection: mongo --quiet --host=localhost test barcharts:& Z" z1 u* c: {+ L) p; i7 C6 | - title: MongoDB documents by status items: - label: IN_PROGRESS init: $mongoconnection sample: db.getCollection('events').find({status:'IN_PROGRESS'}).count() - label: SUCCESS/ \, U- `1 Q: G% F init: $mongoconnection sample: db.getCollection('events').find({status:'SUCCESS'}).count()( b6 N5 Y0 j3 Y* V; T) E - label: FAIL+ L* ?7 R! L: L9 { init: $mongoconnection' L. J+ t' Q" [1 U0 f2 i9 v sample: db.getCollection('events').find({status:'FAIL'}).count(): J/ j. L% R' A9 D " y# c5 k7 p8 J( { 6 r# N9 M5 U! l8 ^ a; _' c 颜色主题 ![]() sparklines: - title: CPU usage sample: ps -A -o %cpu | awk '{s+=$1} END {print s}'! K5 C! u5 n: t' s$ N 6、真实场景数据库 以下是不同的数据库连接示例。建议使用交互式shell(init脚本)仅建立一次连接,然后在采样期间重用即可。 MySQL1 F# q6 Q$ C6 y7 ]2 U# prerequisite: installed mysql shell variables: { X. L3 H; o# h0 ^; U mysql_connection: mysql -u root -s --database mysql --skip-column-names1 b. [% _3 }1 X/ Y& h sparklines: - title: MySQL (random number example)0 x/ L* j3 G' Z7 N; x- A; C; K. T pty: true' j | q% I' M- e, @* U init: $mysql_connection sample: select rand();& R. k" N5 J# T/ b# Z/ _+ O/ u7 T PostgreSQL # prerequisite: installed psql shell variables:! R# a8 b/ d& Z8 J# f- C- w PGPASSWORD: pwd* C3 I" e) ]# H7 c7 } postgres_connection: psql -h localhost -U postgres --no-align --tuples-only sparklines:' L! D* l" k7 e/ K3 W9 R - title: PostgreSQL (random number example) init: $postgres_connection sample: select random(); MongoDB # prerequisite: installed mongo shell variables: mongo_connection: mongo --quiet --host=localhost test sparklines: - title: MongoDB (random number example) init: $mongo_connection) n8 ?. I; v, e/ H8 n$ f, Q% [& w sample: Math.random(); Neo4j# E' r6 Y- h0 y6 _1 A( D # prerequisite: installed cypher shell' D' W# t2 `) M variables:5 I; i: ?0 T% C0 m0 O& v/ p" l neo4j_connection: cypher-shell -u neo4j -p pwd --format plain9 [+ |3 S- ^+ P* G sparklines: - title: Neo4j (random number example) pty: true( y0 w: _* [' d* q: k3 c8 a3 Q init: $neo4j_connection sample: RETURN rand();# T, t! n% D4 F transform: echo "$sample" | tail -n 1 : S9 Z0 I. ~5 z# v8 Y$ Y9 s Kafka 检查kafka lag值,计算每个队列lag值的和,高于阈值报警,多consumergroup,多topic。 variables:0 u& x6 w' k% j0 _# R8 m5 v7 _& Vkafka_connection: $KAFKA_HOME/bin/kafka-consumer-groups --bootstrap-server localhost:9092 runcharts: - title: Kafka lag per consumer group* p) N6 z' Q: ]3 m4 }6 V rate-ms: 5000+ G D: {1 u0 q- p" {5 ]7 n6 H5 b scale: 0 items: - label: A->B& v7 R; ~/ }+ C sample: $kafka_connection --group group_a --describe | awk 'NR>1 {sum += $5} END {print sum}'& {4 w$ b% }" L - label: B->C sample: $kafka_connection --group group_b --describe | awk 'NR>1 {sum += $5} END {print sum}'9 b R5 n# E& R: U, r - label: C->D sample: $kafka_connection --group group_c --describe | awk 'NR>1 {sum += $5} END {print sum}'1 u+ K! ^, n9 b* Z Docker Docker容器统计信息(CPU,MEM,O/I) textboxes:- title: Docker containers stats sample: docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemPerc}}\t{{.MemUsage}}\t{{.NetIO}}\t{{.BlockIO}}\t{{.PIDs}}" 8 |& E9 z' C. ~4 j' Y SSH 远程服务器上的TOP命令 variables:9 y/ p! ~5 \3 ~sshconnection: ssh -i ~/my-key-pair.pem ec2-user@1.2.3.4$ ]3 Q3 F2 I, y; y: o textboxes: - title: SSH7 X$ v w( z$ s! l5 I1 J6 B pty: true init: $sshconnection sample: top8 v3 J: ^" l+ y: w _# L* s7 K3 m9 a5 e _) s2 S # C+ _$ P) S& g. X" h2 D6 [1 l JMX Java应用程序的正常运行示例 # prerequisite: download [jmxterm jar file](https://docs.cyclopsgroup.org/jmxterm)textboxes: - title: Java application uptime multistep-init:0 }! `9 u: j- { U* ] - java -jar jmxterm-1.0.0-uber.jar/ _3 g8 d3 u% M2 h - open host:port # or local PID5 J7 |5 K& v% Q) @) r1 g$ o - bean java.lang:type=Runtime/ m( i' A# d1 o9 W0 ?! U: X sample: get Uptime transform: echo $sample | tr -dc '0-9' | awk '{printf "%.1f min", $1/1000/60}'8 q1 ~7 ? ?# n |