1. 安装
宿主机环境:(Centos7)version
#cat /proc/version Linux version 3.10.0-229.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 6 11:36:42 UTC 2015 #yum install python-setuptools #easy_install supervisor #echo_supervisord_conf // 测试是否安装成功 #mkdir -m 755 -p /etc/supervisor/ #echo_supervisord_conf > /etc/supervisor/supervisord.conf // 创建主配置文件 #cd /etc/supervisor/ #mkdir -m 755 conf.d |
2. 使用
在/root目录下,编写一个简单C程序
#include <stdio.h> #include <stdlib.h> int main( int argc, char *argv[]) { while (1) { sleep(10000); } return 0; } |
#gcc -o test test.c |
更多的配置选项参见: http://supervisord.org/configuration.html
#more /etc/supervisor/conf.d/test.ini [program: test ] command = /root/test ;directory= ; director to cwd to before exec (def no cwd) autostart= true ; start at supervisord start (default true ) autostart=unexpected ; wether /when to restart (default unexpected) startsecs=1 ; number of secs prog must stay running (def 1.) redirect_stderr= true ; redirecit proc stderr to stdout default false stdout_logfile= /tmp/supervisor/test_stdout .log |
在 /etc/supervisor/supervisord.conf 中包含conf.d目录下的ini文件:
;[group:thegroupname] ;programs=progname1,progname2 ; each refers to 'x' in [program:x] definitions ;priority=999 ; the relative start priority (default 999) ; The [include] section can just contain the "files" setting. This ; setting can list multiple files (separated by whitespace or ; newlines). It can also contain wildcards. The filenames are ; interpreted as relative to this file . Included files *cannot* ; include files themselves. [include] files = conf.d/*.ini |
启动supervisor
# supervisord -c /etc/supervisor/supervisord.conf |
状态检查
# supervisorctl status test RUNNING pid 16237, uptime 0:00:12 |
配置更新
# supervisorctl reload |
对于监控程序的完整配置项,可以参考
;[program:theprogramname] ; command = /bin/cat ; the program (relative uses PATH, can take args) ;process_name=%(program_name)s ; process_name expr (default %(program_name)s) ;numprocs=1 ; number of processes copies to start (def 1) ;directory= /tmp ; directory to cwd to before exec (def no cwd) ; umask =022 ; umask for process (default None) ;priority=999 ; the relative start priority (default 999) ;autostart= true ; start at supervisord start (default: true ) ;startsecs=1 ; # of secs prog must stay up to be running (def. 1) ;startretries=3 ; max # of serial start failures when starting (default 3) ;autorestart=unexpected ; when to restart if exited after running (def: unexpected) ;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2) ;stopsignal=QUIT ; signal used to kill process (default TERM) ;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10) ;stopasgroup= false ; send stop signal to the UNIX process group (default false ) ;killasgroup= false ; SIGKILL the UNIX process group (def false ) ;user=chrism ; setuid to this UNIX account to run the program ;redirect_stderr= true ; redirect proc stderr to stdout (default false ) ;stdout_logfile= /a/path ; stdout log path, NONE for none; default AUTO ;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10) ;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stdout_events_enabled= false ; emit events on stdout writes (default false ) ;stderr_logfile= /a/path ; stderr log path, NONE for none; default AUTO ;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB) ;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10) ;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0) ;stderr_events_enabled= false ; emit events on stderr writes (default false ) ;environment=A= "1" ,B= "2" ; process environment additions (def no adds) ;serverurl=AUTO ; override serverurl computation (childutils) |
3. 测试
# kill -9 `pgrep test` // kill掉进程 # supervisorctl status test RUNNING pid 15709, uptime 0:00:08 // 可以看出进程已经自动完成重启 |
4. 启动web监控
修改 /etc/supervisor/supervisord.conf
[inet_http_server] ; inet (TCP) server disabled by default port=192.168.1.232:9001 ; (ip_address:port specifier, *:port for all iface) username=user ; (default is no username ( open server)) password=123 ; (default is no password ( open server)) |
# supervisorctl reload |
5. 监控常见的进程
- 监控Nginx,需要将nginx.conf中: daemon off;
- 监控mysql http://supervisord.org/subprocess.html#pidproxy-program 需要使用pidproxy传递信号量。例如:
[program:mysql] command=/path/to/pidproxy /path/to/pidfile /path/to/mysqld_safe
[program:mysql]
command=/usr/bin/pidproxy /opt/mysql/data/mysqld.pid /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf --basedir=/usr/local/mysql --datadir=/opt/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=root --log-error=/opt/mysql/data/mysqld.err --pid-file=/opt/mysql/data/mysqld.pid --socket=/tmp/mysql.sock --port=3306 2>&1 > /dev/null & redirect_stderr = true
6. 告警程序异常事件
如果需求只是自动拉起crash的进程,通过命令行对进程进行管理,以上内容足矣。supervisor 3.0版本引入了事件,利用这个特性我们实现实时监控进程状态并告警。
首先了解一下supervisor提供的事件,supervisor通过子进程对应用程序进行管理,监控程序同样作为一个子进程运行,子进程的stdin、stdout、stderr都已经被重定向。事件处理过程如下:
- 作为监听程序的子进程向stdout写入READY\n
- 事件发生时,supervisor会选择一个处于Ready状态的子进程的stdin写入事件内容
- 该子进程事件处理完毕后向stdout写入OK\n或FAIL\n,反馈处理结果
- 重复上述操作
supervisor提供的多种事件类型,包括:进程状态事件、supervisor状态事件、定时事件、xmlrpc调用事件、进程日志事件等等。我们主要关注进程状态相关事件。supervisor对进程定义了如下状态,每个状态的切换都会触发相应的事件。
监控程序可以使用任何语言编写,只要按照supervisor的协议格式正确的处理事件消息即可。推荐使用python作为监控程序,supervisor提供了一个childutils模块使编写处理程序更简单。
github上2个基于supervisor二次开发的集中进程管理工具,可在一个页面下管理多台机器的进程。
https://github.com/mlazarov/supervisord-monitor
https://github.com/TAKEALOT/nodervisor
参见文档: http://supervisord.org/events.html
Event Listener进程用于监听特定类型的event,作为subprocess接受事件,方便与用户可以自己定制告警通知方式,告警或者其他方式。
定义方式同program的ini配置文件, conf.d/event_listener.ini
[eventlistener:event_listener] command = /etc/supervisor/event .py events=PROCESS_STATE buffer_size=100 autostart= true startsecs=1 autorestart=unexpected |
#!/usr/bin/env python import sys import os def write_stdout(s): # only eventlistener protocol messages may be sent to stdout sys.stdout.write(s) sys.stdout.flush() def write_stderr(s): sys.stderr.write(s) sys.stderr.flush() log_file = open ( '/tmp/event.log' , 'a' ) def main(): #log_file = open('/tmp/event.log', 'a') while 1 : # transition from ACKNOWLEDGED to READY write_stdout( 'READY\n' ) # read header line and print it to stderr line = sys.stdin.readline() log_file.write(line); #write_stderr(line) # read event payload and print it to stderr headers = dict ([ x.split( ':' ) for x in line.split() ]) data = sys.stdin.read( int (headers[ 'len' ])) log_file.write(data) # check eventname == SUPERVISOR_STATE_CHANGE_STOPPING 即可 # log_file.flush() # transition from READY to ACKNOWLEDGED write_stdout( 'RESULT 2\nOK' ) if __name__ = = '__main__' : main() |
Event Notification Protocol中字段如下:
Key
|
Description
|
Example
|
---|---|---|
ver | The event system protocol version | 3.0 |
server | The identifier of the supervisord sending the event (see config file [supervisord] section identifier value. | |
serial | An integer assigned to each event. No two events generated during the lifetime of a supervisord process will have the same serial number. The value is useful for functional testing and detecting event ordering anomalies. | 30 |
pool | The name of the event listener pool which generated this event. | myeventpool |
pooolserial | An integer assigned to each event by the eventlistener pool which it is being sent from. No two events generated by the same eventlister pool during the lifetime of a supervisord process will have the same poolserial number. This value can be used to detect event ordering anomalies. | 30 |
eventname | The specific event type name (see Event Types) | TICK_5 |
len | An integer indicating the number of bytes in the event payload, aka the PAYLOAD_LENGTH | 22 |
样例:
ver:3.0 server:supervisor serial:21 pool:listener poolserial:10 eventname:PROCESS_COMMUNICATION_STDOUT len:54
Pingback引用通告: 使用Superviosr 管理监控进程 | 勇气