Supervisor进程监护使用指南

官方网址:http://supervisord.org/ 

1. 安装

宿主机环境:(Centos7)version

#cat /proc/version
Linux version 3.10.0-229.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 6 11:36:42 UTC 2015
 
#yum install python-setuptools
#easy_install supervisor
 
#echo_supervisord_conf   // 测试是否安装成功
#mkdir -m 755 -p /etc/supervisor/
#echo_supervisord_conf > /etc/supervisor/supervisord.conf    // 创建主配置文件
#cd /etc/supervisor/
#mkdir -m 755 conf.d

2. 使用

在/root目录下,编写一个简单C程序

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    while(1)
    {
        sleep(10000);
    }
    return 0;
}
#gcc -o test test.c

更多的配置选项参见: http://supervisord.org/configuration.html

#more /etc/supervisor/conf.d/test.ini
[program:test]
command=/root/test
;directory= ; director to cwd to before exec (def no cwd)
autostart=true ; start at supervisord start (default true)
autostart=unexpected ; wether/when to restart (default unexpected)
startsecs=1 ; number of secs prog must stay running (def 1.)
redirect_stderr=true ; redirecit proc stderr to stdout default false
stdout_logfile=/tmp/supervisor/test_stdout.log

在 /etc/supervisor/supervisord.conf 中包含conf.d目录下的ini文件:

;[group:thegroupname]
;programs=progname1,progname2  ; each refers to 'x' in [program:x] definitions
;priority=999                  ; the relative start priority (default 999)
; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.
[include]
files = conf.d/*.ini

启动supervisor

# supervisord -c /etc/supervisor/supervisord.conf

状态检查

# supervisorctl status
test                             RUNNING   pid 16237, uptime 0:00:12

配置更新

# supervisorctl reload

对于监控程序的完整配置项,可以参考

;[program:theprogramname]
;command=/bin/cat ; the program (relative uses PATH, can take args)
;process_name=%(program_name)s ; process_name expr (default %(program_name)s)
;numprocs=1 ; number of processes copies to start (def 1)
;directory=/tmp ; directory to cwd to before exec (def no cwd)
;umask=022 ; umask for process (default None)
;priority=999 ; the relative start priority (default 999)
;autostart=true ; start at supervisord start (default: true)
;startsecs=1 ; # of secs prog must stay up to be running (def. 1)
;startretries=3 ; max # of serial start failures when starting (default 3)
;autorestart=unexpected ; when to restart if exited after running (def: unexpected)
;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2)
;stopsignal=QUIT ; signal used to kill process (default TERM)
;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)
;stopasgroup=false ; send stop signal to the UNIX process group (default false)
;killasgroup=false ; SIGKILL the UNIX process group (def false)
;user=chrism ; setuid to this UNIX account to run the program
;redirect_stderr=true ; redirect proc stderr to stdout (default false)
;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO
;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10)
;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stdout_events_enabled=false ; emit events on stdout writes (default false)
;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO
;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10)
;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stderr_events_enabled=false ; emit events on stderr writes (default false)
;environment=A="1",B="2" ; process environment additions (def no adds)
;serverurl=AUTO ; override serverurl computation (childutils)
 

3. 测试

# kill -9 `pgrep test`    // kill掉进程
# supervisorctl status
test                             RUNNING   pid 15709, uptime 0:00:08    // 可以看出进程已经自动完成重启

4. 启动web监控

修改 /etc/supervisor/supervisord.conf

[inet_http_server]         ; inet (TCP) server disabled by default
port=192.168.1.232:9001        ; (ip_address:port specifier, *:port forall iface)
username=user              ; (default is no username (openserver))
password=123               ; (default is no password (openserver))
# supervisorctl reload

5. 监控常见的进程

  1. 监控Nginx,需要将nginx.conf中: daemon off;
  2. 监控mysql http://supervisord.org/subprocess.html#pidproxy-program  需要使用pidproxy传递信号量。例如:
    [program:mysql]
    command=/path/to/pidproxy /path/to/pidfile /path/to/mysqld_safe

    [program:mysql]

    command=/usr/bin/pidproxy /opt/mysql/data/mysqld.pid  /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf   --basedir=/usr/local/mysql --datadir=/opt/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=root --log-error=/opt/mysql/data/mysqld.err --pid-file=/opt/mysql/data/mysqld.pid --socket=/tmp/mysql.sock --port=3306 2>&1 > /dev/null & 
    redirect_stderr = true

6. 告警程序异常事件

如果需求只是自动拉起crash的进程,通过命令行对进程进行管理,以上内容足矣。supervisor 3.0版本引入了事件,利用这个特性我们实现实时监控进程状态并告警。

首先了解一下supervisor提供的事件,supervisor通过子进程对应用程序进行管理,监控程序同样作为一个子进程运行,子进程的stdin、stdout、stderr都已经被重定向。事件处理过程如下:

  • 作为监听程序的子进程向stdout写入READY\n
  • 事件发生时,supervisor会选择一个处于Ready状态的子进程的stdin写入事件内容
  • 该子进程事件处理完毕后向stdout写入OK\n或FAIL\n,反馈处理结果
  • 重复上述操作

supervisor提供的多种事件类型,包括:进程状态事件、supervisor状态事件、定时事件、xmlrpc调用事件、进程日志事件等等。我们主要关注进程状态相关事件。supervisor对进程定义了如下状态,每个状态的切换都会触发相应的事件。

clip_image003[5]

监控程序可以使用任何语言编写,只要按照supervisor的协议格式正确的处理事件消息即可。推荐使用python作为监控程序,supervisor提供了一个childutils模块使编写处理程序更简单。

github上2个基于supervisor二次开发的集中进程管理工具,可在一个页面下管理多台机器的进程。

https://github.com/mlazarov/supervisord-monitor
https://github.com/TAKEALOT/nodervisor

参见文档: http://supervisord.org/events.html

Event Listener进程用于监听特定类型的event,作为subprocess接受事件,方便与用户可以自己定制告警通知方式,告警或者其他方式。

定义方式同program的ini配置文件, conf.d/event_listener.ini

[eventlistener:event_listener]
command=/etc/supervisor/event.py
events=PROCESS_STATE
buffer_size=100
autostart=true
startsecs=1
autorestart=unexpected