Supervisor进程监护使用指南

官方网址:http://supervisord.org/ 

1. 安装

宿主机环境:(Centos7)version

#cat /proc/version
Linux version 3.10.0-229.el7.x86_64 (builder@kbuilder.dev.centos.org) (gcc version 4.8.2 20140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Fri Mar 6 11:36:42 UTC 2015
 
#yum install python-setuptools
#easy_install supervisor
 
#echo_supervisord_conf   // 测试是否安装成功
#mkdir -m 755 -p /etc/supervisor/
#echo_supervisord_conf > /etc/supervisor/supervisord.conf    // 创建主配置文件
#cd /etc/supervisor/
#mkdir -m 755 conf.d

2. 使用

在/root目录下,编写一个简单C程序

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
    while(1)
    {
        sleep(10000);
    }
    return 0;
}
#gcc -o test test.c

更多的配置选项参见: http://supervisord.org/configuration.html

#more /etc/supervisor/conf.d/test.ini
[program:test]
command=/root/test
;directory= ; director to cwd to before exec (def no cwd)
autostart=true ; start at supervisord start (default true)
autostart=unexpected ; wether/when to restart (default unexpected)
startsecs=1 ; number of secs prog must stay running (def 1.)
redirect_stderr=true ; redirecit proc stderr to stdout default false
stdout_logfile=/tmp/supervisor/test_stdout.log

在 /etc/supervisor/supervisord.conf 中包含conf.d目录下的ini文件:

;[group:thegroupname]
;programs=progname1,progname2  ; each refers to 'x' in [program:x] definitions
;priority=999                  ; the relative start priority (default 999)
; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.
[include]
files = conf.d/*.ini

启动supervisor

# supervisord -c /etc/supervisor/supervisord.conf

状态检查

# supervisorctl status
test                             RUNNING   pid 16237, uptime 0:00:12

配置更新

# supervisorctl reload

对于监控程序的完整配置项,可以参考

;[program:theprogramname]
;command=/bin/cat ; the program (relative uses PATH, can take args)
;process_name=%(program_name)s ; process_name expr (default %(program_name)s)
;numprocs=1 ; number of processes copies to start (def 1)
;directory=/tmp ; directory to cwd to before exec (def no cwd)
;umask=022 ; umask for process (default None)
;priority=999 ; the relative start priority (default 999)
;autostart=true ; start at supervisord start (default: true)
;startsecs=1 ; # of secs prog must stay up to be running (def. 1)
;startretries=3 ; max # of serial start failures when starting (default 3)
;autorestart=unexpected ; when to restart if exited after running (def: unexpected)
;exitcodes=0,2 ; 'expected' exit codes used with autorestart (default 0,2)
;stopsignal=QUIT ; signal used to kill process (default TERM)
;stopwaitsecs=10 ; max num secs to wait b4 SIGKILL (default 10)
;stopasgroup=false ; send stop signal to the UNIX process group (default false)
;killasgroup=false ; SIGKILL the UNIX process group (def false)
;user=chrism ; setuid to this UNIX account to run the program
;redirect_stderr=true ; redirect proc stderr to stdout (default false)
;stdout_logfile=/a/path ; stdout log path, NONE for none; default AUTO
;stdout_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stdout_logfile_backups=10 ; # of stdout logfile backups (default 10)
;stdout_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stdout_events_enabled=false ; emit events on stdout writes (default false)
;stderr_logfile=/a/path ; stderr log path, NONE for none; default AUTO
;stderr_logfile_maxbytes=1MB ; max # logfile bytes b4 rotation (default 50MB)
;stderr_logfile_backups=10 ; # of stderr logfile backups (default 10)
;stderr_capture_maxbytes=1MB ; number of bytes in 'capturemode' (default 0)
;stderr_events_enabled=false ; emit events on stderr writes (default false)
;environment=A="1",B="2" ; process environment additions (def no adds)
;serverurl=AUTO ; override serverurl computation (childutils)
 

3. 测试

# kill -9 `pgrep test`    // kill掉进程
# supervisorctl status
test                             RUNNING   pid 15709, uptime 0:00:08    // 可以看出进程已经自动完成重启

4. 启动web监控

修改 /etc/supervisor/supervisord.conf

[inet_http_server]         ; inet (TCP) server disabled by default
port=192.168.1.232:9001        ; (ip_address:port specifier, *:port forall iface)
username=user              ; (default is no username (openserver))
password=123               ; (default is no password (openserver))
# supervisorctl reload

5. 监控常见的进程

  1. 监控Nginx,需要将nginx.conf中: daemon off;
  2. 监控mysql http://supervisord.org/subprocess.html#pidproxy-program  需要使用pidproxy传递信号量。例如:
    [program:mysql]
    command=/path/to/pidproxy /path/to/pidfile /path/to/mysqld_safe

    [program:mysql]

    command=/usr/bin/pidproxy /opt/mysql/data/mysqld.pid  /usr/local/mysql/bin/mysqld_safe --defaults-file=/etc/my.cnf   --basedir=/usr/local/mysql --datadir=/opt/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=root --log-error=/opt/mysql/data/mysqld.err --pid-file=/opt/mysql/data/mysqld.pid --socket=/tmp/mysql.sock --port=3306 2>&1 > /dev/null & 
    redirect_stderr = true

6. 告警程序异常事件

如果需求只是自动拉起crash的进程,通过命令行对进程进行管理,以上内容足矣。supervisor 3.0版本引入了事件,利用这个特性我们实现实时监控进程状态并告警。

首先了解一下supervisor提供的事件,supervisor通过子进程对应用程序进行管理,监控程序同样作为一个子进程运行,子进程的stdin、stdout、stderr都已经被重定向。事件处理过程如下:

  • 作为监听程序的子进程向stdout写入READY\n
  • 事件发生时,supervisor会选择一个处于Ready状态的子进程的stdin写入事件内容
  • 该子进程事件处理完毕后向stdout写入OK\n或FAIL\n,反馈处理结果
  • 重复上述操作

supervisor提供的多种事件类型,包括:进程状态事件、supervisor状态事件、定时事件、xmlrpc调用事件、进程日志事件等等。我们主要关注进程状态相关事件。supervisor对进程定义了如下状态,每个状态的切换都会触发相应的事件。

clip_image003[5]

监控程序可以使用任何语言编写,只要按照supervisor的协议格式正确的处理事件消息即可。推荐使用python作为监控程序,supervisor提供了一个childutils模块使编写处理程序更简单。

github上2个基于supervisor二次开发的集中进程管理工具,可在一个页面下管理多台机器的进程。

https://github.com/mlazarov/supervisord-monitor
https://github.com/TAKEALOT/nodervisor

参见文档: http://supervisord.org/events.html

Event Listener进程用于监听特定类型的event,作为subprocess接受事件,方便与用户可以自己定制告警通知方式,告警或者其他方式。

定义方式同program的ini配置文件, conf.d/event_listener.ini

[eventlistener:event_listener]
command=/etc/supervisor/event.py
events=PROCESS_STATE
buffer_size=100
autostart=true
startsecs=1
autorestart=unexpected
#!/usr/bin/env python
import sys
import os
def write_stdout(s):
    # only eventlistener protocol messages may be sent to stdout
    sys.stdout.write(s)
    sys.stdout.flush()
    
def write_stderr(s):
    sys.stderr.write(s)
    sys.stderr.flush()
log_file = open('/tmp/event.log', 'a')
def main():
    #log_file = open('/tmp/event.log', 'a')
    while 1:
        # transition from ACKNOWLEDGED to READY
        write_stdout('READY\n')
        # read header line and print it to stderr
        line = sys.stdin.readline()
        log_file.write(line);
        #write_stderr(line)
        # read event payload and print it to stderr
        headers = dict([ x.split(':') for x in line.split() ])
        data = sys.stdin.read(int(headers['len']))
        log_file.write(data)
        # check eventname == SUPERVISOR_STATE_CHANGE_STOPPING  即可
        # log_file.flush()
        # transition from READY to ACKNOWLEDGED
        write_stdout('RESULT 2\nOK')
if __name__ == '__main__':
    main()

 

Event Notification Protocol中字段如下:

 

Key
Description
Example
ver The event system protocol version 3.0
server The identifier of the supervisord sending the event (see config file [supervisord] section identifier value.
serial An integer assigned to each event. No two events generated during the lifetime of a supervisord process will have the same serial number. The value is useful for functional testing and detecting event ordering anomalies. 30
pool The name of the event listener pool which generated this event. myeventpool
pooolserial An integer assigned to each event by the eventlistener pool which it is being sent from. No two events generated by the same eventlister pool during the lifetime of a supervisord process will have the same poolserial number. This value can be used to detect event ordering anomalies. 30
eventname The specific event type name (see Event Types) TICK_5
len An integer indicating the number of bytes in the event payload, aka the PAYLOAD_LENGTH 22

样例:

ver:3.0 server:supervisor serial:21 pool:listener poolserial:10 eventname:PROCESS_COMMUNICATION_STDOUT len:54

Supervisor进程监护使用指南》上有1条评论

  1. Pingback引用通告: 使用Superviosr 管理监控进程 | 勇气

发表评论

电子邮件地址不会被公开。 必填项已用*标注