istio源码 – pilot-agent 源码分析(原创)

本文分析基于 Istio 1.1 版本,但是日志或者流程是基于 1.0.5 版本。

整体架构

pilot 的代码仓库位于 pilot repo,当前主要实现了 3 个命令:

  1. pilot-agent 充当 Proxy 节点上与 API-Server 和 proxy 的桥梁,负责生成 Envoy 初始配置文件和管理Envoy 生命周期;
  2. pilot-discoveryproxy 提供集群地址服务发现服务;
  3. sidecar-injector 自动注入提供的 Webhook 服务;

上图为旧版本的结构图,与新版本可能有差异,但是整体架构类似

在 istio-proxy 容器 pilot-agent 启动的命令行参数如下:

$ ps -ef -www
istio-p+     1     0  0 Jan14 ?        00:01:00 /usr/local/bin/pilot-agent proxy sidecar --configPath /etc/istio/proxy --binaryPath /usr/local/bin/Envoy --serviceCluster helloworld --drainDuration 45s --parentShutdownDuration 1m0s --discoveryAddress istio-pilot.istio-system:15007 --discoveryRefreshDelay 1s --zipkinAddress zipkin.istio-system:9411 --connectTimeout 10s --proxyAdminPort 15000 --controlPlaneAuthPolicy NONE

istio-p+   608     1  0 Jan15 ?        03:31:24 /usr/local/bin/Envoy -c /etc/istio/proxy/Envoy-rev13.json --restart-epoch 13 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster helloworld --service-node sidecar~10.128.5.4~helloworld-v1-8f8dd85-cfz42.default~default.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only

其中 Envoy 进程为 pilot-agent 的子进程,配置文件的初始化和启动监护由 pilot-agent 来管理。

pilot-agent 命令行

# /usr/local/bin/pilot-agent --help
Istio Pilot agent runs in the side car or gateway container and bootstraps Envoy.

Usage:
  pilot-agent [command]

Available Commands:
  help        Help about any command
  proxy       Envoy proxy agent
  request     Makes an HTTP request to the Envoy admin API
  version     Prints out build version information

Flags:
  -h, --help                          help for pilot-agent
      --log_as_json                   Whether to format output as JSON or in plain console-friendly format
      --log_caller string             Comma-separated list of scopes for which to include caller information, scopes can be any of [default, model]
      --log_output_level string       Comma-separated minimum per-scope logging level of messages to output, in the form of <scope>:<level>,<scope>:<level>,... where scope can be one of [default, model] and level can be one of [debug, info, warn, error, none] (default "default:info")
      --log_rotate string             The path for the optional rotating log file
      --log_rotate_max_age int        The maximum age in days of a log file beyond which the file is rotated (0 indicates no limit) (default 30)
      --log_rotate_max_backups int    The maximum number of log file backups to keep before older files are deleted (0 indicates no limit) (default 1000)
      --log_rotate_max_size int       The maximum size in megabytes of a log file beyond which the file is rotated (default 104857600)
      --log_stacktrace_level string   Comma-separated minimum per-scope logging level at which stack traces are captured, in the form of <scope>:<level>,<scope:level>,... where scope can be one of [default, model] and level can be one of [debug, info, warn, error, none] (default "default:none")
      --log_target stringArray        The set of paths where to output the log. This can be any path as well as the special values stdout and stderr (default [stdout])

Use "pilot-agent [command] --help" for more information about a command.

关于子命令 proxy 的帮助内容如下:(省略了公共的参数)

$ /usr/local/bin/pilot-agent proxy --help
Envoy proxy agent

Usage:
  pilot-agent proxy [flags]

Flags:
      --applicationPorts stringSlice      Ports exposed by the application. Used to determine that Envoy is configured and ready to receive traffic.
      --availabilityZone string           Availability zone
      --binaryPath string                 Path to the proxy binary (default "/usr/local/bin/Envoy")
      --bootstrapv2                       Use bootstrap v2 - DEPRECATED (default true)
      --concurrency int                   number of worker threads to run
      --configPath string                 Path to the generated configuration file directory (default "/etc/istio/proxy")
      --connectTimeout duration           Connection timeout used by Envoy for supporting services (default 1s)
      --controlPlaneAuthPolicy string     Control Plane Authentication Policy (default "NONE")
      --customConfigFile string           Path to the custom configuration file
      --disableInternalTelemetry          Disable internal telemetry
      --discoveryAddress string           Address of the discovery service exposing xDS (e.g. istio-pilot:8080) (default "istio-pilot:15007")
      --discoveryRefreshDelay duration    Polling interval for service discovery (used by EDS, CDS, LDS, but not RDS) (default 1s)
      --domain string                     DNS domain suffix. If not provided uses ${POD_NAMESPACE}.svc.cluster.local
      --drainDuration duration            The time in seconds that Envoy will drain connections during a hot restart (default 2s)
  -h, --help                              help for proxy
      --id string                         Proxy unique ID. If not provided uses ${POD_NAME}.${POD_NAMESPACE} from environment variables
      --ip string                         Proxy IP address. If not provided uses ${INSTANCE_IP} environment variable.
      --parentShutdownDuration duration   The time in seconds that Envoy will wait before shutting down the parent process during a hot restart (default 3s)
      --proxyAdminPort uint16             Port on which Envoy should listen for administrative commands (default 15000)
      --proxyLogLevel string              The log level used to start the Envoy proxy (choose from {trace, debug, info, warn, err, critical, off}) (default "warn")
      --serviceCluster string             Service cluster (default "istio-proxy")
      --serviceregistry string            Select the platform for service registry, options are {Kubernetes, Consul, CloudFoundry, Mock, Config} (default "Kubernetes")
      --statsdUdpAddress string           IP Address and Port of a statsd UDP listener (e.g. 10.75.241.127:9125)
      --statusPort uint16                 HTTP Port on which to serve pilot agent status. If zero, agent status will not be provided.
      --templateFile string               Go template bootstrap config
      --zipkinAddress string              Address of the Zipkin service (e.g. zipkin:9411)

如果启动方式为 /usr/local/bin/pilot-agent proxy 则后面可以选的启动类型分为:

  • pilot-agent proxy sidecar

  • pilot-agent proxy router, 用在 istio-gateway 方式下

    # ingressgateway 或者 egressgateway
    $ ps -ef -www
    UID        PID  PPID  C STIME TTY          TIME CMD
    root         1     0  0 Jan15 ?        00:00:52 /usr/local/bin/pilot-agent proxy router -v 2 --discoveryRefreshDelay 1s --drainDuration 45s --parentShutdownDuration 1m0s --connectTimeout 10s --serviceCluster istio-ingressgateway --zipkinAddress zipkin:9411 --proxyAdminPort 15000 --controlPlaneAuthPolicy NONE --discoveryAddress istio-pilot:8080
    root        64     1  0 Jan15 ?        03:37:59 /usr/local/bin/Envoy -c /etc/istio/proxy/Envoy-rev1.json --restart-epoch 1 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster istio-ingressgateway --service-node router~10.128.45.4~istio-ingressgateway-78c6d8b8d7-sxvpx.istio-system~istio-system.svc.cluster.local --max-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only
    
  • pilot-agent proxy ingress,仅用于 ingress 模式下

pilot-agent 代码流程分析

pilot-agent 需要监视相关的证书。

  • sidecar模式,监视 /etc/certs/ 目录下的 cert-chain.pem/key.pem/root-cert.pem 三个文件;
  • ingress 模式,监控 /etc/istio/ingress-certs/ 目录下的 tls.crt/tls.key 两个文件;

在 pilot-agent 启动的初始日志里面,会打印出来当前的配置和监控的证书,如下:

$ # kubectl logs helloworld-v1-8f8dd85-cfz42 -c istio-proxy |more
2019-01-14T06:35:21.726068Z info    Version root@6f6ea1061f2b-docker.io/istio-1.0.5-c1707e45e71c75d74bf3a5dec8c7086f32f32fad-Clean
2019-01-14T06:35:21.726179Z info    Proxy role: model.Proxy{ClusterID:"", Type:"sidecar", IPAddress:"10.128.5.4", ID:"helloworld-v1-8f8dd8
5-cfz42.default", Domain:"default.svc.cluster.local", Metadata:map[string]string(nil)}
2019-01-14T06:35:21.726865Z info    Effective config: binaryPath: /usr/local/bin/Envoy
configPath: /etc/istio/proxy
connectTimeout: 10s
discoveryAddress: istio-pilot.istio-system:15007
discoveryRefreshDelay: 1s
drainDuration: 45s
parentShutdownDuration: 60s
proxyAdminPort: 15000
serviceCluster: helloworld
zipkinAddress: zipkin.istio-system:9411

2019-01-14T06:35:21.726902Z info    Monitored certs: []Envoy.CertSource{Envoy.CertSource{Directory:"/etc/certs/", Files:[]string{"cert-cha
in.pem", "key.pem", "root-cert.pem"}}}
2019-01-14T06:35:21.727115Z info    Starting proxy agent
2019-01-14T06:35:21.728269Z info    Received new config, resetting budget
2019-01-14T06:35:21.728587Z info    Reconciling configuration (budget 10)
2019-01-14T06:35:21.728626Z info    Epoch 0 starting
2019-01-14T06:35:21.729334Z info    Envoy command: [-c /etc/istio/proxy/Envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutd
own-time-s 60 --service-cluster helloworld --service-node sidecar~10.128.5.4~helloworld-v1-8f8dd85-cfz42.default~default.svc.cluster.local --m
ax-obj-name-len 189 --allow-unknown-fields -l warn --v2-config-only]

本文主要分析 proxy sidecar 方式下的主要逻辑:

image-20190201172553563

Envoy hot restart

主函数入口:

istio.io/istio/pilot/cmd/pilot-agent/main.go

func main() {
    if err := rootCmd.Execute(); err != nil {
        log.Errora(err)
        os.Exit(-1)
    }
}

由于使用的命令行为 pilot-agent proxy sidecar,最终调用的子命令的函数入口

    proxyCmd = &cobra.Command{
        Use:   "proxy",
        Short: "Envoy proxy agent",
        RunE: func(c *cobra.Command, args []string) error {
             // ...
            // 用于设置默认配置文件的默认配置相关参数
            proxyConfig := model.DefaultProxyConfig()

            // set all flags
            proxyConfig.CustomConfigFile = customConfigFile
            proxyConfig.ConfigPath = configPath
            proxyConfig.BinaryPath = binaryPath
            proxyConfig.ServiceCluster = serviceCluster
            proxyConfig.DrainDuration = types.DurationProto(drainDuration)
            proxyConfig.ParentShutdownDuration = types.DurationProto(parentShutdownDuration)
            proxyConfig.DiscoveryAddress = discoveryAddress
            proxyConfig.ConnectTimeout = types.DurationProto(connectTimeout)
            proxyConfig.StatsdUdpAddress = statsdUDPAddress
            proxyConfig.ProxyAdminPort = int32(proxyAdminPort)
            proxyConfig.Concurrency = int32(concurrency)


             // ...
            // 1. 启动 status server
            // If a status port was provided, start handling status probes.
            if statusPort > 0 {
                parsedPorts, err := parseApplicationPorts()
                if err != nil {
                    return err
                }

                statusServer := status.NewServer(status.Config{
                    AdminPort:        proxyAdminPort,
                    StatusPort:       statusPort,
                    ApplicationPorts: parsedPorts,
                })
                go statusServer.Run(ctx)
            }

            // 初始化 EnvoyProxy 对象
            EnvoyProxy := Envoy.NewProxy(proxyConfig, role.ServiceNode(), proxyLogLevel, pilotSAN, role.IPAddresses)

            agent := proxy.NewAgent(EnvoyProxy, proxy.DefaultRetry)
            watcher := Envoy.NewWatcher(certs, agent.ConfigCh())

            // 2. 启动 agent 
            go agent.Run(ctx)

            // 3. 启动 watcher
            go watcher.Run(ctx)

            // 4. 主 goroutine 等待信号量
            stop := make(chan struct{})
            cmd.WaitSignal(stop)
            <-stop
            return nil
        }
    }

status server

如果 statusPort 端口进行了设置,则会启动 statusServer。

  • 对于 ready 检查,调用的路径为/healthz/ready, 并配合设置的端口 applicationPorts 通过 Envoy 的 admin 端口进行对应的端口进行检查,用于决定 Envoy 是否已经 ready 接受相对应的流量。

–statusPort uint16 HTTP Port on which to serve pilot agent status. If zero, agent status will not be provided.

–applicationPorts stringSlice Ports exposed by the application. Used to determine that Envoy is configured and ready to receive traffic.

检查原理是通过本地管理端口,如 http://127.0.0.1:15000/listeners 获取 Envoy 当前监听的全部端口,然后将配置的端口 applicationPorts 在监听的端口中进行查找,来决定 Envoy 是否 ready。

  • 应用端口检查

    检查的路径为 /url 路径,在 header 中设置 istio-app-probe-port 端口,使用 访问路径中的 url 来进行检查,最终调用的是 http://127.0.0.1:istio-app-probe-port/url,头部设置的全部参数也都会传递到别检测的服务端口上;

agent

agent 的函数入口代码位于 istio.io/istio/pilot/pkg/proxy/agent.go 文件中。

interface 定义:

type Agent interface {
    // ConfigCh returns the config channel used to send configuration updates.
    // Agent compares the current active configuration to the desired state and
    // initiates a restart if necessary. If the restart fails, the agent attempts
    // to retry with an exponential back-off.
    ConfigCh() chan<- interface{}

    // Run starts the agent control loop and awaits for a signal on the input
    // channel to exit the loop.
    Run(ctx context.Context)
}

agent 结构定义为:

type agent struct {
    // proxy commands
    proxy Proxy

    // 记录 Envoy 重启的各种参数,包括 time、budget、MaxRetries 和 InitialInterval 间隔
    // 对于异常退出的进程,一般来说抢救 10 次,中间使用退步算法,如果 10 次仍然不好,
    // 则会退出 proxy 容器,启动新的容器
    // retry configuration
    retry Retry

    // 期望使用的配置文件,当前为对应证书的 sha256 的值
    // desired configuration state
    desiredConfig interface{}

    // 用来保存全部对应的 epoch 对应的证书 sha256 的值
    // active epochs and their configurations
    epochs map[int]interface{}

    // 当前使用的配置文件,为对应证书的 sha256 的值
    // current configuration is the highest epoch configuration
    currentConfig interface{}

    // 读取从 watcher 监护到证书变化的 channel
    // channel for posting desired configurations
    configCh chan interface{}

    // 用于监护管理 Envoy 的 channel
    // channel for proxy exit notifications
    statusCh chan exitStatus

    // 记录 epoch 对应的  abortCh channel,当前最大为10个,最大允许10个正在重启中的 proxy 
    // channel for aborting running instances
    abortCh map[int]chan error
}

agent 接口体中外部的控制主要是通过 channel 来实现的:

  • configCh 用于接受到是否有配置文件发生变化,当前主要是有 watcher goroutine 来监视相关的证书,如果证书发生了变化或者定时(当前为10s),configCh 就会节后到 watcher 发送的 sha256 摘要值;

  • statusCh 用于管理启动 Envoy 后的状态通道,用于监视 Envoy 进程的状态;

  • proxy 对象则是实现了对于 Envoy 管理的主要工作,在 proxyCmd 的函数中初始化:

    EnvoyProxy := Envoy.NewProxy(proxyConfig, role.ServiceNode(), proxyLogLevel, pilotSAN, role.IPAddresses)
    

其中 Proxy 为接口定义如下:

// Proxy defines command interface for a proxy
type Proxy interface {
    // Run command for a config, epoch, and abort channel
    Run(interface{}, int, <-chan error) error

    // Cleanup command for an epoch
    Cleanup(int)

    // Panic command is invoked with the desired config when all retries to
    // start the proxy fail just before the agent terminating
    Panic(interface{})
}

agent 的主入口函数为:

istio.io/istio/pilot/pkg/proxy/agent.go

func (a *agent) Run(ctx context.Context) {
    log.Info("Starting proxy agent")

    // Throttle processing up to smoothed 1 qps with bursts up to 10 qps.
    // High QPS is needed to process messages on all channels.
    rateLimiter := rate.NewLimiter(1, 10)

    var reconcileTimer *time.Timer
    for {
        err := rateLimiter.Wait(ctx)
        if err != nil {
            a.terminate()
            return
        }

        // maximum duration or duration till next restart
        var delay time.Duration = 1<<63 - 1
        if a.retry.restart != nil {
            // 如果设置了下次重启的时间间隔,则 delay 设置为该值
            delay = time.Until(*a.retry.restart)
        }

        // 停止原有的 reconcileTimer, 并设置成当前的 delay 值
        if reconcileTimer != nil {
            reconcileTimer.Stop()
        }
        reconcileTimer = time.NewTimer(delay)

        select {
        //  1. 如果相关的配置发生了变化,如果没有变化则忽略
        case config := <-a.configCh:
            if !reflect.DeepEqual(a.desiredConfig, config) {
                log.Infof("Received new config, resetting budget")
                a.desiredConfig = config

                // reset retry budget if and only if the desired config changes
                // 因为配置发生了变化,把下一次重启时间间隔设置为最大
                a.retry.budget = a.retry.MaxRetries
                a.reconcile()
            }

        // 默认的重试策略值为 
        /*
        DefaultRetry = Retry{
            MaxRetries:      10,
            InitialInterval: 200 * time.Millisecond,
        }*/

        // 2. 如果 proxy-Envoy 的状态发生了变化
        case status := <-a.statusCh:
            // delete epoch record and update current config
            // avoid self-aborting on non-abort error
            delete(a.epochs, status.epoch)
            delete(a.abortCh, status.epoch)
            a.currentConfig = a.epochs[a.latestEpoch()]

            // errAbort 为被正常取消情况下的退出,比如 <-ctx.Done() 情况下调用 a.terminate()
            if status.err == errAbort {
                log.Infof("Epoch %d aborted", status.epoch)
            } else if status.err != nil { // 异常情况下的退出
                log.Warnf("Epoch %d terminated with an error: %v", status.epoch, status.err)

                // NOTE: due to Envoy hot restart race conditions, an error from the
                // process requires aggressive non-graceful restarts by killing all
                // existing proxy instances
                //  Envoy热重启竞争条件,进程中的错误需要通过终止所有现有代理实例来进行积极的非正常重启
                a.abortAll()
            } else {
                // 正常情况下的退出
                log.Infof("Epoch %d exited normally", status.epoch)
            }

            // cleanup for the epoch
            // 删除当前 epoch 对应的配置文件
            a.proxy.Cleanup(status.epoch)

            // 设置出错后的重试,由于 proxy 可能已中止,因此当前配置可能已过期。当前配置
            // 将在中止时更改,因此在中止之前重试将不会进行。
            // 如果重新启动的计划尚未安排,需要重新安排相关重启。
            if status.err != nil {
                // skip retrying twice by checking retry restart delay
                // a.retry.restart nil 表示还未安排相关的重启进程
                if a.retry.restart == nil {
                    if a.retry.budget > 0 {
                        delayDuration := a.retry.InitialInterval * (1 << uint(a.retry.MaxRetries-a.retry.budget))
                        restart := time.Now().Add(delayDuration)
                        a.retry.restart = &restart
                        a.retry.budget = a.retry.budget - 1
                        log.Infof("Epoch %d: set retry delay to %v, budget to %d", status.epoch, delayDuration, a.retry.budget)
                    } else {
                        // 耗费了所有的重启次数尝试,仍然不能正常启动,退出容器
                        log.Error("Permanent error: budget exhausted trying to fulfill the desired configuration")
                        a.proxy.Panic(status.epoch)
                        return
                    }
                } else { // 重启已经安排过了
                    log.Debugf("Epoch %d: restart already scheduled", status.epoch)
                }
            }

        // 3. reconcileTimer 时间到了
        case <-reconcileTimer.C:
            a.reconcile()

        case _, more := <-ctx.Done():
            if !more { // 表明被关闭了
                a.terminate()
                return
            }
        }
    }
}

简化一下为:

for {
    // 根据当前的重启策略设置 reconcileTimer 定时器的时间
    select {

    // 接收到的配置如果和当前使用的而配置不相同,则调用, a.reconcile(); 相同则忽略
    case config := <-a.configCh:
        a.reconcile()

   // 检测各种错误值,如果是特定的 errAbort 退出或者 根据退出的各种参数检查判断是预期安排的重启
   // 还是异常退出;同时根据重启的策略设置相关的重启策略,在后续的循环中设置 reconcileTimer 时间
   case status := <-a.statusCh:
        // 非预期错误的错误处理
        // 如果是重启策略失效了,则直接退出当前循环
        // 特定的重启策略内,设置下次重启的时间
        // 特定的重启策略内,设置下次重启的时间

   // 设置的重启时间到达
   case <-reconcileTimer.C:
            a.reconcile()

   // 如果是取消,则全部退出
   case _, more := <-ctx.Done():
      if !more {
            a.terminate()
            return
      }     

在配置发生变化或者异常重启设置重启策略后,最终调用的函数为 reconcile

func (a *agent) reconcile() {
    // cancel any scheduled restart
    a.retry.restart = nil

    log.Infof("Reconciling retry (budget %d)", a.retry.budget)

    // check that the config is current
    if reflect.DeepEqual(a.desiredConfig, a.currentConfig) {
        log.Infof("Desired configuration is already applied")
        return
    }

    // discover and increment the latest running epoch
    epoch := a.latestEpoch() + 1
    // buffer aborts to prevent blocking on failing proxy
    abortCh := make(chan error, maxAborts)
    a.epochs[epoch] = a.desiredConfig
    a.abortCh[epoch] = abortCh
    a.currentConfig = a.desiredConfig

    // 最终的调用,会将相关相关结果放到 abortCh channel 中
    go a.runWait(a.desiredConfig, epoch, abortCh)
}

reconcile 设置相关参数后,最终启动一个新的 goroutine 来进行启动最终的程序

// runWait runs the start-up command as a go routine and waits for it to finish
func (a *agent) runWait(config interface{}, epoch int, abortCh <-chan error) {
    log.Infof("Epoch %d starting", epoch)
    err := a.proxy.Run(config, epoch, abortCh)
    a.statusCh <- exitStatus{epoch: epoch, err: err}
}

runWait 函数中,最终调用 a.proxy.Run(config, epoch, abortCh),并将其返回的错误值放到 agent 的 statusCh channel 中。

对于 Envoy 的启动过程可以通过 proxy.run 来进行总结分析:

Envoy 的结构体定义如下:

type Envoy struct {
    config    meshconfig.ProxyConfig  // 配置文件
    node      string
    extraArgs []string
    pilotSAN  []string
    opts      map[string]interface{}
    errChan   chan error
    nodeIPs   []string
}

istio.io/istio/pilot/pkg/proxy/Envoy/proxy.go

func (e *Envoy) Run(config interface{}, epoch int, abort <-chan error) error {
    var fname string
    // Note: the cert checking still works, the generated file is updated if certs are changed.
    // We just don't save the generated file, but use a custom one instead. Pilot will keep
    // monitoring the certs and restart if the content of the certs changes.
    // 1. 如果指定了模板文件,则使用用户指定的,否则则使用默认的
    if len(e.config.CustomConfigFile) > 0 {
        // there is a custom configuration. Don't write our own config - but keep watching the certs.
        fname = e.config.CustomConfigFile
    } else {
        out, err := bootstrap.WriteBootstrap(&e.config, e.node, epoch, e.pilotSAN, e.opts, os.Environ(), e.nodeIPs)
        if err != nil {
            log.Errora("Failed to generate bootstrap config", err)
            os.Exit(1) // Prevent infinite loop attempting to write the file, let k8s/systemd report
            return err
        }
        fname = out
    }

    // spin up a new Envoy process
    args := e.args(fname, epoch)
    log.Infof("Envoy command: %v", args)

    /* #nosec */
    cmd := exec.Command(e.config.BinaryPath, args...)
    cmd.Stdout = os.Stdout
    cmd.Stderr = os.Stderr
    if err := cmd.Start(); err != nil {
        return err
    }

    // Set if the caller is monitoring Envoy, for example in tests or if Envoy runs in same
    // container with the app.
    if e.errChan != nil {
        // Caller passed a channel, will wait itself for termination
        go func() {
            e.errChan <- cmd.Wait()
        }()
        return nil
    }

    // 通过 done channel 来获取 evnoy 启动的最终状态
    done := make(chan error, 1)
    go func() {
        done <- cmd.Wait()
    }()

    // 等待 abort channel 和 done,用于结束 Envoy 和正确返回当前的启动状态
    select {
    case err := <-abort:
        log.Warnf("Aborting epoch %d", epoch)
        if errKill := cmd.Process.Kill(); errKill != nil {
            log.Warnf("killing epoch %d caused an error %v", epoch, errKill)
        }
        return err
    case err := <-done:
        return err
    }
}

函数 bootstrap.WriteBootstrap 用来生成 Envoy 使用的初始配置文件,默认的配置模板文件路径为 /var/lib/istio/Envoy/Envoy_bootstrap_tmpl.json,该文件也可以通过参数 templateFile 传递进来。

// istio.io/istio/pkg/bootstrap/bootstrap_config.go
DefaultCfgDir     = "/var/lib/istio/Envoy/Envoy_bootstrap_tmpl.json"

整体内容参见 file-Envoy_bootstrap_tmpl-json

Envoy_bootstrap_tmpl.json + meshconfig.ProxyConfig + epoch = Envoy 初始化文件,文件名格式为 ”Envoy-rev%d.json“, 其中 %d 会被替换成 epoch 的值。

在 Envoy 默认配置输出成功以后,就接着构造 Envoy 启动的参数,主要函数如下:

func (e *Envoy) args(fname string, epoch int) []string {
    startupArgs := []string{"-c", fname,
        "--restart-epoch", fmt.Sprint(epoch),
        "--drain-time-s", fmt.Sprint(int(convertDuration(e.config.DrainDuration) / time.Second)),
        "--parent-shutdown-time-s", fmt.Sprint(int(convertDuration(e.config.ParentShutdownDuration) / time.Second)),
        "--service-cluster", e.config.ServiceCluster,
        "--service-node", e.node,
        "--max-obj-name-len", fmt.Sprint(e.config.StatNameLength),
        "--allow-unknown-fields",
    }

    startupArgs = append(startupArgs, e.extraArgs...)

    if e.config.Concurrency > 0 {
        startupArgs = append(startupArgs, "--concurrency", fmt.Sprint(e.config.Concurrency))
    }

    return startupArgs
}

到此为止,Envoy 的配置文件和启动命令行已经构造完成,后续就需要采用 exec.Command 命令来进行启动。

至此 agent 对于 Envoy 的启动和管理流程结束。

watcher

watcher 的整体逻辑相对比较简单,就是 watch 相关的证书变化,当证书有变化或者定期(10s),向 agent 发送配置当前的 sha256 摘要,agent 如果发现配置已经变化,则会触发 agent 重启 Envoy,并增加 epoch 的值。

istio.io/istio/pilot/pkg/proxy/Envoy/watcher.go

func (w *watcher) Run(ctx context.Context) {
    // kick start the proxy with partial state (in case there are no notifications coming)
    w.SendConfig()  // 用于向 agent 发送相关的摘要值

    // monitor certificates
    certDirs := make([]string, 0, len(w.certs))
    for _, cert := range w.certs {
        certDirs = append(certDirs, cert.Directory)
    }

    go watchCerts(ctx, certDirs, watchFileEvents, defaultMinDelay, w.SendConfig)

    <-ctx.Done()
}

watchCerts 的函数主体:

// watchCerts watches all certificate directories and calls the provided
// `updateFunc` method when changes are detected. This method is blocking
// so it should be run as a goroutine.
// updateFunc will not be called more than one time per minDelay.
func watchCerts(ctx context.Context, certsDirs []string, watchFileEventsFn watchFileEventsFn,
    minDelay time.Duration, updateFunc func()) {
    fw, err := fsnotify.NewWatcher()
    if err != nil {
        log.Warnf("failed to create a watcher for certificate files: %v", err)
        return
    }
    defer func() {
        if err := fw.Close(); err != nil {
            log.Warnf("closing watcher encounters an error %v", err)
        }
    }()

    // watch all directories
    for _, d := range certsDirs {
        if err := fw.Watch(d); err != nil {
            log.Warnf("watching %s encounters an error %v", d, err)
            return
        }
    }
    watchFileEventsFn(ctx, fw.Event, minDelay, updateFunc)
}

关于监听的相关证书,则来自于注入时从命名空间中 secret 中加载到 pod 中的文件,deployment文件如下:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  creationTimestamp: null
  name: helloworld-v2
spec:
  replicas: 1
  strategy: {}
  template:
    metadata:
      labels:
        app: helloworld
        version: v2
    spec:
      containers:
      - image: istio/examples-helloworld-v2
        imagePullPolicy: IfNotPresent
        name: helloworld
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: 100m
      - args:
        - proxy
        - sidecar
        - --configPath
        - /etc/istio/proxy
        - --binaryPath
        - /usr/local/bin/Envoy
        - --serviceCluster
        - helloworld
        - --drainDuration
        - 45s
        - --parentShutdownDuration
        - 1m0s
        - --discoveryAddress
        - istio-pilot.istio-system:15007
        - --discoveryRefreshDelay
        - 1s
        - --zipkinAddress
        - zipkin.istio-system:9411
        - --connectTimeout
        - 10s
        - --proxyAdminPort
        - "15000"
        - --controlPlaneAuthPolicy
        - NONE
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: INSTANCE_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: ISTIO_META_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: ISTIO_META_INTERCEPTION_MODE
          value: REDIRECT
        - name: ISTIO_METAJSON_LABELS
          value: |
            {"app":"helloworld","version":"v2"}
        image: docker.io/istio/proxyv2:1.0.5
        imagePullPolicy: IfNotPresent
        name: istio-proxy
        ports:
        - containerPort: 15090
          name: http-Envoy-prom
          protocol: TCP
        resources:
          requests:
            cpu: 10m
        securityContext:
          readOnlyRootFilesystem: true
          runAsUser: 1337
        volumeMounts:
        - mountPath: /etc/istio/proxy
          name: istio-Envoy
        - mountPath: /etc/certs/   # 将证书挂载的目录, watcher 会监视该目录
          name: istio-certs
          readOnly: true
      initContainers:
      - args:
        - -p
        - "15001"
        - -u
        - "1337"
        - -m
        - REDIRECT
        - -i
        - '*'
        - -x
        - ""
        - -b
        - "5000"
        - -d
        - ""
        image: docker.io/istio/proxy_init:1.0.5
        imagePullPolicy: IfNotPresent
        name: istio-init
        resources: {}
        securityContext:
          capabilities:
            add:
            - NET_ADMIN
          privileged: true
      volumes:
      - emptyDir:
          medium: Memory
        name: istio-Envoy
      - name: istio-certs    # 来自于 istio.default 的 secret
        secret:
          optional: true
          secretName: istio.default

使用 kubectl 命令验证:

# kubectl get secrets istio.default
NAME            TYPE                    DATA   AGE
istio.default   istio.io/key-and-cert   3      17d

# kubectl get secrets istio.default -o yaml
apiVersion: v1
data:
  cert-chain.pem: xxxx
  key.pem: xxxx==
  root-cert.pem: xxxx==
kind: Secret
metadata:
  annotations:
    istio.io/service-account.name: default
  creationTimestamp: 2019-01-15T08:24:26Z
  name: istio.default
  namespace: default
  resourceVersion: "16685160"
  selfLink: /api/v1/namespaces/default/secrets/istio.default
  uid: f863ce6f-189e-11e9-ab53-00163e0c1552
type: istio.io/key-and-cert

证书的生成和管理是由 Citadel 负责的,具体细节可以参见 Keys and Certificates ,可以使用以下命令来检查证书的详细信息

# jq 为命令下处理 json 的工具,参见 
# https://www.ibm.com/developerworks/cn/linux/1612_chengg_jq/index.html
$ sudo yum install -y jq  

$ kubectl get secret -o json istio.default | jq -r '.data["cert-chain.pem"]' | base64 --decode | openssl x509 -noout -text

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            07:d1:25:13:1c:38:db:ef:89:8e:95:2e:d1:6c:b5:4d
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: O=k8s.cluster.local
        Validity
            Not Before: Jan 15 08:24:26 2019 GMT
            Not After : Apr 15 08:24:26 2019 GMT
        Subject: O=
        Subject Public Key Info:
            Public Key Algorithm: rsaEncryption
                Public-Key: (2048 bit)
                Modulus:
                    00:d1:13:34:58:e7:b0:6e:b5:07:0e:bd:7f:d5:a0:
                    66:d9:4a:2a:6d:ec:bd:26:ab:22:26:31:c7:c9:48:
                    da:57:9a:5a:91:b3:7c:78:2c:c4:8d:14:4b:b1:b4:
                    a4:29:3d:26:d1:ad:8d:6e:6f:b0:27:64:31:93:cf:
                    43:be:f4:04:0a:d2:0f:e6:dc:45:4a:5d:38:65:c0:
                    08:44:25:5f:e8:2d:c3:2a:9a:6b:82:bb:27:81:59:
                    c7:f5:38:66:b1:f2:06:eb:94:46:34:47:c5:b1:9d:
                    01:59:04:e7:8e:df:bf:ed:17:f8:16:06:9d:85:c0:
                    e9:43:0f:3a:a0:b6:b9:64:50:3e:e1:26:e3:03:d4:
                    dc:43:08:ef:de:af:56:5c:d5:6c:c1:72:72:7b:f5:
                    05:f8:09:15:08:2d:f3:5b:c7:57:7d:1a:15:72:90:
                    3e:df:0a:e6:a1:e3:d9:81:9f:bb:9c:f2:c5:da:7f:
                    48:a6:4d:12:f7:5e:e8:21:99:1d:f0:95:d7:c5:1a:
                    34:d5:a7:56:79:4b:dd:82:a1:39:cc:d5:0b:e3:fa:
                    92:04:21:38:89:41:7c:9b:12:aa:c4:5f:93:c1:1d:
                    fc:bc:5a:6d:e5:3d:98:e1:9c:28:38:58:75:c6:e3:
                    27:5f:77:4b:10:b6:53:70:7b:25:fd:5c:44:28:67:
                    54:51
                Exponent: 65537 (0x10001)
        X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication, TLS Web Client Authentication
            X509v3 Basic Constraints: critical
                CA:FALSE
            X509v3 Subject Alternative Name:
                URI:spiffe://cluster.local/ns/default/sa/default
    Signature Algorithm: sha256WithRSAEncryption
         68:fb:87:12:c6:d1:fb:c1:69:fa:ec:2e:30:ea:f7:4d:8f:9c:
         5b:54:a1:f9:a3:5f:ff:83:3f:76:c5:d6:9c:2b:cb:55:6a:66:
         49:b5:a2:bd:dd:71:06:7f:a4:f8:18:cd:0d:7a:3f:bf:4c:56:
         e0:35:5b:68:2d:71:72:e3:a2:7d:b9:90:f3:86:d0:1a:87:f6:
         31:7e:24:db:00:a8:69:df:54:7f:8b:b0:1d:7d:02:03:c0:26:
         1a:87:53:aa:e4:66:76:e9:80:1e:28:61:53:0c:53:c6:1e:80:
         7e:b0:2d:71:75:1b:42:9f:51:f4:5c:7b:53:ee:06:02:31:5d:
         71:2d:b5:4a:dd:58:25:a6:c4:24:ad:19:86:ac:24:87:99:ad:
         6b:be:c8:ae:84:7e:7d:86:0a:1d:44:a0:50:62:2d:8b:d6:79:
         ff:db:43:40:de:3b:ec:4e:2b:80:87:e5:1a:cb:1e:cf:e6:12:
         8e:96:10:46:f7:fa:ed:1c:bb:0b:11:35:41:c8:69:43:64:79:
         44:ea:a8:72:b7:27:2a:0d:a6:39:bb:34:b0:8b:e3:86:ba:3c:
         1e:b9:ee:b5:61:bb:c8:65:b0:8d:bd:a1:9c:29:64:7d:0b:2c:
         f9:9b:34:18:98:38:24:ad:85:b8:1e:59:41:09:1f:2e:a8:6d:
         ef:ee:0d:a8

特别是,Subject Alternative Name 字段应为 URI:spiffe://cluster.local/ns/default/sa/default

Envoy hot restart

经过上述代码分析,我们得知 pilot-agent 基本上就是准备 Envoy 初始配置文件和管理 Envoy 的生命周期(包括初始化启动、异常退出重启、终止),但是对于 hot restart 的过程则是完全不参与,Envoy 的 hot restart 的过程则是由 Envoy 自己来负责管理的。

当前 host restart 的方式主要有以下两种:

  1. Rollling deploy,部署多个服务实例,通过切换流量的方式来逐步迁移,方案需要基础设施的配合支持;
  2. host restart deploy,在同一个主机或者容器内启动多个实例,由服务自身完成相关的工作;

方式1提供了更安全可控的流量迁移方式,能够提供各种复杂情况下的流量迁移技巧,但是对于基础设置的要求也比较高;Envoy 采用方式2,主要原因是方式1需要基础实施的配合支持,比较重,而且在 lyft 公司内部也没有容器化,所以采用轻量方式2,让 Envoy 的 host restart 方案具备了更多的适用场景。

Envoy hot restart 的目标有以下几点:

  1. 整个进程 reload(不是配置)不能丢失连接;
  2. 在 reload 过程中,统计相关的信息需要保持一致;重启过程中的多个实例统计要保持步调一致;
  3. 使用基于容器的不可变部署仍然可以进行热重启; 同一个主机上不可变容器的方式只能通过共享内存和网络的方式进行;
  4. 老的进程驱逐连接的速率和销毁进度可以配置;

Envoy 的 host restart 的方式,在工程方面考虑的更加周全,在保证流量不丢失且逐步驱逐的前提下,更多考虑了对于 Envoy 整体的可观测性。重启过程中的多个实例在外部看来是一个逻辑的整体,通过共享内存共享 stats 统计信息和全局 locks。多实例的进程间通过 UDS(unix domain sockets) 协议进行通信。新老实例是通过其拥有的 epoch 值来进行区分,epoch == 0 的进程负责创建共享内存。

Envoy hot restart 过程中的交互流程如下图,即使图中的两个进程位于同一主机上两个不同的容器内,仍然能够正常工作;

  1. secondary 进程要求 primary 进程关闭其管理端口。secondary 进程将接管所有管理职责,包括统计刷新。从运维人员的角度来看,两者只有一个逻辑 Envoy 进程。
  2. secondary 加载其配置并开始绑定到侦听套接字。在此阶段,其通过 UDS 从 primary 获取 listen sockets。 (在当前的实现中,Envoy 没有使用 SO_REUSEPORT 套接字选项。这主要是历史原因,因为套接字选项仅适用于相对较新的内核。在某些时候,可以删除对旧内核的支持,我们将代码切换为使用这个套接字选项)。
  3. 一旦 secondary 进程完全初始化,其通知 primary 进程停止接受新连接并开始 drain。drain 时间可配置,默认为 15 分钟。在这15分钟内(或配置的任何时间),primary 进程将开始正常关闭连接。drain 耗尽的时间越长,关闭率就越高。实现平滑地关闭旧连接,然后重新在 secondary 进程建立新连接。
  4. 在 drain 阶段,secondary 进程负责刷新统计数据。大多数统计数据存储在共享内存中,不需要通过 RPC 获取。但是,一些特殊的统计数据仅用于 drain 过程中 primary 进程。例如,primary 进程仍然打开了多少连接以及它分配了多少内存。这些统计数据由 secondary 获取并统计,因此可以更容易地观察到 drain 速率。
  5. 最后,在 drain 时间过去之后,secondary 告诉 primary 关闭。 打开的任何剩余连接都将关闭。此时,secondary 服务器成为主服务器,并且运行一个 Envoy 进程。
  6. 重复以上过程。

参考

  1. istio源码分析之pilot-agent模块分析
  2. [Istio实战系列-Envoy Proxy构建分析](Istio实战系列-Envoy Proxy构建分析)
  3. Istio Proxy【Envoy扩展】详解
  4. 蚂蚁金服Service Mesh渐进式迁移方案
  5. Envoy 官方文档中文版

发表评论

您的电子邮箱地址不会被公开。 必填项已用*标注