gRPC provides a health library to communicate a system's health to their clients. It works by providing a service definition via the health/v1 api.
By using the health library, clients can gracefully avoid using servers as they encounter issues. Most languages provide an implementation out of box, making it interoperable between systems.
go run server/main.go -port=50051 -sleep=5s go run server/main.go -port=50052 -sleep=10s
go run client/main.go
Clients have two ways to monitor a servers health. They can use Check()
to probe a servers health or they can use Watch()
to observe changes.
In most cases, clients do not need to directly check backend servers. Instead, they can do this transparently when a healthCheckConfig
is specified in the service config. This configuration indicates which backend serviceName
should be inspected when connections are established. An empty string (""
) typically indicates the overall health of a server should be reported.
// import grpc/health to enable transparent client side checking import _ "google.golang.org/grpc/health" // set up appropriate service config serviceConfig := grpc.WithDefaultServiceConfig(`{ "loadBalancingPolicy": "round_robin", "healthCheckConfig": { "serviceName": "" } }`) conn, err := grpc.Dial(..., serviceConfig)
See A17 - Client-Side Health Checking for more details.
Servers control their serving status. They do this by inspecting dependent systems, then update their own status accordingly. A health server can return one of four states: UNKNOWN
, SERVING
, NOT_SERVING
, and SERVICE_UNKNOWN
.
UNKNOWN
indicates the current state is not yet known. This state is often seen at the start up of a server instance.
SERVING
means that the system is healthy and ready to service requests. Conversely, NOT_SERVING
indicates the system is unable to service requests at the time.
SERVICE_UNKNOWN
communicates the serviceName
requested by the client is not known by the server. This status is only reported by the Watch()
call.
A server may toggle its health using healthServer.SetServingStatus("serviceName", servingStatus)
.