In this article I explain how to discovery new microservices in kubernetes balancing them automatically without loosing the sticky session feature necessary for managing stateful services.
In order to archive it, I will use a new feature of haproxy, present starting from 1.8 version, that can update an HAProxy configuration during run time, and described in this article
https://www.haproxy.com/documentation/aloha/9-5/traffic-management/lb-layer7/dns-srv-records/.
The haproxy microservice becames the http interface versus the external world that can be reached directly pointing to worker nodes by NodePort, as done in this case, or by ingress controller running in the master nodes.
The haproxy balances the traffic versus backend microservices, managing automatically the scaling up and down. Properly configured is able to manage the stick session by dynamic cookies that are a hash of the balanced ip pod.
The reference architecture involved in this scenario is a classic kubernetes cluster with master and worker nodes described in this picture:
Let me start to explain the limit of kubernetes services that justifies the use of this architecture.
Services in Kubernetes
A Kubernetes Service, as described, is an abstraction which defines a logical set of Pods and a policy by which to access them – sometimes called a micro-service
The main type of services managed in kubernetes are: (I report the defintion from
https://kubernetes.io/docs/concepts/services-networking/service/#headless-services):
ClusterIP
: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the defaultServiceType
.NodePort
: Exposes the service on each Node’s IP at a static port (theNodePort
). AClusterIP
service, to which theNodePort
service will route, is automatically created. You’ll be able to contact theNodePort
service, from outside the cluster, by requesting<NodeIP>:<NodePort>
.LoadBalancer
: Exposes the service externally using a cloud provider’s load balancer.NodePort
andClusterIP
services, to which the external load balancer will route, are automatically created.
In all this case, the balancing is very simple and there is no chance to implement a sticky session feature or more balancing functionality that can be archived by the use of external balancer.
For this scope, a service called HeadLess can be used, where a cluster IP is not allocated, kube-proxy does not handle these services, and there is no load balancing or proxying done by the platform for them.
This type of service provides instead an SRV record with all endpoints actived under a service.
If the "headless-service"
Service
has a port named "http"
with protocol TCP
, you can do a DNS SRV query for "_http._tcp._headless-service.namespace.cluster.svc.local
."
to discover the service port number and all the endpoint actived.
Let me make a example.
Following, I will create a deployment with a simple nginx POD, and a headless service related to it by the selector that must match the labels of pod template.
I use the kube-system namespace, but in a real scenario it’s better to work in a different namespace.
[root@kali haproxy]#cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-headless
labels:
app: nginx-headless
spec:
replicas: 1
selector:
matchLabels:
app: nginx-headless
template:
metadata:
labels:
app: nginx-headless
spec:
containers:
- name: nginx-headless
image: nginx:latest
ports:
- containerPort: 80
[root@kali haproxy]#kubectl apply -f nginx.yaml
[root@kali haproxy]#kubectl get pods|grep nginx-head
nginx-headless-6bf7c57748-dv54w 1/1 Running 0 10m
[root@kali haproxy]#cat service-headless.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app: headless-service
name: headless-service
spec:
clusterIP: None
ports:
port: 80
protocol: TCP
name: headeless-service
selector:
app: nginx-headless
apiVersion: v1
kind: Service
metadata:
labels:
[root@kali haproxy]#kubectl get service|grep head
headless-service ClusterIP None 80/TCP 12m
The headless service permits to discovery all the ip addresses actived under the headless-service by a SRV query sent to dns-kube service virtual address that is the ip address inserted in any /etc/resolv.conf of any pod in kubernetes.
[root@kali haproxy]#dig -t SRV _headless-service._tcp.headless-service.kube-system.svc.cluster.local @10.0.0.10
;;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 21495
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_headeless-service._tcp.headless-service.kube-system.svc.cluster.local. IN SRV
;; ANSWER SECTION:
_headless-service._tcp.headless-service.kube-system.svc.cluster.local. 5 IN SRV 0 100 80 10-1-43-243.headless-service.kube-system.svc.cluster.local.
;; ADDITIONAL SECTION:
10-1-43-243.headless-service.kube-system.svc.cluster.local. 5 IN A 10.1.43.243
The ip 10.1.43.243 is the address of nginx container. Infact:
[root@kali haproxy]#kubectl describe service headless-service
Name: headless-service
Namespace: kube-system
Labels: app=headless-service
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"headless-service"},"name":"headless-service","namespace"…
Selector: app=nginx-headless
Type: ClusterIP
IP: None
Port: headeless-service 80/TCP
TargetPort: 80/TCP
Endpoints: 10.1.43.243:80
Session Affinity: None
Events:
Now, I scale the deployment to 3 replica:
[root@kali haproxy]#kubectl scale deployment.v1.apps/nginx-headless --replicas=3
deployment.apps/nginx-headless scaled
[root@kali haproxy]# kubectl get pods |grep nginx
nginx-headless-6bf7c57748-dv54w 1/1 Running 0 25h
nginx-headless-6bf7c57748-lgh7t 1/1 Running 0 24h
nginx-headless-6bf7c57748-z272h 1/1 Running 0 24h
When the new pods are available, they will be visible in the SRV dns query. It’s good practise to configure a HTTP readinessProbe that provides 200 OK when the pod is available to receive traffic: kubernetes will mark available only the pods truly available.
[root@kali haproxy]#dig -t SRV _headeless-service._tcp.headless-service.kube-system.svc.cluster.local @10.0.0.10
; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1667
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 4
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;_headeless-service._tcp.headless-service.kube-system.svc.cluster.local. IN SRV
;; ANSWER SECTION:
_headeless-service._tcp.headless-service.kube-system.svc.cluster.local. 5 IN SRV 0 33 80 10-1-122-6.headless-service.kube-system.svc.cluster.local.
_headeless-service._tcp.headless-service.kube-system.svc.cluster.local. 5 IN SRV 0 33 80 10-1-43-243.headless-service.kube-system.svc.cluster.local.
_headeless-service._tcp.headless-service.kube-system.svc.cluster.local. 5 IN SRV 0 33 80 10-1-75-201.headless-service.kube-system.svc.cluster.local.
;; ADDITIONAL SECTION:
10-1-75-201.headless-service.kube-system.svc.cluster.local. 5 IN A 10.1.75.201
10-1-43-243.headless-service.kube-system.svc.cluster.local. 5 IN A 10.1.43.243
10-1-122-6.headless-service.kube-system.svc.cluster.local. 5 IN A 10.1.122.6
This feature provided by the headless kubernete service is used by haproxy for managing the scaling of microservices and it’s explained in the next paragraph.
Haproxy DNS Discovery
Haproxy, starting from 1.8 version, supports a feature for DNS service discovery that can update an HAProxy configuration during run time, such as changes in server status, IP addresses, ports, and weights. without making explicit changes to configuration files.
In order to configure haproxy for dns discovery, it’s necessary to have the following configuration:
resolvers test
nameserver dns1 10.0.0.10:53
hold timeout 600s
hold refused 600s
frontend fe_main
bind *:8889
default_backend be_template
backend be_template
balance roundrobin
dynamic-cookie-key MYKEY
cookie SRVID insert dynamic
option tcp-check
server-template srv 10 _headless-service._tcp.headless-service.kube-system.svc.cluster.local resolvers test check
The directive server-template creates a slot of 10 back-end servers configurated automatically by the dns srv query response for the domain name “_headless-service._tcp.headless-service.kube-system.svc.cluster.local“.
The query is sent to kube-dns service that in my case is 10.0.0.10:53. This is the same ip address present in any /etc/resolv.conf of any pod, populated by kubernetes.
The configuration permits to manage a cookie that is a hash of microservice ip like that:
Set-Cookie: SRVID=3af2f2e4d041847a; path=/
I suggest you to tune opportunely the hold timeout for using the last dns result in case of timeout versus kube-dns. This is useful for having a robust balancer even if the control nodes have some type of instability.
The haproxy.cfg becames:
[root@kali haproxy]#cat haproxy.cfg
global
log syslogserver local1 info
pidfile /var/run/hapee-1.7/hapee-lb.pid
user root
group root
stats socket /var/run/hapee-lb.sock mode 666 level admin
daemon
external-check
defaults
mode http
log global
option httplog
timeout connect 10s
timeout client 300s
timeout server 300s
frontend public
bind *:8888
mode http
stats enable
stats realm HAProxy-lb1
stats uri /stats
stats refresh 30s
stats auth yyyy:XXXXXX
stats admin if TRUE
stats hide-version
resolvers test
nameserver dns1 10.0.0.10:53
hold timeout 600s
hold refused 600s
frontend fe_main
bind *:8889
default_backend be_template
backend be_template
balance roundrobin
dynamic-cookie-key MYKEY
cookie SRVID insert dynamic
option tcp-check
server-template srv 10 _headless-service._tcp.headless-service.kube-system.svc.cluster.local resolvers test check
Now I’m going to create the haproxy microservice by the following deployment:
[root@kali haproxy]#cat haproxy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: haproxy-headless
labels:
app: haproxy-headless
spec:
replicas: 1
selector:
matchLabels:
app: haproxy-headless
template:
metadata:
labels:
app: haproxy-headless
spec:
containers:
- name: haproxy-headless
image: haproxy:1.8
ports:
- containerPort: 8888
name: management
- containerPort: 8889
name: http
volumeMounts:
- mountPath: "/usr/local/etc/haproxy/haproxy.cfg"
name: hostpathvol
subPath: /etc/haproxy.cfg
volumes:
- name: hostpathvol
hostPath:
path: /etc/haproxy.cfg
[root@kali haproxy]#kubectl -f apply haproxy-deployment.yaml
I used a hostPath volume for storing the haproxy.cfg file. It’s better to use a gluster or nfs volume for avoiding to store the haproxy.cfg in any worker node of the cluster.
In production I suggest to set Replica to 2 in order to have the haproxy in high availability.
The haproxy must be reachable on NodePort service that that I’m going to configure exposing, to external world, the management and traffic port.
[root@kali haproxy]#cat haproxy-service.yaml
apiVersion: v1
kind: Service
metadata:
name: haproxy-headless
labels:
app: haproxy-headless
spec:
type: NodePort
ports:
- port: 8888
targetPort: 8888
protocol: TCP
nodePort: 30888
- port: 8889
targetPort: 8889
protocol: TCP
nodePort: 30889
selector:
app: haproxy-headless
[root@kali haproxy]#kubectl -f apply haproxy-service.yaml
[root@kali haproxy]#kubectl get service |grep haproxy
haproxy-headless NodePort 10.0.217.146 8888:30888/TCP,8889:30889/TCP 10d
It’s possible to monitoring the haproxy at this address, http://worker_ip_node:30888/stats, and configure external reverse proxy to kubernetes for balancing the http traffic to haproxy at 30889 of any worker node (http://worker_ip_node_01:30889/, http://worker_ip_nod_02:30889/, etc..) rather than to the backend microservices, that in our case are represented by simple nginx.
Another approach for exposing the haproxy is to configure a ingress reachable to control nodes that proxies all the http traffic versus the haproxy service. In this case it’s necessary to create a cluster ip without NodePort becasue the haproxy service will be contacted from ingress controller and not externally.
Conclusion
I hope to have helped you to resolve a big problem in the microservices management with kubernetes that is related to scaling of containers that are not stateless.
For any question, don’t hesitate to contact me.