Kubernetes network cluster architecture with calico

  Network, System
 

In this article I will go deeper into the implementation of networking in kubernetes cluster explaining a scenario implemented wit Calico network plugin.

Calico is a open source networking and network solution for containers that can be easily integrated with kubernetes by the container network interface specification that are well described here.

I chose Calico because is easy to understand and it provides us the chance to understand how the networking is managed by a kubernetes cluster because every other network plugin can be integrated with the same approach.

I will work on a kubernetes cluster, composed by a master and one worker, installed and configured with kubeadm following the kubernetes documentation. The reference architecture used for explaing how the kubernetes networking works:

kubernetes-network-architecture-with-calico

Following the procedure for installing and configuring the kubernetes cluster with calico network.

Kubernetes cluster: installation and configuration

The kubernetes cluster will be installed on two centos 7 server: master-01 (10.30.200.1) and worker-01 (10.30.200.2).

Following the commands to execute on the master for installing the kubernetes cluster with kubeadm:

[root@master-01 ~]#update-alternatives --set iptables /usr/sbin/iptables-legacy
[root@master-01 ~]#cat < /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
> EOF
[root@master-01 ~]#setenforce 0
[root@master-01 ~]#sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
[root@master-01 ~]#yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
[root@master-01 ~]#cat <  /etc/sysctl.d/k8s.conf
> net.bridge.bridge-nf-call-ip6tables = 1
> net.bridge.bridge-nf-call-iptables = 1
> EOF
[root@master-01 ~]#sysctl --system
[root@master-01 ~]#yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
[root@master-01 ~]#swapoff -a
[root@master-01 ~]#cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master-01 ~]#chown $(id -u):$(id -g) $HOME/.kube/config
[root@master-01 ~]#systemctl enable kubelet
[root@master-01 ~]#systemctl restart kubelet
[root@master-01 ~]#kubectl get pods -n kube-system
NAME                                                   READY   STATUS    RESTARTS   AGE
coredns-6955765f44-b5dkz                               0/1     Pending   0          4m13s
coredns-6955765f44-xqbm8                               0/1     Pending   0          4m13s
etcd-master-01                                         1/1     Running   0          4m
kube-apiserver-master-01                               1/1     Running   0          4m
kube-controller-manager-master-01                      1/1     Running   0          4m
kube-proxy-fszvv                                       1/1     Running   0          4m13s
kube-scheduler-master-01                               1/1     Running   0          4m

You must install a pod network add-on so that your pods can communicate with each other.

CoreDNS will not start up before a network is installed. kubeadm only supports Container Network Interface (CNI) based networks that I will explain when the cluster is up&running.

The other kubernetes core pod – apiserver, scheduler, controller, etcd, kube-proxy – are running because they are under the node network namespace and they can access to all network namespaces.

Infact, if you take a look at the file inside kubelet manifest directory, that contains all the core pod to run at startup, you will find that all these pods running with hostNetwork: true.

[root@master-01 manifests]# pwd
/etc/kubernetes/manifests
[root@master-01 manifests]# grep "hostNetwork:\ true" *.yaml
etcd.yaml: hostNetwork: true
kube-apiserver.yaml: hostNetwork: true
kube-controller-manager.yaml: hostNetwork: true
kube-scheduler.yaml: hostNetwork: true

If you want to confirm that the apiserver, for example, is in the same network namespace of node, you can verify that the namespace is equal to systemd daemon. In this way it’s possible to contact the api server directly in the port where the process is listening, 6443 in this case, without any natting involved.

[root@master-01  manifests]# ps -afe |grep apiserver
root 4312 4295 4 Dec22 ? 05:09:55 kube-apiserver --advertise-address=10.30.200.1 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
[root@master-01 manifests]# ls -ltr /proc/4312/ns/net
lrwxrwxrwx. 1 root root 0 Dec 22 18:25 /proc/4312/ns/net -> net:[4026531956]
[root@master-01 manifests]# ls -ltr /proc/1/ns/net
lrwxrwxrwx. 1 root root 0 Dec 22 18:25 /proc/1/ns/net -> net:[4026531956]

Now I will get the authentication token and a sha of the kubernetes certification autority that will used for join the worker to cluster:

[root@master-01 ~]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
t9idi2.5p5miat5ghkjntav 23h 2019-12-22T18:00:25+01:00 authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
[root@master-01 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | \
> openssl dgst -sha256 -hex | sed 's/^.* //'
7152ba645c5c16a222df23d4f03b162ff5caee1959b4rf069395369221840c07<\pre>

With these authentication info, it’s possible to add a worker to cluster (6443 is the port where the apiserver is listening)

[root@worker-01]# cat < /etc/yum.repos.d/kubernetes.repo
> [kubernetes]
> name=Kubernetes
> baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
> enabled=1
> gpgcheck=1
> repo_gpgcheck=1
> gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
> EOF
[root@worker-01]# yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
[root@worker-01]# swapoff -a
[root@worker-01~]# kubeadm join --token t9idi2.5j8miat5ghjjntav 10.30.200.1:6443 --discovery-token-ca-cert-hash sha256:7152ba611c3456a222df23da303b162ff5caee1959b1ff069395369221840c07
W1220 18:39:55.148808 3408 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 18.03.0-ce. Latest validated version: 19.03
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.<\pre>

On the master, it’s possible to show the node status. The authentication with the api server is performed by certifications signed by a certification authority visible to apiserver by the its following parameter: –client-ca-file=/etc/kubernetes/pki/ca.crt.

[root@master-01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-01 Ready <none> 63s v1.17.0
worker-01 Ready master 40m v1.17.0

The cluster is up&running, and we are ready to install calico and explain how it works.

Kubernetes cluster: Calico Installation

There are three components of a Calico / Kubernetes integration:

  1. calico/node. It’s running as daemon set on all nodes of the cluster and contains the BGP agent necessary for Calico routing to occur, and the Felix agent which programs network policy rules.
  2. cni-plugin. Each CNI plugin must be implemented as an executable that is invoked by the container management system kubelet. This binary is installed from a init container inside the daemon set.
  3. calico/kube-controllers. It runs as Deployment and it has the scope to manage the network policy watching the Kubernetes API for Pod, Namespace, and NetworkPolicy events and configuring Calico in response. It runs as a single pod managed by a ReplicaSet.

The config.yaml to apply contains all the info need for installing all the calico components. Respect to default configuration, I changed these parametes:

  1. The authentication method, adding the variable IP_AUTODETECTION_METHOD=”interface=ens160″ in calico-node pod of the daemon set. In this way the felix uses as ip address, for the bgp peering connections, that of the ens160 interface.
  2. The IPV4 Pool to use for assigning ip addresses to node of the cluster. The variable to change is CALICO_IPV4POOL_CIDR that I set to 10.5.0.0/16.

After that, I can install calico with these simple commands:

[root@master-01 manifests]#wget https://docs.projectcalico.org/v3.8/manifests/calico.yaml

[root@master-01 manifests]#kubectl apply -f calico.yaml [root@master-01 ~]# kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-778676476b-n84g8 1/1 Running 0 6d4h calico-node-c2frd 1/1 Running 0 3d5h calico-node-ffwtk 1/1 Running 0 3d5h coredns-6955765f44-b5dkz 1/1 Running 0 6d4h coredns-6955765f44-xqbm8 1/1 Running 0 6d4h etcd-master-01 1/1 Running 0 6d4h kube-apiserver-master-01 1/1 Running 0 6d4h kube-controller-manager-master-01 1/1 Running 0 6d4h kube-proxy-9dznf 1/1 Running 0 6d4h kube-proxy-fszvv 1/1 Running 0 6d4h kube-scheduler-master-01 1/1 Running 0 6d4h [root@master-01 ~]# kubectl get daemonset -n kube-system NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE calico-node 2 2 2 2 2 beta.kubernetes.io/os=linux 6d21h

A lot of custom resources used are installed and they contain data and metadata used by calico.

[root@master-01 ~]# kubectl get crd
NAME CREATED AT
bgpconfigurations.crd.projectcalico.org 2019-12-22T17:11:11Z
bgppeers.crd.projectcalico.org 2019-12-22T17:11:11Z
blockaffinities.crd.projectcalico.org 2019-12-22T17:11:11Z
clusterinformations.crd.projectcalico.org 2019-12-22T17:11:11Z
felixconfigurations.crd.projectcalico.org 2019-12-22T17:11:11Z
globalnetworkpolicies.crd.projectcalico.org 2019-12-22T17:11:11Z
globalnetworksets.crd.projectcalico.org 2019-12-22T17:11:11Z
hostendpoints.crd.projectcalico.org 2019-12-22T17:11:11Z
ipamblocks.crd.projectcalico.org 2019-12-22T17:11:11Z
ipamconfigs.crd.projectcalico.org 2019-12-22T17:11:11Z
ipamhandles.crd.projectcalico.org 2019-12-22T17:11:11Z
ippools.crd.projectcalico.org 2019-12-22T17:11:11Z
networkpolicies.crd.projectcalico.org 2019-12-22T17:11:11Z
networksets.crd.projectcalico.org 2019-12-22T17:11:11Z

All the components of the cluster are up&running, and we are ready to explain how the calico networking works in kubernetes.

Kubernetes cluster: inside calico felix networking.

Every node of the clusters has running a calico/node container that containes the BGP agent necessary for Calico routing. It’s a mesh network where every nodes has a peering connections with all the others. It’s possible to go inside the calico pod and check the mesh network state:

[root@master-01 manifests]# kubectl exec -it calico-node-c2frd /bin/sh -n kube-system
[root@master-01 manifests]:/# ./calicoctl node status
Calico process is running. IPv4 BGP status
+---------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+---------------+-------------------+-------+----------+-------------+
| 10.30.200.2 | node-to-node mesh | up | 16:23:39 | Established |<>

The IP class address used by BGP protocol for assigning to every node of the cluster belong to a IPPool that is possible to show in this way:

[root@master-01 ~]# kubectl describe crd ippools.crd.projectcalico.org
Name:         default-ipv4-ippool
Namespace:    
Labels:       
Annotations:  projectcalico.org/metadata: {"uid":"ff22f0c1-49f4-477a-b1ba-3404555da89b","creationTimestamp":"2019-12-22T17:11:38Z"}
API Version:  crd.projectcalico.org/v1
Kind:         IPPool
Metadata:
  Creation Timestamp:  2019-12-22T17:11:38Z
  Generation:          1
  Resource Version:    2040
  Self Link:           /apis/crd.projectcalico.org/v1/ippools/default-ipv4-ippool
  UID:                 fcbb9e2e-31f1-4878-a865-f989ca90fa42
Spec:
  Block Size:     26
  Cidr:           10.5.0.0/16
  Ipip Mode:      Always
  Nat Outgoing:   true
  Node Selector:  all()
  Vxlan Mode:     Never
Events:           

This object is a custom resources definition that is extensions of the Kubernetes API. In this case, it contains these type of information:

  1. Cidr: IP Pools to be used to determining the subnet class assigned to any node of the cluster.
  2. Ipip Mode: IP-in-IP encapsulation is used for forward the ip packets from one node to other. The original ip packet is encapsulated inside another ip packet where the source and destination ip address are that of the worker and the master. This encapsulation is more perfomaning than vxlan encapsulation that is disabled.

Don’t confuse the Cidr with the –service-cluster-ip-range, parameter of apiserver, that is a IP range from which to assign service cluster IPs. This must not overlap with any IP ranges assigned to nodes for pods by Calico. In our example, this vip service range is 10.96.0.0/12 different from pod range that is 10.5.0.0/16.

The result of the bgp mesh are the following routes added in the two nodes of the cluster.

[root@master-01 manifests]# netstat -rnv
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.30.200.254 0.0.0.0 UG 0 0 0 ens160
10.5.53.128 10.30.200.2 255.255.255.192 UG 0 0 0 tunl0
10.5.252.192 0.0.0.0 255.255.255.192 U 0 0 0 *//ADDED BY FELIX BGP CLIENT
10.5.252.193 0.0.0.0 255.255.255.255 UH 0 0 0 cali3f91d23777d//ADDED BY Calico-cni.
10.5.252.194 0.0.0.0 255.255.255.255 UH 0 0 0 cali84836540a5b//ADDED BY Calico-cni.
10.5.252.195 0.0.0.0 255.255.255.255 UH 0 0 0 cali61f7d79e884//ADDED BY Calico-cni.
10.5.252.197 0.0.0.0 255.255.255.255 UH 0 0 0 cali12aa5f52300//ADDED BY Calico-cni.
10.30.200.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
[root@master-01 manifests]# ip addr show dev tunl0
27: tunl0@NONE: <noarp,up,lower_up> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.5.252.192/32 brd 10.5.252.192 scope global tunl0
valid_lft forever preferred_lft forever</noarp,up,lower_up>
picture[root@worker-01 ~]# netstat -rnv
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 10.30.200.254 0.0.0.0 UG 0 0 0 ens160
10.5.53.128 0.0.0.0 255.255.255.192 U 0 0 0 *//ADDED BY FELIX BGP CLIENT
10.5.53.138 0.0.0.0 255.255.255.255 UH 0 0 0 calif917f594b43//ADDED BY Calico-cni.
10.5.53.140 0.0.0.0 255.255.255.255 UH 0 0 0 calic94e94000cf//ADDED BY Calico-cni.
10.5.252.192 10.30.200.1 255.255.255.192 UG 0 0 0 tunl0
10.30.200.0 0.0.0.0 255.255.255.0 U 0 0 0 ens160
169.254.0.0 0.0.0.0 255.255.0.0 U 0 0 0 ens160
[root@worker-01 ~]# ip addr show dev tunl0
15: tunl0@NONE: <noarp,up,lower_up> mtu 1440 qdisc noqueue state UNKNOWN qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.5.53.128/32 brd 10.5.53.128 scope global tunl0
valid_lft forever preferred_lft forever<\pre></noarp,up,lower_up>

Following a graphic rapresentation about the ip-ip tunneling implementation by Felix agent running in both nodes of the cluster. Every felix agent receives via BGP the subnet assigned to other node and configure a route in the routing tables for forwarding this subnet received by ip in ip tunneling.

I showed also a hypotetical ip packet travelling in the network: there two ip layers, the first with the ip address of physical addresses of two nodes; the field proto of this packet is set to IPIP; the other ip packet contains the ip addresses of pod involved in the comunication – i will explain better this later.

IP-in-IP encapsulation is one IP packet encapsulated inside another and all the configuration is done by calico-node running in any node of the clusters. Every pod running in the cluster will contact the other pod without any knowledge about it. The packet is encapsulated from the tunnel ip-ip and sent to destination node where it’s running the destination pod. This node receives the packet because the mac address match its network interface and the destination ip address is set to physical node address.

Infact, if I try to ping from a pod to another, it’s possible to see the encapsulation packets by tcpdump. As showed below, the source and destination ip of the packet travelling the network are the ip interfaces of two nodes: 10.30.200.2 (worker-01) 10.30.200.1 (master-01). The proto field of this ip packet is IPIP.

Inside this packet there is the original packet where the source and destination ip are that of the pods involved in the communication: the pod with ip 10.5.53.142, running in the master, that connects to pod with ip 10.5.252.19, running in the worker.

[root@worker-01 ~]# tcpdump -i any host 10.30.200.1
13:35:58.115982 In 00:50:56:c4:8b:93 ethertype IPv4 (0x0800), length 96: (tos 0x0, ttl 63, id 45577, offset 0, flags [DF], proto IPIP (4), length 80)
10.30.200.1 > 10.30.200.2: (tos 0x0, ttl 63, id 47726, offset 0, flags [DF], proto TCP (6), length 60)
10.5.53.142.40554 > 10.5.252.197.80: Flags [S], cksum 0x6d2a (correct), seq 3257238968, win 28000, options [mss 1400,sackOK,TS val 12992470 ecr 0,nop,wscale 7], length 0
--
13:35:58.116105 Out 00:50:56:b4:b8:01 ethertype IPv4 (0x0800), length 96: (tos 0x0, ttl 63, id 58858, offset 0, flags [DF], proto IPIP (4), length 80)
10.30.200.2 > 10.30.200.1: (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
10.5.252.197.80 > 10.5.53.142.40554: Flags [S.], cksum 0xe63d (correct), seq 462088542, ack 3257238969, win 27760, options [mss 1400,sackOK,TS val 331968281 ecr 12992470,nop,wscale 7], length 0
--
13:35:58.116315 In 00:50:56:c4:8b:93 ethertype IPv4 (0x0800), length 88: (tos 0x0, ttl 63, id 45578, offset 0, flags [DF], proto IPIP (4), length 72)
10.30.200.1 > 10.30.200.2: (tos 0x0, ttl 63, id 47727, offset 0, flags [DF], proto TCP (6), length 52)
10.5.53.142.40554 > 10.5.252.197.80: Flags [.], cksum 0x8063 (correct), seq 3257238969, ack 462088543, win 219, options [nop,nop,TS val 12992470 ecr 331968281], length 0

Now it’s time to explain how the comunication between kubelet and calico-cni happens inside a kubernetes node and how the traffic is forwarded from inside a pod network to node network before forwarding to other node by the tunnel interface.

Kubernetes cluster: inside calico cni networking plugin

The interface between the kubernetes and the calico plugin is the container network interface described in this github project: https://github.com/containernetworking/cni/blob/master/SPEC.md 

The goal of this specification is to specify a interface between the container runtime, that in our case is kubelet daemon, and the cni plugin that is calico.  The network configuration is a json file installed by calico in the directory /etc/cni/netd that is the default directory where kubelet looks for network plugin. This the default configuration:

[root@worker-01 ~]# cat /etc/cni/net.d/10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"nodename": "worker-01",
"mtu": 1440,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
}
]
}

The network configuration includes mandatory fields and this is the meaning of the main parameters:

type: calico. The calico cni plugin, invoked as binary from kubelet and installed by the init container of calico-node daemon set, responsible for inserting a network interface into the container network namespace (e.g. one end of a veth pair) and making any necessary changes on the host (e.g. attaching the other end of the veth).

type: calico-ipam. It’s called from the above plugin, and it assigns the IP to the veth interface and setup the routes consistent with the IP Address Management. Each host that has calico/node running on it has its own /26 subnet derived from CALICO_IPV4POOL_CIDR that in our case is set to 10.5.0.0/16. The route inserted, in the master-01, by calico is showed following: it means that the worker-01 node has assigned the subnet 10.5.53.128/26 and it’s reachable by the tunnel interface.

10.5.53.128/26 via 10.30.200.2 dev tunl0 proto bird onlink 

mtu: 1440. It’s the mtu of the veth interface set to 1440 lower than default 1500 because the ip packets are forwarded inside a ip in ip tunneling,

type: k8s. This is for enabling the Kubernetes NetworkPolicy API.

kubeconfig: /etc/cni/net.d/calico-kubeconfig. This file contains the authentication certificate and key for read-only Kubernetes API access to the Pods resource in all namespaces. This is necessary in order to implement the network policy above.

type: portmap and snat: true, The calico networking plugin supports hostPort and this enable calico to perform DNAT and SNAT for the Pod hostPort feature. Kubernetes suggest to use instead of it the kubernetes port forward: https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/ .

The kubelet after creating the container, calls the calico plugin, installed in the /opt/cni/bin/ directory of any node, and it makes any necessary changes on the hosts assigning the IP to the interface and setup the routes.

For describing what is done by calico plugin, I will create a nginx-deployment, with two replicas. For forcing the scheduler to run pods also in the master, I will have to delete the taint configured on it:

[root@master-01 ~]# kubectl taint nodes --all node-role.kubernetes.io/master-
node/master-01 untainted
[root@master-01 k8s-test-01]# vi nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
[root@master-01 k8s-test-01]# kubectl apply -f nginx-deployment.yaml [root@master-01 k8s-test-01]# kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx-deployment-54f57cf6bf-gn7x8 1/1 Running 0 3d22h 10.5.252.197 master-01
nginx-deployment-54f57cf6bf-jmp9l 1/1 Running 0 22m 10.5.53.142 worker-01 <none> <none>

Let’s see inside the network namespace of the nginx-deployment-54f57cf6bf-jmp9l pod and how is related to node network namespace of the worker-01 node.

After getting the containerID of the pod, I can login to worker-01 for showing the network configured by the calico plugin:

[root@master-01 ~]# kubectl describe pod nginx-deployment-54f57cf6bf-jmp9l | grep Cont
Containers:
Container ID: docker://02f616bbb36d5165ded96a219ac7448203de68890a7d3b9a0df5b0a9bafaf0f6
ContainersReady True

On worker-01, after getting the pid of the nginx process from Container ID of the pod, I can get the network namespace of the process,  with container id 02f616bbb36d, and the veth network interface of node called cali892ef576711.

[root@worker-01 ~]# ln -s /var/run/docker/netns  /var/run/netns 
[root@worker-01 ~]# docker ps -a |grep 02f616bbb36d
02f616bbb36d 84581e99d807 "nginx -g 'daemon of…" 3 minutes ago Up 3 minutes k8s_nginx_nginx-deployment-54f57cf6bf-jmp9l_default_ee80161a-2064-4fb3-8c4d-b4fce949405e_0
[root@worker-01 ~]# docker inspect 02f616bbb36d | grep Pid
"Pid": 5800,
"PidMode": "",
"PidsLimit": 0,
root@worker-01 ~]# ls -ltr /proc/5800/ns/net
lrwxrwxrwx 1 root root 0 Dec 30 22:44 /proc/5800/ns/net -> net:[4026532790]
[root@worker-01 ~]# ls -1i /var/run/netns
4026532651 1-upztggam3a
4026532790 5ff75f86ca72
4026532892 6ae49790cde0
4026531956 default
4026532562 ingress_sbox
[root@worker-01 ~]# ip netns exec 5ff75f86ca72 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if15: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP
link/ether 6a:cb:00:ef:d2:29 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.5.53.142/32 scope global eth0
valid_lft forever preferred_lft forever
[root@worker-01 ~]# ethtool -S cali892ef576711
NIC statistics:
peer_ifindex: 4
[root@worker-01 ~]# ip route show |grep cali892ef576711
10.5.53.142 dev cali892ef576711 scope link

I remember that the veth interface is a way to permit to a isolated network namespace to communicate with the system network namespace: every packet sent to a of two veth interface it’s received from the other veth interface. In this way, the communication between the container and the external world is possible. In a docker standalone configuration, the other side of veth interface of the container is attached to a linux bridge where are attached all the veth interfaces of the containers of the same network. Calico doesn’t attach this veth interface to any bridge permitting the communication between containers inside the same pod and using the ip in ip tunneling for the routing between pod runnning in different nodes.

Following a picture that describes the changes done by calico-cni plugin in both nodes of the clusters. In the scenario described below is showed a ip packet sent into ip-in-ip tunnel from a pod, running in worker-01, with 10.5.53.142 ip address to a pod, runnning in master-01, with 10.5.252.197 ip address.

In this picture it’s showed clearly the role of the two calico binary:

calico-felix: It’s responsabile to populate the routing tables of any node for permitting the routing, via ip-in-ip tunnel, between the nodes of the clusters. the routing protocl used is the BGP.

calico-cni: It’s responsible for inserting a network interface into the container network namespace (e.g. one end of a veth pair) and making any necessary changes on the host (e.g. attaching the other end of the veth into a bridge).

Conclusion

In this article I have explained how the kubernetes networking with calico plugin is implemented.

The important thing to understand is that the interation between kubelet and calico is described by container network interface and this gives the possibility to integrate in kubernetes, without changing the core go modules, any network plugin where its configuration is saved by the json file.

The integration, following the open source spirit, is opened and well documented and this permitted the development of a lot of network plugin.

I hope that this article helped  to understand better this interesting topic of kubernetes.

LEAVE A COMMENT