Inside docker overlay network

  Network, System
 

Docker Overlay Network is a good way to create isolate layer two networks where the containers are distributed on different physical hosts. It’s possible to define two equal subnets without any overlap because the network interfaces of containers are in different namespaces.

The frame between hosts are tunneled by vxlan protocol that manages until to 16777216 vlan: much more of 4096 permitted by 802.1q protocol.

In my laboratory I will create two overlay networks called red and green with the same subnet ip 10.65.10.0/24. Every network will have a different vxlan identifier: to all effects they are two different vlans.

The hosts inside the two vlans can communicate only between them. All the containers can reach internet. The approach points the way witch openstack frameworks permits to different tenants the creation of different vlans even if belong to same subnet.

All the technical aspects behind the scene will be explained. The reference architecture is the following:

docker-overlay-network

In the laboratory four containers are created: namespace_01 and namespace_02 with one interface in red overlay network; namespace_03 and namespace_04 with one interface in green overlay network.

Docker engine stores all the information about overlay network in a key-value store: in this case I have chosen consul daemon.

The prerequisite of the article is to run a consul daemon and configure docker for using it: for that please read the section swarm docker cluster installation of my article.

Let’s start how to create the two overlay networks and what happens inside linux network.

Inside Docker Network

For explaining how the overlay network are implemented, two overlay networks are created: reed and green. Both networks belong to same subnet: 10.65.10.0/24. For doing that, the following commands need be executed on only docker node:

[root@docker-01 docker]# docker network create –subnet 10.65.10.0/24 –driver overlay red
249690033b0dab7f42cbe0a0582609024d3b8c672ce488a74afe558dc2b91807
[root@docker-01 docker]# docker network create –subnet 10.65.10.0/24 –driver overlay green
39849b4682f20de4af68b32cd5d291609e4f03632866e3875bf24844bdff1441
[root@docker-01 docker]# docker network ls|grep overlay
39849b4682f2 green overlay global
249690033b0d red overlay global

Now two containers namespace_01 and namespace_03 are created belonging to network green using a very light linux container: busybox.

On swarm-01

[root@swarm-01 ~]# docker run –name namespace_01 -itd –network=red busybox
b3b648bd3b2aee6d03a4867471b8d257d915ed96122c746fb198978ccc453648
[root@swarm-01 ~]# docker ps -a |grep busybox
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6276d2565185 busybox “sh” 2 hours ago Up 2 hours namespace_01

On swarm-02

[root@swarm-02 ~]# docker run –name namespace_03 -itd –network=red busybox
6276d25651858454e6637d9c87a4a336f6adeed40e38a142a8ec0a401cdeea5e
[root@swarm-02 ~]# docker ps -a|grep busybox
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6276d2565185 busybox “sh” 2 hours ago Up 2 hours namespace_03

Let’s to explain what happened behind the scene.

On both dockers machine one network namespace is created. Every docker container is running inside it. For checking the namespaces, we should create the following links (it permits to use the command ‘ip netns‘).

[root@swarm-02 ~]# ln -s /var/run/docker/netns /var/run/netns
[root@swarm-01 ~]# ln -s /var/run/docker/netns /var/run/netns

Following the network information about the namespace_01 container:

docker-network-overlay_01

As you can see the namespace_01 container (the same for namespace_03)  is inside a network namespace that isolates the container from the network of the docker system.

The namespace_01 container contains two interfaces: the first, eth0, belongs to red overlay network (i will explain it later) and the other, eth1, configured for permitting to container to reach the external.

First, let’s speak about the eth1 interface.

This interface is a part of a veth pair of directly connected virtual interfaces: the traffic sent to one interface of the pair, it’s received in the other. It permits to the container or its network namespace to communicate with its default gateway that is a bridge network. In the image below is showed the network namespace configuration and how to match the pair veth interfaces.

docker-network-overlay_02

The default gateway of the container is the docker_bridge that by a source NAT forwards the packets to external world.

The following image shows the network architecture already explained:

docker-overlay-network-eth1

The eth1 interface is the port to external world. Only the external word can be reached because other type of traffic are blocked by iptables rule.

Let’s go now to explain the inside red overlay network.

The red overlay network is a a layer two network spanned on both nodes by a vxlan tunnel. The eth0 interfaces of both containers (namespace_01 and namespace_03) belong to this network. Every interface is a part of another veth pair interfaces: the other side is inside an hidden network namespace.

The following picture explains it:

Docker Overlay Network

 

The frame of red network reach both hosts because they are encapsulated inside udp segment by vxlan protocol. Every vlan has a different vxlan virtual network identifier for avoiding overlap.

Following the commands in order to check the red network configuration on the swarm-02 node (same thing for the swarm-01 node).

For checking the container network configuration (namespace_03):

[root@swarm-02 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6276d2565185 busybox “sh” 2 hours ago Up 2 hours namespace_03
8fa7f190b3e6 swarm “/swarm join –addr=1” 13 days ago Up 13 days 2375/tcp infallible_allen
[root@swarm-02 ~]# docker exec -it 6276d2565185  ip addr show
93: eth0@if94: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
link/ether 02:42:0a:41:0a:03 brd ff:ff:ff:ff:ff:ff
inet 10.65.10.2/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe41:a03/64 scope link
valid_lft forever preferred_lft forever
95: eth1@if96: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 02:42:ac:12:00:02 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.3/16 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:2/64 scope link
valid_lft forever preferred_lft forever

The eth0 interface is part of veth pair interface: the other side is  interconnected to veth interface (veth2) of a hidden namespace where is configured the vxlan tunnel.

Following how to show the network configuration of the hidden network namespace:

[root@swarm-02 ~]# ip netns ls
80c64628846e (id: 5)
1-5eded2abc0 (id: 4)
2c7e95af1b56 (id: 0)
[root@swarm-02 ~]# ip netns exec 1-5eded2abc0 ip addr show
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
link/ether ce:87:1e:20:dc:93 brd ff:ff:ff:ff:ff:ff
inet 10.65.10.1/24 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::4c7d:6bff:fe90:ed4/64 scope link
valid_lft forever preferred_lft forever
92: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN
link/ether f2:7e:3a:a9:51:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::f07e:3aff:fea9:51a6/64 scope link
valid_lft forever preferred_lft forever
94: veth2@if93: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP
link/ether ce:87:1e:20:dc:93 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::cc87:1eff:fe20:dc93/64 scope link
valid_lft forever preferred_lft forever

The docker daemon has created in the forwarding table of the hidden namespace (id 1-5eded2abc0) an association between the mac-address of other container of the vlan namespace_01 and the ip address of other docker system (192.168.1.51).

[root@swarm-02 ~]# ip netns exec 1-5eded2abc0 bridge fdb show dev vxlan1
2e:30:93:b6:b8:5c master br0
f2:7e:3a:a9:51:a6 master br0 permanent
02:42:0a:41:0a:02 master br0
f2:7e:3a:a9:51:a6 vlan 1 master br0 permanent
2e:30:93:b6:b8:5c dst 192.168.1.51 link-netnsid 0 self
02:42:0a:41:0a:03 dst 192.168.1.51 link-netnsid 0 self permanent
02:42:0a:41:0a:02 dst 192.168.1.52 link-netnsid 0 self permanent

Inside the docker container namespace_03 another association is created between the mac-address of other container of the vlan namespace_01 and its ip address. In this way no arp resolution is necessary for encapsulating the traffic on the vxlan tunnel.

[root@swarm-02 ~]# docker exec -it 6276d2565185  arp -a
namespace_01.red (10.65.10.2) at 02:42:0a:41:0a:03 [ether] on eth0

The ping from namespace_03 to namespace_01 works correctly.

[root@swarm-02 ~]# docker exec -it 6276d2565185  ping 10.65.10.2
PING 10.65.10.2 (10.65.10.2): 56 data bytes
64 bytes from 10.65.10.2: seq=0 ttl=64 time=0.475 ms
64 bytes from 10.65.10.2: seq=1 ttl=64 time=0.421 ms
— 10.65.10.2 ping statistics —
4 packets transmitted, 0 received, 100% packet loss, time 2999ms.

The icmp packet is sent to eth0 interface (no arp is needed), received from veth2 interface of hidden namespace, encapsulated in a udp packet and sent to other vxlan endpoint of the swarm-01 node.

Following the trace showed with wireshark:

vxlan stack

Now let’s go to create the green overlay network with the same subnet ip 10.65.10.0/24 and two other containers: namespace_02 on swarm-01 and namespace_04 on swarm-02.

The green network is created by the following command:

[root@swarm-02 ~]# docker network create –subnet 10.65.10.0/24 –driver overlay  green

On both swarm nodes the new namespaces are created:

[root@swarm-02 ~]# docker run –name namespace_04 -itd –network=green busybox
[root@swarm-02 ~]# docker ps -a |grep namespace_04
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb43171af488 busybox “sh” 2 seconds ago Up 1 seconds namespace_04
[root@swarm-01 ~]# docker run –name namespace_02 -itd –network=green busybox
[root@swarm-01 ~]# docker ps -a |grep namespace_02
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3fdb91c71830 busybox “sh” 2 seconds ago Up 1 seconds namespace_02

The network configuration of namespace_04 container (the configuration of namespace_02 is specular):

[root@swarm-02 ~]# docker exec -it fb43171af488 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
98: eth0@if99: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1450 qdisc noqueue
link/ether 02:42:0a:41:0a:05 brd ff:ff:ff:ff:ff:ff
inet 10.65.10.5/24 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::42:aff:fe41:a05/64 scope link
valid_lft forever preferred_lft forever
100: eth1@if101: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue
link/ether 02:42:ac:12:00:03 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.3/16 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::42:acff:fe12:3/64 scope link
valid_lft forever preferred_lft forever

The eth0 interface is one of a pair veth interface: the other of the pair is inside another hidden namespace:

[root@swarm-02-02 ~]# ip netns ls
6c4007150d35 (id: 7)
1-05c633350b (id: 6)
80c64628846e (id: 5)
1-5eded2abc0 (id: 4)
2c7e95af1b56 (id: 0)
[root@swarm-02 ~]# ip netns exec 1-05c633350b ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
link/ether ba:7e:49:48:f7:ab brd ff:ff:ff:ff:ff:ff
inet 10.65.10.4/24 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::600c:15ff:fef5:f4d3/64 scope link
valid_lft forever preferred_lft forever
97: vxlan1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UNKNOWN
link/ether ba:7e:49:48:f7:ab brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::b87e:49ff:fe48:f7ab/64 scope link
valid_lft forever preferred_lft forever
99: veth2@if98: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master br0 state UP
link/ether de:a7:e5:af:f1:ab brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::dca7:e5ff:feaf:f1ab/64 scope link
valid_lft forever preferred_lft forever

Now it’s possible to ping the container_02 from container_04.

[root@swarm-02 ~]# docker exec -it fb43171af488 ping 10.65.10.5
PING 10.65.10.5 (10.65.10.5) 56(84) bytes of data.
64 bytes from 10.65.10.5: icmp_seq=1 ttl=64 time=0.128 ms
64 bytes from 10.65.10.5: icmp_seq=2 ttl=64 time=0.051 ms
^C
— 10.65.10.5 ping statistics —

The trace with wireshark:

vxlan protocol

It’s not possible to ping from green network the container inside red network even if they have the same subnet. The two networks are isolated:

[root@swarm-02 ~]# docker exec -it fb43171af488 ping 10.65.10.3
PING 10.65.10.3 (10.65.10.3) 56(84) bytes of data.

Let’s explain the role of consul daemon used to store all the information about the overlay network.

The information about the two overlay networks can be showed in this way:

For the green network:

[root@swarm-01 ~]# docker network inspect green |grep Id
“Id”: “0c8ed0040dcb0b3f231a6e69722c474e7ddc1a093cf36f98a72f1a19978ab6d4”,
[root@swarm-01 ~]# curl -v http://localhost:8500/v1/kv/swarm/docker/network/v1.0/overlay/network/0c8ed0040dcb0b3f231a6e69722c474e7ddc1a093cf36f98a72f1a19978ab6d4/
* About to connect() to localhost port 8500 (#0)
* Trying ::1…
* Connected to localhost (::1) port 8500 (#0)
> GET /v1/kv/swarm/docker/network/v1.0/overlay/network/0c8ed0040dcb0b3f231a6e69722c474e7ddc1a093cf36f98a72f1a19978ab6d4/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8500
> Accept: */*
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 967
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Sat, 17 Dec 2016 19:16:06 GMT
< Content-Length: 340
[{“CreateIndex”:523,”ModifyIndex”:967,”LockIndex”:0,”Key”:”swarm/docker/network/v1.0/overlay/network/0c8ed0040dcb0b3f231a6e69722c474e7ddc1a093cf36f98a72f1a19978ab6d4/”,”Flags”:3304740253564472344,”Value”:”eyJtdHUiOjAsInNlY3VyZSI6ZmFsc2UsInN1Ym5ldHMiOlt7IlN1Ym5ldElQIjoiMTAuNjUuMTAuMC8yNCIsIkd3SVAiOiIxMC42NS4xMC40LzI0IiwiVm5pIjoyNTd9XX0=“}][root@swarm-01 ~]

The Value decoded in Base 64 contains:

{“mtu”:0,”secure”:false,”subnets”:[{“SubnetIP”:”10.65.10.0/24″,”GwIP”:”10.65.10.4/24″,”Vni”:257}]}

It means that the network green will be encapsulated with a vxlan Vni 257 (see the trace above). For the red network:

[root@swarm-01 ~]# docker network inspect red |grep Id
“Id”: “2302c1f694b200d91a601b00f9cfd45fe6a6a0ec28bb332bda00c9eda0a782e7”,
[root@swarm-01 ~]# curl -v http://localhost:8500/v1/kv/swarm/docker/network/v1.0/overlay/network/2302c1f694b200d91a601b00f9cfd45fe6a6a0ec28bb332bda00c9eda0a782e7/
* About to connect() to localhost port 8500 (#0)
* Trying ::1…
* Connected to localhost (::1) port 8500 (#0)
> GET /v1/kv/swarm/docker/network/v1.0/overlay/network/2302c1f694b200d91a601b00f9cfd45fe6a6a0ec28bb332bda00c9eda0a782e7/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: localhost:8500
> Accept: */*
< HTTP/1.1 200 OK
< Content-Type: application/json
< X-Consul-Index: 78
< X-Consul-Knownleader: true
< X-Consul-Lastcontact: 0
< Date: Sat, 17 Dec 2016 20:54:30 GMT
< Content-Length: 338
* Connection #0 to host localhost left intact
[{“CreateIndex”:68,”ModifyIndex”:78,”LockIndex”:0,”Key”:”swarm/docker/network/v1.0/overlay/network/2302c1f694b200d91a601b00f9cfd45fe6a6a0ec28bb332bda00c9eda0a782e7/”,”Flags”:3304740253564472344,”Value”:”eyJtdHUiOjAsInNlY3VyZSI6ZmFsc2UsInN1Ym5ldHMiOlt7IlN1Ym5ldElQIjoiMTAuNjUuMTAuMC8yNCIsIkd3SVAiOiIxMC42NS4xMC4xLzI0IiwiVm5pIjoyNTZ9XX0=“}][root@swarm-01 ~]

The hexadecimal value is:

{“mtu”:0,”secure”:false,”subnets”:[{“SubnetIP”:”10.65.10.0/24″,”GwIP”:”10.65.10.1/24″,”Vni”:256}]}

In this case the Vni is 256 (see the trace above).

The consul daemon is the container that stores all the information about the overlay network and permit to docker daemon to configure in a right way the hidden namespaces where the vxlan tunnel is accessible.

Conclusions

In this article I showed how to use docker for creating overlay network isolated from docker hosts by network namespaces.

This approach permits to create secure environments well segmented and isolated that can only communicate with external world.

The same result obtained by openstack with neutron for providing network as service to different tenants.

Don’t hesitate to contact me for any questions.

LEAVE A COMMENT