Fix for Docker Inside Docker Networking issue on Kubernetes

Abhijeet Kamble
4 min readDec 15, 2020

--

I would like to share my experience on the issue that I was facing from few days. We have a Continuous Integration server and we have build farm of 200+ agents/slaves.

Recently we moved the agents/slave to kubernetes cluster and we started experiencing a network issue.

Problem

While running Docker Inside Docker(DIND) We are experiencing some network related issue.

The Problem was, the docker inside docker was unable to process any URL’s and fails with timeout issues.

The architecture is complex when running docker build inside kubernetes. From running a host machine we have a pod running docker daemon which will help us to run docker builds from CI server.

We started getting failed command for docker build or docker run commands.

[root@c04pesbalXX abhi]# kubectl exec -it bamboo-agent-7bdcc66f49-dhqjq -n bamboo-agent-docker -- /bin/sh
sh-4.3#
sh-4.3# docker run -it hashicorp/terraform init0.12.24: Pulling from hashicorp/terraform
c9b1b535fdd9: Pulling fs layer
011000b168e5: Pulling fs layer
4c096b23c4a8: Pulling fs layer
c9b1b535fdd9: Verifying Checksum
c9b1b535fdd9: Download complete
011000b168e5: Verifying Checksum
011000b168e5: Download complete
c9b1b535fdd9: Pull complete
4c096b23c4a8: Verifying Checksum
4c096b23c4a8: Download complete
011000b168e5: Pull complete
4c096b23c4a8: Pull complete
Digest: sha256:53fb1c0a78c8bb91c4a855c1b352ea7928f6fa65f8080dc7a845e240dd2a9bee
Status: Downloaded newer image for hashicorp/terraform:0.12.24Initializing the backend...Successfully configured the backend "s3"! Terraform will automaticallyuse this backend unless the backend configuration changes.Initializing provider plugins...- Checking for available provider plugins...Registry service unreachable.This may indicate a network issue, or an issue with the requested Terraform Registry.Error verifying checksum for provider "template"The checksum for provider distribution from the Terraform Registrydid not match the source. This may mean that the distributed fileswere changed after this version was released to the Registry.Error: registry service is unreachable, check https://status.hashicorp.com/ for status updatesError: unable to verify checksum

After checking this issue is not related to hashicorp/terraform:0.12.24 image nor with any other docker image or any URL’s getting triggered on Docker build command as well.

Solution

After doing lot of research regarding this issue, I came to know that its some network related issue. Thanks to https://mlohr.com/docker-mtu/ for writing a blog on the issue and explained in detail.

The issue is with the network MTU’s. MTU stands for Maximum Transmission Unit is the largest protocol data unit that can be communicated in a single network layer transaction.

So, I have host which has eth0 and docker0 and MTU’s defined as 1500.

[root@c04pesbal472 kamblea]# ifconfig
datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
ether ee:3c:60:db:dd:a1 txqueuelen 1000 (Ethernet)
RX packets 12930 bytes 1349718 (1.2 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:a8:a0:41:7e txqueuelen 0 (Ethernet)
RX packets 554487 bytes 91672085 (87.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 851473 bytes 1801995025 (1.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.81.21.120 netmask 255.255.252.0 broadcast 100.81.23.255
ether 00:50:56:97:38:5b txqueuelen 1000 (Ethernet)
RX packets 67531210 bytes 18015889819 (16.7 GiB)
RX errors 0 dropped 230695 overruns 0 frame 0
TX packets 59070841 bytes 10659101883 (9.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.80.45.166 netmask 255.255.248.0 broadcast 100.80.47.255
ether 00:50:56:97:e8:f7 txqueuelen 1000 (Ethernet)
RX packets 4913461 bytes 384821835 (366.9 MiB)
RX errors 0 dropped 347293 overruns 0 frame 0
TX packets 2228 bytes 93704 (91.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 146992447 bytes 27673873348 (25.7 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 146992447 bytes 27673873348 (25.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

On top of this host we have a pod running, if we exec into the pod and see the network configuration.

[root@c04pesbal332 abhi]# docker exec -it ab2b259e54b4 /bin/sh
sh-4.3# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:53:b3:ef:42 brd ff:ff:ff:ff:ff:ff
inet 192.168.255.129/25 brd 192.168.255.255 scope global docker0
valid_lft forever preferred_lft forever
108: eth0@if109: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default
link/ether 9a:7d:7f:60:13:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.39.0.3/12 brd 10.47.255.255 scope global eth0
valid_lft forever preferred_lft forever
sh-4.3#

though the docker config is showing MTU of 1500, its using the bridge network created by docker(which is the default network)

sh-4.3# docker network inspect bridge
[
{
"Name": "bridge",
"Id": "abaa249b95dec904d5dd71523a2202a6b4ab4ec54a28f0a1ec59678969e63917",
"Created": "2020-12-10T11:56:33.411365327Z",
"Scope": "local",
"Driver": "bridge",
"EnableIPv6": false,
"IPAM": {
"Driver": "default",
"Options": null,
"Config": [
{
"Subnet": "192.168.255.129/25",
"Gateway": "192.168.255.129"
}
]
},
"Internal": false,
"Attachable": false,
"Ingress": false,
"ConfigFrom": {
"Network": ""
},
"ConfigOnly": false,
"Containers": {},
"Options": {
"com.docker.network.bridge.default_bridge": "true",
"com.docker.network.bridge.enable_icc": "true",
"com.docker.network.bridge.enable_ip_masquerade": "true",
"com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
"com.docker.network.bridge.name": "docker0",
"com.docker.network.driver.mtu": "1360"
},
"Labels": {}
}
]
sh-4.3#

So, If we see the bridge configuration we have setup the MTU for docker as 1360. To do so, you have to add this to /etc/docker/daemon.json file.

{
"mtu": 1360
}

So to understand,

On Host Machine
eth0 -- 1500 MTU
docker0 -- 1500 MTU
On Pod
eth0 -- 1360 MTU
docker0 -- 1500 MTU

Here if you notice, both docker network has MTU of 1500 which is causing the issue. We need to keep it less than 1500 MTU of the host.

Once done it will solve the network issue.

Thanks for Reading.

--

--