Fix for Docker Inside Docker Networking issue on Kubernetes

4 min readDec 15, 2020

I would like to share my experience on the issue that I was facing from few days. We have a Continuous Integration server and we have build farm of 200+ agents/slaves.

Recently we moved the agents/slave to kubernetes cluster and we started experiencing a network issue.

Problem

While running Docker Inside Docker(DIND) We are experiencing some network related issue.

The Problem was, the docker inside docker was unable to process any URL’s and fails with timeout issues.

The architecture is complex when running docker build inside kubernetes. From running a host machine we have a pod running docker daemon which will help us to run docker builds from CI server.

We started getting failed command for docker build or docker run commands.

[root@c04pesbalXX abhi]# kubectl exec -it bamboo-agent-7bdcc66f49-dhqjq -n bamboo-agent-docker -- /bin/sh
sh-4.3#sh-4.3# docker run -it hashicorp/terraform init0.12.24: Pulling from hashicorp/terraform
c9b1b535fdd9: Pulling fs layer
011000b168e5: Pulling fs layer
4c096b23c4a8: Pulling fs layer
c9b1b535fdd9: Verifying Checksum
c9b1b535fdd9: Download complete
011000b168e5: Verifying Checksum
011000b168e5: Download complete
c9b1b535fdd9: Pull complete
4c096b23c4a8: Verifying Checksum
4c096b23c4a8: Download complete
011000b168e5: Pull complete
4c096b23c4a8: Pull complete
Digest: sha256:53fb1c0a78c8bb91c4a855c1b352ea7928f6fa65f8080dc7a845e240dd2a9beeStatus: Downloaded newer image for hashicorp/terraform:0.12.24Initializing the backend...Successfully configured the backend "s3"! Terraform will automaticallyuse this backend unless the backend configuration changes.Initializing provider plugins...- Checking for available provider plugins...Registry service unreachable.This may indicate a network issue, or an issue with the requested Terraform Registry.Error verifying checksum for provider "template"The checksum for provider distribution from the Terraform Registrydid not match the source. This may mean that the distributed fileswere changed after this version was released to the Registry.Error: registry service is unreachable, check https://status.hashicorp.com/ for status updatesError: unable to verify checksum

After checking this issue is not related to hashicorp/terraform:0.12.24 image nor with any other docker image or any URL’s getting triggered on Docker build command as well.

Solution

After doing lot of research regarding this issue, I came to know that its some network related issue. Thanks to https://mlohr.com/docker-mtu/ for writing a blog on the issue and explained in detail.

The issue is with the network MTU’s. MTU stands for Maximum Transmission Unit is the largest protocol data unit that can be communicated in a single network layer transaction.

So, I have host which has eth0 and docker0 and MTU’s defined as 1500.

[root@c04pesbal472 kamblea]# ifconfig
datapath: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1376
        ether ee:3c:60:db:dd:a1  txqueuelen 1000  (Ethernet)
        RX packets 12930  bytes 1349718 (1.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:a8:a0:41:7e  txqueuelen 0  (Ethernet)
        RX packets 554487  bytes 91672085 (87.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 851473  bytes 1801995025 (1.6 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.81.21.120  netmask 255.255.252.0  broadcast 100.81.23.255
        ether 00:50:56:97:38:5b  txqueuelen 1000  (Ethernet)
        RX packets 67531210  bytes 18015889819 (16.7 GiB)
        RX errors 0  dropped 230695  overruns 0  frame 0
        TX packets 59070841  bytes 10659101883 (9.9 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.80.45.166  netmask 255.255.248.0  broadcast 100.80.47.255
        ether 00:50:56:97:e8:f7  txqueuelen 1000  (Ethernet)
        RX packets 4913461  bytes 384821835 (366.9 MiB)
        RX errors 0  dropped 347293  overruns 0  frame 0
        TX packets 2228  bytes 93704 (91.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 146992447  bytes 27673873348 (25.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 146992447  bytes 27673873348 (25.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

On top of this host we have a pod running, if we exec into the pod and see the network configuration.

[root@c04pesbal332 abhi]# docker exec -it ab2b259e54b4 /bin/sh
sh-4.3# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:53:b3:ef:42 brd ff:ff:ff:ff:ff:ff
    inet 192.168.255.129/25 brd 192.168.255.255 scope global docker0
       valid_lft forever preferred_lft forever
108: eth0@if109: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1376 qdisc noqueue state UP group default
    link/ether 9a:7d:7f:60:13:4c brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.39.0.3/12 brd 10.47.255.255 scope global eth0
       valid_lft forever preferred_lft forever
sh-4.3#

though the docker config is showing MTU of 1500, its using the bridge network created by docker(which is the default network)

sh-4.3# docker network inspect bridge
[
    {
        "Name": "bridge",
        "Id": "abaa249b95dec904d5dd71523a2202a6b4ab4ec54a28f0a1ec59678969e63917",
        "Created": "2020-12-10T11:56:33.411365327Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "192.168.255.129/25",
                    "Gateway": "192.168.255.129"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {},
        "Options": {
            "com.docker.network.bridge.default_bridge": "true",
            "com.docker.network.bridge.enable_icc": "true",
            "com.docker.network.bridge.enable_ip_masquerade": "true",
            "com.docker.network.bridge.host_binding_ipv4": "0.0.0.0",
            "com.docker.network.bridge.name": "docker0",
            "com.docker.network.driver.mtu": "1360"
        },
        "Labels": {}
    }
]
sh-4.3#

So, If we see the bridge configuration we have setup the MTU for docker as 1360. To do so, you have to add this to /etc/docker/daemon.json file.

{
  "mtu": 1360
}

So to understand,

On Host Machine
eth0    -- 1500 MTU
docker0 -- 1500 MTUOn Pod 
eth0    -- 1360 MTU
docker0 -- 1500 MTU

Here if you notice, both docker network has MTU of 1500 which is causing the issue. We need to keep it less than 1500 MTU of the host.

Once done it will solve the network issue.

Thanks for Reading.

Fix for Docker Inside Docker Networking issue on Kubernetes

Problem

Solution

Written by Abhijeet Kamble