Understand Linux namespace by creating a Docker-like engine

This blog post aims to highlight the core features of Docker by guiding readers through the process of creating a new container management system.

Understand Linux namespace by creating a Docker-like engine

TLDR: In the StackOverflow 2022 survey, Docker has emerged as a vital tool for software developers, and has become so ubiquitous that people often question why it's not being used rather than why it is. While a container is not a virtual machine, there can be confusion between a container and a process. This blog post aims to highlight the core features of Docker by guiding readers through the process of creating a new container management system. By doing so, we can distinguish between a container and a process, and gain insight into the inner workings of Docker.

Mocker

What is the rationale behind using containers?

Storage problem

Suppose we need to configure two instances of MySQL 5.5 and 8.0 on an Ubuntu server that does not have Docker installed. However, the problem is that both versions of MySQL require different versions of libraries that need to be installed at the same path "/root/sql-dependencies," and Ubuntu only supports installing one version of the library at a time. This leads to the first problem of how to isolate storage.

To solve this, we first need to download the two versions of libraries to the paths "/root/sql-dependencies-v1" and "/root/sql-dependencies-v2." Next, we need to make the MySQL 5.5 instance believe that the path "/root/sql-dependencies" is actually the path "/root/sql-dependencies-v1," and similarly for the lib dependencies for MySQL 8.0. To achieve this, we need to use some advanced Linux techniques, which are defined here. In this case, we can use the Linux mount namespace feature.

Mount namespace allows us to create a simulated, isolated storage space that other processes can use. The goal is to create two storage spaces for the two MySQL instances and map the dependency path for each space to the correct path on the Ubuntu server. The first step in this process is to create a new mount namespace.

Create a new mount namespace

$ unshare --mount mysqld-5.0.0 start

"unshare --mount" creates a new mount space for the process "mysqld-5.0.0"

Mount the path in the mount namespace to the host path

$ mount --bind /root/sql-dependencies /root/sql-dependencies-v1

Do the same with MySQL 8.0 we create isolated storage space so that both SQL instances can work correctly in the same Ubuntu server.

Network problem

Once we have resolved the library conflict by utilizing the mount namespace feature, the development team informs us that their SDK only supports accessing MySQL instances using the default port 3306. However, since we are running two different MySQL instances, we can't start both of them on the same port. This leads to the second problem of how to isolate the network.

By default, an Ubuntu server has a single network namespace. Whenever we start applications, the processes are deployed to the default network namespace. However, it is not possible for a single network namespace to start two processes that are bound to the same port (and IP)

Our goal is to run two separate MySQL instances on the same port 3306, but with different IP addresses.

To accomplish this, we will create two network namespaces, each of which will host one of the MySQL instances.

$ unshare --net mysqld-5.0.0 start
$ unshare --net mysqld-8.0.0 start

To allow the MySQL instances running in the created network namespaces to be accessed from the host, we need to use some advanced techniques to map the networks in the namespaces to the default namespace. However, the details of these techniques are outside of the scope of this blog, so we will not cover them here.

At this point, we have successfully set up the two MySQL instances in separate network namespaces and resolved the port conflict. We can now deploy our developer applications to the physical host, which will be able to access both MySQL instances.

The architecture of a MySQL instance utilizing network and mount namespaces.

Another scenario to consider is when a process is using the same network namespace but a different mount namespace (this makes things pretty flexible).

Created your first release

By using only the necessary features of the Linux kernel and some additional libraries, it is possible to create a new container management system, similar to Mocker. When we create a new container using network and process namespaces, the resulting output is as follows.

# Install on Centos
# Tested with AWS Linux `ami-0f2eac25772cd4e36`
curl https://raw.githubusercontent.com/dinhanhhuy/mocker-k1s/main/mocker/install.sh | bash

# Basic usage
# Demo mocker file
$ cat golang1.Mockerfile
RUN wget https://github.com/dinhanhhuy/go-backend/releases/download/1.0.0/go-backend-linux-amd64
RUN chmod +x go-backend-linux-amd64
RUN echo 'this is go-backend 1000' > /root/.mocker/layer/builder_space/index.html
RUN ls -la
RUN ls -la

# Build image from Mockerfile
$ mocker build img1 golang1.Mockerfile
...
932ace9337a30e1f58841718c2624870
[INFO]: buding RUN chmod +x go-backend-linux-amd64
[INFO]: buding RUN echo 'this is go-backend 1000' > /root/.mocker/layer/builder_space/index.html
[INFO]: buding RUN ls -la
[INFO]: buding RUN ls -la
[INFO]: buding CMD ID=go-backend-1 WORK_DIR=/root/workspace /root/workspace/go-backend-linux-amd64
/root/.mocker/layer
|-- 2d15f0c52caa33fd679b2fdc8f14f642
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- 90cb403239851d091015e1e2d98f489b
|   |-- entry.sh
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- 932ace9337a30e1f58841718c2624870
|   `-- go-backend-linux-amd64
|-- ba8675a9a669696c12076f6c0b879a7d
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- ddf58290b0367f3f695074b6ecde8985
|   `-- go-backend-linux-amd64
|-- f4d1ea5912242d002cc06053b6220342
|   |-- go-backend-linux-amd64
|   `-- index.html
`-- img1 -> 90cb403239851d091015e1e2d98f489b
 __________________________
< Build image img1 success >
 --------------------------
        \   ^__^
         \  (><)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
# List all images
$ mocker images
/root/.mocker/layer
|-- 2d15f0c52caa33fd679b2fdc8f14f642
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- 90cb403239851d091015e1e2d98f489b
|   |-- entry.sh
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- 932ace9337a30e1f58841718c2624870
|   `-- go-backend-linux-amd64
|-- ba8675a9a669696c12076f6c0b879a7d
|   |-- go-backend-linux-amd64
|   `-- index.html
|-- ddf58290b0367f3f695074b6ecde8985
|   `-- go-backend-linux-amd64
|-- f4d1ea5912242d002cc06053b6220342
|   |-- go-backend-linux-amd64
|   `-- index.html
`-- img1 -&gt; 90cb403239851d091015e1e2d98f489b

# Run new container with isolate network
$ mocker run backend img1
ip backend, ip 10.0.0.5, pid 3832
 ___________________________________
< Success create backend, ip: 10.0.0.5 >
 -----------------------------------
        \   ^__^
         \  (oO)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

# List container
$ mocker ps
backend 10.0.0.5

# Test connection
container backend running at 10.0.0.5:10000
$ curl 10.0.0.5:10000
this is go-backend 1000

# Verify the host dont listen on 10000
$ netstat -lnpt | grep 10000
# exit 1
# Execute to pod and check for network

$ mocker exec backend
______________
< exec backend >
 -------------
        \   ^__^
         \  (..)\_______
            (__)\       )\/\
                ||----w |
                ||     ||
[root@ip-172-31-20-239 /]# netstat -lnpt | grep 10000
tcp6       0      0 :::10000                :::*                    LISTEN      5/go-backend-linux-

Demo

To learn more about how to construct each feature such as "docker run", "docker build," and so on, please refer to the source code.

Compare to Docker

We aim to explore a Docker container and demonstrate that it is utilizing the same Linux namespaces as the ones we discussed earlier. To make things more interesting, we will SSH into a worker node of the Kubernetes cluster and attempt to inspect a random container. We will select a front-end website as our target and see how much we can delve into it.

$ docker ps
c84ad4878642   .../web-frontend-result-page
...

We attempt to use the "netstat" command line interface to list the listening ports of the selected container and determine which ports are in use by the container.

$ docker exec -it c84ad4878642 netstat -lnpt
OCI runtime exec failed: ... "netstat": executable file not found in $PATH: unknown

We encountered an error when trying to list the listening ports of the container using the "netstat" command, as the command was not available in the container image. However, we remembered that a container is simply a process running in a Linux namespace, so we downloaded the "netstat" CLI to the host and started the process "netstat -lnpt" in the container's network namespace to gather more information.

To start, we need to find the actual PID of the container.

$ docker inspect c84ad4878642
[
  {
    "State": {
      "Pid": 52507,
    ...

We used the PID to identify the Linux namespace to which the process was attached.

# listed all Linux namespace attach by process 52507
$ lsns | grep 52507
4026539260 mnt         4 52507 root            /bin/sh -c ./env.sh
4026539261 uts         4 52507 root            /bin/sh -c ./env.sh
4026539262 pid         4 52507 root            /bin/sh -c ./env.sh

Unfortunately, the network namespace for the process is not listed here. However, we can perform another technique to uncover the network namespace of the process.

# Voilà, we found the network namespace
$ ip netns ls
c84ad4878642 (id: 77)

Using the previously discovered network namespace of the container, we can access it and debug it by running the "netstat" CLI we downloaded earlier.

$ ip netns exec c84ad4878642 netstat -lnpt

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      52583/nginx: master

The process we went through helped us gain a better understanding of how Docker works internally. We were able to inspect the container's network namespace by injecting the "netstat -lnpt" process and using a CLI that wasn't present in the container's mount namespace.

Summary

The Linux namespace is a built-in feature of the Linux kernel that existed prior to the creation of Docker and containerd. As an example, let's look at the Linux namespace of a physical host that does not have Docker installed.

$ lsns
        NS TYPE   NPROCS   PID USER            COMMAND
4026531835 cgroup   1112     1 root            /sbin/init splash
4026531836 pid       652     1 root            /sbin/init splash
4026531837 user     1112     1 root            /sbin/init splash
4026531838 uts       705     1 root            /sbin/init splash
4026531839 ipc       639     1 root            /sbin/init splash
4026531840 mnt       634     1 root            /sbin/init splash
...


Containers are constructed using Linux namespaces, which means that a Docker container, a Linux container, and a process are essentially identical. Docker is built on top of the Linux kernel to provide a CLI and other interfaces and standards that make it easier for users to use. With just the features of the Linux kernel and some additional libraries, we can create a complete container management system, just like Mocker did.

Note

1 network namespace can't start 2 processes binding to the same port (and IP)

We can bind multiple IPs to a Linux host. The process can bind to the same port as long as it binds to a different IP on the host (Ex: 10.0.0.1:3306 and 10.0.0.2:3306).

A process using the same network namespace but a different mount namespace (this makes things pretty flexible).

This architecture also applies to many solutions like Docker-compose, and Kubernetes pod (with multiple containers), side car models... that share the network namespace but use different mount namespaces.

Containers are building base on Linux namespace (most likely)

Docker mostly runs on Linux servers, but there is also an engine that can execute programs of Windows (x86-64).

References