Create and Restore Container Checkpoints with CRIU, buildah, Podman and Docker

CRIU (stands for Checkpoint and Restore in Userspace) is a utility that enables you to set a checkpoint on a running container or an individual application and store its state to disk. You can use data saved to restore the container after a reboot at the same point in time it was checkpointed. It is possible to perform operations like container live migration, snapshots, remote debugging etc.

CRIU is integrated by major container engines such as Docker, Podman, LXC/LXD, OpenVZ, etc for implementing associated functionality. It is also available in respective package repositories for linux distributions.

Installing and Setup CRIU

As we mentioned earlier that all major container engines already package criu as one of the dependencies. If you want to install criu separately, you can go through official instructions here.

Once criu is installed, you can check if its working by issuing criu --version or criu --help:

[cloud_user@3965c7b6ce1c privreg]$ criu --version
Version: 3.15
GitID: 8004f94

[cloud_user@3965c7b6ce1c privreg]$ criu --help

Usage:
  criu dump|pre-dump -t PID [<options>]
  criu restore [<options>]
  criu check [--feature FEAT]
  criu page-server
  criu service [<options>]
  criu dedup
  criu lazy-pages -D DIR [<options>]

Commands:
  dump           checkpoint a process/tree identified by pid
  pre-dump       pre-dump task(s) minimizing their frozen time
  restore        restore a process/tree
  check          checks whether the kernel support is up-to-date
  page-server    launch page server
  service        launch service
  dedup          remove duplicates in memory dump
  cpuinfo dump   writes cpu information into image file
  cpuinfo check  validates cpu information read from image file
...
...

Create a checkpoint for a container

For this, let’s create a simple shell script to print a incremental numberical integer at every 30 seconds and save it as test.sh:

# create script test.sh
[root@3965c7b6ce1c criu]# cat ./test.sh
#!/bin/sh
i=0
while :; do
    echo $i
    i=$(expr $i + 1)
    sleep 30 
done


# check if file exists
[root@3965c7b6ce1c criu]# ll
total 4
-rw-rw-r--. 1 root root 48 Mar 31 10:02 test.sh


# assign execute perms
[root@3965c7b6ce1c criu]# chmod +x ./test.sh


# verify new permission
[root@3965c7b6ce1c criu]# ll
total 4
-rwxrwxr-x. 1 root root 48 Mar 31 10:02 test.sh

If we now try to run our script test.sh, we can see that it is working properly:

[root@3965c7b6ce1c criu]# ./test.sh 
0
1
2
3
4
^C

We can now use buildah to create a container image containing this simple shell script and podman to run it:

# Create a Dockerfile for building container image
[root@3965c7b6ce1c criu]# cat Dockerfile 
# Based on the busybox image
FROM busybox:latest
MAINTAINER Mohit Goyal noreply@mohitgoyal.co

WORKDIR /app
COPY test.sh .
RUN chmod +x test.sh

ENTRYPOINT ["/app/test.sh"]

# build container image with buildah
[root@3965c7b6ce1c criu]# buildah bud -f Dockerfile -t localhost/buildah-criu:2 .
STEP 1: FROM busybox:latest
STEP 2: MAINTAINER Mohit Goyal noreply@mohitgoyal.co
STEP 3: WORKDIR /app
STEP 4: COPY test.sh .
STEP 5: RUN chmod +x test.sh
STEP 6: ENTRYPOINT ["/app/test.sh"]
STEP 7: COMMIT localhost/buildah-criu:2
Getting image source signatures
Copying blob 2983725f2649 skipped: already exists  
Copying blob f77de914c46e done  
Copying config 3f06780153 done  
Writing manifest to image destination
Storing signatures
--> 3f06780153f
3f06780153fc1a10f3da35feb2ed3ed535dbbddb9f827f49c0e137279ee99033


# verify that image is listed
[root@3965c7b6ce1c criu]# buildah images
REPOSITORY                   TAG      IMAGE ID       CREATED          SIZE
localhost/buildah-criu       2        3f06780153fc   47 seconds ago   1.46 MB
localhost/buildah-criu       1        efba86e5b580   4 hours ago      1.46 MB
localhost/buildah-httpd      1        52427f76aec0   47 hours ago     637 MB
docker.io/library/registry   latest   eefcac9e3856   5 days ago       26.8 MB
docker.io/library/busybox    latest   a9d583973f65   3 weeks ago      1.45 MB


# use podman to run image and verify execution
[root@3965c7b6ce1c criu]# podman run -d --name criu-test localhost/buildah-criu:2
6af0c737bdd55f4598552cbc099a30e1b61aeb0aa882a816350015bd9e3844da

[root@3965c7b6ce1c criu]# podman ps -a
CONTAINER ID  IMAGE                              COMMAND               CREATED        STATUS                    PORTS                   NAMES
6af0c737bdd5  localhost/buildah-criu:2                                 7 seconds ago  Up 7 seconds ago                                  criu-test
a8db2a70b8d3  localhost/buildah-criu:1                                 4 hours ago    Exited (137) 4 hours ago                          crazy_galileo
ca522b3892f9  docker.io/library/registry:latest  /etc/docker/regis...  7 hours ago    Exited (2) 4 hours ago    0.0.0.0:5000->5000/tcp  myregistry

# after few mins - 
[root@3965c7b6ce1c criu]# podman logs criu-test 
0
1
2
3
4
5
6

Note that saving a checkpoint requires you to run podman with root privileges. We can save it using podman container checkpoint:

[root@3965c7b6ce1c criu]# podman container checkpoint criu-test --export /tmp/chkpt.tar.gz
6af0c737bdd55f4598552cbc099a30e1b61aeb0aa882a816350015bd9e3844da

[root@3965c7b6ce1c criu]# ls /tmp
chkpt.tar.gz                                                             systemd-private-f195ea96aa814e068832691f1c5ec48a-ModemManager.service-xscqcr
systemd-private-f195ea96aa814e068832691f1c5ec48a-chronyd.service-b78Jul  systemd-private-f195ea96aa814e068832691f1c5ec48a-rtkit-daemon.service-HdkICy
systemd-private-f195ea96aa814e068832691f1c5ec48a-colord.service-57JMnM   tracker-extract-files.1001

Restore the container from checkpoint

Now that we have saved the container, let’s reboot the host OS and wait for few minutes to pass. For the purpose of this blog post, we’ll do restore on the same host. However, you can also use podman container checkpoint to save the image to a tar file, transfer it to different host and then restore it over there.

To restore a container from a checkpoint, we can use podman container restore command. For our purpose, we’ll create 3 separate containers from checkpoint:

[root@3965c7b6ce1c criu]# podman container restore --import /tmp/chkpt.tar.gz --name criu-restored-1
de385b63c7f11dc80744e8697711496f6cd2e2e42ede57e073b9fc71b57c8406
[root@3965c7b6ce1c criu]# podman container restore --import /tmp/chkpt.tar.gz --name criu-restored-2
f26825f90511222076cee6f1a1dde8ba9739781b64e6fba7e9f2bbb813e4753b
[root@3965c7b6ce1c criu]# podman container restore --import /tmp/chkpt.tar.gz --name criu-restored-3
4273f3c4c9bb6420cf47fac82f1e2d99b5182e6588f03725665060d012a8a226

[root@3965c7b6ce1c criu]# podman ps -a
CONTAINER ID  IMAGE                              COMMAND               CREATED             STATUS                     PORTS                   NAMES
4273f3c4c9bb  localhost/buildah-criu:2                                 About a minute ago  Up About a minute ago                              criu-restored-3
f26825f90511  localhost/buildah-criu:2                                 About a minute ago  Up About a minute ago                              criu-restored-2
de385b63c7f1  localhost/buildah-criu:2                                 About a minute ago  Up About a minute ago                              criu-restored-1
6af0c737bdd5  localhost/buildah-criu:2                                 20 minutes ago      Exited (0) 17 minutes ago                          criu-test
a8db2a70b8d3  localhost/buildah-criu:1                                 5 hours ago         Exited (137) 4 hours ago                           crazy_galileo
ca522b3892f9  docker.io/library/registry:latest  /etc/docker/regis...  8 hours ago         Exited (2) 5 hours ago     0.0.0.0:5000->5000/tcp  myregistry

Now if we check logs of each container, we can see that they did restarted from earlier value of counter:

[root@3965c7b6ce1c criu]# podman logs criu-restored-1
0
1
2
3
4
5
6
7
8
[root@3965c7b6ce1c criu]# podman logs criu-restored-2
0
1
2
3
4
5
6
7
8
[root@3965c7b6ce1c criu]# podman logs criu-restored-3
0
1
2
3
4
5
6
7
8

Since criu is at backend of major container engines, we can perform checkpoint and restore with other engines as well such as Docker, etc.

Working with Docker Engine

Docker will need you to enable experimental features, in order to use this functionality:

C:\>docker checkpoint --help
docker checkpoint is only supported on a Docker daemon with experimental features enabled

We can do so by using "experimental": true in the dockerd configuration file and restarting the same. If you are using Docker Desktop, go to settings and enable this functionality:

To create a checkpoint, we can use docker checkpoint command:

C:\>docker checkpoint --help

Usage:  docker checkpoint COMMAND

Manage checkpoints

Commands:
  create      Create a checkpoint from a running container
  ls          List checkpoints for a container
  rm          Remove a checkpoint

Run 'docker checkpoint COMMAND --help' for more information on a command.

To restore from checkpoint, we need to use docker start --checkpoint command:

C:\>docker start --checkpoint --help
"docker start" requires at least 1 argument.
See 'docker start --help'.

Usage:  docker start [OPTIONS] CONTAINER [CONTAINER...]

Start one or more stopped containers

Other steps are to create container and image are same, as they are not dependent on container engines.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s