I do not want to discourage anyone reading the post (as it is interesting), but going further basic understanding of containers and Docker technology is required. This post will not explain what containers are. It focuses on one aspect of the Docker containers - persistent storage. Contrary to popular believe, containers can have and may need persistent storage. Docker volumes and volume plugins are the technologies for it.
Docker volumes are used to persist data to the container's writable layer. In this case the file system of the docker host. Volume plugins extend the capabilities of Docker volumes across different hosts and across different environments: for example instead of writing container data to the host's filesystem, the container will write data to an AWS EBS volume, or Azure Blob storage or a vSphere VMFS.
Let's take a step down from abstract world. We have a dockerized application: a voting application. It uses a PostgreSQL database to keep the results of the votes. The PostgreSQL DB needs a place to keep its data. We want that place to be outside the Docker host storage and since we are running in a vSphere environment, we'll use vSphere Storage for Docker. Putting it all in a picture would look like this (for simplicity, only PostgreSQL container is represented):
We'll start with the Docker host (the VM running on top of vSphere). Docker Engine is installed on the VM and it runs containers and creates volumes. The DB runs in the container and needs some storage. Let's take a 2 step approach:
First, create the volume. Docker Engine using vSphere Storage for Docker plugin (vDVS Plugin and vDVS vib) creates a virtual disks (vmdk's) on the ESXi host's datastore and maps it back to the Docker volume. Now we have a permanent storage space that we can use.
Second step: the same Docker engine presents the volume to the container and mounts it as a file system mount point in the container.
This makes it possible for the DB running inside the container to write in the vmdk from the vSphere datastore (of course, without knowing it does so). Pretty cool.
The vmdk is automatically attached to the Docker host (the VM). More, when the vmdk is created from Docker command line, it can be given attributes that apply to any vmdk. This means it can be created as:
- independent persistent or dependent (very important since this affects the ability to snapshot the vmdk or not)
- thick (eager or lazy zeroed) or thin
- read only
It can also be assigned a VSAN policy. The vmdk will persist data for the container and across container lifetime. The container can be destroyed, but the vmdk will keep existing on the datastore.
Let's recap: we are using a Docker volume plugin to present vSphere datastore storage space to an application running within a Docker container. Or shorter, the PostgreSQL DB running within the container writes data to vmdk.
Going back to the question - can we backup the container? Since the container itself is actually the runtime instance of Docker image (a template), it does not contain any persistent data. The only data that we need is actually written in vmdk. In this case, the answer is yes. We can back it up in the same way we can backup any vSphere VM. We will actually backup the vmdk attached to the docker host itself.
Probably the hardest question to answer when talking about containers is what data to protect as the container itself is just a runtime instance. By design, containers are ephemeral and immutable. The writable space of the container can be either directly in memory (tmpfs) or on a docker volume. If we need data to persist across container lifecycles, we need to use volumes. The volumes can be implemented by a multitude of storage technologies and this complicates the backup process. Container images represent the template from which containers are launched. They are also persistent data that could be source for backup.
Steps for installing the vSphere Docker Volume Service and testing volumes
- Docker host already exists and it has access to Docker Hub
- download the VIB file (got mine from here)
- logon to ESXi host, transfer and install the vib
- restart hostd and check the module has been loaded
- logon to the Dokcer host and install the plugin
- create a volume and inspect it - by default it will create the vmdk as independent persistent which will not allow snapshots to be taken - add option (-o attach-as=persistent) for dependent vmdks
- go to vSphere client to the datastore where the Docker host is configured and check for a new folder dockvols and for the VMDK of the volume created earlier
- since the volumes are not used by any container, they are not attached to the Docker host VM. Create a container and attach it the dependent volume
Lastly, create a backup job with the Docker host as source, exclude other disks and run it.