Altair_Blog_hero_1920x225

Featured Articles

Using Altair Grid Engine with Docker

It’s easier than you probably think

For HPC applications, containers are a great way to install software and ensure portability across clusters and clouds. Containers can encapsulate complete, pre-tested environments allowing users to mix and match different applications and versions without conflict. Software providers such as ESI OpenFoam, UberCloud, and others are increasingly packaging software in containers for ease of deployment.

Fortunately, native support for Docker in Altair Grid Engine makes running containerized applications a breeze. In this article, I’ll explain how to deploy, run, and manage containerized workloads on your Altair Grid Engine cluster and provide some insights into how Altair Grid Engine  manages containerized workloads behind the scenes.

Dockerizing your cluster

If you don’t already have Docker installed on your compute hosts, this is a good place to start. Adding Docker shouldn’t break existing applications but testing things first on a non-production host is always a good idea. Adding Docker is like adding a Java runtime. The Docker Engine provides runtime support for containerized workloads that need it.

As a word of caution, don’t assume you can necessarily install the latest version of Docker on your cluster hosts. Docker APIs change like the weather, so you’ll want to download a stable Docker version supported with your version of Altair Grid Engine.

In this example, I’m running Altair Grid Engine v8.5.4 on CentOS 7 on Amazon Web Services (AWS). I used the AWS marketplace as an easy way to install a Grid Engine cluster. Consulting the Altair Grid Engine release notes, Docker version 17.03 is the latest supported Docker for Altair Grid Engine 8.5.4, so I’ll be using the free Docker Community Edition package (docker-ce-17.03.0.ce-1.el7.centos.x86_64) on my cluster compute hosts.

Once you have the Docker repositories configured (I’ll cover this shortly), you can use the yum list command to show available Docker versions. The text returned in the second column is the version string for the Docker release that you’ll want to make a note of.

To show the available Docker 17.03 packages, I used this command:

$ sudo yum list docker-ce --showduplicates | sort -r | grep 17.03
docker-ce.x86_64            17.03.3.ce-1.el7                   docker-ce-stable
docker-ce.x86_64            17.03.2.ce-1.el7.centos            docker-ce-stable
docker-ce.x86_64            17.03.1.ce-1.el7.centos            docker-ce-stable
docker-ce.x86_64            17.03.0.ce-1.el7.centos            docker-ce-stable
docker-ce.x86_64            17.03.0.ce-1.el7.centos            @docker-ce-stable

Source: https://gist.githubusercontent.com/GJSissons/9066002066a43fbd659738fa2af9560b 

Depending on your OS you might need to use different commands. The Docker CE documentation has detailed instructions for other Linux versions including Debian, Fedora, and Ubuntu.

Since I need to install Docker on multiple hosts, It makes sense to build a script that installs Docker to save typing the same commands on each host. The following script runs as root and (for me at least) properly installs Docker on my CentOS 7 Altair Grid Engine compute hosts. 

#!/bin/bash # # Install docker on Grid Engine compute host # yum install -y yum-utils \ device-mapper-persistent-data \ lvm2 yum-config-manager \ --add-repo \ https://download.docker.com/linux/centos/docker-ce.repo yum install -y --setopt=obsoletes=0 \ docker-ce-17.03.0.ce-1.el7.centos \ docker-ce-selinux-17.03.0.ce-1.el7.centos

Source: https://gist.githubusercontent.com/GJSissons/7228230edefd13b49e283fd9e4ddd4d3

The script performs these steps:

  • It installs required packages including yum-utils which contains the yum-config-manager utility used in the second command.
  • It adds the stable docker repository to the yum environment so that yum can find the needed Docker packages. You’ll need to do this before the yum list command shown above will work.
  • The final command installs the required version of Docker (17.03). I got the long nasty name complete with the version string from the yum list command.

The install command probably needs some explanation. Knowledgeable readers (the only kind that read this blog) probably expected to see something like “yum install docker-ce-17.03.0.ce-1.el7.centos”. This was my first guess too.
Just to prove nothing is ever easy, I learned that installing older versions of Docker CE can be a little glitchy. A new “obsoletes” restriction was introduced in docker-ce 17.06.0, and for whatever reason, the yum repo applies the restriction to all versions of Docker. To avoid an error message (Package docker-ce-selinux is obsoleted by docker-ce ….) that prevented Docker from installing I needed to manually set obsoletes to false on the yum command line and download docker-ce and docker-ce-selinux together. The issue is explained in detail here.

You’ll need to watch for this detail. It’s always the little things that cause the biggest headaches!
Once Docker is installed you can start Docker and verify that it is working by running a few Docker commands and running the hello-world container from Docker Hub. It’s a good idea to use systemctl to enable Docker so that it will start automatically when the node boots. You’ll probably want to add these commands to your own installation script.

$ sudo systemctl start docker
$ sudo systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
d1725b59e92d: Pull complete
Digest: sha256:0add3ace90ecb4adbf7777e9aacf18357296e799f81cabc9fde470971e499788
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

Source: https://gist.github.com/GJSissons/ac3365cfb890d38e19ba32b1ee9d65fe 

One more detail – to allow regular users to run Docker commands, you’ll want to add each of the users on your cluster to the docker group. The command below adds the user bill to the docker group.

$ sudo usermod -aG docker bill

Source: https://gist.githubusercontent.com/GJSissons/e63b593274e32003f60925cff30fe5a9 

Configuring Grid Engine to use Docker

Now we’ve reached the cool part. If you’ve installed Docker correctly, you don’t need to do anything else. Grid Engine should already know about Docker and any Docker images installed on each host.

The command below executed from a Grid Engine node illustrates this:

$ qhost -F docker,docker_images
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-53-169        lx-amd64        2    1    1    2  0.01    7.6G  304.7M     0.0     0.0
    Host Resource(s):      hl:docker=1.000000
   hl:docker_images=hello-world:latest
ip-172-31-55-62         lx-amd64        2    1    1    2  0.01    7.6G  350.5M     0.0     0.0
    Host Resource(s):      hl:docker=1.000000
   hl:docker_images=centos:latest,hello-world:latest

Source: https://gist.githubusercontent.com/GJSissons/8e78ff93184926271590725803047285 

For people that don’t administer Grid Engine for a living, qhost is showing the compute hosts in our cluster. I have a master host, and two compute hosts. You can see the AWS host names. The -F switch shows the value of specific resources on each host.

Altair Grid Engine has two new default resources added in Altair Grid Engine v8.4.0 to help manage Docker workloads:

  • docker: a boolean resource that is auto-detected and has a value of 0 or 1 depending on whether Docker is installed on a host
  • docker_images: a comma-separated list (of type RESTRING) that lists the docker images available on the host

Assuming your Altair Grid Engine environment is recognizing that Docker is installed on each host and seeing available images, you’re done! You’ve Dockerized your cluster and can start submitting and managing containerized applications.

Running and managing containerized jobs

Altair Grid Engine makes it easy to run jobs inside or outside of containers. To illustrate how this works, I’ve created a simple script called testjob.sh. The script does a few simple things like determining whether it’s running in a container and reporting its hostname and IP address. I add a sleep command because I wanted to script to run long enough so that I could Docker commands against the running container. In case readers are wondering, checking for the presence of the hidden file .dockerenv is a useful trick to tell whether your script is running in a container.

#!/bin/sh
HOSTNAME=`hostname`
IP=`hostname -i`
if [ -f /.dockerenv ]; then
    echo "I'm living in a container!"
else
    echo "I'm living in the real world!";
fi
echo "My hostname is $HOSTNAME."
echo "My IP address is $IP."
echo "I'm going to sleep now. bye."
sleep 100

Source: https://gist.githubusercontent.com/GJSissons/c6d8b45efd64422f90bb8a0a18b69533 

I submit this script to Grid Engine as a normal, non-container job:

$ qsub testjob.sh
Your job 31 ("testjob.sh") has been submitted
$ qstat
job-ID     prior   name       user         state submit/start at     queue                          
-----------------------------------------------------------------------------------------
        31 0.55500 testjob.sh bill         r     11/14/2018 17:04:28 all.q@ip-172-31-55-... 

Source: https://gist.githubusercontent.com/GJSissons/9c6400d3d102dafd095d959ca1537707 

The script is assigned a job-ID (31) and gets dispatched to one of the compute hosts. The job output is logged in the user’s home directory, and we see the output of the script. As expected the job runs in the real world (as opposed to in a Docker container) on one of our AWS machine instances.

$ cat testjob.sh.o31
I'm living in the real world!
My hostname is ip-172-31-55-62.ec2.internal.
My IP address is 172.31.55.62.
I'm going to sleep now. bye.

Source: https://gist.githubusercontent.com/GJSissons/bf46ac3ca6a9537501f7d862e1606afb 

To run the job in a container, the process is almost identical. I just need to tell Altair Grid Engine that we want to use a Docker container and specify the Docker image to use. To do this, I use the -l switch (lowercase L) on the command line to request two resources: docker and docker_images. This will select hosts with the docker resource set to true and hosts where the list of available images contains our desired Docker image (centos:latest). We use wildcards to match the image name against the longer comma-delimited list of images available on each host. If the image is not available on a host, Altair Grid Engine can pull the image for you automatically, but for performance reasons, it is preferable to run on a host that already has the image stored locally.

From the Grid Engine user’s perspective, everything works the same way. Users can delete or manipulate container jobs like any other job. The containerized job shows up as Altair Grid Engine job 32 and runs in a container on one of our AWS hosts.

[root@ip-172-31-53-71 ~]# qsub -l docker,docker_images="*centos:latest*" testjob.sh
Your job 32 ("testjob.sh") has been submitted
[root@ip-172-31-53-71 ~]# qstat
job-ID  prior   name       user    state submit/start at     queue                   
--------------------------------------------------------------------------------------------
     32 0.55500 testjob.sh root    r     11/14/2018 17:15:52 [email protected]..

Source: https://gist.githubusercontent.com/GJSissons/cdbf994467507145cbebebb3afe05cd8 

If I monitor Docker on the execution host, I see that a Docker container has been started based on the image centos image. As a Grid Engine user, this is handled transparently for me, but it’s nice to know what’s going on.

$  docker ps
CONTAINER ID  IMAGE         COMMAND                CREATED         STATUS          NAMES
4539e0b94529  centos:latest "/uge_mnt/opt/uge-..." 11 seconds ago  Up 10 seconds  UGE_job_32.1

Source: https://gist.githubusercontent.com/GJSissons/477d103f97e52b5749796b9f86c86f79 

After the job completes, I see from the job’s output file that the job ran in the container shown in the docker command line (4539e0b94529).

$ cat testjob.sh.o32
I'm living in a container!
My hostname is 4539e0b94529.
My IP address is 172.17.0.2.
I'm going to sleep now. bye.

Source: https://gist.githubusercontent.com/GJSissons/470690f77c4c0e3b8094b06e2b6eb81a 

Using soft resource requests to pull Docker images on the fly

In the example above, I knew that one of the compute hosts already had the required Docker image (centos:latest). Often, a needed image won’t be present on any cluster hosts. Altair Grid Engine can automatically download required images, but to do this, we need to use a soft resource request. A soft resource request indicates to Altair Grid Engine that the image is “nice to have”, but not necessary to schedule the job on a host. In the example, below we specify a different Docker image (ubuntu:14.04) that we know is not available on either cluster host, and make its presence a soft request instead of a hard request.

$ qsub -l docker -soft -l docker_images=“*ubuntu:14.04*” testjob.sh

Source: https://gist.githubusercontent.com/GJSissons/31f05d81730fd5f15722131cb20c918d 

Altair Grid Engine attempts to find a host with the needed ubuntu image, but when none are available it schedules the job to a host fulfilling the hard resource requirement (docker), and Altair Grid Engine automatically triggers the docker daemon to download the needed image and start the container. Re-running the qhost command shows that our first compute host now has the needed image and the job runs as before.

$ qhost -F docker,docker_images
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
ip-172-31-53-169        lx-amd64        2    1    1    2  0.01    7.6G  304.7M     0.0     0.0
    Host Resource(s):      hl:docker=1.000000
   hl:docker_images=hello-world:latest,ubuntu:14.04
ip-172-31-55-62         lx-amd64        2    1    1    2  0.01    7.6G  350.5M     0.0     0.0
    Host Resource(s):      hl:docker=1.000000
   hl:docker_images=centos:latest,hello-world:latest

Source: https://gist.githubusercontent.com/GJSissons/907d0e6b4f875265d1ab9cea7c73991e 

This is an important feature because it means that users can guarantee that their containerized jobs will run even when a required Docker image is not available on compute hosts.

Understanding what’s happening behind the scenes

To accomplish all this, Grid Engine did some clever things behind the scenes. First, because this was not a binary job, Grid Engine had to transfer the script from the submission host to the execution host. From there, the script was copied into a spool directory.

For the container to be able to see the script, the spool directory needs to be bound (a Docker term) to the container. The files in $SGE_ROOT are also bound to the container, and Altair Grid Engine automatically detects any other directories that may be required and binds them under the subdirectory /uge_mnt inside the container. Other directories are bound to the directory including the user’s home directory so that job output can be written where the user expects it along with directories passed via the -o or -e switches on the qsub command line.

The dockerinspect command gives us visibility to what happens when the job runs. I wanted to see details about the job, so I saved the output from docker inspect to a file as the job’s container was running as shown:

$ docker inspect 4539e0b94529 > /tmp/inspect_output

Source: https://gist.githubusercontent.com/GJSissons/d50cf9ff6d30ebb71c37ca25492509f4 

There is too much detail in the docker inspect command to provide the full output, so I’ve abbreviated it to show a few items of interest.

First, note that when the Docker job runs, the entry point is the sge_container_shepherd program that essentially “shepherds” the job along as it runs inside the container. This is one of the reasons that the Grid Engine binaries need to be available inside the container bound under /uge_mnt. Other bindings are shown as well mapping /var, /opt, and /home/bill (our job ran as bill) so that these need to be accessible from with the container.

 {
        "Id": "4539e0b94529cf60d9178a545ca81506ca4da5e3bd0adc055029bd07c9609161",
        "Created": "2018-11-14T16:48:02.097747583Z",
        "Path": "/uge_mnt/opt/uge-8.5.4/bin/lx-amd64/sge_container_shepherd",
        "Args": [
            "-cm"
        ],
..
        "Name": "/UGE_job_32.1",
..
        "HostConfig": {
            "Binds": [
                "/var:/uge_mnt/var",
                "/opt:/uge_mnt/opt",
                "/root:/uge_mnt/home/bill"
            ],
..
      "Config": {
            "Hostname": "4539e0b94529",
            "Image": "centos:latest",
            "Volumes": null,
            "WorkingDir": "/uge_mnt/var/sge/default/unicloud/spool/ip-172-31-55-62/active_jobs/32.1",
            "Entrypoint": [
                "/uge_mnt/opt/uge-8.5.4/bin/lx-amd64/sge_container_shepherd"
            ],
	..
            "Labels": {
                "com.univa.gridengine.cell": "default",
                "com.univa.gridengine.job_number": "32",
                "com.univa.gridengine.root": "/opt/uge-8.5.4",
	..
            }
        },
	..
   }

Source: https://gist.githubusercontent.com/GJSissons/ce3bbb543d105c44ca97b4023f40c7f3 

The working directory is set to the spool directory for the job on the host and other information of relevance to the Grid Engine job is stored in Docker labels.

From a Linux administrator’s perspective, understanding the process tree on the compute host is also instructive. The output of pstree (or ps auxf) is too verbose to show fully, but a stripped-down version of the process hierarchy is shown below.

Normally, when a Grid Engine user submits a non-containerized job, the process hierarchy on the execution host looks something like this:

sge       /opt/uge-8.5.4/bin/lx-amd64/sge_execd
sge        \_ sge_shepherd-35 -bg
bill        |    \_ /bin/sh /var/sge/default/unicloud/spool/ip-172-31-55-62/job_scripts/35
bill        |       \_ sleep 100

Source: https://gist.githubusercontent.com/GJSissons/3b540994648f4cbcee0f4114b70c5cbc

The Grid Engine jobs are children of the sge_execd process on the execution host and execution is managed by a sge_shepherd process. The actual workloads run under the user ID of the user that submitted the job.

When the same job is run as a container job, the processes that comprise the job are children of the Docker daemon. In this view we see the sge_container_shepherd process running inside the container is the parent of the actual job.

root /usr/bin/dockerd
root \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock (abbreviated)
root     \_ docker-containerd-shim ad./var/run/docker/libcontainerd/ad683e376769691740b29e2
bill         \_ /uge_mnt/opt/uge-8.5.4/bin/lx-amd64/sge_container_shepherd -cm
bill             \_ /bin/sh /uge_mnt/var/sge/default/unicloud/spool/ip-172-31-55-62/job_scripts/33
bill                  \_ sleep 100

Source: https://gist.githubusercontent.com/GJSissons/908425f44c15d5a6415d383e364bdba7 

Other useful tips and cool stuff

Often a Altair Grid Engine job will want to manipulate data in a specific directory, for example an NFS share accessible from all compute hosts. Directories can be bound manually into the container using Docker’s HOST-DIR:CONTAINER-DIR format using the -xd switch to pass docker options.

On the compute node, a directory called /nfs_share might contain shared data. In this case we can bind the directory /data in the docker container to the shared /nfs_share file system visible to the docker host. The path passed on the Altair Grid Engine command line needs to refer to the path visible within the container.

root /usr/bin/dockerd
root \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock (abbreviated)
root     \_ docker-containerd-shim ad./var/run/docker/libcontainerd/ad683e376769691740b29e2
bill         \_ /uge_mnt/opt/uge-8.5.4/bin/lx-amd64/sge_container_shepherd -cm
bill             \_ /bin/sh /uge_mnt/var/sge/default/unicloud/spool/ip-172-31-55-62/job_scripts/33
bill                  \_ sleep 100

Source: https://gist.githubusercontent.com/GJSissons/908425f44c15d5a6415d383e364bdba7 

The examples so far have dealt with scripts as opposed to binaries. Binaries are easier in some ways because it is assumed that the command invoked already resides within the container. It’s a good idea when starting a binary in a Docker container to specify the shell that should be used to start the binary. Otherwise the shell may default to csh, often not present in Docker containers. An example is shown below:

$ qsub -l docker,docker_images="*ubuntu:14.04*" -b y \
 -S /bin/sh hostname

Source: https://gist.githubusercontent.com/GJSissons/e1a654ce78dafa3dbacdcd2eaae73b99  

There are many more features to the Altair Grid Engine Docker integration including support for array jobs, MPI parallel jobs, and access to GPU devices. Also, Grid Engine can be used to launch and manage containers that package long-running services where the entry point is build into the container image. We’ll cover some of these other topics in a follow-on article.

At Altair we’ve been amassing a lot of experience running containers in production on Grid Engine clusters. If you have any comments or questions about this article, I’d love to get your feedback.