Part 2: Building Container Images for Data Science

4 min readSep 21, 2020

In the previous post, we talked about the benefits of container images and why they provide a consistent runtime environment for your AI workload. In part 2 of this series, we will explore tools such as buildah and podman to build OCI compatible images.

The source code used in this post is available on my Git Hub repository.

Dockerless?

Docker Inc. contribution of docker resulted in the explosion of containers usage. Docker tool makes it very easy for us to build and deploy containers on any environment, such as on your laptop, within your data center or to the cloud. Today, you can even run containers on your Synology NAS.

However, are there alternatives to running containers if you choose not to use docker ? Or you don’t want to run docker as root and as a daemon?

There wasn’t any Open Standard for Containers until 2015 when the Open Container Initiative (OCI) was established.

The Open Container Initiative (OCI) is a lightweight, open governance structure (project), formed under the auspices of the Linux Foundation, for the express purpose of creating open industry standards around container formats and runtime. The OCI was launched on June 22nd 2015 by Docker, CoreOS and other leaders in the container industry

As such, today we have alternatives tools like buildah and podman.

Building the Base Image

I’m going to use Red Hat’s UBI image for Python ubi8/python3-8 to build the base PyTorch image using requirements.txt

Red Hat Universal Base Images (UBI) are OCI-compliant container base operating system images with complementary runtime languages and packages that are freely redistributable. Like previous base images, they are built from portions of Red Hat Enterprise Linux. UBI images can be obtained from the Red Hat container catalog, and be built and deployed anywhere.
With Red Hat Universal Base Image (UBI), you can now take advantage of the greater reliability, security, and performance of official Red Hat container images.

Using Source-to-Image, I’m able to quickly build the base image with the following structure:

$ find pytorch/
pytorch/
pytorch/.s2i
pytorch/.s2i/bin
pytorch/.s2i/bin/assemble
pytorch/requirements.txt

A custom assembling file is used to remove requirements.txt from the final image as it is not necessary to keep them in the final image.

The requirements.txt defines the PyTorch stable version of 1.6.0 .

future==0.18.2
numpy==1.19.2
Pillow==7.2.0
torch==1.6.0
torchvision==0.7.0

However s2i uses docker by default, therefore I will have to ask s2i to generate a Dockerfile for buildah to consume later.

$ s2i build pytorch registry.access.redhat.com/ubi8/python-38:1-34.1599745032 --as-dockerfile=/tmp/pytorch/Dockerfile
$ cd /tmp/pytorch/ && buildah bud -f . -t pytorch:l.6.0

During the build process, the s2i process will invoke pip to install the python dependencies, build the image and commit the image locally.

$ podman run -it --rm pytorch:1.6.0 pip freeze
future==0.18.2
numpy==1.19.2
Pillow==7.2.0
torch==1.6.0
torchvision==0.7.0

The image will now consists of the following:

UBI8
Python 3.8
PyTorch 1.6.0 and dependencies

This can then be the base image that you can further customize on.

Running the Training Code

Using podman, we can run the PyTorch image against our training code.

We need to use the SELinux flag :Z to tell podman to relabel the volume, otherwise we will get permission denied error.

$ podman run -it --rm -v /tmp/model:/tmp/model:Z -v ./mnist/train.py:/tmp/train.py:Z pytorch:1.6.0 python /tmp/train.py --save-model --model-path /tmp/modelDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ../data/MNIST/raw/train-images-idx3-ubyte.gz
<snip>
Train Epoch: 14 [59520/60000 (99%)] Loss: 0.015479Test set: Average loss: 0.0396, Accuracy: 9870/10000 (99%)Model saved to /tmp/model/mnist_cnn.pt

The trained model will be saved to /tmp/model on the host machine.

Building a Training Image

A training image can be built using the same s2i structure. I can further define my Python dependencies in requirement.txt that are required by my training script.

mlflow==1.11.0
mlflow[extras]==1.11.0
boto3==1.15.1
botocore==1.18.1
s3transfer==0.3.3

My training script app.sh:

#!/bin/bash
/opt/app-root/src/train.py --save-model --model-path /tmp/model

Now, I can build my training image using the pytorch:1.6.0 builder image that we created earlier.

$ s2i build mnist pytorch:l.6.0--as-dockerfile=/tmp/mnist/Dockerfile
$ cd /tmp/mnist/ && buildah bud -f . -t pytorch-mnist:latest

And I can just run my training code like this:

# Saving my model to /tmp/model
$ podman run -it --rm -v /tmp/model:/tmp/model:Z pytorch-mnist:latest

I can either run this on my laptop or collaborate with others in a consistent manner. I no longer have to worry whether there will be any setup or compatibility issues. The image can also be deployed onto a Kubernetes cluster for training.

In this post, we discussed how you can build a base image containing all the necessary Python dependencies and how an image can provide a consistent runtime environment, which may include Jupyter Notebook. In the subsequent post, we will begin to explore deploying such images onto a Kubernetes platform and how we can running training at scale.

Part 2: Building Container Images for Data Science

Dockerless?

Building the Base Image

Running the Training Code

Building a Training Image

Written by Liming Tsai