HowTo:CS Launch GPU
This guide gives an example of using GPU resources with the Endeavour container cluster. This guide is based off of Nvidia's example at: and applied to our local teaching environment.
We use the MovieLens 20m dataset. The VA-CF model was trained on the MovieLens 20M dataset. MovieLens 20M is a movie rating dataset. It includes 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. the goal of our model is to predict the rate of a new movie for a user considering the previous sets of (movie, rate) of the user. The model will be trained using a dataset of the movie and the rate for the movie. After that, the trained model predicts the rate of a new movie for a user.
The guide is broken up into the following steps. We assume that you already a project available on the Endeavour cluster, and are familiar with how the CS launch service works. See: HowTo:CS_Launch
- Create your docker image
- Create the container workload
- Create an ingress for jupyterlab
- Connect to jupyter and run the model
Create Docker Image
If you want to skip this step, I have a pre-built image for this example by using the image:
Create Docker registry
You will need to host your docker image in a docker registry. A docker registry is available with our Gitlab instance.
- Login to
- Click on New project button
- You can create a blank project, all we need is to use the container registry which gets created automatically. Make the project Public for ease of use.
- From the menu on the left, select Deploy->Container Registry This will give you your image registry URL that you will need for both uploading and deploying.
Build Docker Image
The docker image will be based on the Nvidia VAE for TensorFlow:
- SSH to
- Make a directory to hold the files:
mkdir gpu
- Download the image files:
wget --content-disposition -O
- Unzip the files:
- Download the dataset:
- Modify the Dockerfile:
vim Dockerfile
ARG FROM ${FROM_IMAGE_NAME} ADD requirements.txt . RUN pip install -r requirements.txt WORKDIR /code COPY . . RUN mkdir -p /data/ml-20m/extracted; \ cd /data/ml-20m/extracted; \ unzip /code/ ENTRYPOINT ["jupyter", "notebook", "--ip", "", "--port", "8888", "--allow-root"]