HowTo:CS Launch GPU

Introduction

This guide gives an example of using GPU resources with the Endeavour container cluster. This guide is based off of Nvidia's example at: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/vae/setup and applied to our local teaching environment.

Dataset

We use the MovieLens 20m dataset. The VA-CF model was trained on the MovieLens 20M dataset. MovieLens 20M is a movie rating dataset. It includes 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. the goal of our model is to predict the rate of a new movie for a user considering the previous sets of (movie, rate) of the user. The model will be trained using a dataset of the movie and the rate for the movie. After that, the trained model predicts the rate of a new movie for a user.

Overview

The guide is broken up into the following steps. We assume that you already a project available on the Endeavour cluster, and are familiar with how the CS launch service works. See: HowTo:CS_Launch

Create your docker image
Create the container workload
Create an ingress for jupyterlab
Connect to jupyter and run the model

Create Docker Image

If you want to skip this step, I have a pre-built image for this example by using the image: container.cs.vt.edu/carnold/gpu:latest

Create Docker registry

You will need to host your docker image in a docker registry. A docker registry is available with our Gitlab instance.

Login to https://git.cs.vt.edu
Click on New project button
You can create a blank project, all we need is to use the container registry which gets created automatically. Make the project Public for ease of use.
From the menu on the left, select Deploy->Container Registry This will give you your image registry URL that you will need for both uploading and deploying.

Build Docker Image

The docker image will be based on the Nvidia VAE for TensorFlow: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/vae_for_tensorflow

SSH to rlogin.cs.vt.edu
Make a directory to hold the files: mkdir gpu
Download the image files: wget --content-disposition https://api.ngc.nvidia.com/v2/resources/nvidia/vae_for_tensorflow/versions/20.06.3/zip -O vae_for_tensorflow_20.06.3.zip
Unzip the files: unzip vae_for_tensorflow_20.06.3.zip
Download the dataset: wget http://files.grouplens.org/datasets/movielens/ml-20m.zip
Modify the Dockerfile: vim Dockerfile

ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3
FROM ${FROM_IMAGE_NAME}

ADD requirements.txt .
RUN pip install -r requirements.txt

WORKDIR /code
COPY . .

RUN mkdir -p /data/ml-20m/extracted; \
    cd /data/ml-20m/extracted; \
    unzip /code/ml-20m.zip

ENTRYPOINT ["jupyter", "notebook", "--ip", "0.0.0.0", "--port", "8888", "--allow-root"]

HowTo:CS Launch GPU

Contents

Introduction

Dataset

Overview

Create Docker Image

Create Docker registry

Build Docker Image

Navigation menu

HowTo:CS Launch GPU

Introduction

Dataset

Overview

Create Docker Image

Create Docker registry

Build Docker Image

Navigation menu

Search