HowTo:CS Launch GPU

From Computer Science Wiki
Revision as of 08:18, 11 July 2024 by Carnold (talk | contribs) (Created page with "== Introduction == This guide gives an example of using GPU resources with the ''Endeavour'' container cluster. This guide is based off of Nvidia's example at: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/vae/setup and applied to our local teaching environment. == Dataset == We use the MovieLens 20m dataset. The VA-CF model was trained on the MovieLens 20M dataset. MovieLens 20M is a movie rating dataset. It includes 20 million ratings and 465,000 tag applicati...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

This guide gives an example of using GPU resources with the Endeavour container cluster. This guide is based off of Nvidia's example at: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/vae/setup and applied to our local teaching environment.

Dataset

We use the MovieLens 20m dataset. The VA-CF model was trained on the MovieLens 20M dataset. MovieLens 20M is a movie rating dataset. It includes 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. the goal of our model is to predict the rate of a new movie for a user considering the previous sets of (movie, rate) of the user. The model will be trained using a dataset of the movie and the rate for the movie. After that, the trained model predicts the rate of a new movie for a user.

Overview

The guide is broken up into the following steps. We assume that you already a project available on the Endeavour cluster, and are familiar with how the CS launch service works. See: HowTo:CS_Launch

  • Create your docker image
  • Create the container workload
  • Create an ingress for jupyterlab
  • Connect to jupyter and run the model

Create Docker Image

If you want to skip this step, I have a pre-built image for this example by using the image: container.cs.vt.edu/carnold/gpu:latest

Create Docker registry

You will need to host your docker image in a docker registry. A docker registry is available with our Gitlab instance.

  • Login to https://git.cs.vt.edu
  • Click on New project button
  • You can create a blank project, all we need is to use the container registry which gets created automatically. Make the project Public for ease of use.
  • From the menu on the left, select Deploy->Container Registry This will give you your image registry URL that you will need for both uploading and deploying.

Build Docker Image

The docker image will be based on the Nvidia VAE for TensorFlow: https://catalog.ngc.nvidia.com/orgs/nvidia/resources/vae_for_tensorflow

ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:20.06-tf1-py3
FROM ${FROM_IMAGE_NAME}

ADD requirements.txt .
RUN pip install -r requirements.txt

WORKDIR /code
COPY . .

RUN mkdir -p /data/ml-20m/extracted; \
    cd /data/ml-20m/extracted; \
    unzip /code/ml-20m.zip

ENTRYPOINT ["jupyter", "notebook", "--ip", "0.0.0.0", "--port", "8888", "--allow-root"]