Section 3 (Week 3)

A walkthrough of first steps for implementing a project - finding a codebase and running a baseline.

Introduction

In this section, we’re going to try out some first steps for implementing a project: finding a good codebase, as well as running a baseline model. A baseline model is the first, simple model used to start a project. This simple model serves as a starting point in terms of accuracy, and provides a comparison with future, more complex models.

In general, if there are existing implementations of your task, try to reproduce results by running their code. If there is no implementation of your task, you could find a solved proxy task, and plug in your own dataset. For instance, if you want to build a fish breed classifier, you can find open source code for a cat breed classifier, and plug in your fish breed dataset with the right labels.

If you need a refresher on the terminal or git, take a look at this tutorial.

Quick Guides

If you want to follow along with the walkthrough we’ll perform in section, check out these instruction docs. You’ll have to sign in with your stanford email to be able to access them.

Case Study: Style Transfer

Le musée du Louvre in Impressionist style from Ng, Katanforoosh & Bensouda. (source)

Let’s say that you’re working on a project called “Improving Neural Style Transfer”.

Finding a Good GitHub Repository

A great starting point for implementing your project idea is finding a high-quality GitHub repository. Here’s a general guide:

(1) How many stars does the repository have? In general, the number of stars is a good indicator of the quality of the content and number of users.

(2) What is the license for the repository? Repositories with MIT, Apache, GNU GPL, and BSD licenses are open source. This usually means that you can use the code as you would like with credits to the author(s) (in some cases, you may need to use the same license in your own codebase.).

(3) Is the README clear and detailed? Are results reported? Is it possible to quickly set up the repository locally? Good repositories provide detailed instructions in the README to install and run the code, and provide numerical and qualitative results. This is really important: a repository with vague or outdated setup instructions can be extremely hard to use.

(4) Does the repository have links to download compatible datasets, and ideally, pretrained model weights? Connecting a dataset to a codebase can be difficult. Some repositories have instructions for how to download and use a compatible dataset. In addition, many repositories provide pretrained model weights (the parameter values for a network after it has been trained on a standard dataset), so that you don’t need to replicate that work.

(5) Is the code written in your preferred deep learning framework? (Tensorflow, Pytorch, Keras, etc.)

(6) How is the code’s quality and readability? Check that the code is commented and the repository folders are well-structured.

(7) Does the repository provide references? You can usually find references at the bottom of the README. These provide context for the implementation.

Hands On

  • The GitHub repository neural-style seems to satisfy our criteria, let’s focus on it. Take 5 minutes to read and run the GitHub repository on its inherent data. You should see the message “Iteration …/1000”. What challenges did you encounter?

  • It seems that a few minutes is not enough to reproduce good results locally. What can you tweak to overcome this challenge?

  • Run the codebase on your own chosen images for content and style. Use 100 iterations.

This section was designed to help you get started with the existing code bases related to your project. Note that we only tested the model here, and did not train it. For your project, find a GitHub repository that is relevant to your project and try to reproduce the results.

Tips for Building Your Own Baseline Model

If your task hasn’t been tackled before, you can start by building a simple algorithm that solves the problem without aiming for high accuracy.

You will often notice that your newly created neural network struggles and isn’t very accurate. A good first step is to try to overfit on a very small number of examples, and then gradually add more and more. By focusing on a few examples, you can run many variations of your architecture and hyperparameters in a short amount of time. When your model starts overfitting on a good amount of training examples, you can then analyze the its generalization capabilities on unseen data. In C2M1 and C2M2, you will learn about the techniques used to help your model generalize to the real world.

Cloud Computing - Setting Up an AWS Instance

Training a deep learning model often requires a lot of computational power - that’s why we use specialized hardware, such as GPUs or TPUs. These processors can speed up training by many orders of magnitude.

Cloud computing services such as Amazon Web Services (AWS), Google Cloud, and Microsoft Azure allow us to access powerful computer instances on-demand: we can have just the right amount of power, right when we need it! AI practitioners should know how to work with these remote computers in order to access the right hardware.

Next, we’ll walk through how to set up your own AWS instance.

Setting up an AWS Instance

There are different types of AWS instances. We will use the p2.xlarge for accelerated computing, which contains one NVIDIA K80 GPU.

  • Create an account here.
  • Sign in into your account here.
  • In the top right corner of the home page, click on the location name and set it to US West (Oregon). This AWS region has instances with GPUs, and is close by to most of us.
  • After selecting the region, click on EC2 under the Compute list.
  • In order to create a GPU instance, we’ll need to request a limit increase here. Choose Region as US West (Oregon), Instance Type “All P instances”, and New limit value as 4. For use case, you can write something like “Training neural networks for the Stanford deep learning class”. AWS will contact you when your increase is approved: then, continue with the following steps. If you don’t want to wait, you can use a t2.xlarge instance, which is much slower because it doesn’t have GPUs - make sure though that your account has the $500 credits loaded so you are not billed!
  • On the EC2 Dashboard view, click on the “Launch Instance” button.
  • Search for and select the Deep Learning AMI (Ubuntu 16.04) Version 26.0. This AMI (Amazon Machine Image) comes with pre-installed deep learning frameworks such as TensorFlow, PyTorch or Keras.
  • In the next page, select the p2.xlarge instance. Then, click on “Review and Launch”.
  • Then, click on the blue button “Launch”.
  • A pop-up window will appear asking for a key pair. You can either provide one or create one. If you create one, you should download it and keep it somewhere it won’t be deleted (if that happens, you won’t be able to access your instance anymore!).
  • If you downloaded the key file, change its permissions in the terminal to user-only read and write. In Linux, this could be done with chmod 400 PEM_FILENAME where PEM_FILENAME is the file with the key.
  • After this, click on the blue button “Launch Instances”.
  • Click on “View Instances” to check that it is “Running” and passed “2/2 status checks”. It will take some time to pass the checks but after that, you will be ready to ssh into the instance. Finally, note down the Public IP of the instance launched (it will be required in the next step).
  • SSH in your instance with ssh -i PEM_FILENAME ubuntu@PUBLIC_IP.
  • Your machine comes with many Conda environments pre-installed: each one is a Python environment with deep learning libraries already installed. Look at the README for how to use them. For this section, we can use a Tensorflow environment (source activate tensorflow_p36).
  • VERY IMPORTANT! When you’re done using your instance, be sure to turn it off using the web interface! Otherwise your AWS account will be charged $0.90/hr.

Now, let’s use this AWS instance to run our neural style transfer code. It now runs extremely quickly with 1000 iterations!

Additional AWS Info

  • We’ll be distributing AWS credits to all project groups. Stay tuned for announcements on Piazza.
  • AWS bills instances by the minute, so make sure to turn off your machine (and save your data, if needed!) when you’re done using it.
  • For those training large networks, the p3.2xlarge can be used, with one NVIDIA V100 GPU. This instance can train networks much faster, but is also 3x more expensive.

Colab Walkthrough

Let’s say that for your final project you want to work on Neural Style Transfer. As a starting point, you find a repository on Github that can serve as a starting point for your group. If you want to run this repository on your computer, you can do so using the following steps:

  • Step 1: Installing git
  • Step 2: Using git to download the package
  • Step 3: Installing python3
  • Step 4: Installing pip3
  • Step 5: Installing virtualenv
  • Step 6: Creating a new virtual environment
  • Step 7: Installing the project’s package dependencies
  • Step 8: Downloading the project’s dataset / pretrained models
  • Step 9: Running the model

As you might notice, there are quite a few steps involved with testing something, and it can be difficult to setup before running the project. Further, if you run the project on your computer, chances are that it will take a few hours to run. This is because you are running the project on your computer’s CPU (these days models are mostly trained and evaluated on GPUs, which are several orders of magnitude faster).

Instead of installing packages on your computer, if you want to test a project quickly, there is an alternative - Google Colab.

What is Google Colab?

Colab (short for Colaboratory) is a product offered by Google Research that allows machine learning researchers to work on projects in the browser. Similar to Google Docs, it allows you to share projects between many people, and best of all, it gives free access to GPUs for you to quickly train models without any signup.

When working on your projects, we recommend testing your code on a portion of your dataset on your own computer, or on Google Colab initially. You can use these tools as a quick way to see if your code has bugs or if the output of your model is reasonable. After testing initially, you can then copy your code and full dataset to your group’s AWS instance to run the full version of your model (as testing locally / on Colab is free, whereas you will be using the course credits for running on AWS).

Getting Started

Follow the tutorial here to get started with Google Colab.

Further Reading

If you’re interested learning more about style transfer, here are some of the most important papers on the topic. We’ll also be talking about style transfer in class, so stay tuned!