Deep Learning Learning Plan

This is my plan to on-board myself with recent deep learning practice (as of the publishing date of this post). Comments and recommendations via GitHub issues are welcome and appreciated! This plan presumes some probability, linear algebra, and machine learning theory already, but if you’re following along Part 1 of the Deep Learning book gives an overview of prerequisite topics to cover.

My notes on these sources are publicly available, as are my experiments.

  1. Intro tutorials/posts.
  2. Scalar supervised learning theory
  3. Scalar supervised learning practice
    • Choose an enviornment.
      • Should be TensorFlow-based, given the wealth of ecosystem around it; stuff like Sonnet and T2T.
      • I tried TF-Slim and and TensorLayer, but I still found Keras easiest to rapidly prototype in (and expand). TensorFlow is still pretty easy to drop down into from the Keras models.
      • Even with Keras, TF is awkward to prototype in: it’s also worth considering PyTorch.
    • Google MNIST
    • Lessons 0-4 from USF
    • Assignments 1-4 from Udacity
    • CIFAR-10
      • Extend to multiple GPUs
      • Visualizations (with Tensorboard): histogram summary for weights/biases/activations and layer-by-layer gradient norm recordings (+ how does batch norm affect them), graph visualization, cost over time
      • Visualizations for trained kernels: most-activating image from input set as viz, direct kernel image visualizations + maximizing image from input set as the viz per maximizing inputs, activations direct image viz (per Yosinki et al 2015). For maximizing inputs use regularization from Yosinki paper.
      • Faster input pipeline and timing metrics for each stage of operation input pipeline notes.
    • Assignment 2 from Stanford CS20S1
    • Lab 1 from MIT 6.S191
    • Stanford CS231n
    • Try out slightly less common techniques: compare initialization (orthogonal vs LSUV vs uniform), weight normalization vs batch normalization vs layer normalization, Bayesian-inspired weight decay vs early stopping vs proximal regularization
    • Replicate ResNet by He et al 2015, Dropconnect, Maxout, Inception (do a fine-tuning example with Inception per this paper).
    • Do an end-to-end application from scratch. E.g., convert an equation image to LaTeX.
  4. Sequence supervised learning
  5. Unsupervised and semi-supervised approaches