Course v3 Lesson 2 Notes

General notes regarding previous lesson

  • No need to feel intimidated for all the good projects in lesson 1. Just open your imagination and start an interesting project!
  • Keep going: code and experiment –> “The whole game” –> concepts –> lesson 2

Creating your own dataset from Google Images

Download images

  • After opened image search result page, run the following javascript code in javascript console, by pressing Ctrl+Shift+J in Windows/Linux or Cmd+Opt+J in Mac.
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);'data:text/csv;charset=utf-8,' + escape(urls.join('\n')));


  • Clear up cache: destroy, delete, and empty cache.
del learn

Clean the data

  • Combine with human(expert) knowledge to delete unrelated samples in training and validation sets. We can make us of ImageCleaner widget from fastai.widets


  • Normally the trained model is pretty good at dealing with moderate amounts of noisy data. However, problem will occur if the data is not randomly noisy, but biased noisy.

  • When putting the model in production, which is in the inference phase, run it on a CPU, rather than a GPU, is recommended. It is because GPU is good at doing many things at the same time, but GPU inference incurs many hassles such as putting all the request into a batch, queueing, etc. So unless there is a lot of inference request simultaneously on a busy web server, CPU is much easier to deal with.

  • Loading a trained model:

    • In the video, there are several steps involved:
      1. Defined the classes same as those in the model
      2. Create a empty DataBunch using ImageDataBunch.single_from_classes(), with necessary image transformations set that are same in training phase.
      3. Create a Learner using the DataBunch just created.
      4. Load the weights/parameters.
    • In the version-3 notebook however, we can use Learner.export() to save everything needed (model, weights, classes, transformations, etc.) and use Learner.load_learner() to load all the stuff at once.

When things go wrong

  • Learning rate
    • Too high: training loss and validation loss will be very large (diverge)
    • Too low: losses decreasing too slow, and training loss is higher than validation loss (underfitting).
  • Number of epochs
    • Too few: training loss is higher than validation loss
    • Too many: overfitting: error rate improves and then gets worse again. Note: Training loss smaller than validation loss does not mean overfitting. A correctly trained model always has training loss smaller than validation loss.
  • Size of training data
    • Not enough training data maybe one of the possible reasons. But start from small amount of data first, don’t waste too much time on gathering data.

Error rate

In library, error_rate equals 1-accuracy. And accuracy is the percentage of correct prediction of the inputs. The metric is always applied to the validation set in

Learning rate

Rule of thumb: start from 3e-3, then unfreeze the model, and train again in range between lr and 3e-4 where lr is the number found by learning rate finder where lie the deepest slope.

learn.fit_one_cycle(4, 3e-3)
learn.fit_one_cycle(4, slice(lr, 3e-4))

Stochastic Gradient Descent (SGD)

Linear Regression problem

  • Tensor is a n-dimension array.
  • Binary infix operator @ in Python 3.5+ is responsible for calculating dot product of matrices.
  • The most common loss/error function is mean square error: $\frac{\sum(\hat{y}-y)^2}{n}$.

Gradient descent

  • PyTorch function torch.autograd.backward calculates the differentiation of given tensors. Then we can use torch.autograd.grad to calculate the gradients, and subtract it from the parameters / coefficients tensor to make the loss smaller.

Stochastic Gradient Descent

  • SGD: Gradient descent with mini batches
  • An epoch consists of running through all the training data once.
comments powered by Disqus