Course v3 Lesson 3 Notes

Multi-label prediction with Planet Amazon dataset

The data block API

Classes involved:

  • Dataset: an abstract class for retrieving an item by index and return the length of the Dataset.
  • DataLoader: get data from Dataset and feed the data batches to processors.
  • DataBunch: bind the DataLoaders of training dataset, validation dataset and optionally a test dataset, and is ready to be sent to the learner.

General steps using the data block API to create a DataBunch:

  • Where are the inputs and how to create them?
  • How to split the data into a training and validation sets?
  • How to label the inputs?
  • How to add a test set?
  • What transforms to apply?
  • How to wrap in dataloaders and create the DataBunch?

Data loading & training

  • Pay attention to transformation, which should suit the characteristics of the specific dataset.
  • Changing metrics does not change the resulting model. It does not make the model better, but only show how is the training going.


  • We can record the misclassified instances, e.g. by user input, and fine-tune the model, with the new dataset with just the misclassified instances. For example, fit the model with higher learning rate and/or more epochs.


  • Learning rate: after unfreezing and finding learning rate using LR Finder, if the learning rate curve doesn’t contain a significant deep slope, but only a flat line and then goes up, we may set the beginning of the slice at the point where is 10x smaller than the point before the curve starts going up, which is the point somehow at the bottom of the curve. The second half of the slice may be set at the learning rate of the frozen part divided by 5 to 10. There is still not a good auto learning rate finder at the moment of the video recording.
  • Training data: Progressive resizing: make most use of the data. Begin to train the model with smaller data set, e.g. to start with scaled down images data set such as 64x64, then fine tune the trained model with higher resolution/detail data set, e.g. step up to 128x128, 256x256, etc. This is one kind of transfer learnings.

Image segmentation with CamVid


  • Segmentation is to classify each pixel in every image. The actual effect is to segment different areas in an image into different classes.
  • For creating segmentation model, we can use the structure U-Net instead of CNN.

Image regression (Regression with BIWI head pose dataset)

  • Regression is a kind of model that the output is continuous number or set of numbers.
  • The output of head pose model is the position of the centre of the head in an image.

Nonlinearity / Activation function

Only linear operations such as matrix multiplication, or even convolution, cannot give very meaningful result or provide lots of usability such as image classification, text analysis. We should add some nonlinearities or activation functions into the whole linear structure to make the model to approximate other functions. One of the popular ones activation function nowadays is Rectified Linear Unit (ReLU):

$$f(x)=\begin{cases} 0 & \text{for } x \lt 0 \\ x & \text{for } x \geq 0 \end{cases}$$

, which is equal to max(x,0).

The idea of combining linear functions and nonlinearities to approximate arbitrary functions is called universal approximation theorem.

And accuracy is the percentage

Other notes and Q&A

  • Tricks to deal with not enough memory

    • smaller batch size
    • lower image size
    • mixed percision training: to train a model using half precision (16-bit) floating point number ( Learner.to_fp16())
  • If we use pre-trained model for many different applications, we should also use the same stats, e.g. use pre-trained ImageNet with the same stats. It is because the model was trained with those specific stats.

comments powered by Disqus