In previous post, I talked about how to use the h5py package to read MAT-file that contains bounding box information of the SVHN dataset. After we successfully reading the bounding box data, we can start to train a neural network for the SVHN recognition task. The bounding box data provided in the dataset is the position, size and label of each digit in the image, which means for a 4-digit house number, there are 4 boxes in total and each one is just bounding 1 digit, as shown below:
Several days ago I was trying to train a neural network on the Street View House Numbers (SVHN) Dataset. I was working on the test set for its relatively smaller size with 13068 images only. The bounding box information are recorded in digitStruct.mat which can be loaded with Matlab. There are two fields for each record in digitStruct: name, the name of the image file; and bbox, the bounding box information of that image.