SVHN Data Preparation

In previous post, I talked about how to use the h5py package to read MAT-file that contains bounding box information of the SVHN dataset. After we successfully reading the bounding box data, we can start to train a neural network for the SVHN recognition task. The bounding box data provided in the dataset is the position, size and label of each digit in the image, which means for a 4-digit house number, there are 4 boxes in total and each one is just bounding 1 digit, as shown below:

4 individual bounding boxes for each digit in a house number image

What I would like to achieve is to train a model that can recognize the complete house number at one go rather than the individual digits and then combine the result. That is because for example, even though the model can successfully recognize 3 out of 4 digits of a 4-digit house number, the combined result is still wrong and useless. Therefore I need to create an encircling bounding box for each image, which includes all the individual boxes of that image so that the big box is now bounding all the annotated digits.

The main process is to read the information of all the bounding boxes, and then calculate the position and size of the circumscribed bounding box:

  • (x, y) correspond the minimum of all the x and y, which denote the top-left corner, of a bounding box.
  • width is the maximum of the sums of the x and the width of each box.
  • height is the maximum of the sums of the y and the height of each box.
Position and size of the circumscribed bounding box for all digits.
Position and size of the circumscribed bounding box for all digits.

The step is translated into the following function. Note that we also change the label of zero from ‘10’ to ‘0’.

def merge_bbox(f, idx=0):
    meta = get_img_boxes(f, idx)
    left = min(meta['left'])
    top = min(meta['top'])
    width = max(map(add, meta['left'], meta['width'])) - left
    height = max(map(add, meta['top'], meta['height'])) - top
    labels = [x if x != 10 else 0 for x in meta['label']]
    bbox = {'left': left, 'top': top, 'width': width, 'height': height, 'labels': labels}
    return bbox

And here are some of the examples showing each individual bounding boxes and the circumscribed bounding box.

Figure showing individual bounding boxes and the corresponding circumscribed bounding box.
Individual bounding boxes for each digit are in blue, and the corresponding circumscribed bounding boxes are in red.

The source code of the bounding boxes information preparation can be found here.

Avatar
Leo Mak
Make the world a better place, piece by piece.
comments powered by Disqus

Related