Access SVHN data in Python using h5py

Several days ago I was trying to train a neural network on the Street View House Numbers (SVHN) Dataset. I was working on the test set for its relatively smaller size with 13068 images only. The bounding box information are recorded in digitStruct.mat which can be loaded with Matlab. There are two fields for each record in digitStruct: name, the name of the image file; and bbox, the bounding box information of that image.

In general, to open a MAT-file in Python, we can use the function loadmat() in SciPy.io package as below:

from scipy.io import loadmat
digit_file = os.path.join(img_path, 'digitStruct.mat')
loadmat(digit_file)

However, the file cannot be loaded correctly and an error message prompts:

NotImplementedError: Please use HDF reader for matlab v7.3 files

Access MAT-file using h5py

It seems that the .mat file is in v7.3 MAT format, which is based on the HDF5 format. To correctly access the HDF5/v7.3 MAT format MAT-file in Python, we can use the h5py package. Here I shall only write down the most essential part to read the information stored in name and bbox. More details on h5py can be found on the documentation page.

First, we need to open the .mat file:

import os
import h5py

digit_file = os.path.join(img_path, 'digitStruct.mat')
f = h5py.File(digit_file, 'r')

Let’s inspect what is stored in f:

>>> f.keys()
<KeysViewHDF5 ['#refs#', 'digitStruct']>

The f['digitStruct'] is an h5py group and somewhat just like a Python dictionary. So we again take a look what are stored in f['digitStruct']:

>>> f['digitStruct'].keys()
<KeysViewHDF5 ['bbox', 'name']>

Here is the real stuff. For ease of reference, we assign variables for the two fields in digitStruct, and check what is inside:

names = f['digitStruct/name']
bboxs = f['digitStruct/bbox']

>>> names
<HDF5 dataset "name": shape (13068, 1), type "|O">
>>> bboxs
<HDF5 dataset "bbox": shape (13068, 1), type "|O">

See the 13068? It means the information of each image is somehow stored within these two structures. It is worth to remember that each entry in both names or bboxs is an array of object references to the data file:

>>> names[0]
array([<HDF5 object reference>], dtype=object)
>>> names[0][0]
<HDF5 object reference>

File name

We can access the actual data by passing this reference to the h5py file object as and index, and retrieve all the elements stored there with the operator [()]:

>>> f[names[0][0]][()]
array([[ 49],
       [ 46],
       [112],
       [110],
       [103]], dtype=uint16)

Then the below function will get the file name of a specific image by index:

def get_img_name(f, idx=0):
    img_name = ''.join(map(chr, f[names[idx][0]][()].flatten()))
    return(img_name)

Bounding boxes

There are 5 elements stored in the item of the bbox dictionary: height, left, top, width, label. The case of bounding boxes is a little bit complicated than file name since there can be more than 1 boxes for an image. We can retrieve the bounding box information by this function:

bbox_prop = ['height', 'left', 'top', 'width', 'label']
def get_img_boxes(f, idx=0):
    """
    get the 'height', 'left', 'top', 'width', 'label' of bounding boxes of an image
    :param f: h5py.File
    :param idx: index of the image
    :return: dictionary
    """
    meta = { key : [] for key in bbox_prop}

    box = f[bboxs[idx][0]]
    for key in box.keys():
        if box[key].shape[0] == 1:
            meta[key].append(int(box[key][0][0]))
        else:
            for i in range(box[key].shape[0]):
                meta[key].append(int(f[box[key][i][0]][()].item()))

    return meta
A small tricky part here is if there is only 1 bounding box in the image, the information will be stored in a Dataset. However, if there are several bounding boxes in an image, the information will be stored in several Dataset under a Group. Therefore we need an if ... else block to distinguish the two situations.

And finally, here is a small test to check whether those functions work properly:

>>> max = f['digitStruct/name'].shape[0]
>>> for _ in range(5):
...     idx = random.randint(0, max - 1)
...     print(get_img_name(f, idx), get_img_boxes(f, idx))
...
8644.png {'height': [25, 25], 'left': [36, 54], 'top': [5, 6], 'width': [18, 16], 'label': [5, 5]}
9333.png {'height': [19, 19], 'left': [54, 64], 'top': [18, 16], 'width': [9, 8], 'label': [3, 7]}
11019.png {'height': [38, 38], 'left': [27, 39], 'top': [6, 8], 'width': [12, 22], 'label': [1, 5]}
3195.png {'height': [34, 34], 'left': [162, 182], 'top': [100, 94], 'width': [22, 16], 'label': [3, 8]}
4459.png {'height': [17, 17, 17], 'left': [78, 84, 92], 'top': [21, 23, 17], 'width': [8, 10, 11], 'label': [2, 10, 10]}

Avatar
Leo Mak
Make the world a better place, piece by piece.
comments powered by Disqus