In my

experiment, I train a multilayer CNN for street view house numbers recognition

and check the accuracy of test data. The coding is done in python using

Tensorflow, a powerful library for implementation and training deep neural

networks. The central unit of data in TensorFlow is the tensor. A tensor

consists of a set of primitive values shaped into an array of any number of

dimensions. A tensor’s rank is its number of dimensions. 9 Along with

TensorFlow used some other library function such as Numpy, Mathplotlib, SciPy

etc.

I perform my

analysis only using the train and test dataset due to limited technical resources.

And omit extra dataset which is almost 2.7GB. To make the analysis simpler delete

all those data points which have more than 5 digits. By preprocessing the data

from the original SVHN dataset a pickle file is created which being used in my

experiment. For the implementation, I randomly shuffle valid dataset and then

used the pickle file and train a 7-layer Convoluted Neural Network. Finally, cast-off the test data

to check for accuracy of the trained model to detect number from street house

number image.

At the very

beginning of the experiment, first convolution layer has 16 feature maps with

5×5 filters, and originate 28x28x16 output. A few ReLU layers are also added

after each convolution layer to add more non-linearity to the decision-making

process. After first sub-sampling the output size decrease in 14x14x10. The

second convolution has 512 feature maps with 5×5 filters and produces 10x10x32

output. By applying sub-sampling second time get the output size 5x5x32.

Finally, the third convolution has 2048 feature maps with same filter size. It

is mentionable that the stride size =1 in my experiment along with zero padding.

During my experiment, I use dropout technique to reduce the overfitting.

Finally, SoftMax regression layer is used to get the final output.

Weights are

initialized randomly using Xavier initialization which keeps the weights in the

right range. It automatically scales the initialization based on the number of

output and input neurons. After model buildup, start train the network and log

the accuracy, loss and validation accuracy for every 500 steps.Once the process

is done then get the test set accuracy. To minimize the loss, Adagrad Optimizer used.

After reach in a suitable accuracy level stop train the network and save the

hyperparameters in a checkpoint file. When we need to perform the detection, the

program will load the checkpoint file without train the model again.

Initially,

the model produced an accuracy of 89% with just 3000 steps. It’s a great

starting point and certainly, after a few times of training the accuracy will reach

my benchmark of 90%. However, I added some additional features to increase accuracy.

First, added a dropout layer between the third convolution layer and fully

connected layer. This allows the network to become more robust and prevents

overfitting. Secondly, introduced

exponential decay to learning rate instead of keeping it constant. This helps

the network to take bigger steps at first so that it learns fast but over time

as we move closer to the global minimum, take smaller noisier steps.

With these changes, the model is now able to produce an accuracy of 91.9% on

the test set. Since there are a large training set and test set, there is a

chance of more improvement if the model will train for a longer time.