experiment, I train a multilayer CNN for street view house numbers recognition
and check the accuracy of test data. The coding is done in python using
Tensorflow, a powerful library for implementation and training deep neural
networks. The central unit of data in TensorFlow is the tensor. A tensor
consists of a set of primitive values shaped into an array of any number of
dimensions. A tensor’s rank is its number of dimensions. 9 Along with
TensorFlow used some other library function such as Numpy, Mathplotlib, SciPy
I perform my
analysis only using the train and test dataset due to limited technical resources.
And omit extra dataset which is almost 2.7GB. To make the analysis simpler delete
all those data points which have more than 5 digits. By preprocessing the data
from the original SVHN dataset a pickle file is created which being used in my
experiment. For the implementation, I randomly shuffle valid dataset and then
used the pickle file and train a 7-layer Convoluted Neural Network. Finally, cast-off the test data
to check for accuracy of the trained model to detect number from street house
At the very
beginning of the experiment, first convolution layer has 16 feature maps with
5×5 filters, and originate 28x28x16 output. A few ReLU layers are also added
after each convolution layer to add more non-linearity to the decision-making
process. After first sub-sampling the output size decrease in 14x14x10. The
second convolution has 512 feature maps with 5×5 filters and produces 10x10x32
output. By applying sub-sampling second time get the output size 5x5x32.
Finally, the third convolution has 2048 feature maps with same filter size. It
is mentionable that the stride size =1 in my experiment along with zero padding.
During my experiment, I use dropout technique to reduce the overfitting.
Finally, SoftMax regression layer is used to get the final output.
initialized randomly using Xavier initialization which keeps the weights in the
right range. It automatically scales the initialization based on the number of
output and input neurons. After model buildup, start train the network and log
the accuracy, loss and validation accuracy for every 500 steps.Once the process
is done then get the test set accuracy. To minimize the loss, Adagrad Optimizer used.
After reach in a suitable accuracy level stop train the network and save the
hyperparameters in a checkpoint file. When we need to perform the detection, the
program will load the checkpoint file without train the model again.
the model produced an accuracy of 89% with just 3000 steps. It’s a great
starting point and certainly, after a few times of training the accuracy will reach
my benchmark of 90%. However, I added some additional features to increase accuracy.
First, added a dropout layer between the third convolution layer and fully
connected layer. This allows the network to become more robust and prevents
overfitting. Secondly, introduced
exponential decay to learning rate instead of keeping it constant. This helps
the network to take bigger steps at first so that it learns fast but over time
as we move closer to the global minimum, take smaller noisier steps.
With these changes, the model is now able to produce an accuracy of 91.9% on
the test set. Since there are a large training set and test set, there is a
chance of more improvement if the model will train for a longer time.