Cnn For Classification Digits On Mnist

create a CNN for MNIST dataset

import some useful libs

# imports
%matplotlib inline
# %pylab osx
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
# Some additional libraries which we'll use just
# to produce some visualizations of our training

import IPython.display as ipyd
plt.style.use('ggplot')

Import the dataset

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/temp/MNIST/", one_hot=True)

see a sample image of traning

plt.imshow(mnist.train.images[0].reshape((28, 28)), cmap='gray')

png

calculate mean and std image

# Take the mean across all images
mean_img = np.mean(mnist.train.images, axis=0)
print(mean_img.shape)
# Then plot the mean image.
plt.figure()
plt.imshow(mean_img.reshape((28, 28)), cmap='gray')

png

# Take the std across all images
std_img = np.std(mnist.train.images, axis=0)

# Then plot the std image.
plt.figure()
plt.imshow(std_img.reshape((28, 28)),cmap='gray')

png

print(mnist.train.images.shape)
print(mnist.validation.images.shape)
print(mnist.test.images.shape)

Training (mnist.train) » Use the given dataset with inputs and related outputs for training of NN. In our case, if you give an image that you know that represents a “nine”, this set will tell the neural network that we expect a “nine” as the output.
- 55,000 data points - mnist.train.images for inputs - mnist.train.labels for outputs
Validation (mnist.validation) » The same as training, but now the date is used to generate model properties (classification error, for example) and from this, tune parameters like the optimal number of hidden units or determine a stopping point for the back-propagation algorithm
- 5,000 data points - mnist.validation.images for inputs - mnist.validation.labels for outputs
Test (mnist.test) » the model does not have access to this informations prior to the test phase. It is used to evaluate the performance and accuracy of the model against “real life situations”. No further optimization beyond this point.
- 10,000 data points - mnist.test.images for inputs - mnist.test.labels for outputs

Defining the Network

I’ll first reset the current graph, so we can build a new one. We’ll use tensorflow’s nice helper function for doing this.

from tensorflow.python.framework.ops import reset_default_graph
reset_default_graph()

And just to confirm, let’s see what’s in our graph:

# We first get the graph that we used to compute the network
g = tf.get_default_graph()

# And can inspect everything inside of it
[op.name for op in g.get_operations()]

Creating placeholders and Set parameters

It’s a best practice to create placeholders before variable assignments when using TensorFlow. Here we’ll create placeholders for inputs (“Xs”) and outputs (“Ys”).

Placeholder ‘X’: represents the “space” allocated input or the images. * Each input has 784 pixels distributed by a 28 width x 28 height matrix
* The ‘shape’ argument defines the tensor size by its dimensions.
* 1st dimension = None. Indicates that the batch size, can be of any size.
* 2nd dimension = 784. Indicates the number of pixels on a single flattened MNIST image.

Placeholder ‘Y’:_ represents the final output or the labels.
* 10 possible classes (0,1,2,3,4,5,6,7,8,9)
* The ‘shape’ argument defines the tensor size by its dimensions.
* 1st dimension = None. Indicates that the batch size, can be of any size.
* 2nd dimension = 10. Indicates the number of targets/outcomes

dtype for both placeholders: if you not sure, use tf.float32. The limitation here is that the later presented softmax function only accepts float32 or float64 dtypes. For more dtypes, check TensorFlow’s documentation here

# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

Note:Since X is currently [batch, height*width], we need to reshape to a 4-D tensor to use it in a convolutional graph.in order to perform convolution, we have to use 4-dimensional tensors describing the:

N x H x W x C

We’ll reshape our input placeholder by telling the shape parameter to be these new dimensions and we’ll use -1 to denote this dimension should not change size. we do this in conv_net

define some usefull functions

# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')

structure of CNN

Convolutional Layer 1

Defining kernel weight and bias

Size of the filter/kernel: 5x5;
Input channels: 1 (greyscale);
32 feature maps (here, 32 feature maps means 32 different filters are applied on each image. So, the output of convolution layer would be 28x28x32). In this step, we create a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]

Convolve with weight tensor and add biases.

Defining a function to create convolutional layers. To creat convolutional layer, we use conv2d. It computes a 2-D convolution given 4-D input and filter tensors.

Inputs:

tensor of shape [batch, in_height, in_width, in_channels]. x of shape [batch_size,28 ,28, 1]
a filter / kernel tensor of shape [filter_height, filter_width, in_channels, out_channels]. W is of size [5, 5, 1, 32]
stride which is [1, 1, 1, 1]

Process:

change the filter to a 2-D matrix with shape [5*5*1,32]
Extracts image patches from the input tensor to form a virtual tensor of shape [batch, 28, 28, 5*5*1].
For each patch, right-multiplies the filter matrix and the image patch vector.

Output:

A Tensor (a 2-D convolution) of size <tf.Tensor ‘add_7:0’ shape=(?, 28, 28, 32)

Apply the ReLU activation Function

In this step, we just go through all outputs convolution layer, covolve1, and wherever a negative number occurs,we swap it out for a 0. It is called ReLU activation Function.

Apply the max pooling

Defining a function to perform max pooling. The maximum pooling is an operation that finds maximum values and simplifies the inputs using the spacial correlations between them.

Kernel size: 2x2 (if the window is a 2x2 matrix, it would result in one output pixel)
Strides: dictates the sliding behaviour of the kernel. In this case it will move 2 pixels everytime, thus not overlapping.

Convolutional Layer 2

Weights and Biases of kernels

Filter/kernel: 5x5 (25 pixels) ; Input channels: 32 (from the 1st Conv layer, we had 32 feature maps); 64 output feature maps
Notice: here, the input is 14x14x32, the filter is 5x5x32 and the output of the convolutional layer would be 14x14x64

do same process on it like Convolutional Layer 1

So, what is the output of the second layer, layer2?

it is 64 matrix of [7x7] Type: Fully Connected Layer. You need a fully connected layer to use the Softmax and create the probabilities in the end. Fully connected layers take the high-level filtered images from previous layer, that is all 64 matrics, and convert them to an array.

So, each matrix [7x7] will be converted to a matrix of [49x1], and then all of the 64 matrix will be connected, which make an array of size [3136x1]. We will connect it into another layer of size [1024x1]. So, the weight between these 2 layers will be [3136x1024]

Weights and Biases between layer 2 and 3

Composition of the feature map from the last layer (7x7) multiplied by the number of feature maps (64); 1027 outputs to Softmax layer

Optional phase for reducing overfitting - Dropout 3

It is a phase where the network “forget” some features. At each training step in a mini-batch, some units get switched off randomly so that it will not interact with the network. That is, it weights cannot be updated, nor affect the learning of the other network nodes. This can be very useful for very large neural networks to prevent overfitting.

Layer 4- Readout Layer (Softmax Layer)

Type: Softmax, Fully Connected Layer.

Weights and Biases

# Store layers weight & bias
weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Create model
def conv_net(x, weights, biases, dropout):
    
    x = tf.reshape(x, [-1, 28, 28, 1])

    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output, class prediction
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

Summary of the my CNN

Now is time to remember the structure of our network

0) Input - MNIST dataset

1) Convolutional and Max-Pooling

2) Convolutional and Max-Pooling

3) Fully Connected Layer

4) Processing - Dropout

5) Readout layer - Fully Connected

6) Outputs - Classified digits

Define functions and train the model

Define the loss function

We need to compare our output, layer4 tensor, with ground truth for all mini_batch. we can use cross entropy to see how bad our CNN is working - to measure the error at a softmax layer.

The following code shows an toy sample of cross-entropy for a mini-batch of size 2 which its items have been classified. You can run it (first change the cell type to code in the toolbar) to see hoe cross entropy changes.

Define the optimizer

It is obvious that we want minimize the error of our network which is calculated by cross_entropy metric. To solve the problem, we have to compute gradients for the loss (which is minimizing the cross-entropy) and apply gradients to variables. It will be done by an optimizer: GradientDescent or Adagrad.

Define prediction

Do you want to know how many of the cases in a mini-batch has been classified correctly? lets count them.

Define accuracy

It makes more sense to report accuracy using average of correct cases.

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

Training the model

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x-mean_img, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x-mean_img,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print("Optimization Finished!")

    # Calculate accuracy for 256 mnist test images
    print("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
                                      y: mnist.test.labels[:256],
                                      keep_prob: 1.}))