Generate CNN training data.

Posted on January 02, 2017 in notebooks

When training convolution neural networks (CNNs) doesn't work, it's difficult to know what went wrong. A starting point for debugging these networks is training them on clean data with clear patters. In this notebook I create a simple image sequence of a moving square and attempt to predict its x (horizontal) cordinate.

In [1]:
import numpy as np
import random 

from matplotlib.pyplot import imshow
from matplotlib import pyplot as plt
from matplotlib import cm
%matplotlib inline 

This function can return the x and y possition but for now we just want to predict the x possition of the square.

In [2]:
def moving_square(n_frames=100, return_x=True, return_y=True):
    
    '''
    Generate sequence of images of square bouncing around 
    the image and labels of its coordinates. Can be used as a 
    basic simple performance test of convolution networks. 
    '''
    
    row = 120
    col = 160
    movie = np.zeros((n_frames, row, col, 3), dtype=np.float)
    labels = np.zeros((n_frames, 2), dtype=np.float)

    #initial possition
    x = np.random.randint(20, col-20)
    y = np.random.randint(20, row-20)
    
    # Direction of motion
    directionx = -1
    directiony = 1
    
    # Size of the square
    w = 4
    
    for t in range(n_frames):
        #move
        x += directionx
        y += directiony
        
        #make square bounce off walls
        if y < 5 or y > row-5: 
            directiony *= -1
        if x < 5 or x > col-5: 
            directionx *= -1
            
        #draw square and record labels
        movie[t, y - w: y + w, x - w: x + w,  1] += 1
        labels[t] = np.array([x, y])
        
    #only return requested labels
    if return_x and return_y:
        return movie, labels
    elif return_x and not return_y:
        return movie, labels[:,0]
    else:
        return movie, labels[:,1]

Here we create images and the labels of the x possition. Both are numpy arrays containing 2000 samples each.

In [16]:
movie, labels = moving_square(2000, return_y=False)

Here we can see how the square bounces around the image to give us a wide range of possitions.

In [22]:
fig, ax = plt.subplots(1, 10, figsize=(15, 6),
                         subplot_kw={'adjustable': 'box-forced'})

axoff = np.vectorize(lambda ax:ax.axis('off'))
axoff(ax)

for i in range(10):
    frame = i*20
    ax[i].imshow(movie[frame])
    ax[i].set_title(labels[frame])
In [23]:
def split_data(X, Y, test_frac=.8):
    count = len(X)
    assert len(X) == len(Y)
    
    cutoff = int((count * test_frac) // 1)
    
    X_train = X[:cutoff]
    Y_train = Y[:cutoff]
    
    X_test = X[cutoff:]
    Y_test = Y[cutoff:]
    
    return X_train, Y_train, X_test, Y_test

    
movie_train, labels_train, movie_test, labels_test = split_data(movie, labels)
print('training samples: %s,   test samples: %s' %(len(movie_train), len(movie_test)))
training samples: 1600,   test samples: 400

Now that we have split our data into our training and test sets, we can create our model to test. For this example, well use a 3 layer convolution network with a dense layer at the end.

In [24]:
from keras.layers import Input, Embedding, LSTM, Dense, merge
from keras.models import Model
from keras.layers import Convolution2D, MaxPooling2D,
from keras.layers import Activation, Dropout, Flatten, Dense
In [25]:
def cnn3_full1():

    img_in = Input(shape=(120, 160, 3), name='img_in')
    angle_in = Input(shape=(1,), name='angle_in')

    x = Convolution2D(8, 3, 3)(img_in)
    x = Activation('relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)

    x = Convolution2D(16, 3, 3)(x)
    x = Activation('relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)

    x = Convolution2D(32, 3, 3)(x)
    x = Activation('relu')(x)
    x = MaxPooling2D(pool_size=(2, 2))(x)

    merged = Flatten()(x)
    #merged = merge([flat, angle_in], mode='concat', concat_axis=-1)

    x = Dense(256)(merged)
    x = Activation('linear')(x)
    x = Dropout(.2)(x)

    angle_out = Dense(1, name='angle_out')(x)

    model = Model(input=[img_in], output=[angle_out])
    return model

Since we are estimating a floating point value between 1 and 120 we've set our activation function to be linear and we'll use a mean squared error(mse) loss function.

In [35]:
model = cnn3_full1()
model.compile(loss='mse', optimizer='adam')
In [36]:
model.fit(movie_train, labels_train, batch_size=32, nb_epoch=10,
         validation_data=(movie_test, labels_test))
Train on 1600 samples, validate on 400 samples
Epoch 1/10
1600/1600 [==============================] - 19s - loss: 3231.6821 - val_loss: 1382.9762
Epoch 2/10
1600/1600 [==============================] - 25s - loss: 506.8916 - val_loss: 185.6907
Epoch 3/10
1600/1600 [==============================] - 25s - loss: 141.4330 - val_loss: 104.8761
Epoch 4/10
1600/1600 [==============================] - 25s - loss: 93.5230 - val_loss: 73.3963
Epoch 5/10
1600/1600 [==============================] - 27s - loss: 64.1208 - val_loss: 61.7327
Epoch 6/10
1600/1600 [==============================] - 26s - loss: 50.9679 - val_loss: 70.5726
Epoch 7/10
1600/1600 [==============================] - 33s - loss: 44.5662 - val_loss: 69.3478
Epoch 8/10
1600/1600 [==============================] - 26s - loss: 36.7403 - val_loss: 62.1258
Epoch 9/10
1600/1600 [==============================] - 32s - loss: 32.8558 - val_loss: 62.8279
Epoch 10/10
1600/1600 [==============================] - 33s - loss: 30.0092 - val_loss: 56.4622
Out[36]:

Our model here shows addequate training after 10 epochs. By showing the actual vs predited values of the test data, we can see that most predictions are within 1 pixel of the actual value. Now that we're sure that our model can learn a simple environment, we can try it on more complicated ones.

In [51]:
import pandas as pd
predictions = model.predict(movie_test[:400])
results = {'angle_pred':predictions[:,0],
           'angle_actual': labels_test[:400]}
df = pd.DataFrame(data=results,)
df
Out[51]:
angle_actual angle_pred
0 56.0 55.233990
1 57.0 56.693146
2 58.0 57.861286
3 59.0 58.791832
4 60.0 59.036770
5 61.0 60.441486
6 62.0 61.689087
7 63.0 63.456985
8 64.0 64.348381
9 65.0 66.222473
10 66.0 67.130646
11 67.0 67.738564
12 68.0 67.963821
13 69.0 68.935982
14 70.0 69.549744
15 71.0 71.431396
16 72.0 72.390083
17 73.0 74.190582
18 74.0 74.819740
19 75.0 75.444885
20 76.0 75.461914
21 77.0 76.146454
22 78.0 76.133331
23 79.0 77.773743
24 80.0 78.428741
25 81.0 79.718063
26 82.0 79.724319
27 83.0 80.628311
28 84.0 81.303391
29 85.0 82.617683
... ... ...
370 122.0 121.601166
371 123.0 126.386185
372 124.0 126.329117
373 125.0 124.133636
374 126.0 121.341873
375 127.0 122.241898
376 128.0 118.468033
377 129.0 116.506271
378 130.0 115.494576
379 131.0 114.433235
380 132.0 113.245926
381 133.0 109.520828
382 134.0 108.485291
383 135.0 111.566132
384 136.0 113.819847
385 137.0 117.470505
386 138.0 125.346176
387 139.0 137.456451
388 140.0 143.929260
389 141.0 148.688904
390 142.0 151.356018
391 143.0 160.217545
392 144.0 159.777679
393 145.0 159.734161
394 146.0 155.533234
395 147.0 153.263718
396 148.0 144.463562
397 149.0 127.741280
398 150.0 111.309967
399 151.0 119.974045

400 rows × 2 columns

In [52]:
ax = df.plot()
ax.set_xlabel("samples")

ax.set_ylabel("x value")
Out[52]:

These results of the test data show the actual angle and the predicted angle. The prediction accuracy in the middle of the image (x value between 100 and 40) is more accurate than the x values less than 40 and greater than 100. This is likely due to the view samples that occurred in those ranges.

In [ ]: