How to Create Image Recognition With Python?
Image recognition is one of the most widespread machine learning classes of problems. It aims at training machines to recognize images similarly as people do.
Image recognition belongs to the group of supervised learning problems, i.e., classification problems, to be more precise.
This article presents a relatively simple approach of training a neural network to recognize digits. This approach uses an ordinary feedforward neural network. The accuracy of the model can be further improved using other techniques.
Creating the Basic Model
When creating the basic model, you should do at least the following five things:
1. Import modules, classes, and functions. In this article, we’re going to use the Keras library to handle the neural network and scikit-learn to get and prepare data.
2. Load data. This article shows how to recognize the digits written by hand. The function load_digits() from sklearn.datasets provide 1797 observations. Each observation has 64 features representing the pixels of 1797 pictures 8 px high and 8 px wide. Each feature can be in the range 0–16 depending on the shade of grey it has. The outputs represent correct digits and can have integer values in the range 0–9.
3. Transform and split data. We first need to binarize the outputs, i.e., make each of them a vector with the values 0 and 1. Then, we have to split the entire dataset into training and test sets. Finally, we standardize the inputs.
4. Create the classification model and train (fit). The simplest models have one input layer that is not explicitly added, one hidden layer, and one output layer. We use a training set to train our neural network.
5. Test the classification model. Finally, we test the performance of the network using the test set.
This is how the code looks like:
# 1. Import modules, classes and functions
import keras
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, StandardScaler
# 2. Load data
x, y = load_digits(n_class=10, return_X_y=True)
# 3. Transform and split data
# Create the binary output
tr = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
y = tr.fit_transform(y)
# Split train and test data
x_train, x_test, y_train, y_test =\
train_test_split(x, y, test_size=0.3, random_state=0)
# Standardize the input
sc = StandardScaler()
x_train, x_test = sc.fit_transform(x_train), sc.transform(x_test)
# 4. Create the classification model and train (fit) it
cl = Sequential()
# Add the hidden layer
cl.add(Dense(units=500, activation='relu', use_bias=True,
kernel_initializer='uniform', bias_initializer='zeros',
input_shape=(x_train.shape[1],)))
# Add the output layer
cl.add(Dense(units=10, activation='softmax', use_bias=True,
kernel_initializer='uniform', bias_initializer='zeros'))
# Compile the classification model
cl.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Fit (train) the classification model
cl.fit(x_train, y_train, epochs=100, batch_size=10)
# 5. Test the classification model
result = cl.evaluate(x_test, y_test, batch_size=128)
for i in range(2):
print(f'{cl.metrics_names[i]}: {result[i]}')
As you can see, the accuracy of the model is about 97.8 %. The results might vary!
You can play with the hyper-parameters and change the number of units in the hidden layer, the optimizer, number of epochs of training, the size of batches and so on, trying to further improve the accuracy of the network.
Making the Network Deeper
Deep neural networks have more than one hidden layer. Adding hidden layers might improve accuracy. The code is almost the same in the previous case, just with one additional statement to add another hidden layer:
# 1. Import modules, classes and functions
import keras
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer, StandardScaler
# 2. Load data
x, y = load_digits(n_class=10, return_X_y=True)
# 3. Transform and split data
# Create the binary output
tr = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
y = tr.fit_transform(y)
# Split train and test data
x_train, x_test, y_train, y_test =\
train_test_split(x, y, test_size=0.3, random_state=0)
# Standardize the input
sc = StandardScaler()
x_train, x_test = sc.fit_transform(x_train), sc.transform(x_test)
# 4. Create the classification model and train (fit) it
cl = Sequential()
# Add the first hidden layer
cl.add(Dense(units=500, activation='relu', use_bias=True,
kernel_initializer='uniform', bias_initializer='zeros',
input_shape=(x_train.shape[1],)))
# Add the second hidden layer
cl.add(Dense(units=500, activation='relu', use_bias=True,
kernel_initializer='uniform', bias_initializer='zeros',
input_shape=(x_train.shape[1],)))
# Add the output layer
cl.add(Dense(units=10, activation='softmax', use_bias=True,
kernel_initializer='uniform', bias_initializer='zeros'))
# Compile the classification model
cl.compile(loss='categorical_crossentropy', optimizer='adam',
metrics=['accuracy'])
# Fit (train) the classification model
cl.fit(x_train, y_train, epochs=100, batch_size=10)
# 5. Test the classification model
result = cl.evaluate(x_test, y_test, batch_size=128)
for i in range(2):
print(f'{cl.metrics_names[i]}: {result[i]}')
The accuracy is slightly increased to 98.3 %.
Convolutional Neural Networks and Other Improvements
Image recognition problems are often solved with even higher accuracy than we’ve obtained here. One way to improve the networks for image recognition is by adding a convolutional and pooling layer, making a convolutional neural network.
Additionally, some sort of regularization can be used, as a dropout. For more information on how to do this with Keras, you can take a look at the official Keras documentation.
Conclusion
This article is an introduction in implementing image recognition with Python and its machine learning libraries Keras and scikit-learn. Image recognition is supervised learning, i.e., classification task. This is just the beginning, and there are many techniques to improve the accuracy of the presented classification model.
Thank you for reading.