使用神经网络进行手写数字识别
machine learningartificial intelligencepython
简介
手写数字识别是图像识别的一部分,广泛应用于深度学习中的计算机视觉。图像识别是深度学习中每个图像或视频相关任务的最基本和初步阶段之一。本文概述了手写数字识别以及如何将图像识别扩展到多类分类。
在继续之前,让我们了解二分类和多类图像分类之间的区别
二分类图像分类
在二分类图像分类中,模型有两个类可以预测。例如,在猫和狗的分类中。
多类图像分类
在多类图像分类中,模型有两个以上的类可以预测。例如,在 FasnionMNIST 或手写数字识别的分类中,我们有 10 个类别需要预测。
手写数字识别
此任务是多类图像分类的一个案例,其中模型预测输入图像所属的 0 到 9 之间的数字之一。
在 MNIST 数字识别任务中,我们使用 CNN 网络开发一个模型来识别手写数字。我们将下载 MNIST 数据集,该数据集包含 60000 张图像的训练集和 10000 张用于测试的图像。每幅图像被裁剪成 28x28 像素,手写数字从 0 到 9。
使用 Python 实现
示例
## Digit Recognition import keras from keras.layers import Conv2D, MaxPooling2D from keras.models import Sequential from keras import backend as K from keras.datasets import mnist from keras.utils import to_categorical from keras.layers import Dense, Dropout, Flatten import matplotlib.pyplot as plt %matplotlib inline fig = plt.figure n_classes = 10 input_shape = (28, 28, 1) batch_size = 128 num_classes = 10 epochs = 10 (X_train, Y_train), (X_test, Y_test) = mnist.load_data() print("Training data shape {} , test data shape {}".format(X_train.shape, Y_train.shape)) img = X_train[1] plt.imshow(img, cmap='gray') plt.show() X_train = X_train.reshape(X_train.shape[0], 28, 28, 1) X_test = X_test.reshape(X_test.shape[0], 28, 28, 1) Y_train = to_categorical(Y_train, n_classes) Y_test = to_categorical(Y_test, n_classes) X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255 print('x_train shape:', X_train.shape) print('train samples ',X_train.shape[0],) print('test samples',X_test.shape[0]) model = Sequential() model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',input_shape=input_shape)) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(num_classes, activation='softmax')) model.compile(loss=keras.losses.categorical_crossentropy,optimizer=keras.optimizers.Adadelta(),metrics=['accuracy']) history = model.fit(X_train, Y_train,batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(X_test, Y_test)) output_score = model.evaluate(X_test, Y_test, verbose=0) print('Testing loss:', output_score[0]) print('Testing accuracy:', output_score[1])
输出
Training data shape (60000, 28, 28) , test data shape (60000,) x_train shape: (60000, 28, 28, 1) train samples 60000 test samples 10000 Epoch 1/10 469/469 [==============================] - 13s 10ms/step - loss: 2.2877 - accuracy: 0.1372 - val_loss: 2.2598 - val_accuracy: 0.2177 Epoch 2/10 469/469 [==============================] - 4s 9ms/step - loss: 2.2428 - accuracy: 0.2251 - val_loss: 2.2058 - val_accuracy: 0.3345 Epoch 3/10 469/469 [==============================] - 5s 10ms/step - loss: 2.1863 - accuracy: 0.3062 - val_loss: 2.1340 - val_accuracy: 0.4703 Epoch 4/10 469/469 [==============================] - 5s 10ms/step - loss: 2.1071 - accuracy: 0.3943 - val_loss: 2.0314 - val_accuracy: 0.5834 Epoch 5/10 469/469 [==============================] - 4s 9ms/step - loss: 1.9948 - accuracy: 0.4911 - val_loss: 1.8849 - val_accuracy: 0.6767 Epoch 6/10 469/469 [==============================] - 4s 10ms/step - loss: 1.8385 - accuracy: 0.5744 - val_loss: 1.6841 - val_accuracy: 0.7461 Epoch 7/10 469/469 [==============================] - 4s 10ms/step - loss: 1.6389 - accuracy: 0.6316 - val_loss: 1.4405 - val_accuracy: 0.7825 Epoch 8/10 469/469 [==============================] - 5s 10ms/step - loss: 1.4230 - accuracy: 0.6694 - val_loss: 1.1946 - val_accuracy: 0.8078 Epoch 9/10 469/469 [==============================] - 5s 10ms/step - loss: 1.2229 - accuracy: 0.6956 - val_loss: 0.9875 - val_accuracy: 0.8234 Epoch 10/10 469/469 [==============================] - 5s 11ms/step - loss: 1.0670 - accuracy: 0.7168 - val_loss: 0.8342 - val_accuracy: 0.8353 Testing loss: 0.8342439532279968 Testing accuracy: 0.8353000283241272
结论
在本文中,我们研究了如何使用神经网络识别手写数字。