2020-11-08

TensorFlow图像分类入门笔记

最好的资料莫过于官方的文档, 但是有时候按官方文档并不那么容易出成果。

官方的tutorials是在你跟着学完好多他的demo之后你才学会怎么导入自己的数据集和训练。当然这样更容易打基础, 但是直接摸索怎么导入自己的数据处理并运行很明显更有趣。

官方tutorials:
https://tensorflow.google.cn/tutorials

TensorFlow环境配置

一般来说都推荐用GPU版本, 安装起来也相对简单, 版本对应即可。
https://blog.csdn.net/qq_41793080/article/details/104546053

~~不过我GTX1650+9300h的GPU版本速度也就比i5-6300u快了1/3? with GPU0:加速速度还不如CPU? 不明白为什么差距这么小。~~
2021-02-17更新: 1650 CUDA加速在跑卷积神经网络的时候速度比9300h快了十几倍。。。
而且不能直接 with GPU0: 加速, 需要在训练前加入这句话

1
2
3

physical_devices = tf.config.experimental.list_physical_devices('GPU')
for physical_device in physical_devices:
    tf.config.experimental.set_memory_growth(physical_device, True)

运行官方Demo

https://tensorflow.google.cn/tutorials/keras/classification
相比于学习手写, 个人感觉这个更有意思一些。
一行一行的跟着复制粘贴很快就跑起来了。

分析官方Demo

建议以下内容结合原文观看。

首先是加载 keras.datasets.fashion_mnist 数据集

1 2	fashion_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

稍微处理一下数据, 写个标签, 把0-255的值归位0-1的浮点(不明白为什么, 实际测试中有时候0-255的效果要比0-1要好)

class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

train_images = train_images / 255.0
test_images = test_images / 255.0

设置网络模型的层
1
2
3
4
5
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
这里强烈建议结合原文观看。
这里面有一个 input_shape=(28, 28) 具体是干什么用的后面会讲到。

原文
该网络的第一层 `tf.keras.layers.Flatten` 将图像格式从二维数组（28 x 28 像素）转换成一维数组（28 x 28 = 784 像素）。将该层视为图像中未堆叠的像素行并将其排列起来。该层没有要学习的参数，它只会重新格式化数据。

展平像素后，网络会包括两个 tf.keras.layers.Dense 层的序列。它们是密集连接或全连接神经层。第一个 Dense 层有 128 个节点（或神经元）。第二个（也是最后一个）层会返回一个长度为 10 的 logits 数组。每个节点都包含一个得分，用来表示当前图像属于 10 个类中的哪一类。

编译模型
也就是配置损失函数优化器以及指标
1
2
3
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
原文
在准备对模型进行训练之前，还需要再对其进行一些设置。以下内容是在模型的编译步骤中添加的：

损失函数 - 用于测量模型在训练期间的准确率。您会希望最小化此函数，以便将模型“引导”到正确的方向上。
优化器 - 决定模型如何根据其看到的数据和自身的损失函数进行更新。
指标 - 用于监控训练和测试步骤。以下示例使用了准确率，即被正确分类的图像的比率。

训练模型
1
model.fit(train_images, train_labels, epochs=10)
很显然 train_images train_labels 分别代表训练图像和训练标签。 epochs 是迭代次数。
测试模型
1
2
predictions = probability_model.predict(test_images)
print("This Item is " + class_names[np.argmax(predictions[0])])
就会输出第0个 test_images 的预测结果
这一部分冗余代码比较多多余的都是绘制图表之类的, 详细代码可以看原文。

修改官方Demo,并导入自己的模型进行训练

原文的第一篇就这样结束了。
接下来可以试着按照官方的demo训练自己的模型。
可以先做个表情包分类玩玩。
表情包训练素材
那第一步肯定是导入自己的图片了。

图像预处理函数

首先先观察官方的训练集是哪里来的

1
2
3

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
print("Type of train_images is: " + str(type(train_images)))

1	Type of train_images is: <class 'numpy.ndarray'>

既然是numpy 那就简单多了

1	print(train_images.shape)

1	(10000, 20, 20)

也就是说10000张20*20的灰度图
train_labels 同理,可以发现是

(10000, )

一万条0-9的标签ID, 比如对于Demo, 则根据

1 2	class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

得: 0对应’T-shirt/top’, 1对应’Trouser’, 9对应’Ankle boot’
那就简单了。

先和demo一样读取长宽固定的灰度图存到对应的矩阵中再来个label的矩阵
众所周知, OpenCV-Python读入的图片就是Numpy的矩阵, 外加一些图像预处理操作的话用OpenCV明显更方便。

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# System API
import os

# 图像预处理函数 将 cv2.imread 得到的图像传入此函数 将自动输出200*200的灰度图
def ImagePreProcess(Path):
    imgtemp = cv.resize(cv.imread(Path),(200,200),)

    # BRG to GRAY
    imgtemp = cv.cvtColor(imgtemp,cv.COLOR_RGB2GRAY)
    imgtemp = 255 - imgtemp
    
    return imgtemp

print(ImagePreProcess("Path Of Image File").shape)

输出:

1	(200, 200)

批量导入图像函数

既然要批量导入图像那就需要规定一些细节了。
为了方便, 我们可以将同一类图片放到同一个文件夹下, 然后再把那个文件夹下所有图片导入即可
路径不能有中文!
将文件夹路径和文件的 Label (int型)传入 LoadImages(Dir, Label)
然后程序会自动将文件夹下的图片批量导入 ImagePreProcess() 并合并为一个numpy

# Load Images From Floder
def LoadImages(Dir,label):
    Input_image_list = []
    Input_label_list = []

    for Files in os.listdir(Dir):
        if(os.path.isfile(Dir + "/" + Files)):
            print("Reading " + Dir + "/" + Files)
            Input_image_list.append(ImagePreProcess(Dir + "/" + Files))
            Input_label_list.append(label)
    return Input_image_list,Input_label_list

这个程序里面, 为了方便导入, 会将 ImagePreProcess() 返回的numpy存到一个List中并将此List当做返回值返回。

文件结构

因为我们可能同时训练很多种图片每种图片又同时需要训练集和验证集, 则为了程序方便, 我们可以规定一种文件格式
比如根据我提供的表情包训练素材 , 表情包共 滑稽 小恐龙 熊猫头 鹦鹉 四种。
那么我们可以有以下文件结构:

emoji
-> HuaJi
----> Test
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
----> Train
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
-> XiaoKongLong
----> Test
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
----> Train
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
-> XiongMaoTou
----> Test
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
----> Train
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
-> YingWu
----> Test
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg
----> Train
-------> xxx.jpg
-------> xxx.jpg
-------> xxx.jpg

看不明白的可以下载压缩包解压看一下。
测试集放到 Test 文件夹下
训练接放到 Train 文件夹下

然后我们就可以写批量导入程序了:

# Input Config
Base_Dir   = './emoji'
Class_name = ['HuaJi','XiaoKongLong','XiongMaoTou','YingWu']
Source_Dir = ['HuaJi','XiaoKongLong','XiongMaoTou','YingWu']

# Load Train Images
image_list = []
label_list = []

for index in range(len(Class_name)):
    temp_image_list,temp_label_list = LoadImages(Base_Dir + "/" + Source_Dir[index] + '/Train',index)
    image_list = image_list + temp_image_list
    label_list = label_list + temp_label_list

train_images = np.array(image_list)
train_labels = np.array(label_list)

Base_Dir 为上述 emoji 文件夹的位置
Class_name 是标签的英文, 训练中用不到, 但是在输出数据的时候可以直观的显示结果。
Source_Dir 就是 Base_Dir 下每种图片文件夹的位置。

这样就已经批量导入了。可以看一下变量类型。

1 2	print(train_images.shape) print(train_labels.shape)

1 2	(388, 200, 200, 3) (388,)

开始训练

然后就可以依葫芦画瓢, 把Demo抄过来使用了。

配置模型:
注意 keras.layers.Flatten(input_shape=(200, 200)), 中的 input_shape=(200, 200) 要和你单张图片的矩阵的shape一样,如果是5125123的彩色图片则应该改为 (200, 200, 3) 具体修改方法见后文。

keras.layers.Flatten(input_shape=(200, 200)), 意思为将矩阵展平为一位数组
keras.layers.Dense(128, activation='relu'), 意思为本层有128个激活函数为 relu 的隐藏层神经元
keras.layers.Dense(4) 意思为输出4种分类, 因为我们就4种表情包。

# Configure Model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(200, 200)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(4)
])

编译模型:

1
2
3

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

如果说我们想记录训练中的准确度和 Loss, 就可以这样写

1 2	# Start Training history = model.fit(train_images, train_labels, epochs=100)

输出模型训练过程中的数据

# Draw Training Message
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist.tail())
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('accuracy / loss')
plt.plot(hist['epoch'], hist['accuracy'], 
           label='Train accuracy')
plt.twinx()
plt.plot(hist['epoch'], hist['loss'],'r', 
           label='Train loss')

plt.show()

最后输出的图像:

评估模型

这一步也没什么好讲的直接 Ctrl+C Ctrl+V 即可。

# Test Model
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
print('\nTest accuracy:', test_acc)

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])

predictions = probability_model.predict(test_images)

def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array, true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("{} {:2.0f}% ({})".format(Class_name[predicted_label],
                                100*np.max(predictions_array),
                                Class_name[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  plt.grid(False)
  plt.xticks(range(4))
  plt.yticks([])
  thisplot = plt.bar(range(4), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

# Plot the first X test images, their predicted labels, and the true labels.
# Color correct predictions in blue and incorrect predictions in red.
num_rows = 5
num_cols = 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(i, predictions[i], test_labels, test_images)
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

不过需要注意

1 2	num_rows = 5 num_cols = 4

当然了, 和官方Demo一样, 迭代用不了多少次就能达到很高的准确度, 而且一样会有 预估概率很高但是结果彻底错误 的情况。本人猜测是模型层配置过于简单。

保存输出训练好的模型

1	model.save("emojiModel")

加载模型也依旧简单,

1	model = keras.models.load_model("emojiModel")

更改训练素材的通道数或种类数

这一步依旧简单。

修改图像预处理部分:
修改颜色通道转换,修改分辨率设置

# Image PreProcess
def ImagePreProcess(Path):
    imgtemp = cv.resize(cv.imread(Path),(512,512),)

    # BRG to GRAY
    #imgtemp = cv.cvtColor(imgtemp,cv.COLOR_RGB2GRAY)
    #imgtemp = 255 - imgtemp

    # BRG to RGB
    imgtemp = cv.cvtColor(imgtemp, cv.COLOR_BGR2RGB)

    #imgtemp = imgtemp / 255.0
    return imgtemp

修改模型配置

# Configure Model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(512, 512, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(4)
])

注意 input_shape 要和你图片矩阵一样,
最后的 keras.layers.Dense(4) 里面的数量是你的图片总的种类数

修改图形绘制

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  plt.grid(False)
  plt.xticks(range(4))
  plt.yticks([])
  thisplot = plt.bar(range(4), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

把里面的一堆 4 换成你的种类数量。

最后的完整代码:

# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import cv2 as cv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# System API
import os

# Input Config
Base_Dir   = './emoji'
Class_name = ['HuaJi','XiaoKongLong','XiongMaoTou','YingWu']
Source_Dir = ['HuaJi','XiaoKongLong','XiongMaoTou','YingWu']

# Image PreProcess
def ImagePreProcess(Path):
    imgtemp = cv.resize(cv.imread(Path),(512,512),)

    # BRG to GRAY
    #imgtemp = cv.cvtColor(imgtemp,cv.COLOR_RGB2GRAY)
    #imgtemp = 255 - imgtemp

    # BRG to RGB
    imgtemp = cv.cvtColor(imgtemp, cv.COLOR_BGR2RGB)

    #imgtemp = imgtemp / 255.0
    return imgtemp

# Load Images
def LoadImages(Dir,label):
    Input_image_list = []
    Input_label_list = []

    for Files in os.listdir(Dir):
        if(os.path.isfile(Dir + "/" + Files)):
            print("Reading " + Dir + "/" + Files)
            Input_image_list.append(ImagePreProcess(Dir + "/" + Files))
            Input_label_list.append(label)
    return Input_image_list,Input_label_list

# Load Train Images
image_list = []
label_list = []

for index in range(len(Class_name)):
    temp_image_list,temp_label_list = LoadImages(Base_Dir + "/" + Source_Dir[index] + '/Train',index)
    image_list = image_list + temp_image_list
    label_list = label_list + temp_label_list

train_images = np.array(image_list)
train_labels = np.array(label_list)
print(train_images.shape)
print(train_labels.shape)

# Load Test Images
image_list = []
label_list = []

for index in range(len(Class_name)):
    temp_image_list,temp_label_list = LoadImages(Base_Dir + "/" + Source_Dir[index] + '/Test',index)
    image_list = image_list + temp_image_list
    label_list = label_list + temp_label_list

test_images = np.array(image_list)
test_labels = np.array(label_list)

# Configure Model
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(512, 512, 3)),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(4)
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

# Start Training
history = model.fit(train_images, train_labels, epochs=100)

# Draw Training Message
hist = pd.DataFrame(history.history)
hist['epoch'] = history.epoch
print(hist.tail())
plt.figure()
plt.xlabel('Epoch')
plt.ylabel('accuracy / loss')
plt.plot(hist['epoch'], hist['accuracy'], 
           label='Train accuracy')
plt.twinx()
plt.plot(hist['epoch'], hist['loss'],'r', 
           label='Train loss')

plt.show()

# Test Model
test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)
print('\nTest accuracy:', test_acc)

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])

predictions = probability_model.predict(test_images)

def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array, true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("{} {:2.0f}% ({})".format(Class_name[predicted_label],
                                100*np.max(predictions_array),
                                Class_name[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  plt.grid(False)
  plt.xticks(range(4))
  plt.yticks([])
  thisplot = plt.bar(range(4), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

# Plot the first X test images, their predicted labels, and the true labels.
# Color correct predictions in blue and incorrect predictions in red.
num_rows = 5
num_cols = 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
    plt.subplot(num_rows, 2*num_cols, 2*i+1)
    plot_image(i, predictions[i], test_labels, test_images)
    plt.subplot(num_rows, 2*num_cols, 2*i+2)
    plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

model.save("emojiModel")