2020-12-11

InceptionResNetV2在图像分类中的使用

快忙死了…

关联项目:
http://www.h13studio.com/TensorFlow图像分类入门笔记/
http://www.h13studio.com/TensorFlow卷积神经网络的理解/

在上一篇文章之后自己也尝试过把官方Demo的网络模型复杂化, 试图增加模型可靠性, 降低假阳性。
但是在随意尝试几次之后, 在准确率上并没有明显的差距。
然后我就在TensorFlow的文档里面找有没有讲解网络模型怎么搭建的例子。很可惜我并没有找到。
然后我看了几个常见的知名框架的网络结构, 发现完全无法理解他为什么这样设计。
所以我决定先跑几个常见TensorFlow模型试试。
常用TensorFlow模型:
https://github.com/keras-team/keras-applications

单击展开常见网络模型

	Input	Top-1	Top-5	Size	Stem	References
VGG16	224	71.268	90.050	138.4M	14.7M	[paper] [tf-models]
VGG19	224	71.256	89.988	143.7M	20.0M	[paper] [tf-models]
ResNet50	224	74.928	92.060	25.6M	23.6M	[paper] [tf-models] [torch] [caffe]
ResNet101	224	76.420	92.786	44.7M	42.7M	[paper] [tf-models] [torch] [caffe]
ResNet152	224	76.604	93.118	60.4M	58.4M	[paper] [tf-models] [torch] [caffe]
ResNet50V2	299	75.960	93.034	25.6M	23.6M	[paper] [tf-models] [torch]
ResNet101V2	299	77.234	93.816	44.7M	42.6M	[paper] [tf-models] [torch]
ResNet152V2	299	78.032	94.162	60.4M	58.3M	[paper] [tf-models] [torch]
ResNeXt50	224	77.740	93.810	25.1M	23.0M	[paper] [torch]
ResNeXt101	224	78.730	94.294	44.3M	42.3M	[paper] [torch]
InceptionV3	299	77.898	93.720	23.9M	21.8M	[paper] [tf-models]
InceptionResNetV2	299	80.256	95.252	55.9M	54.3M	[paper] [tf-models]
Xception	299	79.006	94.452	22.9M	20.9M	[paper]
MobileNet(alpha=0.25)	224	51.582	75.792	0.5M	0.2M	[paper] [tf-models]
MobileNet(alpha=0.50)	224	64.292	85.624	1.3M	0.8M	[paper] [tf-models]
MobileNet(alpha=0.75)	224	68.412	88.242	2.6M	1.8M	[paper] [tf-models]
MobileNet(alpha=1.0)	224	70.424	89.504	4.3M	3.2M	[paper] [tf-models]
MobileNetV2(alpha=0.35)	224	60.086	82.432	1.7M	0.4M	[paper] [tf-models]
MobileNetV2(alpha=0.50)	224	65.194	86.062	2.0M	0.7M	[paper] [tf-models]
MobileNetV2(alpha=0.75)	224	69.532	89.176	2.7M	1.4M	[paper] [tf-models]
MobileNetV2(alpha=1.0)	224	71.336	90.142	3.5M	2.3M	[paper] [tf-models]
MobileNetV2(alpha=1.3)	224	74.680	92.122	5.4M	3.8M	[paper] [tf-models]
MobileNetV2(alpha=1.4)	224	75.230	92.422	6.2M	4.4M	[paper] [tf-models]
MobileNetV3(small)	224	68.076	87.800	2.6M	0.9M	[paper] [tf-models]
MobileNetV3(large)	224	75.556	92.708	5.5M	3.0M	[paper] [tf-models]
DenseNet121	224	74.972	92.258	8.1M	7.0M	[paper] [torch]
DenseNet169	224	76.176	93.176	14.3M	12.6M	[paper] [torch]
DenseNet201	224	77.320	93.620	20.2M	18.3M	[paper] [torch]
NASNetLarge	331	82.498	96.004	93.5M	84.9M	[paper] [tf-models]
NASNetMobile	224	74.366	91.854	7.7M	4.3M	[paper] [tf-models]
EfficientNet-B0	224	77.190	93.492	5.3M	4.0M	[paper] [tf-tpu]
EfficientNet-B1	240	79.134	94.448	7.9M	6.6M	[paper] [tf-tpu]
EfficientNet-B2	260	80.180	94.946	9.2M	7.8M	[paper] [tf-tpu]
EfficientNet-B3	300	81.578	95.676	12.3M	10.8M	[paper] [tf-tpu]
EfficientNet-B4	380	82.960	96.260	19.5M	17.7M	[paper] [tf-tpu]
EfficientNet-B5	456	83.702	96.710	30.6M	28.5M	[paper] [tf-tpu]
EfficientNet-B6	528	84.082	96.898	43.3M	41.0M	[paper] [tf-tpu]
EfficientNet-B7	600	84.430	96.840	66.7M	64.1M	[paper] [tf-tpu]

当然, 上述表格是2020/12/12日的。更新的表格见:
https://github.com/keras-team/keras-applications

正好我们正在做一个图像识别的项目, 就随便选了一个框架玩玩。
如题, 我们选用了 InceptionResNetV2 网络。

InceptionResNetV2

我还是没有找到官方的使用说明文档和教程, 但是我找到了他的源代码。
源代码地址:
https://github.com/keras-team/keras-applications/blob/master/keras_applications/inception_resnet_v2.py

参考了其他几个博客, 大致了解了这个模型的使用流程。

这个模型是集成到 Keras 里面的, 所以我们使用的时候不需要再把它的网络结构再构建一遍, 直接加载模型就行。
然后这个网络模型在训练的时候会生成一个权重文件。也就是我们学习到的数据。
学习完之后, 我们再在使用时加载生成的权重文件即可。

在使用时, 需要经过以下几个步骤:
(嗯, 没错, 我居然也有亲自画Visio的一天)

有几个要注意的点:
输入的图片的Shape最好为 (299, 299, 3) 或 (3, 299, 299)。
其他分辨率也可以, 没必要太高, 也可以比299小一点。实测在表情包处理中 (256, 256, 3) 效果还比 (299, 299, 3) 好一点。但是效果都不错。
如果你想改分辨率则需要在模型的构造函数 InceptionResNetV2() 里传入shape的形状, 并且保证你训练时构造函数的shape和调用时构造函数的Shape一样! 并且你读取图片的时候的shape也需要跟着改变!
不过我的 Train.py 以及 Valid.py 都给好了可能需要Config的地方
比如将文章末尾的 Train.py 中的

构造函数的原型以及官方注释:

def InceptionResNetV2(include_top=True,
                      weights='imagenet',
                      input_tensor=None,
                      input_shape=None,
                      pooling=None,
                      classes=1000,
                      **kwargs)
    """Instantiates the Inception-ResNet v2 architecture.
    Optionally loads weights pre-trained on ImageNet.
    Note that the data format convention used by the model is
    the one specified in your Keras config at `~/.keras/keras.json`.
    # Arguments
        include_top: whether to include the fully-connected
            layer at the top of the network.
        weights: one of `None` (random initialization),
              'imagenet' (pre-training on ImageNet),
              or the path to the weights file to be loaded.
        input_tensor: optional Keras tensor (i.e. output of `layers.Input()`)
            to use as image input for the model.
        input_shape: optional shape tuple, only to be specified
            if `include_top` is `False` (otherwise the input shape
            has to be `(299, 299, 3)` (with `'channels_last'` data format)
            or `(3, 299, 299)` (with `'channels_first'` data format).
            It should have exactly 3 inputs channels,
            and width and height should be no smaller than 75.
            E.g. `(150, 150, 3)` would be one valid value.
        pooling: Optional pooling mode for feature extraction
            when `include_top` is `False`.
            - `None` means that the output of the model will be
                the 4D tensor output of the last convolutional block.
            - `'avg'` means that global average pooling
                will be applied to the output of the
                last convolutional block, and thus
                the output of the model will be a 2D tensor.
            - `'max'` means that global max pooling will be applied.
        classes: optional number of classes to classify images
            into, only to be specified if `include_top` is `True`, and
            if no `weights` argument is specified.
    # Returns
        A Keras `Model` instance.
    # Raises
        ValueError: in case of invalid argument for `weights`,
            or invalid input shape.
    """

具体的代码分析我就不写了, 直接贴代码吧。不懂的翻前面的博客。
反正官方的源码也没有几行, 能大致看懂他要求你输入什么参数。
而且就算这样, 过不了几个月就会因为Tensorflow成天更新导致我的Demo又跑不通了, 还得你自己改。

代码

https://github.com/h13-0/Tensorflow-Study-Note/tree/master/InceptionResNetV2/meme
请把整个repositorie下载下来再使用

效果图

Perfect!

一些问题

当然, 在我尝试的时候还是有一些疑问并且现在还没有解决。

通过修改classes设置模型所需要判断的图片种类数量时遇到的问题

没理解错的话, 官方自己在构造函数 InceptionResNetV2 中提供了 classes 定义, 这个定义是用来设置模型所需要判断的图片种类数量。

1
2
3

classes: optional number of classes to classify images
    into, only to be specified if `include_top` is `True`, and
    if no `weights` argument is specified.

自定义classes的时候, 需要把 include_top 设置为 True , weights 设置为 None。
但是我直接通过这个接口定义我的种类数量进行训练时, 效果很不好。
查阅代码可知: 其又在网络顶部加入了一个激活函数为 softmax 的全连接层。也就是加了一层输出。

1	x = layers.Dense(classes, activation='softmax', name='predictions')(x)

但是实际在训练的时候这样写, 效果很不好(非常不好)。不如自己手动再加一层。

2020-02-22
更新:
在实际使用的时候发现InceptionResNet V2在稍微复杂一点的数据集上表现很拉胯, 甚至还不如MobileNetV2, 速度慢权重文件也大, 暂时推荐Xception。

在Keras官网上写的测试数据好像比Github上的更准一点?
https://keras.io/zh/applications/#%E6%A8%A1%E5%9E%8B%E6%A6%82%E8%A7%88

另外, 图片预处理策略和训练脚本将在下一篇博客更新。
http://www.h13studio.com/基于TensorFlow的keras-applications图像分类技术的一般使用方式/