详解视频中动作识别模型与代码实践

本文分享自华为云社区《视频动作辨认》，作者：HWCloudAI。

实验方针

通过本事例的学习：

掌握C3D模型练习和模型推理、I3D模型推理的办法；

注意事项

本事例推荐运用TensorFlow-1.13.1，需运用

GPU

运行，请查看《ModelArts JupyterLab 硬件标准运用指南》了解切换硬件标准的办法；
假如您是第一次运用 JupyterLab，请查看《ModelArts JupyterLab运用辅导》了解运用办法；
假如您在运用 JupyterLab 进程中碰到报错，请参阅《ModelArts JupyterLab常见问题解决办法》尝试解决问题。

实验进程

事例内容介绍

视频动作辨认是指对一小段视频中的内容进行剖析，判别视频中的人物做了哪种动作。视频动作辨认与图画范畴的图画辨认，既有联系又有差异，图画辨认是对一张静态图片进行辨认，而视频动作辨认不仅要调查每张图片的静态内容，还要调查不同图片静态内容之间的时空联系。比如一个人扶着一扇半开的门，仅凭这一张图片无法判别该动作是开门动作仍是关门动作。

视频剖析范畴的研讨相比较图画剖析范畴的研讨，开展时刻更短，也更有难度。视频剖析模型完结的难点首要在于，需求强壮的核算资源来完结视频的剖析。视频要拆解成为图画进行剖析，导致模型的数据量十分巨大。视频内容有很重要的考虑要素是动作的时刻次序，需求将视频转化成的图画通过时刻联系联系起来，做出判别，所以模型需求考虑时序要素，参加时刻维度之后参数也会大量添加。

得益于PASCAL VOC、ImageNet、MS COCO等数据集的揭露，图画范畴产生了许多的经典模型，那么在视频剖析范畴有没有什么经典的模型呢？答案是有的，本事例将为咱们介绍视频动作辨认范畴的经典模型并进行代码实践。

1.预备源代码和数据

这一步预备事例所需的源代码和数据，相关资源现已保存在OBS中，咱们通过ModelArts SDK将资源下载到本地，并解压到当前目录下。解压后，当前目录包含data、dataset_subset和其他目录文件，分别是预练习参数文件、数据集和代码文件等。

import os
import moxing as mox
if not os.path.exists('videos'):
    mox.file.copy("obs://ai-course-common-26-bj4-v2/video/video.tar.gz", "./video.tar.gz")
    # 运用tar指令解压资源包
    os.system("tar xf ./video.tar.gz")
    # 运用rm指令删去压缩包
    os.system("rm ./video.tar.gz")
INFO:root:Using MoXing-v1.17.3-
INFO:root:Using OBS-Python-SDK-3.20.7

上一节课咱们现已介绍了视频动作辨认有HMDB51、UCF-101和Kinetics三个常用的数据集，本事例选用了UCF-101数据集的部分子集作为演示用数据集，接下来，咱们播映一段UCF-101中的视频：

video_name = "./data/v_TaiChi_g01_c01.avi"
from IPython.display import clear_output, Image, display, HTML
import time
import cv2
import base64
import numpy as np
def arrayShow(img):
    _,ret = cv2.imencode('.jpg', img) 
    return Image(data=ret) 
cap = cv2.VideoCapture(video_name)
while True:
    try:
        clear_output(wait=True)
        ret, frame = cap.read()
        if ret:
            tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            img = arrayShow(frame)
            display(img)
            time.sleep(0.05)
        else:
            break
    except KeyboardInterrupt:
        cap.release()
cap.release()

2.视频动作辨认模型介绍

在图画范畴中，ImageNet作为一个大型图画辨认数据集，自2010年开端，运用此数据集练习出的图画算法层出不穷，深度学习模型经历了从AlexNet到VGG-16再到愈加复杂的结构，模型的体现也越来越好。在辨认千种类别的图片时，错误率体现如下：

在图画辨认中体现很好的模型，能够在图画范畴的其他使命中继续运用，通过复用模型中部分层的参数，就能够提高模型的练习作用。有了基于ImageNet模型的图画模型，许多模型和使命都有了更好的练习基础，比如说物体检测、实例切割、人脸检测、人脸辨认等。

那么练习作用显著的图画模型是否能够用于视频模型的练习呢？答案是yes，有研讨证明，在视频范畴，假如能够复用图画模型结构，甚至参数，将对视频模型的练习有很大帮助。但是怎样才能复用上图画模型的结构呢？首要需求知道视频分类与图画分类的不同，假如将视频视作是图画的调集，每一个帧将作为一个图画，视频分类使命除了要考虑到图画中的体现，也要考虑图画间的时空联系，才能够对视频动作进行分类。

为了捕获图画间的时空联系，论文I3D介绍了三种旧的视频分类模型，并提出了一种更有用的Two-Stream Inflated 3D ConvNets（简称I3D）的模型，下面将逐一简介这四种模型，更多细节信息请查看原论文。

旧模型一：卷积网络+LSTM

模型运用了练习老练的图画模型，通过卷积网络，对每一帧图画进行特征提取、池化和猜测，最终在模型的结尾加一个LSTM层（长短期回忆网络），如下图所示，这样就能够使模型能够考虑时刻性结构，将上下文特征联系起来，做出动作判别。这种模型的缺陷是只能捕获较大的工作，对小动作的辨认作用较差，并且因为视频中的每一帧图画都要通过网络的核算，所以练习时刻很长。

旧模型二：3D卷积网络

3D卷积类似于2D卷积，将时序信息参加卷积操作。虽然这是一种看起来愈加天然的视频处理方式，但是因为卷积核维度添加，参数的数量也添加了，模型的练习变得愈加困难。这种模型没有对图画模型进行复用，而是直接将视频数据传入3D卷积网络进行练习。

旧模型三：Two-Stream 网络

Two-Stream 网络的两个流分别为1张RGB快照和10张核算之后的光流帧画面组成的栈。两个流都通过ImageNet预练习好的图画卷积网络，光流部分能够分为竖直和水平两个通道，所以是普通图片输入的2倍，模型在练习和测验中体现都十分超卓。

光流视频 optical flow video

上面讲到了光流，在此对光流做一下介绍。光流是什么呢？姓名很专业，感觉很陌生，但实际上这种视觉现象咱们每天都在经历，咱们坐高铁的时分，能够看到窗外的景象都在快速往撤退，开得越快，就感受到外面的景象便是“刷”地一个残影，这种视觉上方针的运动方向和速度便是光流。光流从概念上讲，是对物体运动的调查，通过找到相邻帧之间的相关性来判别帧之间的对应联系，核算出相邻帧画面中物体的运动信息，获取像素运动的瞬时速度。在原始视频中，有运动部分和停止的背景部分，咱们一般需求判别的仅仅视频中运动部分的状态，而光流便是通过核算得到了视频中运动部分的运动信息。

下面是一个通过核算后的原视频及光流视频。

原视频

光流视频

新模型：Two-Stream Inflated 3D ConvNets

新模型采取了以下几点结构改进：

拓宽2D卷积为3D。直接运用老练的图画分类模型，只不过将网络中二维$ N N

的 filters 和 pooling kernels 直接变成

的filters和poolingkernels直接变成 N N N $；
用 2D filter 的预练习参数来初始化 3D filter 的参数。上一步现已运用了图画分类模型的网络，这一步的目的是能运用上网络的预练习参数，直接将 2D filter 的参数直接沿着第三个时刻维度进行复制N次，最终将一切参数值再除以N；
调整感受野的形状和巨细。新模型改造了图画分类模型Inception-v1的结构，前两个max-pooling层改成运用$ 1 3 3

kernels and stride 1 in time，其他一切max-pooling层都依然运用对此的kernel和stride，最终一个average pooling层运用

kernelsandstride1intime，其他一切max−pooling层都依然运用对此的kernel和stride，最终一个averagepooling层运用 2 7 7 $的kernel。
延续了Two-Stream的基本办法。用双流结构来捕获图片之间的时空联系依然是有用的。

最终新模型的全体结构如下图所示：

好，到目前为止，咱们现已讲解了视频动作辨认的经典数据集和经典模型，下面咱们通过代码来实践地跑一跑其间的两个模型：C3D模型（ 3D卷积网络）以及I3D模型（Two-Stream Inflated 3D ConvNets）。

C3D模型结构

咱们现已在前面的“旧模型二：3D卷积网络”中讲解到3D卷积网络是一种看起来比较天然的处理视频的网络，虽然它有作用不够好，核算量也大的特色，但它的结构很简略，能够结构一个很简略的网络就能够实现视频动作辨认，如下图所示是3D卷积的示意图：

a)中，一张图片进行了2D卷积， b)中，对视频进行2D卷积，将多个帧视作多个通道， c)中，对视频进行3D卷积，将时序信息参加输入信号中。

ab中，output都是一张二维特征图，所以无论是输入是否有时刻信息，输出都是一张二维的特征图，2D卷积失去了时序信息。只有3D卷积在输出时，保留了时序信息。2D和3D池化操作同样有这样的问题。

如下图所示是一种C3D网络的变种：（如需阅览原文描绘，请查看I3D论文 2.2 节）

C3D结构，包含8个卷积层，5个最大池化层以及2个全衔接层，最终是softmax输出层。

一切的3D卷积核为 $333$ 步长为1，运用SGD，初始学习率为0.003，每150k个迭代，除以2。优化在1.9M个迭代的时分结束，大约13epoch。

数据处理时，视频抽帧界说巨细为： $c l h w ， c 为通道数量，为通道数量， l 为帧的数量， h 为帧画面的高度， w 为帧画面的宽度。 3 D 卷积核和池化核的大小为 d k k ， d 是核的时间深度， k 是核的空间大小。网络的输入为视频的抽帧，预测出的是类别标签。所有的视频帧画面都调整大小为 128 171$ ，几乎将UCF-101数据集中的帧调整为一半巨细。视频被分为不重复的16帧画面，这些画面将作为模型网络的输入。最终对帧画面的巨细进行裁剪，输入的数据为 $16112112$

3.C3D模型练习

接下来，咱们将对C3D模型进行练习，练习进程分为：数据预处理以及模型练习。在此次练习中，咱们运用的数据集为UCF-101，因为C3D模型的输入是视频的每帧图片，因此咱们需求对数据集的视频进行抽帧，也便是将视频转化为图片，然后将图片数据传入模型之中，进行练习。

在本事例中，咱们随机抽取了UCF-101数据集的一部分进行练习的演示，感兴趣的同学能够下载完好的UCF-101数据集进行练习。

UCF-101下载

数据集存储在目录dataset_subset下

如下代码是运用cv2库进行视频文件到图片文件的转化

import cv2
import os
# 视频数据集存储位置
video_path = './dataset_subset/'
# 生成的图画数据集存储位置
save_path = './dataset/'
# 假如文件路径不存在则创立路径
if not os.path.exists(save_path):
    os.mkdir(save_path)
# 获取动作列表
action_list = os.listdir(video_path)
# 遍历一切动作
for action in action_list:
    if action.startswith(".")==False:
        if not os.path.exists(save_path+action):
            os.mkdir(save_path+action)
        video_list = os.listdir(video_path+action)
        # 遍历一切视频
        for video in video_list:
            prefix = video.split('.')[0]
            if not os.path.exists(os.path.join(save_path, action, prefix)):
                os.mkdir(os.path.join(save_path, action, prefix))
            save_name = os.path.join(save_path, action, prefix) + '/'
            video_name = video_path+action+'/'+video
            # 读取视频文件
            # cap为视频的帧
            cap = cv2.VideoCapture(video_name)
            # fps为帧率
            fps = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
            fps_count = 0
            for i in range(fps):
                ret, frame = cap.read()
                if ret:
                    # 将帧画面写入图片文件中
                    cv2.imwrite(save_name+str(10000+fps_count)+'.jpg',frame)
                    fps_count += 1

此时，视频逐帧转化成的图片数据现已存储起来，为模型练习做预备。

4.模型练习

首要，咱们构建模型结构。

C3D模型结构咱们之前现已介绍过，这儿咱们通过keras提供的Conv3D，MaxPool3D，ZeroPadding3D等函数进行模型的搭建。

from keras.layers import Dense,Dropout,Conv3D,Input,MaxPool3D,Flatten,Activation, ZeroPadding3D
from keras.regularizers import l2
from keras.models import Model, Sequential
# 输入数据为 112112 的图片，16帧， 3通道
input_shape = (112,112,16,3)
# 权重衰减率
weight_decay = 0.005
# 类型数量，咱们运用UCF-101 为数据集，所以为101
nb_classes = 101
# 构建模型结构
inputs = Input(input_shape)
x = Conv3D(64,(3,3,3),strides=(1,1,1),padding='same',
           activation='relu',kernel_regularizer=l2(weight_decay))(inputs)
x = MaxPool3D((2,2,1),strides=(2,2,1),padding='same')(x)
x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same',
           activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)
x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same',
           activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)
x = Conv3D(256,(3,3,3),strides=(1,1,1),padding='same',
           activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)
x = Conv3D(256, (3, 3, 3), strides=(1, 1, 1), padding='same',
           activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same')(x)
x = Flatten()(x)
x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = Dropout(0.5)(x)
x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)
x = Dropout(0.5)(x)
x = Dense(nb_classes,kernel_regularizer=l2(weight_decay))(x)
x = Activation('softmax')(x)
model = Model(inputs, x)
Using TensorFlow backend.
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

通过keras提供的summary()办法，打印模型结构。能够看到模型的层构建以及各层的输入输出状况。

model.summary()

此处输出较长，省掉

通过keras的input办法能够查看模型的输入形状，shape分别为( batch size, width, height, frames, channels) 。

model.input
<tf.Tensor 'input_1:0' shape=(?, 112, 112, 16, 3) dtype=float32>

能够看到模型的数据处理的维度与图画处理模型有一些差别，多了frames维度，体现出时序联系在视频剖析中的影响。

接下来，咱们开端将图片文件转为练习需求的数据方式。

# 引用必要的库
from keras.optimizers import SGD,Adam
from keras.utils import np_utils
import numpy as np
import random
import cv2
import matplotlib.pyplot as plt
# 自界说callbacks
from schedules import onetenth_4_8_12
INFO:matplotlib.font_manager:font search path ['/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/ttf', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/afm', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts']
INFO:matplotlib.font_manager:generated new fontManager

参数界说

img_path = save_path  # 图片文件存储位置
results_path = './results'  # 练习成果保存位置
if not os.path.exists(results_path):
    os.mkdir(results_path)

数据集区分，随机抽取4/5 作为练习集，其他为验证集。将文件信息分别存储在train_list和test_list中，为练习做预备。

cates = os.listdir(img_path)
train_list = []
test_list = []
# 遍历一切的动作类型
for cate in cates:
    videos = os.listdir(os.path.join(img_path, cate))
    length = len(videos)//5
    # 练习集巨细，随机取视频文件参加练习集
    train= random.sample(videos, length*4)
    train_list.extend(train)
    # 将余下的视频参加测验集
    for video in videos:
        if video not in train:
            test_list.append(video)
print("练习集为：")    
print( train_list)
print("共%d 个视频\n"%(len(train_list)))
print("验证集为：")            
print(test_list)
print("共%d 个视频"%(len(test_list)))

此处输出较长，省掉

接下来开端进行模型的练习。

首要界说数据读取办法。办法process_data中读取一个batch的数据，包含16帧的图片信息的数据，以及数据的标注信息。在读取图片数据时，对图片进行随机裁剪和翻转操作以完结数据增广。

def process_data(img_path, file_list,batch_size=16,train=True):
    batch = np.zeros((batch_size,16,112,112,3),dtype='float32')
    labels = np.zeros(batch_size,dtype='int')
    cate_list = os.listdir(img_path)
    def read_classes():
        path = "./classInd.txt"
        with open(path, "r+") as f:
            lines = f.readlines()
        classes = {}
        for line in lines:
            c_id = line.split()[0]
            c_name = line.split()[1]
            classes[c_name] =c_id 
        return classes
    classes_dict = read_classes()
    for file in file_list:
        cate = file.split("_")[1]
        img_list = os.listdir(os.path.join(img_path, cate, file))
        img_list.sort()
        batch_img = []
        for i in range(batch_size):
            path = os.path.join(img_path, cate, file)
            label =  int(classes_dict[cate])-1
            symbol = len(img_list)//16
            if train:
                # 随机进行裁剪
                crop_x = random.randint(0, 15)
                crop_y = random.randint(0, 58)
                # 随机进行翻转
                is_flip = random.randint(0, 1)
                # 以16 帧为单位
                for j in range(16):
                    img = img_list[symbol + j]
                    image = cv2.imread( path + '/' + img)
                    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                    image = cv2.resize(image, (171, 128))
                    if is_flip == 1:
                        image = cv2.flip(image, 1)
                    batch[i][j][:][:][:] = image[crop_x:crop_x + 112, crop_y:crop_y + 112, :]
                    symbol-=1
                    if symbol<0:
                        break
                labels[i] = label
            else:
                for j in range(16):
                    img = img_list[symbol + j]
                    image = cv2.imread( path + '/' + img)
                    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                    image = cv2.resize(image, (171, 128))
                    batch[i][j][:][:][:] = image[8:120, 30:142, :]
                    symbol-=1
                    if symbol<0:
                        break
                labels[i] = label
    return batch, labels
batch, labels = process_data(img_path, train_list)
print("每个batch的形状为：%s"%(str(batch.shape)))
print("每个label的形状为：%s"%(str(labels.shape)))
每个batch的形状为：(16, 16, 112, 112, 3)
每个label的形状为：(16,)

界说data generator，将数据批次传入练习函数中。

def generator_train_batch(train_list, batch_size, num_classes, img_path):
    while True:
        # 读取一个batch的数据
        x_train, x_labels = process_data(img_path, train_list, batch_size=16,train=True)
        x = preprocess(x_train)
        # 构成input要求的数据格局
        y = np_utils.to_categorical(np.array(x_labels), num_classes)
        x = np.transpose(x, (0,2,3,1,4))
        yield x, y
def generator_val_batch(test_list, batch_size, num_classes, img_path):
    while True:
        # 读取一个batch的数据
        y_test,y_labels = process_data(img_path, train_list, batch_size=16,train=False)
        x = preprocess(y_test)
        # 构成input要求的数据格局
        x = np.transpose(x,(0,2,3,1,4))
        y = np_utils.to_categorical(np.array(y_labels), num_classes)
        yield x, y

界说办法preprocess，对函数的输入数据进行图画的标准化处理。

def preprocess(inputs):
    inputs[..., 0] -= 99.9
    inputs[..., 1] -= 92.1
    inputs[..., 2] -= 82.6
    inputs[..., 0] /= 65.8
    inputs[..., 1] /= 62.3
    inputs[..., 2] /= 60.3
    return inputs
# 练习一个epoch大约需4分钟
# 类别数量
num_classes = 101
# batch巨细
batch_size = 4
# epoch数量
epochs = 1
# 学习率巨细
lr = 0.005
# 优化器界说
sgd = SGD(lr=lr, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
# 开端练习
history = model.fit_generator(generator_train_batch(train_list, batch_size, num_classes,img_path),
                              steps_per_epoch= len(train_list) // batch_size,
                              epochs=epochs,
                              callbacks=[onetenth_4_8_12(lr)],
                              validation_data=generator_val_batch(test_list, batch_size,num_classes,img_path),
                              validation_steps= len(test_list) // batch_size,
                              verbose=1)
# 对练习成果进行保存
model.save_weights(os.path.join(results_path, 'weights_c3d.h5'))
WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/1
20/20 [==============================] - 442s 22s/step - loss: 28.7099 - acc: 0.9344 - val_loss: 27.7600 - val_acc: 1.0000

5.模型测验

接下来咱们将练习之后得到的模型进行测验。随机在UCF-101中选择一个视频文件作为测验数据，然后对视频进行取帧，每16帧画面传入模型进行一次动作猜测，并且将动作猜测以及猜测百分比打印在画面中并进行视频播映。

首要，引进相关的库。

from IPython.display import clear_output, Image, display, HTML
import time
import cv2
import base64
import numpy as np

构建模型结构并且加载权重。

from models import c3d_model
model = c3d_model()
model.load_weights(os.path.join(results_path, 'weights_c3d.h5'), by_name=True)  # 加载刚练习的模型

界说函数arrayshow，进行图片变量的编码格局转化。

def arrayShow(img):
    _,ret = cv2.imencode('.jpg', img) 
    return Image(data=ret)

进行视频的预处理以及猜测，将猜测成果打印到画面中，最终进行播映。

# 加载一切的类别和编号
with open('./ucfTrainTestlist/classInd.txt', 'r') as f:
    class_names = f.readlines()
    f.close()
# 读取视频文件
video = './videos/v_Punch_g03_c01.avi'
cap = cv2.VideoCapture(video)
clip = []
# 将视频画面传入模型
while True:
    try:
        clear_output(wait=True)
        ret, frame = cap.read()
        if ret:
            tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            clip.append(cv2.resize(tmp, (171, 128)))
            # 每16帧进行一次猜测
            if len(clip) == 16:
                inputs = np.array(clip).astype(np.float32)
                inputs = np.expand_dims(inputs, axis=0)
                inputs[..., 0] -= 99.9
                inputs[..., 1] -= 92.1
                inputs[..., 2] -= 82.6
                inputs[..., 0] /= 65.8
                inputs[..., 1] /= 62.3
                inputs[..., 2] /= 60.3
                inputs = inputs[:,:,8:120,30:142,:]
                inputs = np.transpose(inputs, (0, 2, 3, 1, 4))
                # 取得猜测成果
                pred = model.predict(inputs)
                label = np.argmax(pred[0])
                # 将猜测成果制作到画面中
                cv2.putText(frame, class_names[label].split(' ')[-1].strip(), (20, 20),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.6,
                            (0, 0, 255), 1)
                cv2.putText(frame, "prob: %.4f" % pred[0][label], (20, 40),
                            cv2.FONT_HERSHEY_SIMPLEX, 0.6,
                            (0, 0, 255), 1)
                clip.pop(0)
            # 播映猜测后的视频    
            lines, columns, _ = frame.shape
            frame = cv2.resize(frame, (int(columns), int(lines)))
            img = arrayShow(frame)
            display(img)
            time.sleep(0.02)
        else:
            break
    except:
        print(0)
cap.release()

6.I3D 模型

在之前咱们简略介绍了I3D模型，I3D官方github库提供了在Kinetics上预练习的模型和猜测代码，接下来咱们将体验I3D模型怎么对视频进行猜测。

首要，引进相关的包

import numpy as np
import tensorflow as tf
import i3d
WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

进行参数的界说

# 输入图片巨细
_IMAGE_SIZE = 224
#  视频的帧数
_SAMPLE_VIDEO_FRAMES = 79
# 输入数据包含两部分：RGB和光流
# RGB和光流数据现现已过提早核算
_SAMPLE_PATHS = {
    'rgb': 'data/v_CricketShot_g04_c01_rgb.npy',
    'flow': 'data/v_CricketShot_g04_c01_flow.npy',
}
# 提供了多种能够选择的预练习权重
# 其间，imagenet系列模型从ImageNet的2D权重中拓宽而来，其他为视频数据下的预练习权重
_CHECKPOINT_PATHS = {
    'rgb': 'data/checkpoints/rgb_scratch/model.ckpt',
    'flow': 'data/checkpoints/flow_scratch/model.ckpt',
    'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt',
    'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',
}
# 记载类别文件
_LABEL_MAP_PATH = 'data/label_map.txt'
# 类别数量为400
NUM_CLASSES = 400

界说参数：

imagenet_pretrained ：假如为True，则调用预练习权重，假如为False，则调用ImageNet转成的权重

imagenet_pretrained = True

加载动作类型

kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)] tf.logging.set_verbosity(tf.logging.INFO)

构建RGB部分模型

rgb_input = tf.placeholder(tf.float32, shape=(1, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 3))
with tf.variable_scope('RGB', reuse=tf.AUTO_REUSE):
    rgb_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits')
    rgb_logits, _ = rgb_model(rgb_input, is_training=False, dropout_keep_prob=1.0)
rgb_variable_map = {}
for variable in tf.global_variables():
    if variable.name.split('/')[0] == 'RGB':
        rgb_variable_map[variable.name.replace(':0', '')] = variable
rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)

构建光流部分模型

flow_input = tf.placeholder(tf.float32,shape=(1, _SAMPLE_VIDEO_FRAMES, _IMAGE_SIZE, _IMAGE_SIZE, 2))
with tf.variable_scope('Flow', reuse=tf.AUTO_REUSE):
    flow_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits')
    flow_logits, _ = flow_model(flow_input, is_training=False, dropout_keep_prob=1.0)
flow_variable_map = {}
for variable in tf.global_variables():
    if variable.name.split('/')[0] == 'Flow':
        flow_variable_map[variable.name.replace(':0', '')] = variable
flow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)

将模型联合，成为完好的I3D模型

model_logits = rgb_logits + flow_logits
model_predictions = tf.nn.softmax(model_logits)

开端模型猜测,取得视频动作猜测成果。
猜测数据为开篇提供的RGB和光流数据：

with tf.Session() as sess:
    feed_dict = {}
    if imagenet_pretrained:
        rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb_imagenet'])    # 加载rgb流的模型
    else:
        rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb'])
    tf.logging.info('RGB checkpoint restored')
    if imagenet_pretrained:
        flow_saver.restore(sess, _CHECKPOINT_PATHS['flow_imagenet'])  # 加载flow流的模型
    else:
        flow_saver.restore(sess, _CHECKPOINT_PATHS['flow'])
    tf.logging.info('Flow checkpoint restored')   
    start_time = time.time()
    rgb_sample = np.load(_SAMPLE_PATHS['rgb'])    # 加载rgb流的输入数据
    tf.logging.info('RGB data loaded, shape=%s', str(rgb_sample.shape))
    feed_dict[rgb_input] = rgb_sample
    flow_sample = np.load(_SAMPLE_PATHS['flow'])  # 加载flow流的输入数据
    tf.logging.info('Flow data loaded, shape=%s', str(flow_sample.shape))
    feed_dict[flow_input] = flow_sample
    out_logits, out_predictions = sess.run(
        [model_logits, model_predictions],
        feed_dict=feed_dict)
    out_logits = out_logits[0]
    out_predictions = out_predictions[0]
    sorted_indices = np.argsort(out_predictions)[::-1]
    print('Inference time in sec: %.3f' % float(time.time() - start_time))
    print('Norm of logits: %f' % np.linalg.norm(out_logits))
    print('\nTop classes and probabilities')
    for index in sorted_indices[:20]:
        print(out_predictions[index], out_logits[index], kinetics_classes[index])
WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_imagenet/model.ckpt
INFO:tensorflow:RGB checkpoint restored
INFO:tensorflow:Restoring parameters from data/checkpoints/flow_imagenet/model.ckpt
INFO:tensorflow:Flow checkpoint restored
INFO:tensorflow:RGB data loaded, shape=(1, 79, 224, 224, 3)
INFO:tensorflow:Flow data loaded, shape=(1, 79, 224, 224, 2)
Inference time in sec: 1.511
Norm of logits: 138.468643
Top classes and probabilities
1.0 41.813675 playing cricket
1.497162e-09 21.49398 hurling (sport)
3.8431236e-10 20.13411 catching or throwing baseball
1.549242e-10 19.22559 catching or throwing softball
1.1360187e-10 18.915354 hitting baseball
8.801105e-11 18.660116 playing tennis
2.4415466e-11 17.37787 playing kickball
1.153184e-11 16.627766 playing squash or racquetball
6.1318893e-12 15.996157 shooting goal (soccer)
4.391727e-12 15.662376 hammer throw
2.2134352e-12 14.9772005 golf putting
1.6307096e-12 14.67167 throwing discus
1.5456218e-12 14.618079 javelin throw
7.6690325e-13 13.917259 pumping fist
5.1929587e-13 13.527372 shot put
4.2681337e-13 13.331245 celebrating
2.7205462e-13 12.880901 applauding
1.8357015e-13 12.487494 throwing ball
1.6134511e-13 12.358444 dodgeball
1.1388395e-13 12.010078 tap dancing

点击关注，第一时刻了解华为云新鲜技能~

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。