AI实战1：实时人脸检测

1. 布景

AI在视觉领域最常用的就是人脸检测、人脸辨认、活体检测、人体与行为剖析、图画辨认、图画增强等，并且目前都是比较老练的技能，不论商业化的Paas渠道还是开源的模型，都几乎一抓一大把。一般的，AI开发进程有以下几步：

特征剖析
数据收集
数据标示
模型练习
模型推理

推理能够在云端也能够在客户端，端云各有各的场景，比如一般把人脸检测放到客户端，把人脸辨认放到云端。本系列咱们首要介绍视觉方向模型推理的工程实践。

2. 项目介绍

咱们根据谷歌开源项目mediapipe供给的的模型，在客户端布置运转进行推理，mediapipe供给了一下能力：

人脸检测(Face Detection)
三维人脸网络模型(Face Mesh)
虹膜检测(Iris)
手势(Hands)
姿势(Pose)
全身姿势(Holistic)
头发分隔(Hair Segmentation)
目标检测(Object Detection)
物体追踪(Box Tracking)
即时移动检测(Instant Motion Tracking)
Objectron
KNIFT
…

mediapipe供给了bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu编译出来即可运转。咱们这儿移动端开发结构咱们根据开源项目github.com/terryky/and… NDK运转和丈量TensorFlow Lite GPU Delegate的功能。全体根据NativeActivity结构在进行摄像头收集后画面烘托和功能数据烘托。本文咱们跑通实时人脸辨认模型。移动端开发结构咱们根据开源项目github.com/terryky/and… NDK运转和丈量TensorFlow Lite GPU Delegate的功能。全体根据NativeActivity结构在进行摄像头收集后画面烘托和功能数据烘托。本文咱们跑通实时人脸辨认模型。

3. 了解NativeActivity

NativeActivity是为单独运用C|C++开发app供给的基类。纯C++开发Android使用，最后还是需求一个Java层的壳子，在Android供给的开发结构中，已经运用java开发好了一个中心类，咱们运用C++开发的Native库之所以能运转，就是因为被这个中心类运用JNI的方式调用了，这个中心类就是NativeActivity。这个NativeActivity类的核心功能，就是在特定事情发生时，调用咱们运用C++开发的Native库里的回调函数。比如在咱们熟悉的生命周期函数NativeActivity.onStart中，调用C++开发的Native库的onStartNative函数：

protected void onStart() {
        super.onStart();
        onStartNative(mNativeHandle);
}

Native层Android为咱们供给了两个接口：

native_activity.h
android_native_app_glue.h

android_native_app_glue.h封装了native_activity.h，咱们直接实现void android_main(struct android_app* state)办法即可。

NativeActivity更多详细信息能够参考Android官方文档：GameActivity | Android 开发者 | Android Developers 。

4. 运转模型

咱们挑选的模型：storage.googleapis.com/mediapipe-a…

加载模型；
摄像头预览纹路转换为RGBA
将图画数据feed到模型引擎进行推理
解析烘托成果

4.1 加载模型

首先咱们要将模型文件读取到内存，咱们的模型文件放置在Android工程的asset路径下，将文件加载到内存std::vector<uint8_t> m_tflite_model_buf;：

bool
asset_read_file (AAssetManager *assetMgr, char *fname, std::vector<uint8_t>&buf) 
{
    AAsset* assetDescriptor = AAssetManager_open(assetMgr, fname, AASSET_MODE_BUFFER);
    if (assetDescriptor == NULL)
    {
        return false;
    }
    size_t fileLength = AAsset_getLength(assetDescriptor);
    buf.resize(fileLength);
    int64_t readSize = AAsset_read(assetDescriptor, buf.data(), buf.size());
    AAsset_close(assetDescriptor);
    return (readSize == buf.size());
}
asset_read_file (m_app->activity->assetManager,
                    (char *)BLAZEFACE_MODEL_PATH, m_tflite_model_buf);

tflite供给了FlatBufferModel::BuildFromBuffer加载模型，回来tflite::FlatBufferModel类型的指针：

std::unique_ptr<tflite::FlatBufferModel> model = FlatBufferModel::BuildFromBuffer(model_buf, model_size)

加载完模型，经过模型创立推理引擎解说器tflite::Interpreter,tflite供给了InterpreterBuilder工具来构建tflite::Interpreter：

class InterpreterBuilder {
 public:
  InterpreterBuilder(const FlatBufferModel& model,
                     const OpResolver& op_resolver);

需求传入模型model及OpResolver，OpResolver是个抽象接口，回来给定操作码或自定义操作名的tflite注册器。这是将flatbuffer模型中引证的操作被映射到可履行函数指针(TfLiteRegistrations)的机制。InterpreterBuilder重载了括号操作符：

TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter);
TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter,
                          int num_threads);

构建完InterpreterBuilder后创立tflite::Interpreter:

std::unique_ptr<tflite::FlatBufferModel> model;
std::unique_ptr<tflite::Interpreter>     interpreter;
tflite::ops::builtin::BuiltinOpResolver  resolver;
InterpreterBuilder(*model, resolver)(&interpreter)

InterpreterBuilder重载的括号操作符有两个，第二个有个线程数量的参数，咱们也能够经过tflite::Interpreter的SetNumThreads手动设置：

int num_threads = std::thread::hardware_concurrency();
    char *env_tflite_num_threads = getenv ("FORCE_TFLITE_NUM_THREADS");
    if (env_tflite_num_threads)
    {
        num_threads = atoi (env_tflite_num_threads);
        DBG_LOGI ("@@@@@@ FORCE_TFLITE_NUM_THREADS=%d\n", num_threads);
    }
    DBG_LOG ("@@@@@@ TFLITE_NUM_THREADS=%d\n", num_threads);
    interpreter->SetNumThreads(num_threads);

接下来分配tensor空间：

  // Update allocations for all tensors. This will redim dependent tensors
  // using the input tensor dimensionality as given. This is relatively
  // expensive. This *must be* called after the interpreter has been created
  // and before running inference (and accessing tensor buffers), and *must be*
  // called again if (and only if) an input tensor is resized. Returns status of
  // success or failure.
  TfLiteStatus AllocateTensors();

接下来解析引擎获取模型配置（首要是输入输出张量）：

int
tflite_get_tensor_by_name (std::unique_ptr<tflite::Interpreter> interpreter, int io, const char *name, tflite_tensor_t *ptensor)
{
    memset (ptensor, 0, sizeof (*ptensor));
    int tensor_idx;
    int io_idx = -1;
    int num_tensor = (io == 0) ? interpreter->inputs ().size() :
                                 interpreter->outputs().size();
    for (int i = 0; i < num_tensor; i ++)
    {
        tensor_idx = (io == 0) ? interpreter->inputs ()[i] :
                                 interpreter->outputs()[i];
        const char *tensor_name = interpreter->tensor(tensor_idx)->name;
        if (strcmp (tensor_name, name) == 0)
        {
            io_idx = i;
            break;
        }
    }
    if (io_idx < 0)
    {
        DBG_LOGE ("can't find tensor: "%s"\n", name);
        return -1;
    }
    void *ptr = NULL;
    TfLiteTensor *tensor = interpreter->tensor(tensor_idx);
    switch (tensor->type)
    {
    case kTfLiteUInt8:
        ptr = (io == 0) ? interpreter->typed_input_tensor <uint8_t>(io_idx) :
                          interpreter->typed_output_tensor<uint8_t>(io_idx);
        break;
    case kTfLiteFloat32:
        ptr = (io == 0) ? interpreter->typed_input_tensor <float>(io_idx) :
                          interpreter->typed_output_tensor<float>(io_idx);
        break;
    case kTfLiteInt64:
        ptr = (io == 0) ? interpreter->typed_input_tensor <int64_t>(io_idx) :
                          interpreter->typed_output_tensor<int64_t>(io_idx);
        break;
    default:
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }
    ptensor->idx    = tensor_idx;
    ptensor->io     = io;
    ptensor->io_idx = io_idx;
    ptensor->type   = tensor->type;
    ptensor->ptr    = ptr;
    ptensor->quant_scale = tensor->params.scale;
    ptensor->quant_zerop = tensor->params.zero_point;
    for (int i = 0; (i < 4) && (i < tensor->dims->size); i ++)
    {
        ptensor->dims[i] = tensor->dims->data[i];
    }
    return 0;
}
static tflite_tensor_t      s_detect_tensor_input;
static tflite_tensor_t      s_detect_tensor_scores;
static tflite_tensor_t      s_detect_tensor_bboxes;
tflite_get_tensor_by_name (&s_detect_interpreter, 0, "input",          &s_detect_tensor_input);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "regressors",     &s_detect_tensor_bboxes);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "classificators", &s_detect_tensor_scores);

根据模型配置能够读取支持输入图片宽高：

int det_input_w = s_detect_tensor_input.dims[2];
int det_input_h = s_detect_tensor_input.dims[1];

4.2 摄像头预览纹路转换为RGBA

将摄像头读取的纹路数据转换成RGBA模型才能辨认，咱们将纹路转换为内存数据：

unsigned char *buf_ui8 = NULL;
    static unsigned char *pui8 = NULL;
    if (pui8 == NULL)
        pui8 = (unsigned char *)malloc(w * h * 4);
    buf_ui8 = pui8;
    draw_2d_texture_ex (srctex, 0, win_h - h, w, h, RENDER2D_FLIP_V);
    glPixelStorei (GL_PACK_ALIGNMENT, 4);
    glReadPixels (0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buf_ui8);

需求想将摄像头读取的纹路制作到帧缓存区，再经过OpenGL函数glReadPixels将纹路读取到内存缓存。

注意：glReadPixels是耗时操作

4.3 将图画数据feed到模型引擎进行推理

先经过上面获取的引起输入张量s_detect_tensor_input获取引起分配的输入缓存：

void *
get_blazeface_input_buf (int *w, int *h)
{
    *w = s_detect_tensor_input.dims[2];
    *h = s_detect_tensor_input.dims[1];
    return s_detect_tensor_input.ptr;
}

将上面获取的图片内容转换成float，赋给输入张量：

float mean = 128.0f;
    float std  = 128.0f;
    for (y = 0; y < h; y ++)
    {
        for (x = 0; x < w; x ++)
        {
            int r = *buf_ui8 ++;
            int g = *buf_ui8 ++;
            int b = *buf_ui8 ++;
            buf_ui8 ++;          /* skip alpha */
            *buf_fp32 ++ = (float)(r - mean) / std;
            *buf_fp32 ++ = (float)(g - mean) / std;
            *buf_fp32 ++ = (float)(b - mean) / std;
        }
    }

4.4 解析烘托成果

接下来调用解说器的Invoke()办法履行推理：

    if (interpreter->Invoke() != kTfLiteOk)
    {
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }

接下来解析检测成果：

static int
decode_bounds (std::list<face_t> &face_list, float score_thresh, int input_img_w, int input_img_h)
{
    face_t face_item;
    float  *scores_ptr = (float *)s_detect_tensor_scores.ptr;
    int i = 0;
    for (auto itr = s_anchors.begin(); itr != s_anchors.end(); i ++, itr ++)
    {
        fvec2 anchor = *itr;
        float score0 = scores_ptr[i];
        float score = 1.0f / (1.0f + exp(-score0));
        if (score > score_thresh)
        {
            float *p = get_bbox_ptr (i);
            /* boundary box */
            float sx = p[0];
            float sy = p[1];
            float w  = p[2];
            float h  = p[3];
            float cx = sx + anchor.x;
            float cy = sy + anchor.y;
            cx /= (float)input_img_w;
            cy /= (float)input_img_h;
            w  /= (float)input_img_w;
            h  /= (float)input_img_h;
            fvec2 topleft, btmright;
            topleft.x  = cx - w * 0.5f;
            topleft.y  = cy - h * 0.5f;
            btmright.x = cx + w * 0.5f;
            btmright.y = cy + h * 0.5f;
            face_item.score    = score;
            face_item.topleft  = topleft;
            face_item.btmright = btmright;
            /* landmark positions (6 keys) */
            for (int j = 0; j < kFaceKeyNum; j ++)
            {
                float lx = p[4 + (2 * j) + 0];
                float ly = p[4 + (2 * j) + 1];
                lx += anchor.x;
                ly += anchor.y;
                lx /= (float)input_img_w;
                ly /= (float)input_img_h;
                face_item.keys[j].x = lx;
                face_item.keys[j].y = ly;
            }
            face_list.push_back (face_item);
        }
    }
    return 0;
}

face_t封装了辨认成果中的得分、左上、右下坐标：

typedef struct _face_t
{
    float score;
    fvec2 topleft;
    fvec2 btmright;
    fvec2 keys[kFaceKeyNum];
} face_t;

经过坐标咱们能够在辨认到的“人脸”上制作一个框：

5. 总结

本文介绍了常见的AI开发过程，以及常用的AI视觉使用。经过人脸检测功能，了解了tensorflow lite加载模型、输入数据、履行推理、获取成果等常用接口。

本文正在参加人工智能创作者扶持计划

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

AI视觉实战1：实时人脸检测