AI实战1:实时人脸检测

1. 布景

AI在视觉领域最常用的就是人脸检测、人脸辨认、活体检测、人体与行为剖析、图画辨认、图画增强等,并且目前都是比较老练的技能,不论商业化的Paas渠道还是开源的模型,都几乎一抓一大把。一般的,AI开发进程有以下几步:

  1. 特征剖析
  2. 数据收集
  3. 数据标示
  4. 模型练习
  5. 模型推理

推理能够在云端也能够在客户端,端云各有各的场景,比如一般把人脸检测放到客户端,把人脸辨认放到云端。本系列咱们首要介绍视觉方向模型推理的工程实践。

2. 项目介绍

咱们根据谷歌开源项目mediapipe供给的的模型,在客户端布置运转进行推理,mediapipe供给了一下能力:

  1. 人脸检测(Face Detection)
  2. 三维人脸网络模型(Face Mesh)
  3. 虹膜检测(Iris)
  4. 手势(Hands)
  5. 姿势(Pose)
  6. 全身姿势(Holistic)
  7. 头发分隔(Hair Segmentation)
  8. 目标检测(Object Detection)
  9. 物体追踪(Box Tracking)
  10. 即时移动检测(Instant Motion Tracking)
  11. Objectron
  12. KNIFT

mediapipe供给了bazel build -c opt --config=android_arm64 mediapipe/examples/android/src/java/com/google/mediapipe/apps/handtrackinggpu:handtrackinggpu编译出来即可运转。咱们这儿移动端开发结构咱们根据开源项目github.com/terryky/and… NDK运转和丈量TensorFlow Lite GPU Delegate的功能。全体根据NativeActivity结构在进行摄像头收集后画面烘托和功能数据烘托。本文咱们跑通实时人脸辨认模型。移动端开发结构咱们根据开源项目github.com/terryky/and… NDK运转和丈量TensorFlow Lite GPU Delegate的功能。全体根据NativeActivity结构在进行摄像头收集后画面烘托和功能数据烘托。本文咱们跑通实时人脸辨认模型。

3. 了解NativeActivity

NativeActivity是为单独运用C|C++开发app供给的基类。纯C++开发Android使用,最后还是需求一个Java层的壳子,在Android供给的开发结构中,已经运用java开发好了一个中心类,咱们运用C++开发的Native库之所以能运转,就是因为被这个中心类运用JNI的方式调用了,这个中心类就是NativeActivity。这个NativeActivity类的核心功能,就是在特定事情发生时,调用咱们运用C++开发的Native库里的回调函数。比如在咱们熟悉的生命周期函数NativeActivity.onStart中,调用C++开发的Native库的onStartNative函数:

protected void onStart() {
        super.onStart();
        onStartNative(mNativeHandle);
}

Native层Android为咱们供给了两个接口:

  1. native_activity.h
  2. android_native_app_glue.h

android_native_app_glue.h封装了native_activity.h,咱们直接实现void android_main(struct android_app* state)办法即可。

NativeActivity更多详细信息能够参考Android官方文档:GameActivity | Android 开发者 | Android Developers 。

4. 运转模型

咱们挑选的模型:storage.googleapis.com/mediapipe-a…

  1. 加载模型;
  2. 摄像头预览纹路转换为RGBA
  3. 将图画数据feed到模型引擎进行推理
  4. 解析烘托成果

4.1 加载模型

首先咱们要将模型文件读取到内存,咱们的模型文件放置在Android工程的asset路径下,将文件加载到内存std::vector<uint8_t> m_tflite_model_buf;

bool
asset_read_file (AAssetManager *assetMgr, char *fname, std::vector<uint8_t>&buf) 
{
    AAsset* assetDescriptor = AAssetManager_open(assetMgr, fname, AASSET_MODE_BUFFER);
    if (assetDescriptor == NULL)
    {
        return false;
    }
    size_t fileLength = AAsset_getLength(assetDescriptor);
    buf.resize(fileLength);
    int64_t readSize = AAsset_read(assetDescriptor, buf.data(), buf.size());
    AAsset_close(assetDescriptor);
    return (readSize == buf.size());
}
asset_read_file (m_app->activity->assetManager,
                    (char *)BLAZEFACE_MODEL_PATH, m_tflite_model_buf);

tflite供给了FlatBufferModel::BuildFromBuffer加载模型,回来tflite::FlatBufferModel类型的指针

std::unique_ptr<tflite::FlatBufferModel> model = FlatBufferModel::BuildFromBuffer(model_buf, model_size)

加载完模型,经过模型创立推理引擎解说器tflite::Interpreter,tflite供给了InterpreterBuilder工具来构建tflite::Interpreter

class InterpreterBuilder {
 public:
  InterpreterBuilder(const FlatBufferModel& model,
                     const OpResolver& op_resolver);

需求传入模型model及OpResolver,OpResolver是个抽象接口,回来给定操作码或自定义操作名的tflite注册器。这是将flatbuffer模型中引证的操作被映射到可履行函数指针(TfLiteRegistrations)的机制。InterpreterBuilder重载了括号操作符:

TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter);
TfLiteStatus operator()(std::unique_ptr<Interpreter>* interpreter,
                          int num_threads);

构建完InterpreterBuilder后创立tflite::Interpreter:

std::unique_ptr<tflite::FlatBufferModel> model;
std::unique_ptr<tflite::Interpreter>     interpreter;
tflite::ops::builtin::BuiltinOpResolver  resolver;
InterpreterBuilder(*model, resolver)(&interpreter)

InterpreterBuilder重载的括号操作符有两个,第二个有个线程数量的参数,咱们也能够经过tflite::InterpreterSetNumThreads手动设置:

int num_threads = std::thread::hardware_concurrency();
    char *env_tflite_num_threads = getenv ("FORCE_TFLITE_NUM_THREADS");
    if (env_tflite_num_threads)
    {
        num_threads = atoi (env_tflite_num_threads);
        DBG_LOGI ("@@@@@@ FORCE_TFLITE_NUM_THREADS=%d\n", num_threads);
    }
    DBG_LOG ("@@@@@@ TFLITE_NUM_THREADS=%d\n", num_threads);
    interpreter->SetNumThreads(num_threads);

接下来分配tensor空间:

  // Update allocations for all tensors. This will redim dependent tensors
  // using the input tensor dimensionality as given. This is relatively
  // expensive. This *must be* called after the interpreter has been created
  // and before running inference (and accessing tensor buffers), and *must be*
  // called again if (and only if) an input tensor is resized. Returns status of
  // success or failure.
  TfLiteStatus AllocateTensors();

接下来解析引擎获取模型配置(首要是输入输出张量):

int
tflite_get_tensor_by_name (std::unique_ptr<tflite::Interpreter> interpreter, int io, const char *name, tflite_tensor_t *ptensor)
{
    memset (ptensor, 0, sizeof (*ptensor));
    int tensor_idx;
    int io_idx = -1;
    int num_tensor = (io == 0) ? interpreter->inputs ().size() :
                                 interpreter->outputs().size();
    for (int i = 0; i < num_tensor; i ++)
    {
        tensor_idx = (io == 0) ? interpreter->inputs ()[i] :
                                 interpreter->outputs()[i];
        const char *tensor_name = interpreter->tensor(tensor_idx)->name;
        if (strcmp (tensor_name, name) == 0)
        {
            io_idx = i;
            break;
        }
    }
    if (io_idx < 0)
    {
        DBG_LOGE ("can't find tensor: "%s"\n", name);
        return -1;
    }
    void *ptr = NULL;
    TfLiteTensor *tensor = interpreter->tensor(tensor_idx);
    switch (tensor->type)
    {
    case kTfLiteUInt8:
        ptr = (io == 0) ? interpreter->typed_input_tensor <uint8_t>(io_idx) :
                          interpreter->typed_output_tensor<uint8_t>(io_idx);
        break;
    case kTfLiteFloat32:
        ptr = (io == 0) ? interpreter->typed_input_tensor <float>(io_idx) :
                          interpreter->typed_output_tensor<float>(io_idx);
        break;
    case kTfLiteInt64:
        ptr = (io == 0) ? interpreter->typed_input_tensor <int64_t>(io_idx) :
                          interpreter->typed_output_tensor<int64_t>(io_idx);
        break;
    default:
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }
    ptensor->idx    = tensor_idx;
    ptensor->io     = io;
    ptensor->io_idx = io_idx;
    ptensor->type   = tensor->type;
    ptensor->ptr    = ptr;
    ptensor->quant_scale = tensor->params.scale;
    ptensor->quant_zerop = tensor->params.zero_point;
    for (int i = 0; (i < 4) && (i < tensor->dims->size); i ++)
    {
        ptensor->dims[i] = tensor->dims->data[i];
    }
    return 0;
}
static tflite_tensor_t      s_detect_tensor_input;
static tflite_tensor_t      s_detect_tensor_scores;
static tflite_tensor_t      s_detect_tensor_bboxes;
tflite_get_tensor_by_name (&s_detect_interpreter, 0, "input",          &s_detect_tensor_input);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "regressors",     &s_detect_tensor_bboxes);
tflite_get_tensor_by_name (&s_detect_interpreter, 1, "classificators", &s_detect_tensor_scores);

根据模型配置能够读取支持输入图片宽高:

int det_input_w = s_detect_tensor_input.dims[2];
int det_input_h = s_detect_tensor_input.dims[1];

4.2 摄像头预览纹路转换为RGBA

将摄像头读取的纹路数据转换成RGBA模型才能辨认,咱们将纹路转换为内存数据:

unsigned char *buf_ui8 = NULL;
    static unsigned char *pui8 = NULL;
    if (pui8 == NULL)
        pui8 = (unsigned char *)malloc(w * h * 4);
    buf_ui8 = pui8;
    draw_2d_texture_ex (srctex, 0, win_h - h, w, h, RENDER2D_FLIP_V);
    glPixelStorei (GL_PACK_ALIGNMENT, 4);
    glReadPixels (0, 0, w, h, GL_RGBA, GL_UNSIGNED_BYTE, buf_ui8);

需求想将摄像头读取的纹路制作到帧缓存区,再经过OpenGL函数glReadPixels将纹路读取到内存缓存。

注意:glReadPixels是耗时操作

4.3 将图画数据feed到模型引擎进行推理

先经过上面获取的引起输入张量s_detect_tensor_input获取引起分配的输入缓存:

void *
get_blazeface_input_buf (int *w, int *h)
{
    *w = s_detect_tensor_input.dims[2];
    *h = s_detect_tensor_input.dims[1];
    return s_detect_tensor_input.ptr;
}

将上面获取的图片内容转换成float,赋给输入张量:

float mean = 128.0f;
    float std  = 128.0f;
    for (y = 0; y < h; y ++)
    {
        for (x = 0; x < w; x ++)
        {
            int r = *buf_ui8 ++;
            int g = *buf_ui8 ++;
            int b = *buf_ui8 ++;
            buf_ui8 ++;          /* skip alpha */
            *buf_fp32 ++ = (float)(r - mean) / std;
            *buf_fp32 ++ = (float)(g - mean) / std;
            *buf_fp32 ++ = (float)(b - mean) / std;
        }
    }

4.4 解析烘托成果

接下来调用解说器的Invoke()办法履行推理:

    if (interpreter->Invoke() != kTfLiteOk)
    {
        DBG_LOGE ("ERR: %s(%d)\n", __FILE__, __LINE__);
        return -1;
    }

接下来解析检测成果:

static int
decode_bounds (std::list<face_t> &face_list, float score_thresh, int input_img_w, int input_img_h)
{
    face_t face_item;
    float  *scores_ptr = (float *)s_detect_tensor_scores.ptr;
    int i = 0;
    for (auto itr = s_anchors.begin(); itr != s_anchors.end(); i ++, itr ++)
    {
        fvec2 anchor = *itr;
        float score0 = scores_ptr[i];
        float score = 1.0f / (1.0f + exp(-score0));
        if (score > score_thresh)
        {
            float *p = get_bbox_ptr (i);
            /* boundary box */
            float sx = p[0];
            float sy = p[1];
            float w  = p[2];
            float h  = p[3];
            float cx = sx + anchor.x;
            float cy = sy + anchor.y;
            cx /= (float)input_img_w;
            cy /= (float)input_img_h;
            w  /= (float)input_img_w;
            h  /= (float)input_img_h;
            fvec2 topleft, btmright;
            topleft.x  = cx - w * 0.5f;
            topleft.y  = cy - h * 0.5f;
            btmright.x = cx + w * 0.5f;
            btmright.y = cy + h * 0.5f;
            face_item.score    = score;
            face_item.topleft  = topleft;
            face_item.btmright = btmright;
            /* landmark positions (6 keys) */
            for (int j = 0; j < kFaceKeyNum; j ++)
            {
                float lx = p[4 + (2 * j) + 0];
                float ly = p[4 + (2 * j) + 1];
                lx += anchor.x;
                ly += anchor.y;
                lx /= (float)input_img_w;
                ly /= (float)input_img_h;
                face_item.keys[j].x = lx;
                face_item.keys[j].y = ly;
            }
            face_list.push_back (face_item);
        }
    }
    return 0;
}

face_t封装了辨认成果中的得分、左上、右下坐标:

typedef struct _face_t
{
    float score;
    fvec2 topleft;
    fvec2 btmright;
    fvec2 keys[kFaceKeyNum];
} face_t;

经过坐标咱们能够在辨认到的“人脸”上制作一个框:

AI视觉实战1:实时人脸检测

5. 总结

本文介绍了常见的AI开发过程,以及常用的AI视觉使用。经过人脸检测功能,了解了tensorflow lite加载模型、输入数据、履行推理、获取成果等常用接口。

本文正在参加 人工智能创作者扶持计划