本文为稀土技术社区首发签约文章,30天内制止转载,30天后未获授权制止转载,侵权必究!

引言

本篇是ArkUI Engine 系列的第五篇,经过前四篇文章,相信读者能够掌握一个ArkUI控件最重要的绘制进程与事情绑定进程的原理了,控件的绘制是Engine中的主要流程。当然,Engine做的不只是UI的绘制工作,还有一个流通度监控体系,即WatchDog机制。

经过学习本篇,你将了解到鸿蒙的WatchDog机制与ANR(使用无呼应)断定相关的代码细节,便利咱们进行后续的性能监控与优化。

WatchDog

无论是哪个UI体系,都有着体系流通度监控的需求,鸿蒙也不破例,当咱们遇到以下代码时,点击Text就会进入死循环,此刻咱们再次进行点击事情,就会出现咱们熟知的ANR弹窗

Column() {
  Text(this.father.name)
    .width("200vp")
    .onClick(() => {
      while (true){
      }
    })

ANR弹窗如下:

ArkUI Engine - 深化ANR机制

ANR的检测,其实就经过WatchDog 机制完结的,下面咱们来详细了解一下WatchDog机制

WatchDog 初始化

WatchDog机制中,有两个相关的类,一个是Watchers 结构体,另一个是WatchDog类,WatchDog中会有一个持有着value为Watchers的map

namespace OHOS::Ace {
class ThreadWatcher;
struct Watchers {
    RefPtr<ThreadWatcher> jsWatcher;
    RefPtr<ThreadWatcher> uiWatcher;
};
class WatchDog final : public Referenced {
public:
    WatchDog();
    ~WatchDog() override;
    void Register(int32_t instanceId, const RefPtr<TaskExecutor>& taskExecutor, bool useUIAsJSThread);
    void Unregister(int32_t instanceId);
    void BuriedBomb(int32_t instanceId, uint64_t bombId);
    void DefusingBomb(int32_t instanceId);
private:
    std::unordered_map<int32_t, Watchers> watchMap_;
    ACE_DISALLOW_COPY_AND_MOVE(WatchDog);
};
} // namespace OHOS::Ace

WatchDog在结构函数的时分,会创立发动一个AnrThread

WatchDog::WatchDog()
{
    AnrThread::Start();
#if defined(OHOS_PLATFORM) || defined(ANDROID_PLATFORM)
    AnrThread::PostTaskToTaskRunner(InitializeGcTrigger, GC_CHECK_PERIOD);
#endif
}

AnrThread界说也很简单,它用于一个事情循环的才能,即像Android的Looper相同不断进行事情的分发

namespace OHOS::Ace {
class AnrThread {
public:
    static void Start();
    static void Stop();
    using Task = std::function<void()>;
    static bool PostTaskToTaskRunner(Task&& task, uint32_t delayTime);
};
} // namespace OHOS::Ace
#endif

事情分发的才能是由TaskRunnerAdapter类供给的,TaskRunnerAdapter抽象了事情分发的才能,它的事情可所以任何具有才能分发的类供给,比方(OHOS::AppExecFwk::EventRunner)

namespace {
需要一个TaskRunnerAdapter,用于事情的分发
RefPtr<TaskRunnerAdapter> g_anrThread;
} // namespace
void AnrThread::Start()
{
    if (!g_anrThread) {
        g_anrThread = TaskRunnerAdapterFactory::Create(false, "anr");
    }
}
void AnrThread::Stop()
{
    g_anrThread.Reset();
}
bool AnrThread::PostTaskToTaskRunner(Task&& task, uint32_t delayTime)
{
    if (!g_anrThread || !task) {
        return false;
    }
    if (delayTime > 0) {
        g_anrThread->PostDelayedTask(std::move(task), delayTime, {});
    } else {
        g_anrThread->PostTask(std::move(task), {});
    }
    return true;
}
} // namespace OHOS::Ace

初始化的动作很简单,即发动一个具有事情循环机制的类,用于后边进行事情的循环分发,一起当时渠道假如界说了这两个宏情况下OHOS_PLATFORM或者ANDROID_PLATFORM, 那么将会建议第一个事情,用于GC信号的注册。没错,Engine中需要经过信号触发GC,经过注册自界说信号SIGNAL_FOR_GC(60)来进行信号绑定


void InitializeGcTrigger()
{
    // Record watch dog thread as signal handling thread
    g_signalThread = pthread_self();
    int32_t result = BlockGcSignal();
    if (result != 0) {
        LOGE("Failed to block GC signal, errno = %{public}d", result);
        return;
    }
    // Start to receive GC signal
    signal(SIGNAL_FOR_GC, OnSignalReceive);
    // Start check GC signal
    CheckGcSignal();
}

CheckGcSignal 经过sigtimedwait函数,用于当必定时刻内等待信号降临,假如在时刻内有收到信号,那么顺利履行AceEngine::Get().TriggerGarbageCollection();办法进行GC。(sigtimedwait 超时时result会小于0一起errno会被设置为EAGAIN,一起判别EINTR的目的是其他信号降临时也会打断sigtimedwait调用)

void CheckGcSignal()
{
    // Check if GC signal is in pending signal set
    sigset_t sigSet;
    sigemptyset(&sigSet);
    sigaddset(&sigSet, SIGNAL_FOR_GC);
    struct timespec interval = {
        .tv_sec = 0,
        .tv_nsec = 0,
    };
    int32_t result = sigtimedwait(&sigSet, nullptr, &interval);
    if (result < 0) {
        if (errno != EAGAIN && errno != EINTR) {
            LOGE("Failed to wait signals, errno = %{public}d", errno);
            return;
        }
    } else {
        ACE_DCHECK(result == SIGNAL_FOR_GC);
        // Start GC
        LOGE("Receive GC signal");
        AceEngine::Get().TriggerGarbageCollection();
    }
    // Check again
    AnrThread::AnrThread::PostTaskToTaskRunner(CheckGcSignal, GC_CHECK_PERIOD);
}

至此,WatchDog事情循环机制已经完结初始化,能够承受后边的“埋炸弹”与“拆炸弹”动作了

ANR机制

WatchDog 经过露出Register 办法,供给给Engine以外的模块进行注册,注册之后就能够使用WatchDog的监控

void WatchDog::Register(int32_t instanceId, const RefPtr<TaskExecutor>& taskExecutor, bool useUIAsJSThread)
{
    Watchers watchers = {
        .jsWatcher = AceType::MakeRefPtr<ThreadWatcher>(instanceId, TaskExecutor::TaskType::JS),
        .uiWatcher = AceType::MakeRefPtr<ThreadWatcher>(instanceId, TaskExecutor::TaskType::UI, useUIAsJSThread),
    };
    watchers.uiWatcher->SetTaskExecutor(taskExecutor);
    if (!useUIAsJSThread) {
        watchers.jsWatcher->SetTaskExecutor(taskExecutor);
    } else {
        watchers.jsWatcher = nullptr;
    }
    const auto resExecutor = watchMap_.try_emplace(instanceId, watchers);
    if (!resExecutor.second) {
        LOGW("Duplicate instance id: %{public}d when register to watch dog", instanceId);
    }
}

在ArkTS环境中,WatchDog只会创立uiWatcher并赋值给结构体(Watchers的uiWatcher),它是一个ThreadWatcher目标

ArkUI Engine - 深化ANR机制

ThreadWatcher目标初始化的时分,将发动检查,经过AnrThread::PostTaskToTaskRunner发动了一个检查使命

ThreadWatcher::ThreadWatcher(int32_t instanceId, TaskExecutor::TaskType type, bool useUIAsJSThread)
    : instanceId_(instanceId), type_(type), useUIAsJSThread_(useUIAsJSThread)
{
    InitThreadName();
    AnrThread::PostTaskToTaskRunner(
        [weak = Referenced::WeakClaim(this)]() {
            auto sp = weak.Upgrade();
            CHECK_NULL_VOID(sp);
             调用了ThreadWatcherCheck办法
            sp->Check();
        },
        NORMAL_CHECK_PERIOD);
}

Check办法是整个ANR机制中最核心的完成,下面咱们来看一下代码


void ThreadWatcher::Check()
{
    int32_t period = NORMAL_CHECK_PERIOD;
    if (!IsThreadStuck()) {
        if (state_ == State::FREEZE) {
            RawReport(RawEventType::RECOVER);
        }
        freezeCount_ = 0;
        state_ = State::NORMAL;
        canShowDialog_ = true;
        showDialogCount_ = 0;
    } else {
        if (state_ == State::NORMAL) {
            HiviewReport();
            RawReport(RawEventType::WARNING);
            state_ = State::WARNING;
            period = WARNING_CHECK_PERIOD;
        } else if (state_ == State::WARNING) {
            RawReport(RawEventType::FREEZE);
            state_ = State::FREEZE;
            period = FREEZE_CHECK_PERIOD;
            DetonatedBomb();
        } else {
            if (!canShowDialog_) {
                showDialogCount_++;
                if (showDialogCount_ >= ANR_DIALOG_BLOCK_TIME) {
                    canShowDialog_ = true;
                    showDialogCount_ = 0;
                }
            }
            if (++freezeCount_ >= 5) {
                RawReport(RawEventType::FREEZE);
                freezeCount_ = 0;
            }
            period = FREEZE_CHECK_PERIOD;
            DetonatedBomb();
        }
    }
    check使命完结后,继续进行check使命
    AnrThread::PostTaskToTaskRunner(
        [weak = Referenced::WeakClaim(this)]() {
            auto sp = weak.Upgrade();
            CHECK_NULL_VOID(sp);
            sp->Check();
        },
        period);
}

为了理解上面的代码,咱们简单总结一下上面提到的三种状况,分别是NORMAL,WARNING,FREEZE

ArkUI Engine - 深化ANR机制

NORMAL

NORMAL状况是正常的状况,咱们能够看到,当IsThreadStuck返回false时,state变量就会被设置为NORMAL状况,咱们看一下IsThreadStuck办法

bool ThreadWatcher::IsThreadStuck()
{
        ...
        要害的判别逻辑在这儿
        if (((loopTime_ - threadTag_) > (lastLoopTime_ - lastThreadTag_)) && (lastTaskId_ == taskId)) {
            std::string abilityName;
            if (AceEngine::Get().GetContainer(instanceId_) != nullptr) {
                abilityName = AceEngine::Get().GetContainer(instanceId_)->GetHostClassName();
            }
            LOGE("thread stuck, ability: %{public}s, instanceId: %{public}d, thread: %{public}s, looptime: %{public}d, "
                 "checktime: %{public}d",
                abilityName.c_str(), instanceId_, threadName_.c_str(), loopTime_, threadTag_);
            res = true;
        }
        lastTaskId_ = taskId;
        lastLoopTime_ = loopTime_;
        lastThreadTag_ = threadTag_;
    }
    CheckAndResetIfNeeded();
    PostCheckTask();
    return res;
}

这儿边触及了非常要害的两个变量loopTime_ ,与threadTag_ 。咱们能够想一下,ANR假如发生时,必定是音讯循环的某个音讯履行时刻过长才会导致的,那么如何判别音讯履行时刻呢?就靠这两个变量


void ThreadWatcher::PostCheckTask()
{
    auto taskExecutor = taskExecutor_.Upgrade();
    if (taskExecutor) {
        // post task to specified thread to check it
        taskExecutor->PostTask(
            [weak = Referenced::WeakClaim(this)]() {
                auto sp = weak.Upgrade();
                CHECK_NULL_VOID(sp);
                每次真正履行一个task,threadTag_ 才会自增
                sp->TagIncrease();
            },
            type_);
        std::unique_lock<std::shared_mutex> lock(mutex_);
        每次调用PostCheckTask的时分,loopTime_都会自增
        ++loopTime_;
     ....
}
void ThreadWatcher::TagIncrease()
{
    std::unique_lock<std::shared_mutex> lock(mutex_);
    ++threadTag_;
}

loopTime_ :每次engine调用PostCheckTask的时分,就会自增

threadTag_: 每次使命被调度的时分,就会自增

正常情况下,loopTime_都约等于threadTag_,调用PostCheckTask的时分假如没有delay的话,理应使命也会被调度。但是假如处在异常情况,比方这个task是一个耗时履行,比方一个死循环被调度,那么这两个变量的差值会跟着PostCheckTask的调用被不断增大,然后断定为线程卡顿。当然,这儿还一起判别了当时使命与前一个使命的id,两者假如相同,那么就大大证明了这个task存在卡顿。

假如处于无卡顿状况,那么state变量就会被赋值为NORMAL状况。

WARNING

WARNING是一个中间状况,咱们在上文IsThreadStuck函数能够看到,履行完IsThreadStuck后就会又调用PostCheckTask函数,再次向音讯循环中抛出一个check函数履行。

假如IsThreadStuck返回了false,那么state就会被当即设置为WARNING状况,假如音讯循环中的check函数再次被调度时仍是IsThreadStuck返回了false,那么就当即升级为FREEZE状况

FREEZE

FREEZE 状况是ANR的充分状况,因为两次音讯循环中IsThreadStuck都返回了false,那么此刻就会调用DetonatedBomb进行“炸弹引爆”。

值得注意的是,咱们还有一个else分支,即屡次音讯循环中,上一次状况为FREEZE,下一次状况依然为FREEZE,那么当累计次数达到ANR_DIALOG_BLOCK_TIME(5)次时,将再次把canShowDialog_修改为true(canShowDialog_控制着是否弹出ANR弹窗,当上一次ANR弹窗弹出时会被设置为false,因此只需再超过5次时,就会再次把这个变量设置为true让ANR弹窗再次可弹。)。同样的,假如屡次处于FREEZE状况,那么每一次都会调用DetonatedBomb函数“引爆炸弹”

        } else if (state_ == State::WARNING) {
            RawReport(RawEventType::FREEZE);
            state_ = State::FREEZE;
            period = FREEZE_CHECK_PERIOD;
            DetonatedBomb();
        } else {
            if (!canShowDialog_) {
                showDialogCount_++;
                if (showDialogCount_ >= ANR_DIALOG_BLOCK_TIME) {
                    canShowDialog_ = true;
                    showDialogCount_ = 0;
                }
            }
            if (++freezeCount_ >= 5) {
                RawReport(RawEventType::FREEZE);
                freezeCount_ = 0;
            }
            period = FREEZE_CHECK_PERIOD;
            DetonatedBomb();
        }

“引爆炸弹”&“埋炸弹”&“拆炸弹”

咱们上面提到的“引爆炸弹”,其实就是指DetonatedBomb函数,它用于触发ANR使命,假如满足条件的情况下。

当然,DetonatedBomb并不是调用了就会产生ANR弹窗,而是会判别inputTaskIds_中第一个使命与当时运行使命的时刻差值是否大于ANR_INPUT_FREEZE_TIME(5000 即5s),假如大于这个阈值那么毫无疑问是一个ANR,不然就只是一个卡顿。假如canShowDialog_为true,那么就调用ShowDialog办法弹出ANR弹窗

void ThreadWatcher::DetonatedBomb()
{
    std::shared_lock<std::shared_mutex> lock(mutex_);
    会先判别inputTaskIds_这个行列是否为空
    if (inputTaskIds_.empty()) {
        return;
    }
    uint64_t currentTime = GetMilliseconds();
    uint64_t bombId = inputTaskIds_.front();
    if (currentTime - bombId > ANR_INPUT_FREEZE_TIME) {
        LOGE("Detonated the Bomb, which bombId is %{public}s and currentTime is %{public}s",
            std::to_string(bombId).c_str(), std::to_string(currentTime).c_str());
        if (canShowDialog_) {
            ShowDialog();
            canShowDialog_ = false;
            showDialogCount_ = 0;
        } else {
            LOGE("Can not show dialog when detonated the Bomb.");
        }
        ANR断定成功后会把整个炸弹行列清除
        std::queue<uint64_t> empty;
        std::swap(empty, inputTaskIds_);
    }
}

inputTaskIds_变量其实是一个行列

std::queue<uint64_t> inputTaskIds_;

使用者能够经过BuriedBomb进行“埋炸弹”,用于要害的流程进行ANR判别

void ThreadWatcher::BuriedBomb(uint64_t bombId)
{
    std::unique_lock<std::shared_mutex> lock(mutex_);
    inputTaskIds_.emplace(bombId);
}

当然,使用者也能够经过DefusingBomb办法进行“拆炸弹”

void ThreadWatcher::DefusingBomb()
{
    auto taskExecutor = taskExecutor_.Upgrade();
    CHECK_NULL_VOID(taskExecutor);
    taskExecutor->PostTask(
        [weak = Referenced::WeakClaim(this)]() {
            auto sp = weak.Upgrade();
            if (sp) {
                sp->DefusingTopBomb();
            }
        },
        type_);
}

实质都是对这个行列的元素进行增删操作,因为后续触发DetonatedBomb办法的时分,会先判别inputTaskIds_是否为空,假如为空的情况下,那么其实就算音讯推迟也不算为ANR。

总结

经过本章,咱们学习到Engine供给的WatchDog机制以及其ANR完成的原理,经过学习这些源码,咱们将会对整个ArkUIEngine更加的熟悉,便利咱们进行后续的监控或者优化。