本篇介绍

相信搞过android开发的都经历过crash的case,crash后可以看到一个非常详细的堆栈,从这个堆栈上可以看到crash时候的调用上下文,该信息在分析crash问题时非常有价值,那本篇我们就来看下这个堆栈是如何拿到的。

Unwind介绍

利用内存信息拿到调用堆栈的过程就是回栈,Unwind,业界也有开源的libunwind方案libunwind, 研究该流程可以领略到计算机的不少奥妙。接下来我们就开始看看android上的unwind。

在Android上有AndroidLocalUnwinder和AndroidRemoteUnwinder,前者是获取本进程堆栈信息,后者是获取跨进程堆栈信息,我们就先从前者看下。

class AndroidLocalUnwinder : public AndroidUnwinder {
 public:
  AndroidLocalUnwinder() : AndroidUnwinder(getpid()) {
    initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
  }
  AndroidLocalUnwinder(std::shared_ptr<Memory>& process_memory)
      : AndroidUnwinder(getpid(), process_memory) {
    initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
  }
  AndroidLocalUnwinder(const std::vector<std::string>& initial_map_names_to_skip)
      : AndroidUnwinder(getpid(), initial_map_names_to_skip) {
    initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
  }
  AndroidLocalUnwinder(const std::vector<std::string>& initial_map_names_to_skip,
                       const std::vector<std::string>& map_suffixes_to_ignore)
      : AndroidUnwinder(getpid(), initial_map_names_to_skip, map_suffixes_to_ignore) {
    initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
  }
  virtual ~AndroidLocalUnwinder() = default;
 protected:
  static constexpr const char* kUnwindstackLib = "libunwindstack.so";
  bool InternalInitialize(ErrorData& error) override;
  bool InternalUnwind(std::optional<pid_t> tid, AndroidUnwinderData& data) override;
};

应用通过创建AndroidLocalUnwinder对象,然后调用Unwind方法就可以了,例子如下:

 AndroidLocalUnwinder unwinder;
 AndroidUnwinderData data;
 unwinder.Unwind(tid, data);

这时候拿到的data就包含了堆栈信息,先看下AndroidUnwinderData的结构:

struct AndroidUnwinderData {
  AndroidUnwinderData() = default;
  explicit AndroidUnwinderData(const size_t max_frames) : max_frames(max_frames) {}
  explicit AndroidUnwinderData(const bool show_all_frames) : show_all_frames(show_all_frames) {}
  void DemangleFunctionNames();
  std::string GetErrorString();
  std::vector<FrameData> frames;
  ErrorData error;
  std::optional<std::unique_ptr<Regs>> saved_initial_regs;
  const std::optional<size_t> max_frames;
  const bool show_all_frames = false;
};

这儿的FrameData就对应了某个函数调用记录。这块下面再详细介绍,从上面的例子可以看到入口就是Unwind,那接下来看下Unwind的操作:

bool AndroidUnwinder::Unwind(std::optional<pid_t> tid, AndroidUnwinderData& data) {
  if (!Initialize(data.error)) {
    return false;
  }
  return InternalUnwind(tid, data);
}

虽然入口只有2行代码,先Initialize,然后InternalUnwind,可是内部有不少乾坤,我们先看下Initialize:

bool AndroidUnwinder::Initialize(ErrorData& error) {
  // Android stores the jit and dex file location only in the library
  // libart.so or libartd.so.
  static std::vector<std::string> search_libs [[clang::no_destroy]] = {"libart.so", "libartd.so"};
  std::call_once(initialize_, [this, &error]() {
    if (!InternalInitialize(error)) {
      initialize_status_ = false;
      return;
    }
    jit_debug_ = CreateJitDebug(arch_, process_memory_, search_libs);
#if defined(DEXFILE_SUPPORT)
    dex_files_ = CreateDexFiles(arch_, process_memory_, search_libs);
#endif
    initialize_status_ = true;
  });
  return initialize_status_;
}

这儿的no_destroy就表示search_libs不会被析构。接下来看下InternalInitialize,这个方法也是进程内只执行一次:

bool AndroidLocalUnwinder::InternalInitialize(ErrorData& error) {
  arch_ = Regs::CurrentArch();
  maps_.reset(new LocalUpdatableMaps);
  if (!maps_->Parse()) {
    error.code = ERROR_MAPS_PARSE;
    return false;
  }
  if (process_memory_ == nullptr) {
    process_memory_ = Memory::CreateProcessMemoryThreadCached(getpid());
  }
  return true;
}

这儿就开始读取内存信息了,首先先获取当前的架构,现在的android大部分都是arm64了,接下来就是读取内存信息。先看下LocalUpdatableMaps如何Parse的:

bool LocalUpdatableMaps::Parse() {
  pthread_rwlock_wrlock(&maps_rwlock_);
  bool parsed = Maps::Parse();
  pthread_rwlock_unlock(&maps_rwlock_);
  return parsed;
}

继续跟一下:

bool Maps::Parse() {
  std::shared_ptr<MapInfo> prev_map;
  return android::procinfo::ReadMapFile(GetMapsFile(),
                      [&](const android::procinfo::MapInfo& mapinfo) {
    // Mark a device map in /dev/ and not in /dev/ashmem/ specially.
    auto flags = mapinfo.flags;
    if (strncmp(mapinfo.name.c_str(), "/dev/", 5) == 0 &&
        strncmp(mapinfo.name.c_str() + 5, "ashmem/", 7) != 0) {
      flags |= unwindstack::MAPS_FLAGS_DEVICE_MAP;
    }
    maps_.emplace_back(
        MapInfo::Create(prev_map, mapinfo.start, mapinfo.end, mapinfo.pgoff, flags, mapinfo.name));
    prev_map = maps_.back();
  });
}

这儿就开始读内存的maps信息,然后开始解析成定义好的map格式。读取的文件路径是“/proc/self/maps”, 这里记录的就是内存段信息,格式如下所示:

emu64a:/proc/4584 # cat maps |more
12c00000-5ac00000 rw-p 00000000 00:00 0                                  [anon:dalvik-main space (region space)]
6f0d5000-6f363000 rw-p 00000000 00:00 0                                  [anon:dalvik-/system/framework/boot.art]
6f363000-6f3a5000 rw-p 00000000 00:00 0                                  [anon:dalvik-/system/framework/boot-core-libart.art]
6f3a5000-6f3ce000 rw-p 00000000 00:00 0                                  [anon:dalvik-/system/framework/boot-okhttp.art]
6f3ce000-6f410000 rw-p 00000000 00:00 0                                  [anon:dalvik-/system/framework/boot-bouncycastle.art]
6f410000-6f411000 rw-p 00000000 00:00 0                                  [anon:dalvik-/system/framework/boot-apache-xml.art]
6f411000-6f4a4000 r--p 00000000 fe:00 1299                               /system/framework/arm64/boot.oat
6f4a4000-6f788000 r-xp 00093000 fe:00 1299                               /system/framework/arm64/boot.oat
6f788000-6f789000 rw-p 00000000 00:00 0                                  [anon:.bss]
6f789000-6f79c000 rw-p 00000000 fe:00 1300                               /system/framework/arm64/boot.vdex
6f79c000-6f79d000 r--p 00377000 fe:00 1299                               /system/framework/arm64/boot.oat
6f79d000-6f79e000 rw-p 00378000 fe:00 1299                               /system/framework/arm64/boot.oat
6f79e000-6f7ac000 r--p 00000000 fe:00 1275                               /system/framework/arm64/boot-core-libart.oat
6f7ac000-6f7f0000 r-xp 0000e000 fe:00 1275                               /system/framework/arm64/boot-core-libart.oat
6f7f0000-6f7f1000 rw-p 00000000 00:00 0                                  [anon:.bss]
6f7f1000-6f7f4000 rw-p 00000000 fe:00 1276                               /system/framework/arm64/boot-core-libart.vdex
6f7f4000-6f7f5000 r--p 00052000 fe:00 1275                               /system/framework/arm64/boot-core-libart.oat
6f7f5000-6f7f6000 rw-p 00053000 fe:00 1275                               /system/framework/arm64/boot-core-libart.oat
6f7f6000-6f802000 r--p 00000000 fe:00 1290                               /system/framework/arm64/boot-okhttp.oat
6f802000-6f836000 r-xp 0000c000 fe:00 1290                               /system/framework/arm64/boot-okhttp.oat
6f836000-6f837000 rw-p 00000000 00:00 0                                  [anon:.bss]
6f837000-6f839000 rw-p 00000000 fe:00 1291                               /system/framework/arm64/boot-okhttp.vdex
6f839000-6f83a000 r--p 00040000 fe:00 1290                               /system/framework/arm64/boot-okhttp.oat
6f83a000-6f83b000 rw-p 00041000 fe:00 1290                               /system/framework/arm64/boot-okhttp.oat
6f83b000-6f843000 r--p 00000000 fe:00 1269                               /system/framework/arm64/boot-bouncycastle.oat
6f843000-6f858000 r-xp 00008000 fe:00 1269                               /system/framework/arm64/boot-bouncycastle.oat

包含的信息是虚拟地址的起始地址,结束地址,权限,该虚拟地址空间对应的内容在被映射文件中的偏移,设备号,inode号。
接下来就是将这些信息解析出来。
具体解析过程如下

inline bool ReadMapFileContent(char* content, const MapInfoParamsCallback& callback) {
  uint64_t start_addr;
  uint64_t end_addr;
  uint16_t flags;
  uint64_t pgoff;
  ino_t inode;
  char* line_start = content;
  char* next_line;
  char* name;
  bool shared;
  while (line_start != nullptr && *line_start != '') {
    bool parsed = ParseMapsFileLine(line_start, start_addr, end_addr, flags, pgoff,
                                    inode, &name, shared, &next_line);
    if (!parsed) {
      return false;
    }
    line_start = next_line;
    callback(start_addr, end_addr, flags, pgoff, inode, name, shared);
  }
  return true;
}

这样就是解析maps文件,可以参考上面的信息一块看,可以更好的看到每个字段的含义:

// Parses the given line p pointing at proc/<pid>/maps content buffer and returns true on success
// and false on failure parsing. The first new line character of line will be replaced by the
// null character and *next_line will point to the character after the null.
//
// Example of how a parsed line look line:
// 00400000-00409000 r-xp 00000000 fc:00 426998  /usr/lib/gvfs/gvfsd-http
static inline bool ParseMapsFileLine(char* p, uint64_t& start_addr, uint64_t& end_addr, uint16_t& flags,
                      uint64_t& pgoff, ino_t& inode, char** name, bool& shared, char** next_line) {
  // Make the first new line character null.
  *next_line = strchr(p, 'n');
  if (*next_line != nullptr) {
    **next_line = '';
    (*next_line)++;
  }
  char* end;
  // start_addr
  start_addr = strtoull(p, &end, 16);
  if (end == p || *end != '-') {
    return false;
  }
  p = end + 1;
  // end_addr
  end_addr = strtoull(p, &end, 16);
  if (end == p) {
    return false;
  }
  p = end;
  if (!PassSpace(&p)) {
    return false;
  }
  // flags
  flags = 0;
  if (*p == 'r') {
    flags |= PROT_READ;
  } else if (*p != '-') {
    return false;
  }
  p++;
  if (*p == 'w') {
    flags |= PROT_WRITE;
  } else if (*p != '-') {
    return false;
  }
  p++;
  if (*p == 'x') {
    flags |= PROT_EXEC;
  } else if (*p != '-') {
    return false;
  }
  p++;
  if (*p != 'p' && *p != 's') {
    return false;
  }
  shared = *p == 's';
  p++;
  if (!PassSpace(&p)) {
    return false;
  }
  // pgoff
  pgoff = strtoull(p, &end, 16);
  if (end == p) {
    return false;
  }
  p = end;
  if (!PassSpace(&p)) {
    return false;
  }
  // major:minor
  if (!PassXdigit(&p) || *p++ != ':' || !PassXdigit(&p) || !PassSpace(&p)) {
    return false;
  }
  // inode
  inode = strtoull(p, &end, 10);
  if (end == p) {
    return false;
  }
  p = end;
  if (*p != '' && !PassSpace(&p)) {
    return false;
  }
  // Assumes that the first new character was replaced with null.
  *name = p;
  return true;
}

将解析的每行信息用MapInfo表示,因此也可以和MapInfo对照看下:

// Represents virtual memory map (as obtained from /proc/*/maps).
//
// Note that we have to be surprisingly careful with memory usage here,
// since in system-wide profiling this data can take considerable space.
// (for example, 400 process * 400 maps * 128 bytes = 20 MB + string data).
class MapInfo {
 public:
  MapInfo(std::shared_ptr<MapInfo>& prev_map, uint64_t start, uint64_t end, uint64_t offset,
          uint64_t flags, SharedString name)
      : start_(start),
        end_(end),
        offset_(offset),
        flags_(flags),
        name_(name),
        elf_fields_(nullptr),
        prev_map_(prev_map) {}
  MapInfo(uint64_t start, uint64_t end, uint64_t offset, uint64_t flags, SharedString name)
      : start_(start),
        end_(end),
        offset_(offset),
        flags_(flags),
        name_(name),
        elf_fields_(nullptr) {}
  static inline std::shared_ptr<MapInfo> Create(std::shared_ptr<MapInfo>& prev_map,
                                                uint64_t start, uint64_t end, uint64_t offset,
                                                uint64_t flags, SharedString name) {
    auto map_info = std::make_shared<MapInfo>(prev_map, start, end, offset, flags, name);
    if (prev_map) {
      prev_map->next_map_ = map_info;
    }
    return map_info;
  }

接下来继续看下内存读取:

std::shared_ptr<Memory> Memory::CreateProcessMemoryThreadCached(pid_t pid) {
  if (pid == getpid()) {
    return std::shared_ptr<Memory>(new MemoryThreadCache(new MemoryLocal()));
  }
  return std::shared_ptr<Memory>(new MemoryThreadCache(new MemoryRemote(pid)));
}

这儿只是创建了一个Memory对象,并没有真正读取,那这个对象用来干啥呢? 需要继续往后看:

bool AndroidUnwinder::Initialize(ErrorData& error) {
  // Android stores the jit and dex file location only in the library
  // libart.so or libartd.so.
  static std::vector<std::string> search_libs [[clang::no_destroy]] = {"libart.so", "libartd.so"};
  std::call_once(initialize_, [this, &error]() {
    if (!InternalInitialize(error)) {
      initialize_status_ = false;
      return;
    }
    jit_debug_ = CreateJitDebug(arch_, process_memory_, search_libs);
#if defined(DEXFILE_SUPPORT)
    dex_files_ = CreateDexFiles(arch_, process_memory_, search_libs);
#endif
    initialize_status_ = true;
  });
  return initialize_status_;
}

由于java对应的jit和dex信息会存在到虚拟机的动态库中,因此需要从art的库中解析
接下来看下 CreateJitDebug:

std::unique_ptr<JitDebug> CreateJitDebug(ArchEnum arch, std::shared_ptr<Memory>& memory,
                                         std::vector<std::string> search_libs) {
  return CreateGlobalDebugImpl<Elf>(arch, memory, search_libs, "__jit_debug_descriptor");
}
template <typename Symfile>
std::unique_ptr<GlobalDebugInterface<Symfile>> CreateGlobalDebugImpl(
    ArchEnum arch, std::shared_ptr<Memory>& memory, std::vector<std::string> search_libs,
    const char* global_variable_name) {
  CHECK(arch != ARCH_UNKNOWN);
  // The interface needs to see real-time changes in memory for synchronization with the
  // concurrently running ART JIT compiler. Skip caching and read the memory directly.
  std::shared_ptr<Memory> jit_memory;
  MemoryCacheBase* cached_memory = memory->AsMemoryCacheBase();
  if (cached_memory != nullptr) {
    jit_memory = cached_memory->UnderlyingMemory();
  } else {
    jit_memory = memory;
  }
  switch (arch) {
    case ARCH_X86: {
      using Impl = GlobalDebugImpl<Symfile, uint32_t, Uint64_P>;
      static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 12, "layout");
      static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 28, "layout");
      static_assert(sizeof(typename Impl::JITCodeEntry) == 32, "layout");
      static_assert(sizeof(typename Impl::JITDescriptor) == 48, "layout");
      return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
    }
    case ARCH_ARM: {
      using Impl = GlobalDebugImpl<Symfile, uint32_t, Uint64_A>;
      static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 16, "layout");
      static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 32, "layout");
      static_assert(sizeof(typename Impl::JITCodeEntry) == 40, "layout");
      static_assert(sizeof(typename Impl::JITDescriptor) == 48, "layout");
      return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
    }
    case ARCH_ARM64:
    case ARCH_X86_64:
    case ARCH_RISCV64: {
      using Impl = GlobalDebugImpl<Symfile, uint64_t, Uint64_A>;
      static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 24, "layout");
      static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 40, "layout");
      static_assert(sizeof(typename Impl::JITCodeEntry) == 48, "layout");
      static_assert(sizeof(typename Impl::JITDescriptor) == 56, "layout");
      return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
    }
    default:
      abort();
  }
}

这儿是按照体系结构创建GlobalDebugImpl,我们先只关心ARM64。
先看下AsMemoryCacheBase做了啥:

  MemoryCacheBase* AsMemoryCacheBase() override { return this; }

只是返回了自己。接下来是UnderlyingMemory

  const std::shared_ptr<Memory>& UnderlyingMemory() { return impl_; }

这儿对于我们,返回的其实是MemoryLocal。然后就是创建GlobalDebugImpl对象了。
这儿需要了解一个背景,gdb是如何调式java代码的?要知道java代码运行时会通过jit形成可以直接执行的代码,那这时候就需要一个映射信息,比如某个java 符号对应的代码地址与范围。
这些信息就是以如下结构描述的:

  struct JITCodeEntry {
    Uintptr_T next;
    Uintptr_T prev;
    Uintptr_T symfile_addr;
    Uint64_T symfile_size;
    // Android-specific fields:
    Uint64_T timestamp;
    uint32_t seqlock;
  };
  struct JITDescriptor {
    uint32_t version;
    uint32_t action_flag;
    Uintptr_T relevant_entry;
    Uintptr_T first_entry;
    // Android-specific fields:
    uint8_t magic[8];
    uint32_t flags;
    uint32_t sizeof_descriptor;
    uint32_t sizeof_entry;
    uint32_t seqlock;
    Uint64_T timestamp;
  };

接下来再看下DexFile:

std::unique_ptr<DexFiles> CreateDexFiles(ArchEnum arch, std::shared_ptr<Memory>& memory,
                                         std::vector<std::string> search_libs) {
  return CreateGlobalDebugImpl<DexFile>(arch, memory, search_libs, "__dex_debug_descriptor");
}

这儿流程和JitDebug一样,可以在后面深入看。
到了这儿Initialize流程算是结束了,接下来就要开始回栈了

bool AndroidLocalUnwinder::InternalUnwind(std::optional<pid_t> tid, AndroidUnwinderData& data) {
  if (!tid) {
    tid = android::base::GetThreadId();
  }
  if (static_cast<uint64_t>(*tid) == android::base::GetThreadId()) {
    // Unwind current thread.
    std::unique_ptr<Regs> regs(Regs::CreateFromLocal());
    RegsGetLocal(regs.get());
    return AndroidUnwinder::Unwind(regs.get(), data);
  }
  ThreadUnwinder unwinder(data.max_frames.value_or(max_frames_), maps_.get(), process_memory_);
  unwinder.SetJitDebug(jit_debug_.get());
  unwinder.SetDexFiles(dex_files_.get());
  std::unique_ptr<Regs>* initial_regs = nullptr;
  if (data.saved_initial_regs) {
    initial_regs = &data.saved_initial_regs.value();
  }
  unwinder.UnwindWithSignal(kThreadUnwindSignal, *tid, initial_regs,
                            data.show_all_frames ? nullptr : &initial_map_names_to_skip_,
                            &map_suffixes_to_ignore_);
  data.frames = unwinder.ConsumeFrames();
  data.error = unwinder.LastError();
  return data.frames.size() != 0;
}

开始先判断是否是给当前线程回栈,如果是,那就直接回就可以,如果不是,那么还需要通过信号的方式。可以想到,前者是后者的特例,因此直接看后者的流程即可。
首先是构造了一个ThreadUnwinder对象,然后是一顿赋值,并没有实际的逻辑操作,接下开看下重头戏UnwindWithSignal

void ThreadUnwinder::UnwindWithSignal(int signal, pid_t tid, std::unique_ptr<Regs>* initial_regs,
                                      const std::vector<std::string>* initial_map_names_to_skip,
                                      const std::vector<std::string>* map_suffixes_to_ignore) {
  ClearErrors();
  if (tid == static_cast<pid_t>(android::base::GetThreadId())) {
    last_error_.code = ERROR_UNSUPPORTED;
    return;
  }
  if (!Init()) {
    return;
  }
  ThreadEntry* entry = SendSignalToThread(signal, tid);
  if (entry == nullptr) {
    return;
  }
  std::unique_ptr<Regs> regs(Regs::CreateFromUcontext(Regs::CurrentArch(), entry->GetUcontext()));
  if (initial_regs != nullptr) {
    initial_regs->reset(regs->Clone());
  }
  SetRegs(regs.get());
  UnwinderFromPid::Unwind(initial_map_names_to_skip, map_suffixes_to_ignore);
  // Tell the signal handler to exit and release the entry.
  entry->Wake();
  // Wait for the thread to indicate it is done with the ThreadEntry.
  // If this fails, the Wait command will log an error message.
  entry->Wait(WAIT_FOR_THREAD_TO_RESTART);
  ThreadEntry::Remove(entry);
}

从前面的逻辑我们知道这儿的tid是目标线程的tid,和当前线程并不一样,因此接下来就是初始化,然后给目标线程发信号。
先看下初始化:

bool UnwinderFromPid::Init() {
  CHECK(arch_ != ARCH_UNKNOWN);
  if (initted_) {
    return true;
  }
  initted_ = true;
  if (maps_ == nullptr) {
    if (pid_ == getpid()) {
      maps_ptr_.reset(new LocalMaps());
    } else {
      maps_ptr_.reset(new RemoteMaps(pid_));
    }
    if (!maps_ptr_->Parse()) {
      ClearErrors();
      last_error_.code = ERROR_INVALID_MAP;
      return false;
    }
    maps_ = maps_ptr_.get();
  }
  if (process_memory_ == nullptr) {
    if (pid_ == getpid()) {
      // Local unwind, so use thread cache to allow multiple threads
      // to cache data even when multiple threads access the same object.
      process_memory_ = Memory::CreateProcessMemoryThreadCached(pid_);
    } else {
      // Remote unwind should be safe to cache since the unwind will
      // be occurring on a stopped process.
      process_memory_ = Memory::CreateProcessMemoryCached(pid_);
    }
  }
  // jit_debug_ and dex_files_ may have already been set, for example in
  // AndroidLocalUnwinder::InternalUnwind.
  if (jit_debug_ == nullptr) {
    jit_debug_ptr_ = CreateJitDebug(arch_, process_memory_);
    SetJitDebug(jit_debug_ptr_.get());
  }
#if defined(DEXFILE_SUPPORT)
  if (dex_files_ == nullptr) {
    dex_files_ptr_ = CreateDexFiles(arch_, process_memory_);
    SetDexFiles(dex_files_ptr_.get());
  }
#endif
  return true;
}

可以看到这儿的map_, process_memory_, jit_debug_, dex_files_ 已经初始化过了,因此这儿就不需要再初始化了。接下来就是发送信号:

ThreadEntry* ThreadUnwinder::SendSignalToThread(int signal, pid_t tid) {
  static std::mutex action_mutex;
  std::lock_guard<std::mutex> guard(action_mutex);
  ThreadEntry* entry = ThreadEntry::Get(tid);
  entry->Lock();
  struct sigaction new_action = {.sa_sigaction = SignalHandler,
                                 .sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK};
  struct sigaction old_action = {};
  sigemptyset(&new_action.sa_mask);
  if (sigaction(signal, &new_action, &old_action) != 0) {             // 设置信号handler,这样目标线程唤醒后就会收到并让信号处理函数在目标线程中执行
    Log::AsyncSafe("sigaction failed: %s", strerror(errno));
    ThreadEntry::Remove(entry);
    last_error_.code = ERROR_SYSTEM_CALL;
    return nullptr;
  }
  if (tgkill(getpid(), tid, signal) != 0) { // 发送信号
    // Do not emit an error message, this might be expected. Set the
    // error and let the caller decide.
    if (errno == ESRCH) {
      last_error_.code = ERROR_THREAD_DOES_NOT_EXIST;
    } else {
      last_error_.code = ERROR_SYSTEM_CALL;
    }
    sigaction(signal, &old_action, nullptr);
    ThreadEntry::Remove(entry);
    return nullptr;
  }
  // Wait for the thread to get the ucontext. The number indicates
  // that we are waiting for the first Wake() call made by the thread.
  bool wait_completed = entry->Wait(WAIT_FOR_UCONTEXT); // 当前线程等待目标线程唤醒
  if (wait_completed) {
    return entry;
  }
  if (old_action.sa_sigaction == nullptr) {
    // If the wait failed, it could be that the signal could not be delivered
    // within the timeout. Add a signal handler that's simply going to log
    // something so that we don't crash if the signal eventually gets
    // delivered. Only do this if there isn't already an action set up.
    struct sigaction log_action = {.sa_sigaction = SignalLogOnly,
                                   .sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK};
    sigemptyset(&log_action.sa_mask);
    sigaction(signal, &log_action, nullptr);
  } else {
    sigaction(signal, &old_action, nullptr);
  }
  // Check to see if the thread has disappeared.
  if (tgkill(getpid(), tid, 0) == -1 && errno == ESRCH) { // 如果发送信号失败了,设置错误码
    last_error_.code = ERROR_THREAD_DOES_NOT_EXIST;
  } else {
    last_error_.code = ERROR_THREAD_TIMEOUT;
  }
  ThreadEntry::Remove(entry);
  return nullptr;
}

这儿的流程就是给当前进程设置一个信号处理函数,然后让这个信号处理函数在目标线程中执行,然后当前线程就坐等通知就行。这时候的线程间通信依赖的就是ThreadEntry,可以先看下它的内容,比较简洁:

bool ThreadEntry::Wait(WaitType type) {
  static const std::chrono::duration wait_time(std::chrono::seconds(10));
  std::unique_lock<std::mutex> lock(wait_mutex_);
  if (wait_cond_.wait_for(lock, wait_time, [this, type] { return wait_value_ == type; })) {
    return true;
  } else {
    Log::AsyncSafe("Timeout waiting for %s", GetWaitTypeName(type));
    return false;
  }
}
void ThreadEntry::Wake() {
  wait_mutex_.lock();
  wait_value_++;
  wait_mutex_.unlock();
  wait_cond_.notify_one();
}

接下来再看下SignalHandler,要记住,这个handler是在目标线程中执行的:

static void SignalHandler(int, siginfo_t*, void* sigcontext) {
  android::base::ErrnoRestorer restore;
  ThreadEntry* entry = ThreadEntry::Get(android::base::GetThreadId(), false);
  if (!entry) {
    return;
  }
  entry->CopyUcontextFromSigcontext(sigcontext);
  // Indicate the ucontext is now valid.
  entry->Wake();
  // Pause the thread until the unwind is complete. This avoids having
  // the thread run ahead causing problems.
  // The number indicates that we are waiting for the second Wake() call
  // overall which is made by the thread requesting an unwind.
  if (entry->Wait(WAIT_FOR_UNWIND_TO_COMPLETE)) {
    // Do not remove the entry here because that can result in a deadlock
    // if the code cannot properly send a signal to the thread under test.
    entry->Wake();
  }
  // If the wait fails, the entry might have been freed, so only exit.
}

是不是这儿的流程看起来像是状态机了?先unwind线程发起Wait,目标线程保存中断上下文,然后唤醒unwind线程,目标线程继续Wait,等unwind线程来唤醒,最后目标线程再唤醒unwind线程。
这儿的entry中就保存了上下文信息:

void ThreadEntry::CopyUcontextFromSigcontext(void* sigcontext) {
  ucontext_t* ucontext = reinterpret_cast<ucontext_t*>(sigcontext);
  // The only thing the unwinder cares about is the mcontext data.
  memcpy(&ucontext_.uc_mcontext, &ucontext->uc_mcontext, sizeof(ucontext->uc_mcontext));
}

那mcontext是啥呢?

/* Structure to describe FPU registers.  */
typedef struct _libc_fpstate *fpregset_t;
/* Context to describe whole processor state.  */
typedef struct
  {
    gregset_t gregs;
    /* Note that fpregs is a pointer.  */
    fpregset_t fpregs;
    __extension__ unsigned long long __reserved1 [8];
} mcontext_t;
/* Userlevel context.  */
typedef struct ucontext
  {
    unsigned long int uc_flags;
    struct ucontext *uc_link;
    stack_t uc_stack;
    mcontext_t uc_mcontext;
    __sigset_t uc_sigmask;
    struct _libc_fpstate __fpregs_mem;
  } ucontext_t;

也就是通用寄存器和浮点寄存器的值,通用寄存器的格式如下:

/* Type for general register.  */
__extension__ typedef long long int greg_t;
/* Number of general registers.  */
#define NGREG   23
/* Container for all general registers.  */
typedef greg_t gregset_t[NGREG];
#ifdef __USE_GNU
/* Number of each register in the `gregset_t' array.  */
enum
{
  REG_R8 = 0,
# define REG_R8     REG_R8
  REG_R9,
# define REG_R9     REG_R9
  REG_R10,
# define REG_R10    REG_R10
  REG_R11,
# define REG_R11    REG_R11
  REG_R12,
# define REG_R12    REG_R12
  REG_R13,
# define REG_R13    REG_R13
  REG_R14,
# define REG_R14    REG_R14
  REG_R15,
# define REG_R15    REG_R15
  REG_RDI,
# define REG_RDI    REG_RDI
  REG_RSI,
# define REG_RSI    REG_RSI
  REG_RBP,
# define REG_RBP    REG_RBP
  REG_RBX,
# define REG_RBX    REG_RBX
  REG_RDX,
# define REG_RDX    REG_RDX
  REG_RAX,
# define REG_RAX    REG_RAX
  REG_RCX,
# define REG_RCX    REG_RCX
  REG_RSP,
# define REG_RSP    REG_RSP
  REG_RIP,
# define REG_RIP    REG_RIP
  REG_EFL,
# define REG_EFL    REG_EFL
  REG_CSGSFS,       /* Actually short cs, gs, fs, __pad0.  */
# define REG_CSGSFS REG_CSGSFS
  REG_ERR,
# define REG_ERR    REG_ERR
  REG_TRAPNO,
# define REG_TRAPNO REG_TRAPNO
  REG_OLDMASK,
# define REG_OLDMASK    REG_OLDMASK
  REG_CR2
# define REG_CR2    REG_CR2
};
#endif

浮点寄存器的表示如下:

struct _libc_fpxreg
{
  unsigned short int significand[4];
  unsigned short int exponent;
  unsigned short int padding[3];
};
struct _libc_xmmreg
{
  __uint32_t    element[4];
};
struct _libc_fpstate
{
  /* 64-bit FXSAVE format.  */
  __uint16_t        cwd;
  __uint16_t        swd;
  __uint16_t        ftw;
  __uint16_t        fop;
  __uint64_t        rip;
  __uint64_t        rdp;
  __uint32_t        mxcsr;
  __uint32_t        mxcr_mask;
  struct _libc_fpxreg   _st[8];
  struct _libc_xmmreg   _xmm[16];
  __uint32_t        padding[24];
};
/* Structure to describe FPU registers.  */
typedef struct _libc_fpstate *fpregset_t;

也就是这样一操作,unwind线程就可以拿到目标线程的寄存器上下文信息。
再看下unwind线程是如何保存寄存器信息的:

Regs* Regs::CreateFromUcontext(ArchEnum arch, void* ucontext) {
  switch (arch) {
    case ARCH_X86:
      return RegsX86::CreateFromUcontext(ucontext);
    case ARCH_X86_64:
      return RegsX86_64::CreateFromUcontext(ucontext);
    case ARCH_ARM:
      return RegsArm::CreateFromUcontext(ucontext);
    case ARCH_ARM64:
      return RegsArm64::CreateFromUcontext(ucontext);
    case ARCH_RISCV64:
      return RegsRiscv64::CreateFromUcontext(ucontext);
    case ARCH_UNKNOWN:
    default:
      return nullptr;
  }
}

我们只关心ARM64, 继续看下:

Regs* RegsArm64::CreateFromUcontext(void* ucontext) {
  arm64_ucontext_t* arm64_ucontext = reinterpret_cast<arm64_ucontext_t*>(ucontext);
  RegsArm64* regs = new RegsArm64();
  memcpy(regs->RawData(), &arm64_ucontext->uc_mcontext.regs[0], ARM64_REG_LAST * sizeof(uint64_t));
  return regs;
}

这儿先来一个类型强转,翻译成arm64的上下文,然后将寄存器信息保存下来。
看下arm64的上下文结构:

struct arm64_mcontext_t {
  uint64_t fault_address;         // __u64
  uint64_t regs[ARM64_REG_LAST];  // __u64
  uint64_t pstate;                // __u64
  // Nothing else is used, so don't define it.
};
struct arm64_ucontext_t {
  uint64_t uc_flags;  // unsigned long
  uint64_t uc_link;   // struct ucontext*
  arm64_stack_t uc_stack;
  arm64_sigset_t uc_sigmask;
  // The kernel adds extra padding after uc_sigmask to match glibc sigset_t on ARM64.
  char __padding[128 - sizeof(arm64_sigset_t)];
  // The full structure requires 16 byte alignment, but our partial structure
  // doesn't, so force the alignment.
  arm64_mcontext_t uc_mcontext __attribute__((aligned(16)));
};

这时候就是设置下寄存器信息:

  void SetRegs(Regs* regs) {
    regs_ = regs;
    arch_ = regs_ != nullptr ? regs->Arch() : ARCH_UNKNOWN;
  }

这儿并没有翻译,需要继续往下看Unwind:

void UnwinderFromPid::Unwind(const std::vector<std::string>* initial_map_names_to_skip,
                             const std::vector<std::string>* map_suffixes_to_ignore) {
  if (!Init()) {
    return;
  }
  Unwinder::Unwind(initial_map_names_to_skip, map_suffixes_to_ignore);
}

我们已经看到Init过了,接下来进入核心部分:

void Unwinder::Unwind(const std::vector<std::string>* initial_map_names_to_skip,
                      const std::vector<std::string>* map_suffixes_to_ignore) {
  CHECK(arch_ != ARCH_UNKNOWN);
  ClearErrors();
  frames_.clear();
  // Clear any cached data from previous unwinds.
  process_memory_->Clear();
  if (maps_->Find(regs_->pc()) == nullptr) {
    regs_->fallback_pc();
  }
  bool return_address_attempt = false;
  bool adjust_pc = false;
  for (; frames_.size() < max_frames_;) {
    uint64_t cur_pc = regs_->pc();
    uint64_t cur_sp = regs_->sp();
    std::shared_ptr<MapInfo> map_info = maps_->Find(regs_->pc());
    uint64_t pc_adjustment = 0;
    uint64_t step_pc;
    uint64_t rel_pc;
    Elf* elf;
    bool ignore_frame = false;
    if (map_info == nullptr) {
      step_pc = regs_->pc();
      rel_pc = step_pc;
      // If we get invalid map via return_address_attempt, don't hide error for the previous frame.
      if (!return_address_attempt || last_error_.code == ERROR_NONE) {
        last_error_.code = ERROR_INVALID_MAP;
        last_error_.address = step_pc;
      }
      elf = nullptr;
    } else {
      ignore_frame =
          initial_map_names_to_skip != nullptr &&
          std::find(initial_map_names_to_skip->begin(), initial_map_names_to_skip->end(),
                    android::base::Basename(map_info->name())) != initial_map_names_to_skip->end();
      if (!ignore_frame && ShouldStop(map_suffixes_to_ignore, map_info->name())) {
        break;
      }
      elf = map_info->GetElf(process_memory_, arch_);
      step_pc = regs_->pc();
      rel_pc = elf->GetRelPc(step_pc, map_info.get());
      // Everyone except elf data in gdb jit debug maps uses the relative pc.
      if (!(map_info->flags() & MAPS_FLAGS_JIT_SYMFILE_MAP)) {
        step_pc = rel_pc;
      }
      if (adjust_pc) {
        pc_adjustment = GetPcAdjustment(rel_pc, elf, arch_);
      } else {
        pc_adjustment = 0;
      }
      step_pc -= pc_adjustment;
      // If the pc is in an invalid elf file, try and get an Elf object
      // using the jit debug information.
      if (!elf->valid() && jit_debug_ != nullptr && (map_info->flags() & PROT_EXEC)) {
        uint64_t adjusted_jit_pc = regs_->pc() - pc_adjustment;
        Elf* jit_elf = jit_debug_->Find(maps_, adjusted_jit_pc);
        if (jit_elf != nullptr) {
          // The jit debug information requires a non relative adjusted pc.
          step_pc = adjusted_jit_pc;
          elf = jit_elf;
        }
      }
    }
    FrameData* frame = nullptr;
    if (!ignore_frame) {
      if (regs_->dex_pc() != 0) {
        // Add a frame to represent the dex file.
        FillInDexFrame();
        // Clear the dex pc so that we don't repeat this frame later.
        regs_->set_dex_pc(0);
        // Make sure there is enough room for the real frame.
        if (frames_.size() == max_frames_) {
          last_error_.code = ERROR_MAX_FRAMES_EXCEEDED;
          break;
        }
      }
      frame = FillInFrame(map_info, elf, rel_pc, pc_adjustment);
      // Once a frame is added, stop skipping frames.
      initial_map_names_to_skip = nullptr;
    }
    adjust_pc = true;
    bool stepped = false;
    bool in_device_map = false;
    bool finished = false;
    if (map_info != nullptr) {
      if (map_info->flags() & MAPS_FLAGS_DEVICE_MAP) {
        // Do not stop here, fall through in case we are
        // in the speculative unwind path and need to remove
        // some of the speculative frames.
        in_device_map = true;
      } else {
        auto sp_info = maps_->Find(regs_->sp());
        if (sp_info != nullptr && sp_info->flags() & MAPS_FLAGS_DEVICE_MAP) {
          // Do not stop here, fall through in case we are
          // in the speculative unwind path and need to remove
          // some of the speculative frames.
          in_device_map = true;
        } else {
          bool is_signal_frame = false;
          if (elf->StepIfSignalHandler(rel_pc, regs_, process_memory_.get())) {
            stepped = true;
            is_signal_frame = true;
          } else if (elf->Step(step_pc, regs_, process_memory_.get(), &finished,
                               &is_signal_frame)) {
            stepped = true;
          }
          if (is_signal_frame && frame != nullptr) {
            // Need to adjust the relative pc because the signal handler
            // pc should not be adjusted.
            frame->rel_pc = rel_pc;
            frame->pc += pc_adjustment;
            step_pc = rel_pc;
          }
          elf->GetLastError(&last_error_);
        }
      }
    }
    if (frame != nullptr) {
      if (!resolve_names_ ||
          !elf->GetFunctionName(step_pc, &frame->function_name, &frame->function_offset)) {
        frame->function_name = "";
        frame->function_offset = 0;
      }
    }
    if (finished) {
      break;
    }
    if (!stepped) {
      if (return_address_attempt) {
        // Only remove the speculative frame if there are more than two frames
        // or the pc in the first frame is in a valid map.
        // This allows for a case where the code jumps into the middle of
        // nowhere, but there is no other unwind information after that.
        if (frames_.size() > 2 || (frames_.size() > 0 && maps_->Find(frames_[0].pc) != nullptr)) {
          // Remove the speculative frame.
          frames_.pop_back();
        }
        break;
      } else if (in_device_map) {
        // Do not attempt any other unwinding, pc or sp is in a device
        // map.
        break;
      } else {
        // Steping didn't work, try this secondary method.
        if (!regs_->SetPcFromReturnAddress(process_memory_.get())) {
          break;
        }
        return_address_attempt = true;
      }
    } else {
      return_address_attempt = false;
      if (max_frames_ == frames_.size()) {
        last_error_.code = ERROR_MAX_FRAMES_EXCEEDED;
      }
    }
    // If the pc and sp didn't change, then consider everything stopped.
    if (cur_pc == regs_->pc() && cur_sp == regs_->sp()) {
      last_error_.code = ERROR_REPEATED_FRAME;
      break;
    }
  }
}

这么一顿操作下来,我们就拿到了回栈结果了。我们开始分段看:

  if (maps_->Find(regs_->pc()) == nullptr) {
    regs_->fallback_pc();
  }

首先看当前的pc所在的内存段,可以想到maps中存放的是不同so的内存段,Find就会进行耳返查找,看pc正在哪个so区间:

std::shared_ptr<MapInfo> Maps::Find(uint64_t pc) {
  if (maps_.empty()) {
    return nullptr;
  }
  size_t first = 0;
  size_t last = maps_.size();
  while (first < last) {
    size_t index = (first + last) / 2;
    const auto& cur = maps_[index];
    if (pc >= cur->start() && pc < cur->end()) {
      return cur;
    } else if (pc < cur->start()) {
      last = index;
    } else {
      first = index + 1;
    }
  }
  return nullptr;
}

果然和我们想的一样,那regs->pc 就是读取pc地址:

uint64_t RegsArm64::pc() {
  return regs_[ARM64_REG_PC];
}

那如果没找到呢?

void RegsArm64::fallback_pc() {
  // As a last resort, try stripping the PC of the pointer
  // authentication code.
  regs_[ARM64_REG_PC] = strip_pac(regs_[ARM64_REG_PC], pac_mask_);
}

有些arm特性会在pc地址上做一些标记,比如hwasan就是利用了地址的高位来存放tag信息,这儿的思路也一样,把tag信息抹掉恢复成正常的pc地址:

static uint64_t strip_pac(uint64_t pc, uint64_t mask) {
  // If the target is aarch64 then the return address may have been
  // signed using the Armv8.3-A Pointer Authentication extension. The
  // original return address can be restored by stripping out the
  // authentication code using a mask or xpaclri. xpaclri is a NOP on
  // pre-Armv8.3-A architectures.
  if (mask) {
    pc &= ~mask;
  } else {
#if defined(__BIONIC__)
    pc = __bionic_clear_pac_bits(pc);
#endif
  }
  return pc;
}

接下来就是利用mapinfo 进行回栈,直到栈的深度超过阈值,Android默认最大的栈深度是512,这个数量已经足够用了。
接下来按照我们的逻辑思考,如何回栈?那就是先拿到pc和sp地址,查看所在的内存段,没错,就是这个逻辑:

    uint64_t cur_pc = regs_->pc();
    uint64_t cur_sp = regs_->sp();
    std::shared_ptr<MapInfo> map_info = maps_->Find(regs_->pc());

如果map info 为空,就表示没法解析了,看下如何处理的:

 if (map_info == nullptr) {
      step_pc = regs_->pc();
      rel_pc = step_pc;
      // If we get invalid map via return_address_attempt, don't hide error for the previous frame.
      if (!return_address_attempt || last_error_.code == ERROR_NONE) {
        last_error_.code = ERROR_INVALID_MAP;
        last_error_.address = step_pc;
      }
      elf = nullptr;
    } 

正常的话,mapinfo中一定会包含pc,因为pc地址一定是在进程虚拟内存地址访问的范围内,而进程所有的内存地址使用情况都在maps中,所以只要是正常case,maps中就会包含pc,那有没有可能不包含呢? 当然有可能,比如代码里的野地址。
接下来就看找到对应mapinfo的case:

ignore_frame =
          initial_map_names_to_skip != nullptr &&
          std::find(initial_map_names_to_skip->begin(), initial_map_names_to_skip->end(),
                    android::base::Basename(map_info->name())) != initial_map_names_to_skip->end();
      if (!ignore_frame && ShouldStop(map_suffixes_to_ignore, map_info->name())) {
        break;
      }

有一些so我们是不希望在堆栈中看到的,那就可以通过initial_map_names_to_skip来指定,比如有些隐私的so或者是libunwind,自己解析自己就容易出问题。如果是非这种case,就可以继续往下解析:

      elf = map_info->GetElf(process_memory_, arch_);
      step_pc = regs_->pc();
      rel_pc = elf->GetRelPc(step_pc, map_info.get());

先获取elf,然后获取相对地址,因为pc的中的地址是相对于整个虚拟内存的,而对于so,so内都是按照相对地址来参考的,因此需要做一下转换。先看下如何获取的elf:

Elf* MapInfo::GetElf(const std::shared_ptr<Memory>& process_memory, ArchEnum expected_arch) {
  // Make sure no other thread is trying to add the elf to this map.
  std::lock_guard<std::mutex> guard(elf_mutex());
  if (elf().get() != nullptr) {
    return elf().get();
  }
  ScopedElfCacheLock elf_cache_lock;
  if (Elf::CachingEnabled() && !name().empty()) {
    if (Elf::CacheGet(this)) {
      return elf().get();
    }
  }
  elf().reset(new Elf(CreateMemory(process_memory)));
  // If the init fails, keep the elf around as an invalid object so we
  // don't try to reinit the object.
  elf()->Init();
  if (elf()->valid() && expected_arch != elf()->arch()) {
    // Make the elf invalid, mismatch between arch and expected arch.
    elf()->Invalidate();
  }
  if (!elf()->valid()) {
    set_elf_start_offset(offset());
  } else if (auto prev_real_map = GetPrevRealMap(); prev_real_map != nullptr &&
                                                    prev_real_map->flags() == PROT_READ &&
                                                    prev_real_map->offset() < offset()) {
    // If there is a read-only map then a read-execute map that represents the
    // same elf object, make sure the previous map is using the same elf
    // object if it hasn't already been set. Locking this should not result
    // in a deadlock as long as the invariant that the code only ever tries
    // to lock the previous real map holds true.
    std::lock_guard<std::mutex> guard(prev_real_map->elf_mutex());
    if (prev_real_map->elf() == nullptr) {
      // Need to verify if the map is the previous read-only map.
      prev_real_map->set_elf(elf());
      prev_real_map->set_memory_backed_elf(memory_backed_elf());
      prev_real_map->set_elf_start_offset(elf_start_offset());
      prev_real_map->set_elf_offset(prev_real_map->offset() - elf_start_offset());
    } else if (prev_real_map->elf_start_offset() == elf_start_offset()) {
      // Discard this elf, and use the elf from the previous map instead.
      set_elf(prev_real_map->elf());
    }
  }
  // Cache the elf only after all of the above checks since we might
  // discard the original elf we created.
  if (Elf::CachingEnabled()) {
    Elf::CacheAdd(this);
  }
  return elf().get();
}

上来就是操作elf(),如果有的话,直接就返回了。看下具体实现:

inline std::shared_ptr<Elf>& elf() { return GetElfFields().elf_; }
MapInfo::ElfFields& MapInfo::GetElfFields() {
  ElfFields* elf_fields = elf_fields_.load(std::memory_order_acquire);
  if (elf_fields != nullptr) {
    return *elf_fields;
  }
  // Allocate and initialize the field in thread-safe way.
  std::unique_ptr<ElfFields> desired(new ElfFields());
  ElfFields* expected = nullptr;
  // Strong version is reliable. Weak version might randomly return false.
  if (elf_fields_.compare_exchange_strong(expected, desired.get())) {
    return *desired.release();  // Success: we transferred the pointer ownership to the field.
  } else {
    return *expected;  // Failure: 'expected' is updated to the value set by the other thread.
  }
}

这儿就是分配下ElfFields,作为elf信息的cache。顺便提一下原子比较的stong和weak的区别。目前arm上的原子操作是用的LL/SC(load-linked/store-conditional)指令实现的,这种方式比传统的CAS条件更松一些,可是性能会更好,默认是weak的实现方式,可能出现偶尔的误判,比如值是相等的,可是返回了false。 那stong 就可以完全避免这种情况,因为在weak的基础上又加了一层保障,这个还是很有意思的。LL/SC 就是在读取数据后,只要对该地址没有更新,那么后续的写入可以直接生效。

这个如果不好理解的话,可以简单一点,只要是非while场景,一律用stong,while场景可选用weak。

继续回到上面的逻辑,elf()首次肯定返回空,那么接下来就需要看看如何构造elf了。

 if (Elf::CachingEnabled() && !name().empty()) {
    if (Elf::CacheGet(this)) {
      return elf().get();
    }
  }

这儿的elf 的cache默认是关闭的,因此从cache中是拿不到了。只能主动创建,流程如下:

  elf().reset(new Elf(CreateMemory(process_memory)));
  // If the init fails, keep the elf around as an invalid object so we
  // don't try to reinit the object.
  elf()->Init();
  if (elf()->valid() && expected_arch != elf()->arch()) {
    // Make the elf invalid, mismatch between arch and expected arch.
    elf()->Invalidate();
  }

看下CreateMemory:

Memory* MapInfo::CreateMemory(const std::shared_ptr<Memory>& process_memory) {
  if (end() <= start()) {
    return nullptr;
  }
  set_elf_offset(0);
  // Fail on device maps.
  if (flags() & MAPS_FLAGS_DEVICE_MAP) {
    return nullptr;
  }
  // First try and use the file associated with the info.
  if (!name().empty()) {
    Memory* memory = GetFileMemory();
    if (memory != nullptr) {
      return memory;
    }
  }
  if (process_memory == nullptr) {
    return nullptr;
  }
  set_memory_backed_elf(true);
  // Need to verify that this elf is valid. It's possible that
  // only part of the elf file to be mapped into memory is in the executable
  // map. In this case, there will be another read-only map that includes the
  // first part of the elf file. This is done if the linker rosegment
  // option is used.
  std::unique_ptr<MemoryRange> memory(new MemoryRange(process_memory, start(), end() - start(), 0));
  if (Elf::IsValidElf(memory.get())) {
    set_elf_start_offset(offset());
    auto next_real_map = GetNextRealMap();
    // Might need to peek at the next map to create a memory object that
    // includes that map too.
    if (offset() != 0 || next_real_map == nullptr || offset() >= next_real_map->offset()) {
      return memory.release();
    }
    // There is a possibility that the elf object has already been created
    // in the next map. Since this should be a very uncommon path, just
    // redo the work. If this happens, the elf for this map will eventually
    // be discarded.
    MemoryRanges* ranges = new MemoryRanges;
    ranges->Insert(new MemoryRange(process_memory, start(), end() - start(), 0));
    ranges->Insert(new MemoryRange(process_memory, next_real_map->start(),
                                   next_real_map->end() - next_real_map->start(),
                                   next_real_map->offset() - offset()));
    return ranges;
  }
  auto prev_real_map = GetPrevRealMap();
  // Find the read-only map by looking at the previous map. The linker
  // doesn't guarantee that this invariant will always be true. However,
  // if that changes, there is likely something else that will change and
  // break something.
  if (offset() == 0 || prev_real_map == nullptr || prev_real_map->offset() >= offset()) {
    set_memory_backed_elf(false);
    return nullptr;
  }
  // Make sure that relative pc values are corrected properly.
  set_elf_offset(offset() - prev_real_map->offset());
  // Use this as the elf start offset, otherwise, you always get offsets into
  // the r-x section, which is not quite the right information.
  set_elf_start_offset(prev_real_map->offset());
  std::unique_ptr<MemoryRanges> ranges(new MemoryRanges);
  if (!ranges->Insert(new MemoryRange(process_memory, prev_real_map->start(),
                                      prev_real_map->end() - prev_real_map->start(), 0))) {
    return nullptr;
  }
  if (!ranges->Insert(new MemoryRange(process_memory, start(), end() - start(), elf_offset()))) {
    return nullptr;
  }
  return ranges.release();
}

这儿首先就是加载对应so的内容,具体实现是GetFileMemory:

Memory* MapInfo::GetFileMemory() {
  // Fail on device maps.
  if (flags() & MAPS_FLAGS_DEVICE_MAP) {
    return nullptr;
  }
  std::unique_ptr<MemoryFileAtOffset> memory(new MemoryFileAtOffset);
  if (offset() == 0) {
    if (memory->Init(name(), 0)) {
      return memory.release();
    }
    return nullptr;
  }
  // These are the possibilities when the offset is non-zero.
  // - There is an elf file embedded in a file, and the offset is the
  //   the start of the elf in the file.
  // - There is an elf file embedded in a file, and the offset is the
  //   the start of the executable part of the file. The actual start
  //   of the elf is in the read-only segment preceeding this map.
  // - The whole file is an elf file, and the offset needs to be saved.
  //
  // Map in just the part of the file for the map. If this is not
  // a valid elf, then reinit as if the whole file is an elf file.
  // If the offset is a valid elf, then determine the size of the map
  // and reinit to that size. This is needed because the dynamic linker
  // only maps in a portion of the original elf, and never the symbol
  // file data.
  //
  // For maps with MAPS_FLAGS_JIT_SYMFILE_MAP, the map range is for a JIT function,
  // which can be smaller than elf header size. So make sure map_size is large enough
  // to read elf header.
  uint64_t map_size = std::max<uint64_t>(end() - start(), sizeof(ElfTypes64::Ehdr));
  if (!memory->Init(name(), offset(), map_size)) {
    return nullptr;
  }
  // Check if the start of this map is an embedded elf.
  uint64_t max_size = 0;
  if (Elf::GetInfo(memory.get(), &max_size)) {
    set_elf_start_offset(offset());
    if (max_size > map_size) {
      if (memory->Init(name(), offset(), max_size)) {
        return memory.release();
      }
      // Try to reinit using the default map_size.
      if (memory->Init(name(), offset(), map_size)) {
        return memory.release();
      }
      set_elf_start_offset(0);
      return nullptr;
    }
    return memory.release();
  }
  // No elf at offset, try to init as if the whole file is an elf.
  if (memory->Init(name(), 0) && Elf::IsValidElf(memory.get())) {
    set_elf_offset(offset());
    return memory.release();
  }
  // See if the map previous to this one contains a read-only map
  // that represents the real start of the elf data.
  if (InitFileMemoryFromPreviousReadOnlyMap(memory.get())) {
    return memory.release();
  }
  // Failed to find elf at start of file or at read-only map, return
  // file object from the current map.
  if (memory->Init(name(), offset(), map_size)) {
    return memory.release();
  }
  return nullptr;
}

这儿就是映射so文件,需要处理多种场景,比如整个文件就是完整的elf,这个文件部分内容是elf,这个文件的elf在前面的meminfo中等。比如如果整个文件就是完整的elf,那么就是:

 if (offset() == 0) {
    if (memory->Init(name(), 0)) {
      return memory.release();
    }
    return nullptr;
  }

如果只有部分是elf,那么就是:

 uint64_t map_size = std::max<uint64_t>(end() - start(), sizeof(ElfTypes64::Ehdr));
  if (!memory->Init(name(), offset(), map_size)) {
    return nullptr;
  }

offset()就是elf开头的部分,这儿就是为了查找elf的文件头,包含的信息如下:

typedef struct
{
  unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
  Elf64_Half    e_type;         /* Object file type */
  Elf64_Half    e_machine;      /* Architecture */
  Elf64_Word    e_version;      /* Object file version */
  Elf64_Addr    e_entry;        /* Entry point virtual address */
  Elf64_Off e_phoff;        /* Program header table file offset */
  Elf64_Off e_shoff;        /* Section header table file offset */
  Elf64_Word    e_flags;        /* Processor-specific flags */
  Elf64_Half    e_ehsize;       /* ELF header size in bytes */
  Elf64_Half    e_phentsize;        /* Program header table entry size */
  Elf64_Half    e_phnum;        /* Program header table entry count */
  Elf64_Half    e_shentsize;        /* Section header table entry size */
  Elf64_Half    e_shnum;        /* Section header table entry count */
  Elf64_Half    e_shstrndx;     /* Section header string table index */
} Elf64_Ehdr;

从elf文件中拿到头信息就可以解析该so了。也就是当前的目的就是要找到elf的头信息。
再看这个文件是否是一个elf:

  // Check if the start of this map is an embedded elf.
  uint64_t max_size = 0;
  if (Elf::GetInfo(memory.get(), &max_size)) {
    set_elf_start_offset(offset());
    if (max_size > map_size) {
      if (memory->Init(name(), offset(), max_size)) {
        return memory.release();
      }
      // Try to reinit using the default map_size.
      if (memory->Init(name(), offset(), map_size)) {
        return memory.release();
      }
      set_elf_start_offset(0);
      return nullptr;
    }
    return memory.release();
  }

这儿的GetInfo就是获取文件的信息,看是否是elf,如果是elf的话,算一下elf文件的大小等:

bool Elf::GetInfo(Memory* memory, uint64_t* size) {
  if (!IsValidElf(memory)) {
    return false;
  }
  *size = 0;
  uint8_t class_type;
  if (!memory->ReadFully(EI_CLASS, &class_type, 1)) {
    return false;
  }
  // Get the maximum size of the elf data from the header.
  if (class_type == ELFCLASS32) {
    ElfInterface32::GetMaxSize(memory, size);
  } else if (class_type == ELFCLASS64) {
    ElfInterface64::GetMaxSize(memory, size);
  } else {
    return false;
  }
  return true;
}

再回顾下,如何判断一个文件是否是elf文件,那就是检查下文件头的magic number:

bool Elf::IsValidElf(Memory* memory) {
  if (memory == nullptr) {
    return false;
  }
  // Verify that this is a valid elf file.
  uint8_t e_ident[SELFMAG + 1];
  if (!memory->ReadFully(0, e_ident, SELFMAG)) {
    return false;
  }
  if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
    return false;
  }
  return true;
}

完全正确。
那如果magic number不一样是不是就断定不是elf文件了呢?也不一定,还可以看看文件类型,我们每个文件的开头都会有点信息标识该文件的类型,比如bash,可执行文件,so等。接下来就是通过这种方式看是否是elf文件。如果读取到的类型正好是ELFCLASS64,那说明还真是一个elf文件,那接下来获取下大小,那问题来了,如何获取elf文件的大小呢? 肯定不是文件大小,因为如果elf是嵌入到一个文件中的话,elf文件大小就会小于文件大小,那可以利用elf文件的格式计算出来:

// This is an estimation of the size of the elf file using the location
// of the section headers and size. This assumes that the section headers
// are at the end of the elf file. If the elf has a load bias, the size
// will be too large, but this is acceptable.
template <typename ElfTypes>
void ElfInterfaceImpl<ElfTypes>::GetMaxSize(Memory* memory, uint64_t* size) {
  EhdrType ehdr;
  if (!memory->ReadFully(0, &ehdr, sizeof(ehdr))) {
    *size = 0;
    return;
  }
  // If this winds up as zero, the PT_LOAD reading will get a better value.
  uint64_t elf_size = ehdr.e_shoff + ehdr.e_shentsize * ehdr.e_shnum;
  // Search through the PT_LOAD values and if any result in a larger elf
  // size, use that.
  uint64_t offset = ehdr.e_phoff;
  for (size_t i = 0; i < ehdr.e_phnum; i++, offset += ehdr.e_phentsize) {
    PhdrType phdr;
    if (!memory->ReadFully(offset, &phdr, sizeof(phdr))) {
      break;
    }
    if (phdr.p_type == PT_LOAD) {
      uint64_t end_offset;
      if (__builtin_add_overflow(phdr.p_offset, phdr.p_memsz, &end_offset)) {
        continue;
      }
      if (end_offset > elf_size) {
        elf_size = end_offset;
      }
    }
  }
  *size = elf_size;
}

看到了吧?这就是技巧!一般section header表会在elf文件的末尾,那就利用这个信息可以算一波,不过我们知道elf文件有2种试视图,一种是链接一种是运行,section header是链接用的,运行时候不一定要有,而运行时候的program header是一定会有的,那接下来就利用program header 再算一波,然后取一个最大值就稳了。
再看下program header table entry的结构:

typedef struct
{
  Elf64_Word    p_type;         /* Segment type */
  Elf64_Word    p_flags;        /* Segment flags */
  Elf64_Off p_offset;       /* Segment file offset */
  Elf64_Addr    p_vaddr;        /* Segment virtual address */
  Elf64_Addr    p_paddr;        /* Segment physical address */
  Elf64_Xword   p_filesz;       /* Segment size in file */
  Elf64_Xword   p_memsz;        /* Segment size in memory */
  Elf64_Xword   p_align;        /* Segment alignment */
} Elf64_Phdr;

这时候拿到elf文件大小后就可以再试着映射下so,看看能否成功,如果还是不行,那就只能继续试试其他case了,比如我们拿到的offset是无效的:

  // No elf at offset, try to init as if the whole file is an elf.
  if (memory->Init(name(), 0) && Elf::IsValidElf(memory.get())) {
    set_elf_offset(offset());
    return memory.release();
  }

或者前一个同名的只读mapinfo才是elf的头?

  // See if the map previous to this one contains a read-only map
  // that represents the real start of the elf data.
  if (InitFileMemoryFromPreviousReadOnlyMap(memory.get())) {
    return memory.release();
  }

看下InitFileMemoryFromPreviousReadOnlyMap:

bool MapInfo::InitFileMemoryFromPreviousReadOnlyMap(MemoryFileAtOffset* memory) {
  // One last attempt, see if the previous map is read-only with the
  // same name and stretches across this map.
  auto prev_real_map = GetPrevRealMap();
  if (prev_real_map == nullptr || prev_real_map->flags() != PROT_READ ||
      prev_real_map->offset() >= offset()) {
    return false;
  }
  uint64_t map_size = end() - prev_real_map->end();
  if (!memory->Init(name(), prev_real_map->offset(), map_size)) {
    return false;
  }
  uint64_t max_size;
  if (!Elf::GetInfo(memory, &max_size) || max_size < map_size) {
    return false;
  }
  if (!memory->Init(name(), prev_real_map->offset(), max_size)) {
    return false;
  }
  set_elf_offset(offset() - prev_real_map->offset());
  set_elf_start_offset(prev_real_map->offset());
  return true;
}

关注公众号:Android老皮!!!欢迎大家来找我探讨交流