ChatGPT 如此火爆,但它的强悍在于 NLU(自然语言了解)、DM(对话办理)和 NLG (自然语言生成)这三块,而 Recognition 辨认和 TTS 播报这两块是缺失的。假使你的 App 接入了 ChatGPT,但假如需求播报出来的话,TextToSpeech 机制就能够派上用场了。

1 前言

关于语音方面的交互,Android SDK 供给了用于语音交互的 VoiceInteraction 机制、语音辨认的 Recognition 接口、语音播报的 TTS 接口。

前者现已介绍过,本次首要聊聊第 3 块即 TTS,后续会剖析下第 2 块即 Android 规范的 Recognition 机制。

经过 TextToSpeech 机制,任意 App 都能够方便地选用体系内置或第三方供给的 TTS Engine 进行播映铃声提示、语音提示的恳求,Engine 能够由体系挑选默许的 provider 来履行操作,也可由 App 详细指定偏好的方针 Engine 来完成。

默许 TTS Engine 能够在设备设置的途径中找到,亦可由用户手动更改:Settings -> Accessibility -> Text-to-speech ouput -> preferred engine

直面原理:5 张图彻底了解 Android TextToSpeech 机制

TextToSpeech 机制的长处有许多:

  • 关于需求运用 TTS 的恳求 App 而言:无需关怀 TTS 的详细完成,经过 TextToSpeech API 即用即有
  • 关于需求对外供给 TTS 能力的完成 Engine 而言,无需保护杂乱的 TTS 时序和逻辑,按照 TextToSpeechService 框架的定义对接即可,无需关怀体系怎么将完成和恳求进行衔接

本文将会论述 TextToSpeech 机制的调用、Engine 的完成以及体系调度这三块,完全整理清楚整个流程。

2 TextToSpeech 调用

直面原理:5 张图彻底了解 Android TextToSpeech 机制

TextToSpeech API 是为 TTS 调用预备,总体比较简单。

最首要的是供给初始化 TTS 接口的 TextToSpeech() 结构函数和初始化后的回调 OnInitListener,后续的播映 TTS 的 speak() 和播映铃声的 playEarcon()

比较重要的是处理播映恳求的 4 种回调成果,需求依据不同成果进行 TTS 播报开始的状况记载、播报结束后的下一步动作、抑或是在播报犯错时对音频焦点的办理等等。

之前的 OnUtteranceCompletedListener 在 API level 18 时被抛弃,能够运用回调更为精细的 UtteranceProgressListener

// TTSTest.kt
class TTSTest(context: Context) {
  private val tts: TextToSpeech = TextToSpeech(context) { initResult -> ... }
​
  init {
    tts.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
      override fun onStart(utteranceId: String?) { ... }
​
      override fun onDone(utteranceId: String?) { ... }
​
      override fun onStop(utteranceId: String?, interrupted: Boolean)  { ... }
      
      override fun onError(utteranceId: String?)  { ... }
     })
   }
  
  fun testTextToSpeech(context: Context) {
    tts.speak(
      "你好,轿车",
      TextToSpeech.QUEUE_ADD,
      Bundle(),
      "xxdtgfsf"
     )
​
    tts.playEarcon(
      EARCON_DONE,
      TextToSpeech.QUEUE_ADD,
      Bundle(),
      "yydtgfsf"
     )
   }
​
  companion object {
    const val EARCON_DONE = "earCon_done"
   }
}

3 TextToSpeech 体系调度

3.1 init 绑定

直面原理:5 张图彻底了解 Android TextToSpeech 机制

首先从 TextToSpeech() 的完成下手,以了解在 TTS 播报之前,体系和 TTS Engine 之间做了什么预备作业。

  1. 其触发的 initTTS() 将按照如下顺序查找需求衔接到哪个 Engine:

    • 假如结构 TTS 接口的实例时指定了方针 Engine 的 package,那么首选衔接到该 Engine
    • 反之,获取设备设置的 default Engine 并衔接,设置来自于 TtsEngines 从体系设置数据 SettingsProvider 中 读取 TTS_DEFAULT_SYNTH 而来
    • 假如 default 不存在或许没有安装的话,从 TtsEngines 获取第一位的体系 Engine 并衔接。第一位指的是从所有 TTS Service 完成 Engine 列表里取得第一个属于 system image 的 Engine
  2. 衔接的话均是调用 connectToEngine(),其将依据调用来源来选用不同的 Connection 内部完成去 connect()

    • 假如调用不是来自 system,选用 DirectConnection

      • 其 connect() 完成较为简单,封装 Action 为 INTENT_ACTION_TTS_SERVICE 的 Intent 进行 bindService(),后续由 AMS 履行和 Engine 的绑定,这儿不再打开
    • 反之,选用 SystemConnection,原因在于体系的 TTS 恳求或许许多,不能像其他 App 一样总是创立一个新的衔接,而是需求 cache 并复用这种衔接

      • 详细是直接获取名为 texttospeech 、办理 TTS Service 的体系服务 TextToSpeechManagerService 的接口署理并直接调用它的 createSession() 创立一个 session,一起暂存其指向的 ITextToSpeechSession 署理接口。

        该 session 实际上仍是 AIDL 机制,TTS 体系服务的内部会创立专用的 TextToSpeechSessionConnection 去 bind 和 cache Engine,这儿不再赘述

    • 无论是哪种方式,在 connected 之后都需求将详细的 TTS Eninge 的 ITextToSpeechService 接口实例暂存,一起将 Connection 实例暂存到 mServiceConnection,给外部类接纳到 speak() 的时分运用。而且要留意,此时还会启动一个异步任务 SetupConnectionAsyncTask 将自己作为 Binder 接口 ITextToSpeechCallback 回来给 Engine 以处理完之后回调成果给 Request

  3. connect 履行结束并成果 OK 的话,还要暂存到 mConnectingServiceConnection,以在结束 TTS 需求的时分释放衔接运用。并经过 dispatchOnInit() 传递 SUCCESS 给 Request App

    • 完成很简单,将成果 Enum 回调给初始化传入的 OnInitListener 接口
  4. 假如衔接失利的话,则调用 dispatchOnInit() 传递 ERROR

// TextToSpeech.java
public class TextToSpeech {
  public TextToSpeech(Context context, OnInitListener listener) {
    this(context, listener, null);
   }
​
  private TextToSpeech( ... ) {
     ...
    initTts();
   }
​
  private int initTts() {
    // Step 1: Try connecting to the engine that was requested.
    if (mRequestedEngine != null) {
      if (mEnginesHelper.isEngineInstalled(mRequestedEngine)) {
        if (connectToEngine(mRequestedEngine)) {
          mCurrentEngine = mRequestedEngine;
          return SUCCESS;
         }
         ...
       } else if (!mUseFallback) {
         ...
        dispatchOnInit(ERROR);
        return ERROR;
       }
     }
​
    // Step 2: Try connecting to the user's default engine.
    final String defaultEngine = getDefaultEngine();
     ...
​
    // Step 3: Try connecting to the highest ranked engine in the system.
    final String highestRanked = mEnginesHelper.getHighestRankedEngineName();
     ...
​
    dispatchOnInit(ERROR);
    return ERROR;
   }
​
  private boolean connectToEngine(String engine) {
    Connection connection;
    if (mIsSystem) {
      connection = new SystemConnection();
     } else {
      connection = new DirectConnection();
     }
​
    boolean bound = connection.connect(engine);
    if (!bound) {
      return false;
     } else {
      mConnectingServiceConnection = connection;
      return true;
     }
   }
}

Connection 内部类和其两个子类的完成:

// TextToSpeech.java
public class TextToSpeech {
   ...
  private abstract class Connection implements ServiceConnection {
    private ITextToSpeechService mService;
     ...
​
    private final ITextToSpeechCallback.Stub mCallback =
        new ITextToSpeechCallback.Stub() {
          public void onStop(String utteranceId, boolean isStarted)
              throws RemoteException {
            UtteranceProgressListener listener = mUtteranceProgressListener;
            if (listener != null) {
              listener.onStop(utteranceId, isStarted);
             }
           };
​
          @Override
          public void onSuccess(String utteranceId) { ... }
​
          @Override
          public void onError(String utteranceId, int errorCode) { ... }
​
          @Override
          public void onStart(String utteranceId) { ... }
           ...
         };
​
    @Override
    public void onServiceConnected(ComponentName componentName, IBinder service) {
      synchronized(mStartLock) {
        mConnectingServiceConnection = null;
​
        mService = ITextToSpeechService.Stub.asInterface(service);
        mServiceConnection = Connection.this;
​
        mEstablished = false;
        mOnSetupConnectionAsyncTask = new SetupConnectionAsyncTask();
        mOnSetupConnectionAsyncTask.execute();
       }
     } 
     ...
   }
​
  private class DirectConnection extends Connection {
    @Override
    boolean connect(String engine) {
      Intent intent = new Intent(Engine.INTENT_ACTION_TTS_SERVICE);
      intent.setPackage(engine);
      return mContext.bindService(intent, this, Context.BIND_AUTO_CREATE);
     }
     ...
   }
​
  private class SystemConnection extends Connection {
     ...
    boolean connect(String engine) {
      IBinder binder = ServiceManager.getService(Context.TEXT_TO_SPEECH_MANAGER_SERVICE);
       ...
      try {
        manager.createSession(engine, new ITextToSpeechSessionCallback.Stub() {
           ...
         });
        return true;
       } ...
     }
     ...
   }
}

3.2 speak 播报

直面原理:5 张图彻底了解 Android TextToSpeech 机制

后边看看重要的 speak(),体系做了什么详细完成。

  1. 首先将 speak() 对应的调用长途接口的操作封装为 Action 接口实例,并交给 init() 时暂存的已衔接的 Connection 实例去调度。

    // TextToSpeech.java
    public class TextToSpeech {
       ...
      private Connection mServiceConnection;
      
      public int speak(final CharSequence text, ... ) {
        return runAction((ITextToSpeechService service) -> {
           ...
         }, ERROR, "speak");
       }
    ​
      private <R> R runAction(Action<R> action, R errorResult, String method) {
        return runAction(action, errorResult, method, true, true);
       }
    ​
      private <R> R runAction( ... ) {
        synchronized (mStartLock) {
           ...
          return mServiceConnection.runAction(action, errorResult, method, reconnect,
              onlyEstablishedConnection);
         }
       }
      
      private abstract class Connection implements ServiceConnection {
        public <R> R runAction( ... ) {
          synchronized (mStartLock) {
            try {
               ...
              return action.run(mService);
             }
             ...
           }
         }
       }
    }
    
  2. Action 的实际内容是先从 mUtterances Map 里查找方针文本是否有设置过本地的 audio 资源:

    • 如有设置的话,调用用 TTS Engine 的 playAudio() 直接播映
    • 反之调用 text 转 audio 的接口 speak()
    // TextToSpeech.java
    public class TextToSpeech {
       ...
      public int speak(final CharSequence text, ... ) {
        return runAction((ITextToSpeechService service) -> {
          Uri utteranceUri = mUtterances.get(text);
          if (utteranceUri != null) {
            return service.playAudio(getCallerIdentity(), utteranceUri, queueMode,
                getParams(params), utteranceId);
           } else {
            return service.speak(getCallerIdentity(), text, queueMode, getParams(params),
                utteranceId);
           }
         }, ERROR, "speak");
       }
       ...
    }
    

    后边便是 TextToSpeechService 的完成环节。

4 TextToSpeechService 完成

直面原理:5 张图彻底了解 Android TextToSpeech 机制

  1. TextToSpeechService 内接纳的完成是向内部的 SynthHandler 发送封装的 speak 或 playAudio 恳求的 SpeechItem

    SynthHandler 绑定到 TextToSpeechService 初始化的时分启动的、名为 “SynthThread” 的 HandlerThread。

    • speak 恳求封装给 Handler 的是 SynthesisSpeechItem
    • playAudio 恳求封装的是 AudioSpeechItem
    // TextToSpeechService.java
    public abstract class TextToSpeechService extends Service {
      private final ITextToSpeechService.Stub mBinder =
          new ITextToSpeechService.Stub() {
            @Override
            public int speak(
                IBinder caller,
                CharSequence text,
                int queueMode,
                Bundle params,
                String utteranceId) {
              SpeechItem item =
                  new SynthesisSpeechItem(
                      caller,
                      Binder.getCallingUid(),
                      Binder.getCallingPid(),
                      params,
                      utteranceId,
                      text);
              return mSynthHandler.enqueueSpeechItem(queueMode, item);
             }
    ​
            @Override
            public int playAudio( ... ) {
              SpeechItem item =
                  new AudioSpeechItem( ... );
               ...
             }
             ...
           };
       ...
    }
    
  2. SynthHandler 拿到 SpeechItem 后依据 queueMode 的值决定是 stop() 仍是继续播映。播映的话,是封装进一步 play 的操作 Message 给 Handler。

    // TextToSpeechService.java
      private class SynthHandler extends Handler {
         ...
        public int enqueueSpeechItem(int queueMode, final SpeechItem speechItem) {
          UtteranceProgressDispatcher utterenceProgress = null;
          if (speechItem instanceof UtteranceProgressDispatcher) {
            utterenceProgress = (UtteranceProgressDispatcher) speechItem;
           }
    ​
          if (!speechItem.isValid()) {
            if (utterenceProgress != null) {
              utterenceProgress.dispatchOnError(
                  TextToSpeech.ERROR_INVALID_REQUEST);
             }
            return TextToSpeech.ERROR;
           }
    ​
          if (queueMode == TextToSpeech.QUEUE_FLUSH) {
            stopForApp(speechItem.getCallerIdentity());
           } else if (queueMode == TextToSpeech.QUEUE_DESTROY) {
            stopAll();
           }
          Runnable runnable = new Runnable() {
            @Override
            public void run() {
              if (setCurrentSpeechItem(speechItem)) {
                speechItem.play();
                removeCurrentSpeechItem();
               } else {
                speechItem.stop();
               }
             }
           };
          Message msg = Message.obtain(this, runnable);
    ​
          msg.obj = speechItem.getCallerIdentity();
    ​
          if (sendMessage(msg)) {
            return TextToSpeech.SUCCESS;
           } else {
            if (utterenceProgress != null) {
              utterenceProgress.dispatchOnError(TextToSpeech.ERROR_SERVICE);
             }
            return TextToSpeech.ERROR;
           }
         }
         ...
       }
    
  3. play() 详细是调用 playImpl() 继续。关于 SynthesisSpeechItem 来说,将初始化时创立的 SynthesisRequest 实例和 SynthesisCallback 实例(此处的完成是 PlaybackSynthesisCallback)搜集和调用 onSynthesizeText() 进一步处理,用于恳求和回调成果。

    // TextToSpeechService.java
      private abstract class SpeechItem {
         ...
        public void play() {
          synchronized (this) {
            if (mStarted) {
              throw new IllegalStateException("play() called twice");
             }
            mStarted = true;
           }
          playImpl();
         }
       }
    ​
      class SynthesisSpeechItem extends UtteranceSpeechItemWithParams {
        public SynthesisSpeechItem(
             ...
            String utteranceId,
            CharSequence text) {
          mSynthesisRequest = new SynthesisRequest(mText, mParams);
           ...
         }
         ...
        @Override
        protected void playImpl() {
          AbstractSynthesisCallback synthesisCallback;
          mEventLogger.onRequestProcessingStart();
          synchronized (this) {
             ...
            mSynthesisCallback = createSynthesisCallback();
            synthesisCallback = mSynthesisCallback;
           }
    ​
          TextToSpeechService.this.onSynthesizeText(mSynthesisRequest, synthesisCallback);
          if (synthesisCallback.hasStarted() && !synthesisCallback.hasFinished()) {
            synthesisCallback.done();
           }
         }
         ...
       }
    
  4. onSynthesizeText() 是 abstract 方法,需求 Engine 复写以将 text 组成 audio 数据,也是 TTS 功能里最中心的完成。

    • Engine 需求从 SynthesisRequest 中提取 speak 的方针文本、参数等信息,针对不同信息进行差异处理。并经过 SynthesisCallback 的各接口将数据和时机带回:
    • 在数据组成前,经过 start() 奉告体系生成音频的采样频率,多少位 pcm 格局音频,几通道等等。PlaybackSynthesisCallback 的完成将会创立播映的 SynthesisPlaybackQueueItem 交由 AudioPlaybackHandler 去排队调度
    • 之后,经过 audioAvailable() 接口将组成的数据以 byte[] 方式传递回来,会取出 start() 时创立的 QueueItem put 该 audio 数据开始播映
    • 最终,经过 done() 奉告组成结束
    // PlaybackSynthesisCallback.java
    class PlaybackSynthesisCallback extends AbstractSynthesisCallback {
       ...
      @Override
      public int start(int sampleRateInHz, int audioFormat, int channelCount) {
        mDispatcher.dispatchOnBeginSynthesis(sampleRateInHz, audioFormat, channelCount);
    ​
        int channelConfig = BlockingAudioTrack.getChannelConfig(channelCount);
    ​
        synchronized (mStateLock) {
           ...
          SynthesisPlaybackQueueItem item = new SynthesisPlaybackQueueItem(
              mAudioParams, sampleRateInHz, audioFormat, channelCount,
              mDispatcher, mCallerIdentity, mLogger);
          mAudioTrackHandler.enqueue(item);
          mItem = item;
         }
    ​
        return TextToSpeech.SUCCESS;
       }
    ​
      @Override
      public int audioAvailable(byte[] buffer, int offset, int length) {
        SynthesisPlaybackQueueItem item = null;
        synchronized (mStateLock) {
           ...
          item = mItem;
         }
    ​
        final byte[] bufferCopy = new byte[length];
        System.arraycopy(buffer, offset, bufferCopy, 0, length);
        mDispatcher.dispatchOnAudioAvailable(bufferCopy);
    ​
        try {
          item.put(bufferCopy);
         }
         ...
        return TextToSpeech.SUCCESS;
       }
    ​
      @Override
      public int done() {
        int statusCode = 0;
        SynthesisPlaybackQueueItem item = null;
        synchronized (mStateLock) {
           ...
          mDone = true;
    ​
          if (mItem == null) {
            if (mStatusCode == TextToSpeech.SUCCESS) {
              mDispatcher.dispatchOnSuccess();
             } else {
              mDispatcher.dispatchOnError(mStatusCode);
             }
            return TextToSpeech.ERROR;
           }
    ​
          item = mItem;
          statusCode = mStatusCode;
         }
    ​
        if (statusCode == TextToSpeech.SUCCESS) {
          item.done();
         } else {
          item.stop(statusCode);
         }
        return TextToSpeech.SUCCESS;
       }
       ...
    }
    

    上述的 QueueItem 的放置 audio 数据和消费的逻辑如下,首要是 put 操作触发 Lock 接口的 take Condition 恢复履行,最终调用 AudioTrack 去播映。

    // SynthesisPlaybackQueueItem.java
    final class SynthesisPlaybackQueueItem ... {
      void put(byte[] buffer) throws InterruptedException {
        try {
          mListLock.lock();
          long unconsumedAudioMs = 0;
           ...
          mDataBufferList.add(new ListEntry(buffer));
          mUnconsumedBytes += buffer.length;
          mReadReady.signal();
         } finally {
          mListLock.unlock();
         }
       }
    ​
      private byte[] take() throws InterruptedException {
        try {
          mListLock.lock();
    ​
          while (mDataBufferList.size() == 0 && !mStopped && !mDone) {
            mReadReady.await();
           }
           ...
          ListEntry entry = mDataBufferList.poll();
          mUnconsumedBytes -= entry.mBytes.length;
          mNotFull.signal();
    ​
          return entry.mBytes;
         } finally {
          mListLock.unlock();
         }
       }
    ​
      public void run() {
         ...
        final UtteranceProgressDispatcher dispatcher = getDispatcher();
        dispatcher.dispatchOnStart();
    ​
        if (!mAudioTrack.init()) {
          dispatcher.dispatchOnError(TextToSpeech.ERROR_OUTPUT);
          return;
         }
    ​
        try {
          byte[] buffer = null;
          while ((buffer = take()) != null) {
            mAudioTrack.write(buffer);
           }
    ​
         } ...
        mAudioTrack.waitAndRelease();
        dispatchEndStatus();
       }
    ​
      void done() {
        try {
          mListLock.lock();
          mDone = true;
          mReadReady.signal();
          mNotFull.signal();
         } finally {
          mListLock.unlock();
         }
       }
    }
    
  5. 上述 PlaybackSynthesisCallback 在告诉 QueueItem 的一起,会经过 UtteranceProgressDispatcher 接口将数据、成果一并发送给 Request App。

    // TextToSpeechService.java
      interface UtteranceProgressDispatcher {
        void dispatchOnStop();
    ​
        void dispatchOnSuccess();
    ​
        void dispatchOnStart();
    ​
        void dispatchOnError(int errorCode);
    ​
        void dispatchOnBeginSynthesis(int sampleRateInHz, int audioFormat, int channelCount);
    ​
        void dispatchOnAudioAvailable(byte[] audio);
    ​
        public void dispatchOnRangeStart(int start, int end, int frame);
       }
    

    事实上该接口的完成便是 TextToSpeechService 处理 speak 恳求的 UtteranceSpeechItem 实例,其经过缓存着各 ITextToSpeechCallback 接口实例的 CallbackMap 发送回调给 TTS 恳求的 App。(这些 Callback 来自于 TextToSpeech 初始化时分经过 ITextToSpeechService 将 Binder 接口传递来和缓存起来的。)

      private abstract class UtteranceSpeechItem extends SpeechItem
        implements UtteranceProgressDispatcher  {
         ...
        @Override
        public void dispatchOnStart() {
          final String utteranceId = getUtteranceId();
          if (utteranceId != null) {
            mCallbacks.dispatchOnStart(getCallerIdentity(), utteranceId);
           }
         }
    ​
        @Override
        public void dispatchOnAudioAvailable(byte[] audio) {
          final String utteranceId = getUtteranceId();
          if (utteranceId != null) {
            mCallbacks.dispatchOnAudioAvailable(getCallerIdentity(), utteranceId, audio);
           }
         }
    ​
        @Override
        public void dispatchOnSuccess() {
          final String utteranceId = getUtteranceId();
          if (utteranceId != null) {
            mCallbacks.dispatchOnSuccess(getCallerIdentity(), utteranceId);
           }
         }
    ​
        @Override
        public void dispatchOnStop() { ... }
    ​
        @Override
        public void dispatchOnError(int errorCode) { ... }
    ​
        @Override
        public void dispatchOnBeginSynthesis(int sampleRateInHz, int audioFormat, int channelCount) { ... }
    ​
        @Override
        public void dispatchOnRangeStart(int start, int end, int frame) { ... }
       }
    ​
      private class CallbackMap extends RemoteCallbackList<ITextToSpeechCallback> {
         ...
        public void dispatchOnStart(Object callerIdentity, String utteranceId) {
          ITextToSpeechCallback cb = getCallbackFor(callerIdentity);
          if (cb == null) return;
          try {
            cb.onStart(utteranceId);
           } ...
         }
    ​
        public void dispatchOnAudioAvailable(Object callerIdentity, String utteranceId, byte[] buffer) {
          ITextToSpeechCallback cb = getCallbackFor(callerIdentity);
          if (cb == null) return;
          try {
            cb.onAudioAvailable(utteranceId, buffer);
           } ...
         }
    ​
        public void dispatchOnSuccess(Object callerIdentity, String utteranceId) {
          ITextToSpeechCallback cb = getCallbackFor(callerIdentity);
          if (cb == null) return;
          try {
            cb.onSuccess(utteranceId);
           } ...
         }
         ...
       }
    
  6. ITextToSpeechCallback 的履即将经过 TextToSpeech 的中转抵达恳求 App 的 Callback,以履行“TextToSpeech 调用”章节说到的进一步操作

    // TextToSpeech.java
    public class TextToSpeech {
       ...
      private abstract class Connection implements ServiceConnection {
         ...
        private final ITextToSpeechCallback.Stub mCallback =
            new ITextToSpeechCallback.Stub() {
              @Override
              public void onStart(String utteranceId) {
                UtteranceProgressListener listener = mUtteranceProgressListener;
                if (listener != null) {
                  listener.onStart(utteranceId);
                 }
               }
               ...
             };
       }
    }
      
    // TTSTest.kt
    class TTSTest(context: Context) {
      init {
        tts.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
          override fun onStart(utteranceId: String?) { ... }
    ​
          override fun onDone(utteranceId: String?) { ... }
    ​
          override fun onStop(utteranceId: String?, interrupted: Boolean) { ... }
    ​
          override fun onError(utteranceId: String?)  { ... }
         })
       }
       ....
    }
    

5 运用和完成上的注意

关于 TTS 恳求方有几点运用上的主张:

  1. TTS 播报前记得申请对应 type 的音频焦点
  2. TTS Request App 的 Activity 或 Service 生命周期毁掉的时分,比方 onDestroy() 等时分,需求调用 TextToSpeech 的 shutdown() 释放衔接、资源
  3. 能够经过 addSpeech() 指定固定文本的对应 audio 资源(比方说语音里常用的几套唤醒后的欢迎词 audio),在后续的文本恳求时直接播映该 audio,免去文本转语音的过程、提高功率

关于 TTS Engine 供给方也有几点完成上的主张:

  1. TTS Engine 的各完成要和 TTS 的 SynthesisCallback 做好对接,要留意只能在该 callback 现已履行了 start() 并未结束的条件下调用 done()。否则 TTS 会发生如下两种过错:

    • Duplicate call to done()
    • done() was called before start() call
  2. TTS Engine 中心作用是将 text 文本组成 speech 音频数据,组成到数据之后 Engine 当然能够挑选直接播报,甚至不回传音频数据。但主张将音频数据回传,交由体系 AudioTrack 播报。一来交由体体系一播报;二来 Request App 亦能够拿到音频数据进行 cache 和剖析

6 结语

能够看到 Request App 不关怀完成、只需经过 TextToSpeech 几个 API 便可完成 TTS 的播报操作。而且 TTS 的完成也只需求按照 TextToSpeechService 约好的框架、回调完成即可,和 App 的对接作业由体系完成。

咱们再花点时间整理下整个过程:

直面原理:5 张图彻底了解 Android TextToSpeech 机制

流程:

  1. TTS Request App 调用 TextToSpeech 结构函数,由体系预备播报作业前的预备,比方经过 Connection 绑定和初始化方针的 TTS Engine
  2. Request App 供给方针 text 并调用 speak() 恳求
  3. TextToSpeech 会检查方针 text 是否设置过本地的 audio 资源,没有的话回经过 Connection 调用 ITextToSpeechService AIDL 的 speak() 继续
  4. TextToSpeechService 收到后封装恳求 SynthesisRequest 和用于回调成果的 SynthesisCallback 实例
  5. 之后将两者作为参数调用中心完成 onSynthesizeText(),其将解析 Request 并进行 Speech 音频数据组成
  6. 尔后经过 SynthesisCallback 将组成前后的要害回调奉告体系,尤其是 AudioTrack 播映
  7. 一起需求将 speak 恳求的成果奉告 Request App,即经过 UtteranceProgressDispatcher 中转,实际上是调用 ITextToSpeechCallback AIDL
  8. 最终经过 UtteranceProgressListener 奉告 TextToSpeech 初始化时设置的各回调

7 语音文章推荐

  • 怎么打造车载语音交互:Google Voice Interaction 给你答案

8 参考 / 源码

  • OnUtteranceCompletedListener 抛弃
  • TextToSpeech.java
  • TextToSpeechManagerService.java
  • TextToSpeechService.java