目录

基于Unity3D的AiChat模块

引言

随着人工智能技术的快速发展,语音识别已成为现代应用的重要组成部分。在 Unity 开发中,集成语音识别功能可以极大提升用户体验,特别是在游戏、VR/AR 应用和交互式展示中。与传统云端语音识别方案不同,离线语音识别无需网络连接,具有更好的隐私保护性和实时性。Vosk 作为一款开源的离线语音识别库,凭借其轻量级、高精度和跨平台特性,成为 Unity 开发者的理想选择。

Vosk 基于深度神经网络和隐马尔可夫模型(DNN-HMM),支持20 多种语言,包括中文、英语、法语、德语等,并提供多种规模的预训练模型以满足不同场景的需求。其完全离线的特性使其特别适合对数据隐私要求较高的应用场景。

本文将详细介绍如何在 Unity 项目中配置 Vosk 环境,为实现语音识别功能奠定基础。

1. Vosk 简介与特点

Vosk 是一个基于 Kaldi 语音识别工具包构建的开源离线语音识别引擎,具有以下核心特点:

  • 完全离线工作:无需网络连接,所有数据处理均在设备本地完成,保证了数据安全和隐私保护
  • 多语言支持:支持中文、英文、法文、德文等 20 多种语言,满足国际化项目需求
  • 轻量高效:模型体积小(最小仅 12MB),内存占用低,在树莓派等嵌入式设备上也能流畅运行
  • 高准确率:基于深度学习算法,在安静环境下识别准确率可达 95%以上
  • 跨平台兼容:支持 Windows、Linux、macOS、Android 和 iOS 等多个平台
  • 实时识别:提供流式 API,支持实时语音识别,延迟控制在 200ms 以内

与其他语音识别方案相比,Vosk 在资源消耗和响应速度方面表现优异,特别适合集成到 Unity 项目中实现实时语音交互功能。

2. 环境配置准备工作

在开始集成 Vosk 前,需要完成以下准备工作:

2.1 系统与 Unity 要求

  • Unity 版本:建议使用 2019.4 或更高版本,支持.NET 4.x 或更高版本
  • 操作系统:Windows、macOS 或 Linux 开发环境
  • 存储空间:至少 1GB 可用空间(用于存放模型文件)

2.2 下载 Vosk 相关文件

  1. Vosk Unity 插件:从 GitHub 获取 Vosk 的 C#绑定库(https://github.com/alphacep/vosk-unity-asr)
  2. 语音模型:从 Vosk 模型库(https://alphacephei.com/vosk/models)下载所需语言模型:
    • 中文小型模型(vosk-model-small-cn-0.22,约 42MB):适合移动设备和嵌入式系统
    • 中文标准模型(vosk-model-cn-0.22,约 1.3GB):提供更高精度,适合服务器或高性能设备

3. Unity 项目配置步骤

3.1 创建 Unity 项目并导入 Vosk

  1. 新建 Unity 项目或打开现有项目
  2. 在 Assets 文件夹中创建Plugins文件夹,存放 Vosk 的 DLL 文件(如libvosk.dllvosk.dll等)
  3. 将下载的 Vosk Unity 插件文件导入到项目中

3.2 导入模型文件

  1. 在 Assets 目录下创建StreamingAssets文件夹(如果尚未存在)
  2. 将下载的模型压缩包(如vosk-model-small-cn-0.22.zip)直接放入StreamingAssets文件夹中
    • 注意:无需解压模型文件,Vosk 可以直接读取压缩包内容

3.3 配置播放器设置

  1. 打开"File > Build Settings > Player Settings"
  2. 在"Configuration"选项中,确保".NET Runtime Version"设置为".NET 4.x"或更高版本
  3. 根据目标平台进行相应设置:
    • Windows:无需特殊配置
    • Android:确保设置适当的权限(麦克风访问权限)
    • iOS:需要额外配置麦克风使用描述

4. 模型选择与优化建议

4.1 模型选择策略

根据应用场景选择合适的模型至关重要:

模型类型大小适用场景硬件要求
小型模型40-50MB移动设备、嵌入式系统低端 CPU,256MB+内存
标准模型1.3-1.5GB桌面应用、服务器多核 CPU,2GB+内存
专业模型1.5GB+专业语音识别高性能 CPU,8GB+内存

4.2 性能优化建议

  • 音频格式配置:确保音频输入为 16kHz、16 位单声道格式,这是 Vosk 模型的标准输入格式
  • 预处理优化:使用音频滤波算法减少背景噪音干扰
  • 资源管理:在不需要语音识别时及时释放识别器资源,减少内存占用
  • 多线程处理:将语音识别处理放在单独线程中,避免阻塞主线程

5. 常见问题与解决方案

在配置和使用 Vosk 过程中可能会遇到以下常见问题:

  1. 模型加载失败

    • 原因:模型路径错误或模型文件不完整
    • 解决:检查模型文件是否放置在StreamingAssets文件夹中,并确认文件完整性
  2. 识别准确率低

    • 原因:环境噪音或音频格式不匹配
    • 解决:添加音频预处理环节,确保输入音频符合 16kHz、16 位单声道要求
  3. 性能问题

    • 原因:模型过大或硬件资源不足
    • 解决:根据设备性能选择合适的模型规模,或考虑添加加载屏幕
  4. 平台兼容性问题

    • 原因:不同平台的库文件不兼容
    • 解决:确保使用针对目标平台编译的 Vosk 库文件

6. 基于 Vosk 的 AI 聊天代码实现

主控模块

using UnityEngine;
using UnityEngine.Networking;
using UnityEngine.UI;
using TMPro;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;

public class AiChat : MonoBehaviour
{
    [Header("UI 绑定")]
    [SerializeField] public TMP_InputField inputField;       // 问题输入框
    [SerializeField] public Button submitButton;             // 提交按钮
    [SerializeField] Text answerText;                 // 答案文本框
    [SerializeField] Toggle ttsToggle;                // 语音合成开关

    [Header("API Settings")]
    [SerializeField] string apiUrl = "接口链接";
    [SerializeField] string apiKey = "密钥";

    [Header("Chat Settings")]
    [SerializeField] public string askTag = "";
    [SerializeField] public AIChatAssistant getRAGChat;


    private bool isStreaming = false;
    private Coroutine streamCoroutine;
    private StringBuilder fullResponse = new StringBuilder();

    void Start()
    {
        //判断网络状态
        if (Application.internetReachability != NetworkReachability.NotReachable)
        {
            // 原:绑定按钮点击事件
            // submitButton.onClick.AddListener(OnSubmitClicked);

            // 当前已被RAGC模块劫持
            submitButton.onClick.AddListener(OnSubmitClick);
            // 直接监听RAGChat组件
            submitButton.onClick.AddListener(() => StartCoroutine(getRAGChat.SubmitQuestion()));

            // 初始状态设置
            submitButton.interactable = true;
            answerText.text = "等待输入问题...";
            //默认打开语音合成开关
            ttsToggle.isOn = true;
        }
        else
        {
            answerText.text = "请检查网络连接!";
        }
    }

    // 点击劫持方法
    void OnSubmitClick()
    {
        getRAGChat.questionInput = inputField.text;
    }

    public void OnSubmitClicked()
    {
        if (string.IsNullOrWhiteSpace(inputField.text))
        {
            answerText.text = "<color=#FF0000>请输入有效问题!</color>";
            return;
        }

        // 如果已有请求在进行中,先停止
        if (isStreaming && streamCoroutine != null)
        {
            StopCoroutine(streamCoroutine);
        }

        // 重置状态
        isStreaming = true;
        fullResponse.Clear();
        answerText.text = "思考中...";
        submitButton.interactable = false;

        // 开始流式请求
        streamCoroutine = StartCoroutine(StreamChatCompletion(inputField.text));
    }

    IEnumerator StreamChatCompletion(string userMessage)
    {
        // 准备请求数据
        var requestData = new RequestData
        {
            messages = new List<Message>
            {
                new Message { role = "user", content = userMessage + "," + askTag }
            },
            stream = true
        };

        string jsonPayload = JsonUtility.ToJson(requestData);
        byte[] payloadBytes = Encoding.UTF8.GetBytes(jsonPayload);

        // 创建Web请求
        using (UnityWebRequest request = new UnityWebRequest(apiUrl, "POST"))
        {
            request.uploadHandler = new UploadHandlerRaw(payloadBytes);
            request.downloadHandler = new DownloadHandlerBuffer();
            request.SetRequestHeader("Content-Type", "application/json");
            request.SetRequestHeader("Authorization", "Bearer " + apiKey);
            request.disposeDownloadHandlerOnDispose = true;

            // 发送请求
            yield return request.SendWebRequest();

            if (request.result == UnityWebRequest.Result.ConnectionError ||
                request.result == UnityWebRequest.Result.ProtocolError)
            {
                Debug.LogError($"API Error: {request.error}");
                Debug.LogError($"Response Code: {request.responseCode}");
                Debug.LogError($"Response: {request.downloadHandler.text}");
                answerText.text = $"<color=#FF0000>请求失败: {request.error}</color>";
                isStreaming = false;
                submitButton.interactable = true;
                yield break;
            }

            // 获取完整响应
            string rawResponse = request.downloadHandler.text;
            Debug.Log($"Raw API Response: {rawResponse}");

            // 处理响应
            if (string.IsNullOrEmpty(rawResponse))
            {
                answerText.text = "<color=#FFA500>服务器返回空响应</color>";
                yield break;
            }

            // 分割响应行
            string[] responseLines = rawResponse.Split('\n');
            bool receivedValidResponse = false;

            foreach (string line in responseLines)
            {
                if (string.IsNullOrWhiteSpace(line)) continue;

                string trimmedLine = line.Trim();

                // 检查结束标记
                if (trimmedLine == "[DONE]")
                {
                    Debug.Log("Received [DONE] marker");
                    break;
                }

                // 处理SSE格式 (data: {...})
                string jsonStr = trimmedLine;
                if (trimmedLine.StartsWith("data:"))
                {
                    jsonStr = trimmedLine.Substring(5).Trim();
                }

                // 跳过事件标记
                if (jsonStr == "event:message") continue;

                try
                {
                    // 反转义处理
                    string unescapedStr = jsonStr
                        .Replace("\\\"", "\"")
                        .Replace("\\\\", "\\")
                        .Replace("\\n", "\n")
                        .Replace("\\r", "\r")
                        .Replace("\\t", "\t");

                    // 移除多余的双引号
                    if (unescapedStr.StartsWith("\"") && unescapedStr.EndsWith("\""))
                    {
                        unescapedStr = unescapedStr.Substring(1, unescapedStr.Length - 2);
                    }

                    // 调试输出
                    Debug.Log($"Processing line: {unescapedStr}");

                    // 解析JSON
                    var response = JsonUtility.FromJson<StreamResponse>(unescapedStr);

                    // 提取内容
                    if (response.choices != null && response.choices.Length > 0)
                    {
                        if (response.choices[0].delta != null &&
                            !string.IsNullOrEmpty(response.choices[0].delta.content))
                        {
                            string content = response.choices[0].delta.content;
                            fullResponse.Append(content);
                            answerText.text = fullResponse.ToString();
                            receivedValidResponse = true;
                        }
                    }
                }
                catch (Exception e)
                {
                    Debug.LogWarning($"解析错误: {e.Message}\n原始数据: {jsonStr}");
                }

                yield return null; // 确保UI更新
            }

            // 完成处理
            isStreaming = false;
            submitButton.interactable = true;

            // 文本转语音 —————————
            // 校验是否打开自动语音合成 —————————————————————————————————
            if (ttsToggle.isOn) UITTSController.Instance.OnConvertClick();

            if (!receivedValidResponse)
            {
                // 尝试提取错误信息
                if (rawResponse.Contains("error"))
                {
                    try
                    {
                        var errorResponse = JsonUtility.FromJson<ErrorResponse>(rawResponse);
                        answerText.text = $"<color=#FF0000>API错误: {errorResponse.error.message}</color>";
                    }
                    catch
                    {
                        answerText.text = $"<color=#FF0000>未知API错误: {rawResponse}</color>";
                    }
                }
                else if (fullResponse.Length > 0)
                {
                    answerText.text = fullResponse.ToString();
                }
                else
                {
                    answerText.text = $"<color=#FFA500>未收到有效响应,原始数据:\n{rawResponse}</color>";
                }
            }
        }
    }

    // 请求数据结构
    [System.Serializable]
    private class RequestData
    {
        public List<Message> messages;
        public bool stream;
    }

    [System.Serializable]
    private class Message
    {
        public string role;
        public string content;
    }

    // 响应数据结构
    [System.Serializable]
    private class StreamResponse
    {
        public string id;
        public string @object;
        public int created;
        public string model;
        public Choice[] choices;
    }

    [System.Serializable]
    private class Choice
    {
        public int index;
        public Delta delta;
        public object logprobs;
        public string finish_reason;
    }

    [System.Serializable]
    private class Delta
    {
        public string content;
    }

    // 错误响应结构
    [System.Serializable]
    private class ErrorResponse
    {
        public ErrorInfo error;
    }

    [System.Serializable]
    private class ErrorInfo
    {
        public string message;
        public string type;
        public string code;
    }
}

TTS 语音合成模块

AudioManager.cs

using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class AudioManager : MonoBehaviour
{
    private AudioSource audioSource;

    void Awake()
    {
        audioSource = gameObject.AddComponent<AudioSource>();
    }

    public IEnumerator DownloadAndPlayAudio(string url)
    {
        Debug.Log($"开始下载音频: {url}");

        // 强制指定MIME类型
        using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
        {
            ((DownloadHandlerAudioClip)www.downloadHandler).streamAudio = true;
            ((DownloadHandlerAudioClip)www.downloadHandler).compressed = false;

            // 添加超时控制
            www.timeout = 15;
            var operation = www.SendWebRequest();

            while (!operation.isDone)
            {
                Debug.Log($"下载进度: {www.downloadProgress:P}");
                yield return null;
            }

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError($"下载失败: {www.error},响应头: {www.GetResponseHeaders()}");
                yield break;
            }

            Debug.Log($"音频下载完成,长度: {www.downloadedBytes} bytes");
            AudioClip clip = DownloadHandlerAudioClip.GetContent(www);

            if (clip == null || clip.length == 0)
            {
                Debug.LogError("音频解码失败");
                yield break;
            }

            audioSource.clip = clip;
            audioSource.Play();
            Debug.Log("音频开始播放");
        }

        using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
        {

            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError($"音频下载失败: {www.error}");
                yield break;
            }

            AudioClip clip = DownloadHandlerAudioClip.GetContent(www);
            audioSource.clip = clip;
            audioSource.Play();
        }
    }

    public void TogglePause()
    {
        if (audioSource.isPlaying)
        {
            audioSource.Pause();
        }
        else
        {
            audioSource.UnPause();
        }
    }

    public void StopPlayback()
    {
        audioSource.Stop();
    }

    public bool IsPlaying()
    {
        return audioSource.isPlaying;
    }

    public float GetPlaybackProgress()
    {
        if (audioSource.clip == null || Mathf.Approximately(audioSource.clip.length, 0f))
        {
            return 0f;
        }
        return Mathf.Clamp01(audioSource.time / audioSource.clip.length);
    }
}

BaiduTTSController.cs

using System.Collections;
using UnityEngine;
using UnityEngine.Networking;

public class BaiduTTSController : MonoBehaviour
{
    // 在百度云控制台获取的实际凭证
    private const string CLIENT_ID = "ID";
    private const string CLIENT_SECRET = "密钥";
    private string accessToken = "";

    // 异步获取Access Token
    public IEnumerator GetAccessToken()
    {
        string url = $"百度智能云链接client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}";

        using (UnityWebRequest www = UnityWebRequest.Get(url))
        {
            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError($"Token请求失败: {www.error}");
                yield break;
            }

            TokenResponse response = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text);
            accessToken = response.access_token;
        }
    }

    // 创建语音合成任务
    public IEnumerator CreateTTSTask(string text, System.Action<string> callback)
    {
        string apiUrl = $"接口链接access_token={accessToken}";

        CreateTaskRequest requestData = new CreateTaskRequest
        {
            text = text,
            format = "mp3-16k",
            voice = 0,
            lang = "zh",
            speed = 5,
            pitch = 5,
            volume = 5
        };

        using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
        {
            byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
            www.uploadHandler = new UploadHandlerRaw(bodyRaw);
            www.downloadHandler = new DownloadHandlerBuffer();
            www.SetRequestHeader("Content-Type", "application/json");

            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError($"任务创建失败: {www.error}");
                yield break;
            }

            TaskCreateResponse response = JsonUtility.FromJson<TaskCreateResponse>(www.downloadHandler.text);
            callback?.Invoke(response.task_id);
        }
    }

    // 查询任务状态
    public IEnumerator QueryTaskStatus(string taskId, System.Action<string> callback)
    {
        string apiUrl = $"接口链接access_token={accessToken}";

        QueryTaskRequest requestData = new QueryTaskRequest
        {
            task_ids = new string[] { taskId }
        };

        using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
        {
            byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
            www.uploadHandler = new UploadHandlerRaw(bodyRaw);
            www.downloadHandler = new DownloadHandlerBuffer();
            www.SetRequestHeader("Content-Type", "application/json");

            yield return www.SendWebRequest();

            if (www.result != UnityWebRequest.Result.Success)
            {
                Debug.LogError($"状态查询失败: {www.error}");
                yield break;
            }

            TaskQueryResponse response = JsonUtility.FromJson<TaskQueryResponse>(www.downloadHandler.text);
            if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
            {
                callback?.Invoke(response.tasks_info[0].task_result.speech_url);
            }

            if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
            {
                string audioUrl = response.tasks_info[0].task_result.speech_url;

                Debug.Log($"获取音频地址: {audioUrl}");


                // 添加URL预验证
                using (UnityWebRequest headRequest = UnityWebRequest.Head(audioUrl))
                {
                    yield return headRequest.SendWebRequest();
                    if (headRequest.result == UnityWebRequest.Result.Success)
                    {
                        callback?.Invoke(audioUrl);
                    }
                    else
                    {
                        Debug.LogError($"音频地址不可用: {headRequest.error}");
                    }
                }
            }

        }
    }

    // 数据模型
    [System.Serializable]
    private class TokenResponse
    {
        public string access_token;
    }

    [System.Serializable]
    private class CreateTaskRequest
    {
        public string text;
        public string format;
        public int voice;
        public string lang;
        public int speed;
        public int pitch;
        public int volume;
    }

    [System.Serializable]
    private class TaskCreateResponse
    {
        public string task_id;
    }

    [System.Serializable]
    private class QueryTaskRequest
    {
        public string[] task_ids;
    }

    [System.Serializable]
    private class TaskQueryResponse
    {
        public TaskInfo[] tasks_info;
    }

    [System.Serializable]
    private class TaskInfo
    {
        public string task_status;
        public TaskResult task_result;
    }

    [System.Serializable]
    private class TaskResult
    {
        public string speech_url;
    }
}

UITTSController.cs

using System.Collections;
using TMPro;
using UnityEngine;
using UnityEngine.UI;

public class UITTSController : MonoBehaviour
{
    public static UITTSController Instance { get; private set; }
    private void Awake()
    {
        if (Instance != null && Instance != this)
        {
            Destroy(gameObject);
            return;
        }
        Instance = this;
    }

    [Header("UI Components")]
    public Text inputField;
    public Button convertButton;
    public Text statusText;
    public Slider progressSlider;

    private BaiduTTSController ttsController;
    private AudioManager audioManager;

    void Start()
    {
        ttsController = gameObject.AddComponent<BaiduTTSController>();
        audioManager = gameObject.AddComponent<AudioManager>();

        //判断网络状态
        if(Application.internetReachability != NetworkReachability.NotReachable){
            StartCoroutine(InitializeSystem());
            // 按钮事件绑定
            convertButton.onClick.AddListener(OnConvertClick);
        }
        else{
            statusText.text = "网络连接失败";
        }

    }


    IEnumerator InitializeSystem()
    {
        statusText.text = "正在初始化";
        yield return ttsController.GetAccessToken();
        statusText.text = "朗读语音就绪";
        convertButton.interactable = true;
    }

    public void OnConvertClick()
    {
        if (string.IsNullOrEmpty(inputField.text)) return;

        StartCoroutine(ConvertProcess());
    }

    IEnumerator ConvertProcess()
    {
        convertButton.interactable = false;
        statusText.text = "正在生成语音";

        // 创建任务
        yield return ttsController.CreateTTSTask(inputField.text, (taskId) => {
            StartCoroutine(PollTaskStatus(taskId));
        });
    }

    IEnumerator PollTaskStatus(string taskId)
    {
        float timeout = 30f;
        float pollInterval = 1f;
        bool isCompleted = false;

        while (timeout > 0 && !isCompleted)
        {
            statusText.text = $"处理中...{timeout}秒";

            // 使用Coroutine等待单次查询完成
            yield return StartCoroutine(ttsController.QueryTaskStatus(taskId, (audioUrl) => {
                StartCoroutine(PlayAudio(audioUrl));
                isCompleted = true;
            }));

            if (isCompleted) break;

            yield return new WaitForSeconds(pollInterval);
            timeout -= pollInterval;
        }

        if (!isCompleted)
        {
            statusText.text = "请求超时";
            Debug.LogError("状态轮询超时,最后响应数据:");
        }
        convertButton.interactable = true;
    }

    IEnumerator PlayAudio(string url)
    {
        statusText.text = "正在转载...";
        yield return audioManager.DownloadAndPlayAudio(url);

        statusText.text = "播放中";
        convertButton.interactable = true;

        // 更新进度条
        while (audioManager.IsPlaying())
        {
            progressSlider.value = audioManager.GetPlaybackProgress();
            yield return null;
        }
    }
}

Vosk 语音识别模块

VoskSpeechRecognizer.cs

using UnityEngine;
using UnityEngine.UI;
using System.Threading;
using System.Collections.Concurrent;
using System.Collections;
using Vosk;
using Newtonsoft.Json.Linq;
using TMPro;

public class VoskSpeechRecognizer : MonoBehaviour
{
    public Button toggleButton;
    public Text resultText;
    public TMP_InputField outputText;
    public string modelPath = Application.streamingAssetsPath + "/Assets/vosk-model-small-cn-0.22"; // 替换为你的模型路径

    private VoskRecognizer recognizer;
    private AudioClip recordingClip;
    private bool isRecording;
    private Thread recognitionThread;
    private int sampleRate = 16000;

    // 主线程安全的变量
    private string displayText = "语音识别就绪";
    private string threadStatus = "";
    private string partialResult = "";
    private string finalResult = "";

    private ConcurrentQueue<float[]> audioDataQueue = new ConcurrentQueue<float[]>();
    private ConcurrentQueue<string> statusQueue = new ConcurrentQueue<string>();
    private int lastPosition = 0;
    private bool modelInitialized = false;

    void Start()
    {
        displayText = "初始化中...";
        resultText.text = displayText;
        StartCoroutine(InitializeModel());
    }

    IEnumerator InitializeModel()
    {
        try
        {
            // 初始化Vosk环境
            Vosk.Vosk.SetLogLevel(0);
            Model model = new Model(modelPath);
            recognizer = new VoskRecognizer(model, sampleRate);
            modelInitialized = true;

            displayText = "就绪,点击按钮开始识别";
            toggleButton.interactable = true;
            toggleButton.onClick.AddListener(ToggleRecording);
        }
        catch (System.Exception e)
        {
            displayText = $"初始化失败: {e.Message}";
            Debug.LogError(e);
        }
        yield return null;
    }

    void Update()
    {
        // 1. 在主线程收集音频数据
        if (isRecording && Microphone.IsRecording(null))
        {
            int currentPosition = Microphone.GetPosition(null);
            if (currentPosition < lastPosition)
            {
                statusQueue.Enqueue("检测到音频缓冲区循环");
                lastPosition = 0;
            }

            if (currentPosition > lastPosition)
            {
                int sampleCount = currentPosition - lastPosition;
                float[] samples = new float[sampleCount];

                if (recordingClip != null)
                {
                    recordingClip.GetData(samples, lastPosition);
                    audioDataQueue.Enqueue(samples);
                    lastPosition = currentPosition;
                }
                else
                {
                    statusQueue.Enqueue("错误:录音Clip为空");
                }
            }
        }

        // 2. 处理来自后台线程的状态更新
        while (statusQueue.TryDequeue(out string status))
        {
            threadStatus = status;
            Debug.Log(status);
        }

        // 3. 更新显示文本(优先级:最终结果 > 部分结果 > 线程状态 > 默认文本)
        if (!string.IsNullOrEmpty(finalResult))
        {
            displayText = $"最终结果: {finalResult}";
            outputText.text = finalResult;
        }
        else if (!string.IsNullOrEmpty(partialResult))
        {
            displayText = $"实时识别: {partialResult}";
            outputText.text = partialResult;
        }
        else if (!string.IsNullOrEmpty(threadStatus))
        {
            displayText = threadStatus;
        }

        // 4. 更新UI
        resultText.text = displayText;

    }

    void ToggleRecording()
    {
        if (!modelInitialized)
        {
            displayText = "模型未初始化完成";
            return;
        }

        isRecording = !isRecording;
        toggleButton.GetComponentInChildren<Text>().text = isRecording ? "停止" : "开始";

        if (isRecording)
        {
            // 开始录音
            try
            {
                displayText = "正在启动麦克风...";

                // 重置状态
                partialResult = "";
                finalResult = "";
                threadStatus = "";
                audioDataQueue = new ConcurrentQueue<float[]>();
                lastPosition = 0;

                recordingClip = Microphone.Start(null, true, 10, sampleRate);

                if (recordingClip == null)
                {
                    displayText = "无法创建录音Clip";
                    isRecording = false;
                    return;
                }

                statusQueue.Enqueue("音频处理线程启动");

                recognitionThread = new Thread(ProcessAudio);
                recognitionThread.IsBackground = true;
                recognitionThread.Start();
            }
            catch (System.Exception e)
            {
                displayText = $"启动录音失败: {e.Message}";
                isRecording = false;
                Debug.LogError(e);
            }
        }
        else
        {
            // 停止录音
            displayText = "正在停止录音...";
            Microphone.End(null);
            isRecording = false;

            if (recognitionThread != null && recognitionThread.IsAlive)
            {
                recognitionThread.Abort();
            }

            if (!string.IsNullOrEmpty(finalResult))
            {
                displayText = $"最终结果: {finalResult}";
            }
            else
            {
                displayText = "识别结束,无结果";
            }
        }
    }

    void ProcessAudio()
    {
        statusQueue.Enqueue("音频处理线程启动");

        while (isRecording)
        {
            if (audioDataQueue.TryDequeue(out float[] samples))
            {
                // 转换为字节数据
                byte[] audioBytes = new byte[samples.Length * 2];
                for (int i = 0; i < samples.Length; i++)
                {
                    short sample = (short)(samples[i] * short.MaxValue);
                    audioBytes[i * 2] = (byte)(sample & 0xFF);
                    audioBytes[i * 2 + 1] = (byte)(sample >> 8);
                }

                try
                {
                    // 语音识别处理
                    if (recognizer.AcceptWaveform(audioBytes, audioBytes.Length))
                    {
                        var result = recognizer.Result();
                        finalResult = JObject.Parse(result)["text"]?.ToString() ?? "无文本结果";
                        partialResult = "";
                        statusQueue.Enqueue($"最终结果: {finalResult}");
                    }
                    else
                    {
                        var partial = recognizer.PartialResult();
                        partialResult = JObject.Parse(partial)["partial"]?.ToString() ?? "解析部分结果失败";
                        statusQueue.Enqueue($"部分结果: {partialResult}");
                    }
                }
                catch (System.Exception e)
                {
                    statusQueue.Enqueue($"识别错误: {e.Message}");
                    Debug.LogError(e);
                }
            }
            else
            {
                Thread.Sleep(10);
            }
        }
    }

    void OnApplicationQuit()
    {
        isRecording = false;
        if (recognitionThread != null && recognitionThread.IsAlive)
        {
            recognitionThread.Abort();
        }

        if (recognizer != null)
        {
            recognizer.Dispose();
        }

        Debug.Log("Vosk资源已释放");
    }

    // 添加移动端麦克风权限检查
    IEnumerator RequestMicrophonePermission()
    {
        if (Application.platform == RuntimePlatform.Android ||
            Application.platform == RuntimePlatform.IPhonePlayer)
        {
            displayText = "请求麦克风权限...";
            yield return Application.RequestUserAuthorization(UserAuthorization.Microphone);

            if (!Application.HasUserAuthorization(UserAuthorization.Microphone))
            {
                displayText = "需要麦克风权限";
                yield break;
            }
        }

        // 继续初始化
        StartCoroutine(InitializeModel());
    }
}

结语

通过以上步骤,我们成功在 Unity 项目中配置了 Vosk 离线语音识别环境。Vosk 作为一个轻量级、高精度的离线语音识别解决方案,为 Unity 开发者提供了实现语音交互功能的强大工具。其离线特性特别适合对数据隐私要求高的应用场景,而跨平台支持则使得一次开发即可部署到多种设备。

正确配置环境只是实现语音识别的第一步,在实际开发中还需要根据具体应用场景调整参数和优化性能。建议从小型模型开始测试,逐步优化识别效果,再根据需求决定是否需要升级到更大规模的模型。

随着语音交互技术的不断发展,Vosk 这样的离线识别方案将在更多应用场景中发挥重要作用,为用户提供更自然、更安全的交互体验。

参考资料

  1. Vosk 官方模型库
  2. Vosk Unity 插件 GitHub 页面
  3. Unity 音频系统文档

注意:本文仅涉及环境配置部分,实际语音识别功能的实现需要编写 C#脚本处理音频输入和调用 Vosk 接口。请参考 Vosk 官方文档和示例代码了解具体实现方法。