基于Unity3D的AiChat模块
引言
随着人工智能技术的快速发展,语音识别已成为现代应用的重要组成部分。在 Unity 开发中,集成语音识别功能可以极大提升用户体验,特别是在游戏、VR/AR 应用和交互式展示中。与传统云端语音识别方案不同,离线语音识别无需网络连接,具有更好的隐私保护性和实时性。Vosk 作为一款开源的离线语音识别库,凭借其轻量级、高精度和跨平台特性,成为 Unity 开发者的理想选择。
Vosk 基于深度神经网络和隐马尔可夫模型(DNN-HMM),支持20 多种语言,包括中文、英语、法语、德语等,并提供多种规模的预训练模型以满足不同场景的需求。其完全离线的特性使其特别适合对数据隐私要求较高的应用场景。
本文将详细介绍如何在 Unity 项目中配置 Vosk 环境,为实现语音识别功能奠定基础。
1. Vosk 简介与特点
Vosk 是一个基于 Kaldi 语音识别工具包构建的开源离线语音识别引擎,具有以下核心特点:
- 完全离线工作:无需网络连接,所有数据处理均在设备本地完成,保证了数据安全和隐私保护
- 多语言支持:支持中文、英文、法文、德文等 20 多种语言,满足国际化项目需求
- 轻量高效:模型体积小(最小仅 12MB),内存占用低,在树莓派等嵌入式设备上也能流畅运行
- 高准确率:基于深度学习算法,在安静环境下识别准确率可达 95%以上
- 跨平台兼容:支持 Windows、Linux、macOS、Android 和 iOS 等多个平台
- 实时识别:提供流式 API,支持实时语音识别,延迟控制在 200ms 以内
与其他语音识别方案相比,Vosk 在资源消耗和响应速度方面表现优异,特别适合集成到 Unity 项目中实现实时语音交互功能。
2. 环境配置准备工作
在开始集成 Vosk 前,需要完成以下准备工作:
2.1 系统与 Unity 要求
- Unity 版本:建议使用 2019.4 或更高版本,支持.NET 4.x 或更高版本
- 操作系统:Windows、macOS 或 Linux 开发环境
- 存储空间:至少 1GB 可用空间(用于存放模型文件)
2.2 下载 Vosk 相关文件
- Vosk Unity 插件:从 GitHub 获取 Vosk 的 C#绑定库(https://github.com/alphacep/vosk-unity-asr)
- 语音模型:从 Vosk 模型库(https://alphacephei.com/vosk/models)下载所需语言模型:
- 中文小型模型(vosk-model-small-cn-0.22,约 42MB):适合移动设备和嵌入式系统
- 中文标准模型(vosk-model-cn-0.22,约 1.3GB):提供更高精度,适合服务器或高性能设备
3. Unity 项目配置步骤
3.1 创建 Unity 项目并导入 Vosk
- 新建 Unity 项目或打开现有项目
- 在 Assets 文件夹中创建
Plugins文件夹,存放 Vosk 的 DLL 文件(如libvosk.dll、vosk.dll等) - 将下载的 Vosk Unity 插件文件导入到项目中
3.2 导入模型文件
- 在 Assets 目录下创建
StreamingAssets文件夹(如果尚未存在) - 将下载的模型压缩包(如
vosk-model-small-cn-0.22.zip)直接放入StreamingAssets文件夹中- 注意:无需解压模型文件,Vosk 可以直接读取压缩包内容
3.3 配置播放器设置
- 打开"File > Build Settings > Player Settings"
- 在"Configuration"选项中,确保".NET Runtime Version"设置为".NET 4.x"或更高版本
- 根据目标平台进行相应设置:
- Windows:无需特殊配置
- Android:确保设置适当的权限(麦克风访问权限)
- iOS:需要额外配置麦克风使用描述
4. 模型选择与优化建议
4.1 模型选择策略
根据应用场景选择合适的模型至关重要:
| 模型类型 | 大小 | 适用场景 | 硬件要求 |
|---|---|---|---|
| 小型模型 | 40-50MB | 移动设备、嵌入式系统 | 低端 CPU,256MB+内存 |
| 标准模型 | 1.3-1.5GB | 桌面应用、服务器 | 多核 CPU,2GB+内存 |
| 专业模型 | 1.5GB+ | 专业语音识别 | 高性能 CPU,8GB+内存 |
4.2 性能优化建议
- 音频格式配置:确保音频输入为 16kHz、16 位单声道格式,这是 Vosk 模型的标准输入格式
- 预处理优化:使用音频滤波算法减少背景噪音干扰
- 资源管理:在不需要语音识别时及时释放识别器资源,减少内存占用
- 多线程处理:将语音识别处理放在单独线程中,避免阻塞主线程
5. 常见问题与解决方案
在配置和使用 Vosk 过程中可能会遇到以下常见问题:
模型加载失败
- 原因:模型路径错误或模型文件不完整
- 解决:检查模型文件是否放置在
StreamingAssets文件夹中,并确认文件完整性
识别准确率低
- 原因:环境噪音或音频格式不匹配
- 解决:添加音频预处理环节,确保输入音频符合 16kHz、16 位单声道要求
性能问题
- 原因:模型过大或硬件资源不足
- 解决:根据设备性能选择合适的模型规模,或考虑添加加载屏幕
平台兼容性问题
- 原因:不同平台的库文件不兼容
- 解决:确保使用针对目标平台编译的 Vosk 库文件
6. 基于 Vosk 的 AI 聊天代码实现
主控模块
using UnityEngine;
using UnityEngine.Networking;
using UnityEngine.UI;
using TMPro;
using System;
using System.Collections;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
public class AiChat : MonoBehaviour
{
[Header("UI 绑定")]
[SerializeField] public TMP_InputField inputField; // 问题输入框
[SerializeField] public Button submitButton; // 提交按钮
[SerializeField] Text answerText; // 答案文本框
[SerializeField] Toggle ttsToggle; // 语音合成开关
[Header("API Settings")]
[SerializeField] string apiUrl = "接口链接";
[SerializeField] string apiKey = "密钥";
[Header("Chat Settings")]
[SerializeField] public string askTag = "";
[SerializeField] public AIChatAssistant getRAGChat;
private bool isStreaming = false;
private Coroutine streamCoroutine;
private StringBuilder fullResponse = new StringBuilder();
void Start()
{
//判断网络状态
if (Application.internetReachability != NetworkReachability.NotReachable)
{
// 原:绑定按钮点击事件
// submitButton.onClick.AddListener(OnSubmitClicked);
// 当前已被RAGC模块劫持
submitButton.onClick.AddListener(OnSubmitClick);
// 直接监听RAGChat组件
submitButton.onClick.AddListener(() => StartCoroutine(getRAGChat.SubmitQuestion()));
// 初始状态设置
submitButton.interactable = true;
answerText.text = "等待输入问题...";
//默认打开语音合成开关
ttsToggle.isOn = true;
}
else
{
answerText.text = "请检查网络连接!";
}
}
// 点击劫持方法
void OnSubmitClick()
{
getRAGChat.questionInput = inputField.text;
}
public void OnSubmitClicked()
{
if (string.IsNullOrWhiteSpace(inputField.text))
{
answerText.text = "<color=#FF0000>请输入有效问题!</color>";
return;
}
// 如果已有请求在进行中,先停止
if (isStreaming && streamCoroutine != null)
{
StopCoroutine(streamCoroutine);
}
// 重置状态
isStreaming = true;
fullResponse.Clear();
answerText.text = "思考中...";
submitButton.interactable = false;
// 开始流式请求
streamCoroutine = StartCoroutine(StreamChatCompletion(inputField.text));
}
IEnumerator StreamChatCompletion(string userMessage)
{
// 准备请求数据
var requestData = new RequestData
{
messages = new List<Message>
{
new Message { role = "user", content = userMessage + "," + askTag }
},
stream = true
};
string jsonPayload = JsonUtility.ToJson(requestData);
byte[] payloadBytes = Encoding.UTF8.GetBytes(jsonPayload);
// 创建Web请求
using (UnityWebRequest request = new UnityWebRequest(apiUrl, "POST"))
{
request.uploadHandler = new UploadHandlerRaw(payloadBytes);
request.downloadHandler = new DownloadHandlerBuffer();
request.SetRequestHeader("Content-Type", "application/json");
request.SetRequestHeader("Authorization", "Bearer " + apiKey);
request.disposeDownloadHandlerOnDispose = true;
// 发送请求
yield return request.SendWebRequest();
if (request.result == UnityWebRequest.Result.ConnectionError ||
request.result == UnityWebRequest.Result.ProtocolError)
{
Debug.LogError($"API Error: {request.error}");
Debug.LogError($"Response Code: {request.responseCode}");
Debug.LogError($"Response: {request.downloadHandler.text}");
answerText.text = $"<color=#FF0000>请求失败: {request.error}</color>";
isStreaming = false;
submitButton.interactable = true;
yield break;
}
// 获取完整响应
string rawResponse = request.downloadHandler.text;
Debug.Log($"Raw API Response: {rawResponse}");
// 处理响应
if (string.IsNullOrEmpty(rawResponse))
{
answerText.text = "<color=#FFA500>服务器返回空响应</color>";
yield break;
}
// 分割响应行
string[] responseLines = rawResponse.Split('\n');
bool receivedValidResponse = false;
foreach (string line in responseLines)
{
if (string.IsNullOrWhiteSpace(line)) continue;
string trimmedLine = line.Trim();
// 检查结束标记
if (trimmedLine == "[DONE]")
{
Debug.Log("Received [DONE] marker");
break;
}
// 处理SSE格式 (data: {...})
string jsonStr = trimmedLine;
if (trimmedLine.StartsWith("data:"))
{
jsonStr = trimmedLine.Substring(5).Trim();
}
// 跳过事件标记
if (jsonStr == "event:message") continue;
try
{
// 反转义处理
string unescapedStr = jsonStr
.Replace("\\\"", "\"")
.Replace("\\\\", "\\")
.Replace("\\n", "\n")
.Replace("\\r", "\r")
.Replace("\\t", "\t");
// 移除多余的双引号
if (unescapedStr.StartsWith("\"") && unescapedStr.EndsWith("\""))
{
unescapedStr = unescapedStr.Substring(1, unescapedStr.Length - 2);
}
// 调试输出
Debug.Log($"Processing line: {unescapedStr}");
// 解析JSON
var response = JsonUtility.FromJson<StreamResponse>(unescapedStr);
// 提取内容
if (response.choices != null && response.choices.Length > 0)
{
if (response.choices[0].delta != null &&
!string.IsNullOrEmpty(response.choices[0].delta.content))
{
string content = response.choices[0].delta.content;
fullResponse.Append(content);
answerText.text = fullResponse.ToString();
receivedValidResponse = true;
}
}
}
catch (Exception e)
{
Debug.LogWarning($"解析错误: {e.Message}\n原始数据: {jsonStr}");
}
yield return null; // 确保UI更新
}
// 完成处理
isStreaming = false;
submitButton.interactable = true;
// 文本转语音 —————————
// 校验是否打开自动语音合成 —————————————————————————————————
if (ttsToggle.isOn) UITTSController.Instance.OnConvertClick();
if (!receivedValidResponse)
{
// 尝试提取错误信息
if (rawResponse.Contains("error"))
{
try
{
var errorResponse = JsonUtility.FromJson<ErrorResponse>(rawResponse);
answerText.text = $"<color=#FF0000>API错误: {errorResponse.error.message}</color>";
}
catch
{
answerText.text = $"<color=#FF0000>未知API错误: {rawResponse}</color>";
}
}
else if (fullResponse.Length > 0)
{
answerText.text = fullResponse.ToString();
}
else
{
answerText.text = $"<color=#FFA500>未收到有效响应,原始数据:\n{rawResponse}</color>";
}
}
}
}
// 请求数据结构
[System.Serializable]
private class RequestData
{
public List<Message> messages;
public bool stream;
}
[System.Serializable]
private class Message
{
public string role;
public string content;
}
// 响应数据结构
[System.Serializable]
private class StreamResponse
{
public string id;
public string @object;
public int created;
public string model;
public Choice[] choices;
}
[System.Serializable]
private class Choice
{
public int index;
public Delta delta;
public object logprobs;
public string finish_reason;
}
[System.Serializable]
private class Delta
{
public string content;
}
// 错误响应结构
[System.Serializable]
private class ErrorResponse
{
public ErrorInfo error;
}
[System.Serializable]
private class ErrorInfo
{
public string message;
public string type;
public string code;
}
}TTS 语音合成模块
AudioManager.cs
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;
public class AudioManager : MonoBehaviour
{
private AudioSource audioSource;
void Awake()
{
audioSource = gameObject.AddComponent<AudioSource>();
}
public IEnumerator DownloadAndPlayAudio(string url)
{
Debug.Log($"开始下载音频: {url}");
// 强制指定MIME类型
using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
{
((DownloadHandlerAudioClip)www.downloadHandler).streamAudio = true;
((DownloadHandlerAudioClip)www.downloadHandler).compressed = false;
// 添加超时控制
www.timeout = 15;
var operation = www.SendWebRequest();
while (!operation.isDone)
{
Debug.Log($"下载进度: {www.downloadProgress:P}");
yield return null;
}
if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"下载失败: {www.error},响应头: {www.GetResponseHeaders()}");
yield break;
}
Debug.Log($"音频下载完成,长度: {www.downloadedBytes} bytes");
AudioClip clip = DownloadHandlerAudioClip.GetContent(www);
if (clip == null || clip.length == 0)
{
Debug.LogError("音频解码失败");
yield break;
}
audioSource.clip = clip;
audioSource.Play();
Debug.Log("音频开始播放");
}
using (UnityWebRequest www = UnityWebRequestMultimedia.GetAudioClip(url, AudioType.MPEG))
{
yield return www.SendWebRequest();
if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"音频下载失败: {www.error}");
yield break;
}
AudioClip clip = DownloadHandlerAudioClip.GetContent(www);
audioSource.clip = clip;
audioSource.Play();
}
}
public void TogglePause()
{
if (audioSource.isPlaying)
{
audioSource.Pause();
}
else
{
audioSource.UnPause();
}
}
public void StopPlayback()
{
audioSource.Stop();
}
public bool IsPlaying()
{
return audioSource.isPlaying;
}
public float GetPlaybackProgress()
{
if (audioSource.clip == null || Mathf.Approximately(audioSource.clip.length, 0f))
{
return 0f;
}
return Mathf.Clamp01(audioSource.time / audioSource.clip.length);
}
}BaiduTTSController.cs
using System.Collections;
using UnityEngine;
using UnityEngine.Networking;
public class BaiduTTSController : MonoBehaviour
{
// 在百度云控制台获取的实际凭证
private const string CLIENT_ID = "ID";
private const string CLIENT_SECRET = "密钥";
private string accessToken = "";
// 异步获取Access Token
public IEnumerator GetAccessToken()
{
string url = $"百度智能云链接client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}";
using (UnityWebRequest www = UnityWebRequest.Get(url))
{
yield return www.SendWebRequest();
if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"Token请求失败: {www.error}");
yield break;
}
TokenResponse response = JsonUtility.FromJson<TokenResponse>(www.downloadHandler.text);
accessToken = response.access_token;
}
}
// 创建语音合成任务
public IEnumerator CreateTTSTask(string text, System.Action<string> callback)
{
string apiUrl = $"接口链接access_token={accessToken}";
CreateTaskRequest requestData = new CreateTaskRequest
{
text = text,
format = "mp3-16k",
voice = 0,
lang = "zh",
speed = 5,
pitch = 5,
volume = 5
};
using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
{
byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
www.uploadHandler = new UploadHandlerRaw(bodyRaw);
www.downloadHandler = new DownloadHandlerBuffer();
www.SetRequestHeader("Content-Type", "application/json");
yield return www.SendWebRequest();
if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"任务创建失败: {www.error}");
yield break;
}
TaskCreateResponse response = JsonUtility.FromJson<TaskCreateResponse>(www.downloadHandler.text);
callback?.Invoke(response.task_id);
}
}
// 查询任务状态
public IEnumerator QueryTaskStatus(string taskId, System.Action<string> callback)
{
string apiUrl = $"接口链接access_token={accessToken}";
QueryTaskRequest requestData = new QueryTaskRequest
{
task_ids = new string[] { taskId }
};
using (UnityWebRequest www = new UnityWebRequest(apiUrl, "POST"))
{
byte[] bodyRaw = System.Text.Encoding.UTF8.GetBytes(JsonUtility.ToJson(requestData));
www.uploadHandler = new UploadHandlerRaw(bodyRaw);
www.downloadHandler = new DownloadHandlerBuffer();
www.SetRequestHeader("Content-Type", "application/json");
yield return www.SendWebRequest();
if (www.result != UnityWebRequest.Result.Success)
{
Debug.LogError($"状态查询失败: {www.error}");
yield break;
}
TaskQueryResponse response = JsonUtility.FromJson<TaskQueryResponse>(www.downloadHandler.text);
if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
{
callback?.Invoke(response.tasks_info[0].task_result.speech_url);
}
if (response.tasks_info.Length > 0 && response.tasks_info[0].task_status == "Success")
{
string audioUrl = response.tasks_info[0].task_result.speech_url;
Debug.Log($"获取音频地址: {audioUrl}");
// 添加URL预验证
using (UnityWebRequest headRequest = UnityWebRequest.Head(audioUrl))
{
yield return headRequest.SendWebRequest();
if (headRequest.result == UnityWebRequest.Result.Success)
{
callback?.Invoke(audioUrl);
}
else
{
Debug.LogError($"音频地址不可用: {headRequest.error}");
}
}
}
}
}
// 数据模型
[System.Serializable]
private class TokenResponse
{
public string access_token;
}
[System.Serializable]
private class CreateTaskRequest
{
public string text;
public string format;
public int voice;
public string lang;
public int speed;
public int pitch;
public int volume;
}
[System.Serializable]
private class TaskCreateResponse
{
public string task_id;
}
[System.Serializable]
private class QueryTaskRequest
{
public string[] task_ids;
}
[System.Serializable]
private class TaskQueryResponse
{
public TaskInfo[] tasks_info;
}
[System.Serializable]
private class TaskInfo
{
public string task_status;
public TaskResult task_result;
}
[System.Serializable]
private class TaskResult
{
public string speech_url;
}
}UITTSController.cs
using System.Collections;
using TMPro;
using UnityEngine;
using UnityEngine.UI;
public class UITTSController : MonoBehaviour
{
public static UITTSController Instance { get; private set; }
private void Awake()
{
if (Instance != null && Instance != this)
{
Destroy(gameObject);
return;
}
Instance = this;
}
[Header("UI Components")]
public Text inputField;
public Button convertButton;
public Text statusText;
public Slider progressSlider;
private BaiduTTSController ttsController;
private AudioManager audioManager;
void Start()
{
ttsController = gameObject.AddComponent<BaiduTTSController>();
audioManager = gameObject.AddComponent<AudioManager>();
//判断网络状态
if(Application.internetReachability != NetworkReachability.NotReachable){
StartCoroutine(InitializeSystem());
// 按钮事件绑定
convertButton.onClick.AddListener(OnConvertClick);
}
else{
statusText.text = "网络连接失败";
}
}
IEnumerator InitializeSystem()
{
statusText.text = "正在初始化";
yield return ttsController.GetAccessToken();
statusText.text = "朗读语音就绪";
convertButton.interactable = true;
}
public void OnConvertClick()
{
if (string.IsNullOrEmpty(inputField.text)) return;
StartCoroutine(ConvertProcess());
}
IEnumerator ConvertProcess()
{
convertButton.interactable = false;
statusText.text = "正在生成语音";
// 创建任务
yield return ttsController.CreateTTSTask(inputField.text, (taskId) => {
StartCoroutine(PollTaskStatus(taskId));
});
}
IEnumerator PollTaskStatus(string taskId)
{
float timeout = 30f;
float pollInterval = 1f;
bool isCompleted = false;
while (timeout > 0 && !isCompleted)
{
statusText.text = $"处理中...{timeout}秒";
// 使用Coroutine等待单次查询完成
yield return StartCoroutine(ttsController.QueryTaskStatus(taskId, (audioUrl) => {
StartCoroutine(PlayAudio(audioUrl));
isCompleted = true;
}));
if (isCompleted) break;
yield return new WaitForSeconds(pollInterval);
timeout -= pollInterval;
}
if (!isCompleted)
{
statusText.text = "请求超时";
Debug.LogError("状态轮询超时,最后响应数据:");
}
convertButton.interactable = true;
}
IEnumerator PlayAudio(string url)
{
statusText.text = "正在转载...";
yield return audioManager.DownloadAndPlayAudio(url);
statusText.text = "播放中";
convertButton.interactable = true;
// 更新进度条
while (audioManager.IsPlaying())
{
progressSlider.value = audioManager.GetPlaybackProgress();
yield return null;
}
}
}Vosk 语音识别模块
VoskSpeechRecognizer.cs
using UnityEngine;
using UnityEngine.UI;
using System.Threading;
using System.Collections.Concurrent;
using System.Collections;
using Vosk;
using Newtonsoft.Json.Linq;
using TMPro;
public class VoskSpeechRecognizer : MonoBehaviour
{
public Button toggleButton;
public Text resultText;
public TMP_InputField outputText;
public string modelPath = Application.streamingAssetsPath + "/Assets/vosk-model-small-cn-0.22"; // 替换为你的模型路径
private VoskRecognizer recognizer;
private AudioClip recordingClip;
private bool isRecording;
private Thread recognitionThread;
private int sampleRate = 16000;
// 主线程安全的变量
private string displayText = "语音识别就绪";
private string threadStatus = "";
private string partialResult = "";
private string finalResult = "";
private ConcurrentQueue<float[]> audioDataQueue = new ConcurrentQueue<float[]>();
private ConcurrentQueue<string> statusQueue = new ConcurrentQueue<string>();
private int lastPosition = 0;
private bool modelInitialized = false;
void Start()
{
displayText = "初始化中...";
resultText.text = displayText;
StartCoroutine(InitializeModel());
}
IEnumerator InitializeModel()
{
try
{
// 初始化Vosk环境
Vosk.Vosk.SetLogLevel(0);
Model model = new Model(modelPath);
recognizer = new VoskRecognizer(model, sampleRate);
modelInitialized = true;
displayText = "就绪,点击按钮开始识别";
toggleButton.interactable = true;
toggleButton.onClick.AddListener(ToggleRecording);
}
catch (System.Exception e)
{
displayText = $"初始化失败: {e.Message}";
Debug.LogError(e);
}
yield return null;
}
void Update()
{
// 1. 在主线程收集音频数据
if (isRecording && Microphone.IsRecording(null))
{
int currentPosition = Microphone.GetPosition(null);
if (currentPosition < lastPosition)
{
statusQueue.Enqueue("检测到音频缓冲区循环");
lastPosition = 0;
}
if (currentPosition > lastPosition)
{
int sampleCount = currentPosition - lastPosition;
float[] samples = new float[sampleCount];
if (recordingClip != null)
{
recordingClip.GetData(samples, lastPosition);
audioDataQueue.Enqueue(samples);
lastPosition = currentPosition;
}
else
{
statusQueue.Enqueue("错误:录音Clip为空");
}
}
}
// 2. 处理来自后台线程的状态更新
while (statusQueue.TryDequeue(out string status))
{
threadStatus = status;
Debug.Log(status);
}
// 3. 更新显示文本(优先级:最终结果 > 部分结果 > 线程状态 > 默认文本)
if (!string.IsNullOrEmpty(finalResult))
{
displayText = $"最终结果: {finalResult}";
outputText.text = finalResult;
}
else if (!string.IsNullOrEmpty(partialResult))
{
displayText = $"实时识别: {partialResult}";
outputText.text = partialResult;
}
else if (!string.IsNullOrEmpty(threadStatus))
{
displayText = threadStatus;
}
// 4. 更新UI
resultText.text = displayText;
}
void ToggleRecording()
{
if (!modelInitialized)
{
displayText = "模型未初始化完成";
return;
}
isRecording = !isRecording;
toggleButton.GetComponentInChildren<Text>().text = isRecording ? "停止" : "开始";
if (isRecording)
{
// 开始录音
try
{
displayText = "正在启动麦克风...";
// 重置状态
partialResult = "";
finalResult = "";
threadStatus = "";
audioDataQueue = new ConcurrentQueue<float[]>();
lastPosition = 0;
recordingClip = Microphone.Start(null, true, 10, sampleRate);
if (recordingClip == null)
{
displayText = "无法创建录音Clip";
isRecording = false;
return;
}
statusQueue.Enqueue("音频处理线程启动");
recognitionThread = new Thread(ProcessAudio);
recognitionThread.IsBackground = true;
recognitionThread.Start();
}
catch (System.Exception e)
{
displayText = $"启动录音失败: {e.Message}";
isRecording = false;
Debug.LogError(e);
}
}
else
{
// 停止录音
displayText = "正在停止录音...";
Microphone.End(null);
isRecording = false;
if (recognitionThread != null && recognitionThread.IsAlive)
{
recognitionThread.Abort();
}
if (!string.IsNullOrEmpty(finalResult))
{
displayText = $"最终结果: {finalResult}";
}
else
{
displayText = "识别结束,无结果";
}
}
}
void ProcessAudio()
{
statusQueue.Enqueue("音频处理线程启动");
while (isRecording)
{
if (audioDataQueue.TryDequeue(out float[] samples))
{
// 转换为字节数据
byte[] audioBytes = new byte[samples.Length * 2];
for (int i = 0; i < samples.Length; i++)
{
short sample = (short)(samples[i] * short.MaxValue);
audioBytes[i * 2] = (byte)(sample & 0xFF);
audioBytes[i * 2 + 1] = (byte)(sample >> 8);
}
try
{
// 语音识别处理
if (recognizer.AcceptWaveform(audioBytes, audioBytes.Length))
{
var result = recognizer.Result();
finalResult = JObject.Parse(result)["text"]?.ToString() ?? "无文本结果";
partialResult = "";
statusQueue.Enqueue($"最终结果: {finalResult}");
}
else
{
var partial = recognizer.PartialResult();
partialResult = JObject.Parse(partial)["partial"]?.ToString() ?? "解析部分结果失败";
statusQueue.Enqueue($"部分结果: {partialResult}");
}
}
catch (System.Exception e)
{
statusQueue.Enqueue($"识别错误: {e.Message}");
Debug.LogError(e);
}
}
else
{
Thread.Sleep(10);
}
}
}
void OnApplicationQuit()
{
isRecording = false;
if (recognitionThread != null && recognitionThread.IsAlive)
{
recognitionThread.Abort();
}
if (recognizer != null)
{
recognizer.Dispose();
}
Debug.Log("Vosk资源已释放");
}
// 添加移动端麦克风权限检查
IEnumerator RequestMicrophonePermission()
{
if (Application.platform == RuntimePlatform.Android ||
Application.platform == RuntimePlatform.IPhonePlayer)
{
displayText = "请求麦克风权限...";
yield return Application.RequestUserAuthorization(UserAuthorization.Microphone);
if (!Application.HasUserAuthorization(UserAuthorization.Microphone))
{
displayText = "需要麦克风权限";
yield break;
}
}
// 继续初始化
StartCoroutine(InitializeModel());
}
}结语
通过以上步骤,我们成功在 Unity 项目中配置了 Vosk 离线语音识别环境。Vosk 作为一个轻量级、高精度的离线语音识别解决方案,为 Unity 开发者提供了实现语音交互功能的强大工具。其离线特性特别适合对数据隐私要求高的应用场景,而跨平台支持则使得一次开发即可部署到多种设备。
正确配置环境只是实现语音识别的第一步,在实际开发中还需要根据具体应用场景调整参数和优化性能。建议从小型模型开始测试,逐步优化识别效果,再根据需求决定是否需要升级到更大规模的模型。
随着语音交互技术的不断发展,Vosk 这样的离线识别方案将在更多应用场景中发挥重要作用,为用户提供更自然、更安全的交互体验。
参考资料
注意:本文仅涉及环境配置部分,实际语音识别功能的实现需要编写 C#脚本处理音频输入和调用 Vosk 接口。请参考 Vosk 官方文档和示例代码了解具体实现方法。