我如何通过websocket将音频发送到Nexmo语音
作者:互联网
我正在尝试在.Net Core 2 Web API中使用websockets实现Nexmo的Voice API.
此api需要:
>通过Nexmo接收来自电话的音频
>使用Microsoft Cognitive Speech to text API
>将文字发送给机器人
>在漫游器回复上使用Microsoft Cognitive text to speech
>通过语音API网络套接字将语音发回nexmo
现在,由于我第一次尝试连接到Websocket,因此我绕过了机器人操作步骤.
尝试使用echo方法(将收到的音频发送回websocket)时,它可以正常工作.
但是,当我尝试将语音从Microsoft文本发送到语音时,电话结束.
我找不到任何实现回声之外的东西的文档.
在Websocket外部使用时,TextToSpeech和SpeechToText方法可以按预期工作.
这是带有语音转文字的网络套接字:
public static async Task Echo(HttpContext context, WebSocket webSocket)
{
var buffer = new byte[1024 * 4];
WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
while (!result.CloseStatus.HasValue)
{
while(!result.EndOfMessage)
{
result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
}
var text = SpeechToText.RecognizeSpeechFromBytesAsync(buffer).Result;
Console.WriteLine(text);
}
await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
}
这是带有文本转语音功能的网络套接字:
public static async Task Echo(HttpContext context, WebSocket webSocket)
{
var buffer = new byte[1024 * 4];
WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
while (!result.CloseStatus.HasValue)
{
var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("Hello, this is a test", "en-US");
await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, 0, ttsAudio.Length), WebSocketMessageType.Binary, true, CancellationToken.None);
result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
}
await webSocket.CloseAsync(result.CloseStatus.Value, result.CloseStatusDescription, CancellationToken.None);
}
更新2019年3月1日
回复Sam Machin的评论
我尝试将数组拆分为每个640字节的块(我使用的是16000khz采样率),但是nexmo仍然挂断了电话,并且我仍然听不到任何声音.
public static async Task NexmoTextToSpeech(HttpContext context, WebSocket webSocket)
{
var ttsAudio = await TextToSpeech.TransformTextToSpeechAsync("This is a test", "en-US");
var buffer = new byte[1024 * 4];
WebSocketReceiveResult result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
while (!result.CloseStatus.HasValue)
{
await SendSpeech(context, webSocket, ttsAudio);
result = await webSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken.None);
}
await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing Socket", CancellationToken.None);
}
private static async Task SendSpeech(HttpContext context, WebSocket webSocket, byte[] ttsAudio)
{
const int chunkSize = 640;
var chunkCount = 1;
var offset = 0;
var lastFullChunck = ttsAudio.Length < (offset + chunkSize);
try
{
while(!lastFullChunck)
{
await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, chunkSize), WebSocketMessageType.Binary, false, CancellationToken.None);
offset = chunkSize * chunkCount;
lastFullChunck = ttsAudio.Length < (offset + chunkSize);
chunkCount++;
}
var lastMessageSize = ttsAudio.Length - offset;
await webSocket.SendAsync(new ArraySegment<byte>(ttsAudio, offset, lastMessageSize), WebSocketMessageType.Binary, true, CancellationToken.None);
}
catch (Exception ex)
{
}
}
这是有时出现在日志中的异常:
System.Net.WebSockets.WebSocketException (0x80004005): The remote
party closed the WebSocket connection without completing the close
handshake.
解决方法:
看来您正在将整个音频剪辑写入websocket,Nexmo界面要求音频每条消息以20ms帧为一个,这意味着您需要将剪辑拆分为320或640字节(取决于您是否使用8Khz或16Khz)块,并将每个块写入套接字.如果尝试将太大的文件写入套接字,它将如您所见的那样关闭.
有关详细信息,请参见https://developer.nexmo.com/voice/voice-api/guides/websockets#writing-audio-to-the-websocket.
标签:websocket,speech-recognition,text-to-speech,c,nexmo 来源: https://codeday.me/bug/20191211/2106031.html