编程语言
首页 > 编程语言> > c#-查找发音正确性

c#-查找发音正确性

作者:互联网

我需要借助Microsoft语音SDK(System.Speech.Recognition)来识别用户发音的“质量”.我正在使用MS Speech Engine-US,所以我实际需要的是找出说话者的声音与“北美”口音的接近程度.

一种方法是检查用户语音与美国英语语音发音的接近程度.正如MSDN中提到的那样,该过程似乎是在语音SDK内部自行完成的,因此我需要弄清楚这一点.由于我们也可以自己设置语音到引擎,因此我相信这是可能的.

但是,对于要做什么我没有明确的想法.那么,我该怎么做才能找出用户发音的质量/它与美国北美英语语音发音的接近程度?用户仅需说出预定义的句子,例如“ Hello World.我在这里”.

请帮忙.

更新

通过使用以下代码,我得到了某种“音素”(如MSDN中所述)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Speech.Recognition;
using System.Speech.Synthesis;
using System.Windows.Forms;
using System.IO;

namespace US_Speech_Recognizer
{
    public class RecognizeSpeech
    {
        private SpeechRecognitionEngine sEngine; //Speech recognition engine
        private SpeechSynthesizer sSpeak; //Speech synthesizer
        string text3 = "";

        public RecognizeSpeech()
        {
            //Make the recognizer ready
            sEngine = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US"));


            //Load grammar
            Choices sentences = new Choices();
            sentences.Add(new string[] { "I am hungry" });

            GrammarBuilder gBuilder = new GrammarBuilder(sentences);

            Grammar g = new Grammar(gBuilder);

            sEngine.LoadGrammar(g);

            //Add a handler
            sEngine.SpeechRecognized +=new EventHandler<SpeechRecognizedEventArgs>(sEngine_SpeechRecognized);


            sSpeak = new SpeechSynthesizer();
            sSpeak.Rate = -2;



            //Computer speaks the words to get the phones
            Stream stream = new MemoryStream();
            sSpeak.SetOutputToWaveStream(stream);


            sSpeak.Speak("I was hungry");
            stream.Position = 0;
            sSpeak.SetOutputToNull();


            //Configure the recognizer to stream
            sEngine.SetInputToWaveStream(stream);

            sEngine.RecognizeAsync(RecognizeMode.Single);


        }


        //Start the speech recognition task
        private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
        {
            string text = "";

            if (e.Result.Text == "I am hungry")
            {
                foreach (RecognizedWordUnit wordUnit in e.Result.Words)
                {
                    text = text + wordUnit.Pronunciation + "\n";
                }

                MessageBox.Show(e.Result.Text + "\n" + text);
            }


        }
    }
}

这是与音素相关的直接代码段(从上面的代码中提取)

   //Start the speech recognition task
    private void sEngine_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
    {
        string text = "";

        if (e.Result.Text == "I am hungry")
        {
            foreach (RecognizedWordUnit wordUnit in e.Result.Words)
            {
                text = text + wordUnit.Pronunciation + "\n";
            }

            MessageBox.Show(e.Result.Text + "\n" + text);
        }


    }

以下是我的输出.从第二行开始显示我得到的音素.第一行仅显示已识别的句子

因此,请告诉我,根据MSDN,这是“音素”.那么,这实际上是“音素”吗?我从未见过这些,这就是原因.

以上代码是根据此链接完成的http://msdn.microsoft.com/en-us/library/microsoft.speech.recognition.srgsgrammar.srgstoken.pronunciation(v=office.14).aspx

解决方法:

好的,这就是我解决问题的方法.

首先,使用“语音”主题加载听写引擎,该引擎将返回用户说出的音素(在“识别”事件中).

其次,使用ISpEnginePronunciation::GetPronunciations方法(如我概述的here)获得单词的参考音素.

拥有两组音素后,就可以对其进行比较.本质上,音素之间用空格隔开,每个音素都由短标签表示(在American English Phoneme Representation规范中进行了描述).

鉴于此,您应该能够通过比较任意数量的近似字符串匹配方案(例如Levenshtein distance)的音素来计算得分.

通过比较电话ID而不是字符串,您可能会发现问题更简单. ISpPhoneConverter::PhoneToId可以将音素字符串转换为一组音素ID,每个音素一个ID.这将为您提供一对以空值结尾的整数数组,也许更适合您的比较算法.

您可以使用引擎置信度来惩罚匹配项,因为引擎置信度低表示传入的音频与引擎的音素概念不完全匹配.

标签:voice-recognition,phonetics,voice,windows,c
来源: https://codeday.me/bug/20191030/1971120.html