國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,混合式中文人名語音辨識系統之設計研究,A Hybrid Design of Speech Recognition System for Chinese Names

論文名稱 Title	混合式中文人名語音辨識系統之設計研究 A Hybrid Design of Speech Recognition System for Chinese Names
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	61
研究生 Author	許博閔 Po-Min Hsu
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2004-07-28	繳交日期 Date of Submission	2004-09-06
關鍵字 Keywords	語詞辨識、隱藏式馬可夫模型、卡氏轉換、端點偵測、梅爾倒頻譜 Hidden Markov model, endpoint detection, Phrase recognition, Mel-cepstrum, Karhunen Loeve transform
統計 Statistics	本論文已被瀏覽 5862 次，被下載 0 次 The thesis/dissertation has been browsed 5862 times, has been downloaded 0 times.

中文摘要
本論文研究在於建立一個中文人名語音辨識系統，以卡氏轉換、梅爾倒頻譜係數、隱藏式馬可夫模型及維特比演算法等語詞相關辨識技術來設計實現。卡氏轉換具有最小均方誤差和能量聚集性高的特性，能將語音資料簡化後仍保有大部分資訊。隱藏式馬可夫模型目前廣泛被利用在語音辨識上，以狀態描述語音產生的方式，為一可以代表語音時變特性之統計模型。本系統中，特定語者在實驗室環境下，辨識率可達93.97%，所需時間為3秒。
Abstract
A speech recognition system for Chinese names based on Karhunen Loeve transform (KLT), MFCC, hidden Markov model (HMM) and Viterbi algorithm is proposed in this thesis. KLT is the optimal transform in minimum mean square error and maximal energy packing sense to reduce data. HMM is a stochastic approach which characterizes many of the variability in speech signal by recording the state transitions. For the speaker-dependent case, the correct identification rate can be achieved 93.97% within 3 seconds in the laboratory environment.

目次 Table of Contents
目錄頁次致謝……………………………………………………………………Ⅰ 論文摘要………………………………………………………………Ⅱ 目錄……………………………………………………………………Ⅲ 圖表目錄……………………………………………………………Ⅵ 第一章緒論...........................................1 1-1 研究動機...........................................1 1-2研究方法............................................2 1-3 論文架構...........................................2 第二章語詞辨識系統與數位語音信號處理...........4 2-1語詞辨識的基本架構.............................4 2-2辨識系統之語音前置處理......................6 2-2-1 端點偵測........................................6 2-2-2 預強............................................9 2-2-3 取窗型函數.....................................10 2-3 語音切割之研究..........................14 2-4 語音信號的特徵萃取......................18 2-4-1 倒頻譜係數..................................19 2-4-2 倒頻譜迴歸係數...............................21 2-4-3 梅爾倒頻譜係數.................................21 2-5 卡式轉換簡介..............................25 第三章隱藏式馬可夫模型為基礎之語詞辨識系統....30 3-1 語音訊號之隱藏式馬可夫模型.........................30 3-2 隱藏式馬可夫模型之建立.............................31 3-3 隱藏式馬可夫模型之訓練.............................32 3-3-1 期望值最大演算法 ..............................32 3-3-2 參數重估(Parameters Reestimation)..............33 3-4 隱藏式馬可夫模型之辨識程序.........................38 第四章系統設計與實驗結果............................41 4-1 資料庫規劃與建立...................................41 4-1-1資料庫規劃 ...............................41 4-1-2資料庫建立................................42 4-2 系統實作..........................................43 4-3 實驗結果...................................47 4-3-1 訓練次數與辨識率之關係.........................47 4-3-2 模型狀態數與辨識率之關係.......................48 4-3-3 卡氏轉換選取姓名實驗...........................50 4-3-4 中文單音姓氏選取實驗...........................52 4-3-5 識率與辨識時間估計.............................54 第五章結論與建議.....................................56 5-1 結論...........................................56 5-2建議...........................................58 參考文獻...............................................59 附表一中文姓氏列表...................................61 圖目錄頁次圖 2-1 語詞辨識系統流程..................................4 圖 2-2 中文單音「陳」時域下之波形..........................6 圖 2-3 中文單音「陳」應用能量-越零率端點偵測法的示意圖.....8 圖 2-4 相鄰的音框在時域上重疊一半.......................10 圖 2-5 (a)單一音框的弦波信號(b)Hamming Window的波形 (c)乘上Hamming Window的結果......................11 圖 2-6 Rectangualr、Hamming及Hanning窗型函數的頻譜波形.............................................13 圖 2-7 臨界值選取太高而判斷錯誤造成之結果...............14 圖 2-8 重估端點示意圖..................................15 圖 2-9 臨界值選取太低而判斷錯誤造成之結果.............16 圖 2-10 重估端點示意圖.................................17 圖 2-11 人類語音產生模型...............................18 圖 2-12 倒頻譜係數求解流程.............................20 圖2-13 真實頻率與mel頻率對應關係....................22 圖2-14 濾波器群組示意圖..............................23 圖2-15 Mel-Cepstrum流程圖.........................24 圖 3-1 語音訊號及其隱藏式馬可夫模型.....................30 圖 3-2 正向逆向變數圖示.................................35 圖 3-3 以維特比演算法尋找最佳路徑.......................38 圖 3-4 語詞辨識流程....................................39 圖 4-1 基本HMM辨識系統流程.............................43 圖 4-2 KLT+HMM辨識系統流程..............................44 圖 4-3 辨識姓氏+KLT+HMM辨識系統流程.....................45 圖 4-4 模型狀態數與辨識率之關係.........................52 表目錄頁次表 2-1 語音訊號能量、越零率..............................9 表 4-2 訓練次數與辨識率之關係............................47 表 4-4 卡氏轉換選取姓名實驗(a)...........................50 表 4-5 卡氏轉換選取姓名實驗(b)...........................51 表 4-6 姓氏檢出實驗結果..................................52 表 4-7 辨識率與辨識所需時間..............................54

參考文獻 References
參考文獻 [1] 賴昭華，“不特定語者中量語詞辨識系統之設計研究”，國立中山大學電機工程研究所碩士論文，2002 [2] Lawrence Rabiner and Biing-Hwang Juang, Fundamentals of Speech Recognition, New Jersey: Prentice Hall,Inc.,1993. [3] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete Time Processing of Speech Signals, New York: Macmillan Pub. Co., 1993. [4] 洪一忠, “基於分段機率模型之國語單音節辨認,” 國立台灣大學電機工程研究所碩士論文, 民國81年6月. [5] Campanelle, S. J., and G. S. Robinson, “A comparison of orthogonal transformations for digital speech processing ”,IEEE Transactions on Communications, vol. 19, part 1, pp. 1045-1049, Dec. 1971. [6] Devijver, P. A., and J. Kittler, Pattern Recognition: A Statistical Approach, London, England: Prentice Hall International, 1982. [7] J. P. Campbell, “ Speaker recognition: A Tutorial ”, IEEE proceedings, Vol.85, Sep. 1997, pp. 1437-1462. [8] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE , vol. 77 , pp.257 -286 , Feb. 1989. [9] J. A. Blimes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” International Computer Science Institute, April 1998. [10] 袁义达,张诚, 中国姓氏：群体遗传和人口分布, 上海：华东师范大学出版社,2002. [11] A. M. Kondoz, Digital Speech coding, New York: John Wiley & Sons Inc., 1994. [12] B. H. Juang and L. R. Rabiner, “Mixture Autoregressive Hidden Markov models for speech signals.” IEEE Trans. Speech and Audio Processing,vol.33 ,pp 1404-1413, 1985. [13] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, New Jersey: Prentice Hall, Inc.,1993. [14] 黃銘崇, “不特定語者語詞辨識系統之特徵設計,” 國立中山大學電機工程研究所碩士論文, 民國90年6月5日. [15] 鄭順德, “不特定語者中量語者辨識系統之設計研究,” 國立中山大學電機電機工程研究所碩士論文, 民國91年7月24日. [16] 陳豫德, “中文人名語音辨識系統之設計研究,” 國立中山大學電機工程研究所碩士論文, 民國92年 7月

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.17.184.90 論文開放下載的時間是校外不公開 Your IP address is 3.17.184.90 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS