國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,中文地址語音辨識系統之設計研究,A Design of Mandarin Speech Recognition System for Addresses

論文名稱 Title	中文地址語音辨識系統之設計研究 A Design of Mandarin Speech Recognition System for Addresses
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	57
研究生 Author	張慶勇 Ching-Yung Chang
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2004-07-28	繳交日期 Date of Submission	2004-09-06
關鍵字 Keywords	隱藏式馬可夫模、語詞辨識、端點偵測、梅爾倒頻譜係數 Mel-frequency cepstrum coefficients, Hidden Markov model (HMM), phrase recognition, end-point detection
統計 Statistics	本論文已被瀏覽 5679 次，被下載 0 次 The thesis/dissertation has been browsed 5679 times, has been downloaded 0 times.

中文摘要
本論文探討如何利用梅爾倒頻譜參數、隱藏式馬可夫模型及維特比演算法等語詞辨識相關技術，來設計一套中文地址的語音辨識系統。隱藏式馬可夫模型目前被廣泛地應用在語音辨識，其利用雙重的隨機程序，用狀態(state)的轉移來描述語音產生的方式，以對應語音模型的時變特性。為了簡化系統，減少辨識所需時間，本論文利用中文單音結構的特性，結合單音辨識的方法來完成。此系統，在實驗室中，語者相依的環境下，平均60秒內可完成地址輸入的動作，辨識率達98%。
Abstract
A Mandarin speech recognition system for addresses based on MFCC, hidden Markov model (HMM) and Viterbi algorithm is proposed in this thesis. HMM is a doubly stochastic process describing the ways of pronunciation by recording the state transitions according to the time-varing properties of the speech signal. In order to simplify the system design and reduce the computational cost, the mono-syllable structure information in Mandarin is used by incorporating both mono-syllable recognizor and HMM for our system. For the speaker-dependent case, Mandarin address inputting can be accomplished within 60 seconds and 98% correct identification rate can be achieved in the laboratory environment.

目次 Table of Contents
目錄頁次致謝辭　………………………………………………………… I 論文摘要　……………………………………………………… II 目錄　…………………………………………………………… III 圖表目錄　………………………..………………………………… VI 第一章緒論　………………….………………………………… 1 1-1 研究動機與目的　……………...………………………… 1 1-2 研究方法　………………………………………………... 2 1-3 章節概要　………………………………………………... 2 第二章語音訊號處理的基本技術　……………...…………. 3 2-1 語詞辨識的基本架構　…………………………………... 3 2-2 辨識系統之語音前置處理　……………………………... 5 2-2-1端點偵測　……………………………………………… 5 2-2-1-1訊號能量(Energy) …………………………...… 5 2-2-1-2 越零率(Zero Crossing Rate) ………………..… 5 2-2-1-3 最大相似比測試(MLR test) ………...………... 7 2-3 乘上視窗函數(Window)　……………………..………… 9 2-4 特徵參數抽取 …………………………………….…… 13 2-4-1 倒頻譜係數　………………………………..……… 14 2-4-2 梅爾倒頻譜係數　………………………………..… 17 2-4-3 線性預測倒頻譜參數(LPC-based Cepstrum) ……. 20 2-4-3-1 線性預估編碼 ………………………...…… 20 2-4-3-2 求倒頻譜參數 …………………………..…. 22 第三章隱藏式馬可夫模型　….…………………………..… 23 3-1 模型描述　…………………………………...………….. 23 3-2 參數初始化　……………………………………….…… 23 3-3 隱藏式馬可夫模型之訓練 …….……………………… 24 3-4 期望值最大演算法(EM) ……………………………… 27 3-5 參數重估 ………………………………………………. 28 3-6 隱藏式馬可夫模型之辨識程序 ………………….…… 30 第四章單一聲調之音節辨認　…………………………...… 33 4-1 國語單音節的特性 ……………………………………. 33 4-2 單一聲調之音節辨認實驗 ………………..…………... 35 第五章系統設計與實驗結果　…………..…………………. 42 5-1 資料庫建立與規劃　………………………………..…... 42 5-2 系統設計　…………………….………………………… 45 5-3 實驗結果 …………………………….………………… 50 第六章結論與建議　…………..……………………………. 53 6-1 結論　………………………………..…………………... 53 6-2 建議 …………….……………………………………… 54 參考文獻　………………………………………………………….. 55 圖目錄頁次圖2-1 語詞辨識系統流程　………………………………………… 3 圖2-2 語詞”two”波形及其訊號能量、越零率　…………………… 6 圖2-3 語句”高雄市-鼓山區”波形與統計量　..……………….. 9 圖2-4 各種視窗的振幅頻譜　………………………………...…… 13 圖2-5 語音產生模型　……………….…………………………….. 14 圖2-6 求取倒頻譜分析流程圖　…………………………………... 15 圖2-7 Cepstrum 分析流程圖例　…………………………………. 16 圖2-8 Real frequency scale (Hz)和Perceived frequency scale (Mels) 的轉換關係圖 …………………………….………………... 14 圖2-9 Mel-scale參數流程圖 ……………………………………. 18 圖2-10 Linear frequencies和Mel frequencies之轉換關係式　…… 19 圖2-11 Mel-spaced filiter　…………………………………………. 20 圖3-1 語音訊號及其隱藏式馬可夫模型　………………………... 23 圖3-2 正向過程圖示　…………………………………………....... 26 圖3-3 逆向過程圖示　…………………………………………..…. 27 圖3-4 正向逆向過程圖示　…………………………………..……. 28 圖3-5 以維特比演算法尋找最佳路徑　…………………….…… 31 圖4-1 兩段式辨認架構 ………………………………………..... 39 圖5-1 以最佳Top-N單音組合來篩選路名之流程圖 …………. 46 圖5-2 縣市、鄉鎮市區及路名辨識架構圖　…………………... 47 圖5-3 巷、弄、號、樓部分的辨識架構圖　.......……………… 49 表4-1 國語單音節的結構　…………………………………...…… 33 表4-2 中文408音與料庫之規格　………………………………… 35 表4-3 中文408單音辨識實驗(MFCC+HMM)之結果　………… 36 表4-4 中文408單音辨識實驗(LPCC+ML decision rule)之結果　 37 表4-5 兩段式辨認之實驗結果　…………………………………... 40 表5-1 關鍵詞資料庫列表　………………………………………... 43 表5-2 實驗參數設定　……………………………………………... 43 表5-3 關鍵詞資料庫辨識結果　………………………………….. 44 表5-4 巷弄號樓依字數作分類之辨識率　……….……………….. 45 表5-5 台北市路名的辨識　…………………………………….….. 50 表5-6 台中市路名的辨識　………………………….…………….. 51 表5-7 高雄市路名的辨識　………………………………….…….. 51 表5-8 第一階段(含區)的辨識情形　…………………………….. 52 表5-9 系統測試結果　…………………………………….……….. 52

參考文獻 References
參考文獻 [1] V.R. Algazi, K. L. Brown, M. J. Ready, D. H. Irvine, C. L.Cadwell, Sang Chung, “Transform Representation of the Spectra of Acoustic Speech Segment with Applications－I: General Approach and Application to Speech Recognition,” IEEE Trans. Speech and Audio Processing, vol.1, No.2, April 1993. [2] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete Time Processing of Speech Signals, New York: Macmillan Pub. Co., 1993. [3] A. M.Kondoz, Digital Speech coding, New York: John Wiley & Sons Inc.,1994 [4] S. S. Stevens and J. Volkmann, “The relation of pitch of frequency: Arevised scale,” Am. J. Psychol., 53:329-353,1940. [5] J. R. Deller, J. G. Prooakls, J. H. Hansen, Discrete-Time Processing of Speech Signals, Maxwell Macmillan international. [6] S. B. Davis, P. Mermelstein, “Comparison Of Parametric Representations For Monosyllabic Word Recognition In Continuously Spoken Sentences”, IEEE Transations on ASSP-28, pp 357-366, 1980. [7] Tze Fen Li,”Speech recognition of mandarin monosyllables,” Pattern Recognition, vol.36, pp2713-2721, April 2003 [8] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE , vol. 77 , pp.257 -286 , Feb. 1989. [9] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, New Jersey: Prentice Hall, Inc.,1993 [10] J. A. Blimes, “A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models,” International Computer Science Institute, April 1998. [11] M. B. Gulmezoglu, V. Dzhafarov, M. Keskin, and A. Barkana, “A Novel Approach to Isolated Word Recognition,” IEEE Trans. Speech and Audio Processing, vol.7, pp 620-628, Nov. 1999. [12] J. F. Wang, C. H. Wu, S. H. Chang, and J. Y. Lee, “A Hierarchical Network Model Based on a C/V Segmental Algorithm for Isolated Mandrain Speech Recognition,” IEEE Trans. Signal Processing, vol.39, pp2141-2146, Sep 1991. [13] J. Taboada, S. Feijoo, R. Baisa, and C. Hernandez, “Explicit Estimation of Speech Boundaries” IEE. Proc. Sci. Meas. Technol, vol. 141, pp153-159, May 1994. [14] Y. Wu, and Y. Li, “Robust Speech/Non-Speech Detection in Adverse Conditions Using the Fuzzy Polarity Correlation Method,” 2000 IEEE International Conference on, Systems, Man, and Cybernetics, vol. 4, pp2935-2939, Oct. 2000. [15] B. H. Juang and L. R. Rabiner,” Mixture Autoregressive Hidden Markov models for speech signals.” IEEE Trans. Speech and Audio Processing,vol.33 ,pp 1404-1413, 1985. [16] 楊鎮光,”Visual Basic 與語音辨識－讓電腦聽話”,民國91年6月17 日. [17] 蘇浩岳, “電話語音查號系統之改進,” 國立交通大學電信工程研究所碩士論文, 民國86年6月. [18] 洪一忠, “基於分段機率模型之國語單音節辨認,” 國立台灣大學電機工程研究所碩士論文, 民國81年6月. [19] 黃銘崇, “不特定語者語詞辨識系統之特徵設計,” 國立中山大學電機工程研究所碩士論文, 民國90年6月5日. [20] 賴昭華, “不特定語者中量語詞辨識系統之設計研究,” 國立中山大電機工程研究所碩士論文, 民國91年7月24日. [21] 侯政寬, “中文關鍵語詞搜尋系統之設計與研究,” 國立中山大學電機工程研究所碩士論文, 民國92年7月. [22] 陳豫德, “中文人名語音辨識系統之設計研究,” 國立中山大學電機工程研究所碩士論文, 民國92年7月. [23] 鄭博文, “雜訊環境下語音辨識系統之設計研究,” 國立中山大學電機工程研究所碩士論文, 民國92年7月.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.118.120.204 論文開放下載的時間是校外不公開 Your IP address is 18.118.120.204 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS