國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,兩岸四地與日韓地址語音辨識系統之設計研究,A Design of Speech Recognition System for Address in Cross-Strait Four Regions, Japan and Korea

論文名稱 Title	兩岸四地與日韓地址語音辨識系統之設計研究 A Design of Speech Recognition System for Address in Cross-Strait Four Regions, Japan and Korea
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	104 學年度第 1 學期 The fall semester of Academic Year 104	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	106
研究生 Author	王俊智 Jun-Zhi Wang
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Wang
口試委員 Advisory Committee	柏小松, 李聰 Sheau-Shong Bor; Tsung Lee
口試日期 Date of Exam	2015-07-29	繳交日期 Date of Submission	2015-12-23
關鍵字 Keywords	隱藏式馬可夫模型、線性預估倒頻譜係數、梅爾倒頻譜係數、單詞標籤相關性、單音次分類 Hidden Markov model, Linear predicted cepstrum coefficients, Phrase tagging-correlation, Mel-frequency cepstrum coefficients, Monotone sub-classification
統計 Statistics	本論文已被瀏覽 5703 次，被下載 32 次 The thesis/dissertation has been browsed 5703 times, has been downloaded 32 times.

中文摘要
「地址」是描述一個地點最直接、明瞭的方式。過去，人們欲找尋一個地方時，常以紙本地圖來查詢；近年來，隨著科技的演進與網路的發達，有了如Google Map, Apple Map等網路地圖的出現。如今，為追求使用的便利性，均可以聲控的方式，運用語音辨識技術，輸入地址。因此，在語音辨識技術的層面上，如何更精確有效地識別地址，不僅是學術上，同時亦是資訊工業上的一門重要課題。本論文運用語音的單音分類特性，結合字詞標籤的比對，針對傳統語音地址辨識系統，建立了強化的機制。首先，吾人錄製一輪2699個常用二字詞，作為系統之訓練語料，並依中文音節發音規則，分為404小類。其次，再運用音節中聲韻母信號的越零率平均值、標準差與聲母音長相對量，將音節特性，細分為6大類，以改善中文濁音聲母辨識混淆的問題。最後，以梅爾倒頻譜係數與線性預估倒頻譜係數，透過隱藏式馬可夫程序，產生單音節雙特徵參數模型。在辨識策略上，吾人係透過建立字詞標籤的方式，將資料庫中的每個字音皆建立一組標籤，並依單音分為404組比對碼簿。透過標籤資訊的比對，語音地址辨識系統，可不受所唸地址單詞字數多寡的限制，因而可改善地址漏唸與多唸時系統辨識錯誤的問題。在系統實作方面，吾人蒐集了台灣、大陸、香港、澳門、日本與韓國等六個地區，共約64萬筆地名路名資訊，結合Google API介面，以當地道路門牌地址為例，作語音地址的搜尋，於Linux Ubuntu 12.04之作業系統下，在輸入完整地址情況下，辨識率約為94.21%。
Abstract
An ‘address’ is the most straightforward description of a location. In the past people search for a place using paper maps. Nowadays, due to the enormous advancement in speech science and internet technology, verbal search, such as Google Map and Apple Map, becomes popular for address inquiries. The accuracy increase of a speech recognition system for addresses is therefore not only an academic challenge, but also a profitable task in the information industry. In this thesis, two strategies, the monotone sub-classification and the phrase tagging-correlation, are applied to improve the accuracy of a conventional recognition system for Mandarin addresses. First, 2,699 two-syllable words are chosen and recorded as training material. Secondly, all the monotones are grouped into 404 categories using Mandarin pronunciation rules, and further sub-classified into six classes according to their mean, standard deviation of zero-crossing rate and the ratio of consonant to vowel length. The confusion problem within Mandarin voiced consonants can then be alleviated. Finally, the Mel-frequency cepstrum coefficients (MFCC), and linear predicted cepstrum coefficients (LPCC) are calculated and the bi-parametric Hidden Markov models are estimated for each syllable. Furthermore, an address recognition strategy based on the phrase tagging-correlation is designed by creating tag codebook for 404 monotones in the address database. By calculating the tagging-correlation between the spoken phrase and the designated phrase, the number of spoken words in the address phrase does not need to be absolute correct. Therefore, missing and insertion word problems can be remedied. A Mandarin speech recognition system for addresses in Taiwan, Mainland China, Hong Kong, Macao, Japan, and South Korea is implemented using the Google API interface on a Linux Ubuntu 12.04 operating system PC. About 640,000 place names and road names are collected in this study, the recognition rate of the system is approximately 94.21%.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv 目錄 vi 圖目錄 x 表目錄 xii 第一章緒論 1 1.1 研究動機： 1 1.2 研究方法 2 1.3 研究主題背景介紹 3 1.4 論文章節大綱 24 第二章語音前處理與相關技術 25 2.1 預強調 25 2.2 漢明窗 25 2.3 語音切割技術 26 2.3.1 語音與非語音切點偵測 27 2.3.2 連續語音切點偵測 29 2.3.3 線性預估誤差能量 (LPCEE) 30 第三章語音特性分析與篩選 32 3.1 語音特性分析 32 3.1.1 聲母特性分析 32 3.1.2 韻母特性分析 33 3.2 利用能量波形分類 34 3.2.1 子音之均勻性和非均勻性 35 3.2.2 塞音中的送氣音與非送氣音 38 3.2.3 擦音與塞擦音 39 3.3 單音分類機制介紹與流程 41 第四章特徵值萃取與訓練 43 4.1 梅爾倒頻譜係數 (MFCC) 43 4.2 線性預估倒頻譜係數 (LPCC) 47 4.3隱藏式馬可夫模型 53 4.3.1 求出觀測機率 54 4.3.2 找出最佳狀態轉移路徑 57 4.3.3 參數重估 59 第五章語音編碼與資料庫建立 60 5.1 單音編碼 60 5.2 資料庫建立與比對 60 5.2.1 資料庫建立 60 5.2.2 多維資訊索引比對 67 第六章辨識系統之設計、訓練及實作效能評析 71 6.1 辨識系統流程與架構 71 6.2 辨識系統之訓練策略 73 6.3 中文地址系統輸入 74 6.4 辨識系統實作效能與評析 75 6.4.1 系統參數設定 75 6.4.2 系統模擬數據建構 76 6.4.3 中文單音分類實驗 76 6.4.4 中文地址系統於數字上的辨識結果 78 6.4.5 中文地址系統辨識結果與比較 79 6.4.6 六國中文地址系統模擬辨識結果 85 第七章結論與未來展望及建議 86 參考文獻 88

參考文獻 References
[1] 數位時代, http：//www.bnext.com.tw/article/view/id/34934 [2] 楊仁豪，地理空間結構變遷下台灣行政區劃調整之研究，政治大學地政學系碩士論文，民國93年 [3] 吳濟華，我國鄉鎮市行政區劃調整之研究－以屏東縣為例，國立中山大學公共事務管理研究所碩士論文，民國99年 [4] 梁木生、王紅衛，我國行政區劃整體改革初探，《二十一世纪》網路版 44 期，民國94年 [5] 黃正雄，行政區劃與鄉鎮市自治問題之研究，行政院研究發展考核委員會編印，民國90年 [6] 王小川，語音訊號處理,全華圖書出版社，民國98年 [7] John R. Deller Jr., John H. L. Hansen, and John G. Proakis, “Discrete-Time Processing of Speech Signals,” Prentice Hall PTR Upper Saddle River, USA, 1993. [8] H. Bourlard,V. Tyagi, C. Wellekens, “On Variable-Scale Piecewise Stationary Spectral Analysis of Speech Signals for ASR,” Speech Communication, Vol.48, No.9, pp. 1182-1191, September 2006. [9] Won-Ho Shin, Byoung-Soo Lee, Yun-Keun Lee and Jong-Seok Lee, “Speech/Non-Speech Classification Using Multiple Features For Robust Endpoint Detection,” In Proceeding of ICAASSP, Vol. 3, pp. 1399-1402, 2000. [10] K.W. Law and C.F. Chan, “Split-Dimension Vector Quantization of Parcor Coefficients for Low Bit Rate Speech Coding,” IEEE Transactions on Speech and Audio Processing, pp.443-446, July 1994. [11] 國立臺灣師範大學,國音教材編輯委員會編簒,國音學,正中書局股份有限公司,民國96年 [12] Chee-Yau Wai, “Arjunan, S.P. and Kumar, D.K.Classification of voiceless speech using facial muscle activity and vision based techniques,” TENCON 2008-2008 IEEE Region 10 Conference, pp.1-6, November 1997. [13] R.D. Kent and C. Read, “The Acoustic Analysis of Speech,” San Diego： Singular, pp.105-44, 1992. [14] Jan Skoglund and W. Bastiaan Kleijn, “On Time-Frequency Masking in Voiced Speech,” IEEE Transactions on Speech and Audio Processing, Vol.8, No.4, pp.361-369, July 2000. [15] F. Softic, Z. Bundalo and Z. Blagojevic, “Frequency corrections of sound files for listening without using hearing aid devices,” Proceedings of 2013 2nd Mediterranean Conference, pp.266-269, June 2013. [16] K.K. Chu and S.H. Leung, “SNR-dependent non-uniform spectral compression for noisy speech recognition,” IEEE International Conference, Vol.1, pp.973-6, May 2004. [17] Gin-Der Wu and Zhen-Wei Zhu, “Chip Design of LPC-cepstrum for Speech Recognition,” IEEE Transactions on Computer and Information Science, pp.43-47, July 2007. [18] Lakshmi Kanaka Venkateswarlu Revada, Vasantha Kumari Rambatla and Koti Verra Nagayya Ande, “A Novel Approach to Speech Recognition by Using Generalized Regression Neural Networks,” IJCSI International Journal of Computer Science Issues, Vol. 8, Issue 2, March 2011 [19] Mukesh Rana and Saloni Miglani, “Performance Analysis of MFCC and LPCC Techniques in Automatic Speech Recognition,” International Journal of Engineering and Computer Science, Vol.3, No.8, pp.7727-7732, August 2014. [20] Bhargab Medhi, Prof P. H. Talukdar, “LPC and MFCC Analysis of Assamese Vowel Phonemes,” International Journal of Advanced Research in Computer Science and Software Engineering, Vol.5, No.1, January 2015. [21] X. Huang, and H.W. Hon and A. Acero, “Spoken Language Processing: A Guide to Theory, Algorithm, and System Development,” Prentice Hall, USA, 2011. [22] L. R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” Proceedings of the IEEE, Vol. 77, No. 2, February 1989. [23] T. Kinjo and K. Funaki, “On HMM speech recognition based on complex speech analysis,” IEEE Industrial Electronics, pp. 3477-3480, 2006. [24] J. Yamagishi, T. Nose, H. Zen, Zhen-Hua Ling, “Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis,” IEEE Transactions on Audio, Speech and Language Processing , Vol. 17, No. 6, pp.1208-1230, August 2009. [25] Shirin Jalali, Andrea Montanari, and Tsachy Weissman, “Lossy compression of discrete sources via Viterbi algorithm,” IEEE Transactions on Information theory, Vol. 58, No. 4, April 2012. [26] S. K. Wong and C.W. Wang, “Analysis of parallel genetic algorithms on HMM based speech recognition system,” IEEE Transactions on Consumer Electronics, Vol.43, No.4, pp.1229-1233, November 1997. [27] Database: 郵編庫, http：//tw.youbianku.com [28] Database: 韓巢地圖, http：//map.cn.konest.com/ [29] Database: MIC總務省, http：//www.soumu.go.jp/ [30] Database: 戴漢平，澳門特別行政區街道名冊：澳門篇，民政總署，2012 [31] Database: 萬里地圖製作中心，2015香港街道圖，萬里機構-萬里書店，2014

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-1122115-015847.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS