國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,國語、英語及越南語三語言語音辨識系統之設計研究 ,A Design of Trilingual Speech Recognition System for Chinese, English and Vietnamese

論文名稱 Title	國語、英語及越南語三語言語音辨識系統之設計研究 A Design of Trilingual Speech Recognition System for Chinese, English and Vietnamese
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	56
研究生 Author	曾俋穎 Yi-Ying Tzeng
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	柏小松 Sheau-Shong Bor
口試委員 Advisory Committee	汪啟茂 Chii-Maw Uang
口試日期 Date of Exam	2012-07-25	繳交日期 Date of Submission	2012-09-10
關鍵字 Keywords	隱藏式馬可夫模型、音位結構學、線性預估倒頻譜係數、梅爾倒頻譜係數、語音辨識 Mel-frequency cepstral coefficients, Hidden Markov model, Phonotactic, Linear predicted cepstral coefficients, Speech recognition
統計 Statistics	本論文已被瀏覽 5628 次，被下載 465 次 The thesis/dissertation has been browsed 5628 times, has been downloaded 465 times.

中文摘要
認識一個國家的歷史文化與經濟背景，就能初步了解其語言之根基。國語是我們的母語，有超過十二億人口使用，佔世界第一。近年來，新興的中國不僅擁有強大的市場潛力與大量的勞動人力，而且中華文化，亦在亞洲形成影響甚遠的漢字文化圈。英語是與世界接軌的國際語言，英國的歷史文化底蘊與美國的強勢政治經濟地位，促使英語成為世界上最為廣泛使用的語言。越南地理位置與中國相近，受中華文化影響深遠，近十年的經濟開放政策，吸引大量外國企業的投資，與台灣有著密切的經濟往來。因此吾人希望建構一套三國語言系統，提供國人出外旅遊與語言學習之用。本論文探討國語、英語及越南語三語言語音辨識系統之設計與實作策略。針對三種不同語言之發音規則與特性，吾人歸納出國語404類、英語925類及越南語154類常用單音節，以梅爾倒頻譜係數與線性預估倒頻譜係數，來作單音節雙特徵參數之萃取，運用隱藏式馬可夫模型，來作音節辨識之統計依據。在AMD Athlon XP 2800+之個人電腦與Ubuntu 9.04之作業系統環境下，吾人針對國語82,000筆、英語20,868筆與越南語3,300筆語詞，運用隱藏式馬可夫模型與音位結構學之比對後，正確辨識率可分別達到88.16%、82.74%與87.45%，而平均辨識時間約在2秒以內。吾人並於上述架構下，設計三語言辨識系統，各選取100筆各個語言之常用語詞，對此300筆資料做語言別及語詞正確之判定，系統辨識率可達98%，而平均辨識時間約為2秒。
Abstract
History, culture and economy constitute the foundation of language. Mandarin Chinese is our native language, spoken by over 1.2 billion people. Its population is ranked number one in the world. In the recent years, the emerging China not only possesses market and labor forces, but also develops the Chinese culture circle in Asia. British history and American politics make English the most influential language in the 20th century. Vietnam has been under the profound influence of Chinese culture. The reformed and opened economy in the past decade brought her tremendous foreign investments, including those from Taiwan. It is our objective to establish a trilingual system for travel, living and speech learning. This thesis investigates the design and implementation strategies for a trilingual speech recognition system of Chinese, English and Vietnamese. It utilizes the speech features of 404 Chinese, 925 English and 154 Vietnamese mono-syllables as the major training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct rates of 88.16%, 82.74% and 87.45% can be reached using phonotactical rules for the 82,000 Chinese, 30,795 English and 3,300 Vietnamese phrase database respectively. The computation for each system can be completed within 2 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be obtained with the computation time less than 2 seconds.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v 圖次 vii 表次 viii 第一章緒論 1 1-1 研究動機 1 1-2 研究目的 2 1-3 論文章節概要 3 第二章基礎語音學介紹 4 2-1 語系概覽 4 2-1-1 國語 5 2-1-2 英語 6 2-1-2 越南語 7 2-2 發音介紹 9 2-2-1 國語之發音 9 2-2-2 英語之發音 10 2-2-2 越南語之發音 12 第三章語音辨識系統的流程與數學原理 15 3-1 辨識系統架構 15 3-2 音節切割 16 3-2-1 能量 16 3-2-2 越零率 17 3-3 特徵萃取前處理 17 3-3-1 高頻預強濾波器 18 3-3-2 加視窗 18 3-4 特徵萃取流程 20 3-4-1 線性預估倒頻譜係數 21 3-4-2 梅爾頻率倒頻譜係數 26 3-5 隱藏式馬可夫模型 29 3-5-1 最佳期望值問題 31 3-5-2 最佳狀態序列問題 34 3-5-3 模型參數估算問題 35 第四章辨識系統之訓練策略 37 4-1 國語辨識系統 37 4-2 英語辨識系統 39 4-3 越南語辨識系統 41 4-4 三國語言辨識系統 43 4-5 硬體環境與軟體規範 44 第五章結論與未來展望 45 參考文獻 46

參考文獻 References
[1]Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, Spoken Language Processing: A Guide to Theory, Algorithm and System Development Pearson Education Taiwan Ltd, 2005. [2]J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signal, IEEE Press, New York, 2000. [3]Daniel Jurafsky, James H. Martin, Speech and Language Processing, Prentice Hall, Taiwan, 2009. [4]Wai C. Chu, Speech Coding Algorithms, Wiley Interscience, US, 2003. [5]Thuong Le-Tien, and H. Dinh Chien, Vietnamese Speech Recognition Applied to Robot Communications, AU journal of Tech. 99-104, 2004 [6]John-Paul Hosom, Vietnamese Large Vocabulary Continuous Speech Recognition, In: 9th European Conferenceon Speech Communication and Technology, Lisbon, Portugal, 2005 [7]楊中志，自學越南會話，萬人出版社，2009。 [8]維基百科，http://zh.wikipedia.org/ [9]國立教育廣播電臺，http://wwwner.ner.gov.tw/digital_archives/pages.php?serial=104 [10]Omniglot, http://www.omniglot.com/

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0910112-154149.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS