國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,國語、葡萄牙語及印地語三語言語音辨識系統之設計研究 ,A Design of Trilingual Speech Recognition System for Chinese, Portuguese and Hindi

論文名稱 Title	國語、葡萄牙語及印地語三語言語音辨識系統之設計研究 A Design of Trilingual Speech Recognition System for Chinese, Portuguese and Hindi
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	71
研究生 Author	王裕安 Yu-an Wang
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	柏小松 Sheau-Shong Bor
口試日期 Date of Exam	2012-07-25	繳交日期 Date of Submission	2012-09-10
關鍵字 Keywords	語音辨識、梅爾頻率倒頻譜係數、線性預估倒頻譜係數、隱藏式馬可夫模型、音位結構學 Linear predicted cepstral coefficients, Hidden Markov model, Phonotactics, Mel-frequency cepstral coefficients, Speech recognition
統計 Statistics	本論文已被瀏覽 5649 次，被下載 543 次 The thesis/dissertation has been browsed 5649 times, has been downloaded 543 times.

中文摘要
金磚五國的人口數、經濟成長與國土面積，在全球佔有相當大的份量。首先，中國除了有悠久的歷史外，也是世界上人口最多的國家，近年來各方面發展迅速，與台灣經貿上的往來日益密切，且已擠身世界強國之列。其次，巴西為世界上葡萄牙語使用人口最多的國家，去年台灣鴻海集團旗下的製造業大廠富士康決定在巴西建立美國蘋果公司的iPad 及iPhone製造工廠，使台灣科技業的版圖不斷擴大。最後，印度近年則在軟體、電信和航空等相關產業蓬勃發展，為世界上發展最快的國家之一。西方大企業的資管軟體海外外包，讓印度因價格低廉，拔得了頭籌。漢語、葡萄牙語以及印地語在世界上的使用人口合計超過15.73億人口，佔世界總人口的22%。因此，建立一套國語、葡萄牙語和印地語三語言的語音辨識系統，無論在語言交流和瞭解當地文化上，都有極大幫助。本論文主要探討國語、葡萄牙語和印地語三語言語音辨識系統之設計與實作策略。吾人依照國語、葡萄牙語和印地語之發音規則，分別歸納出404類、515類以及244類常用單音節的語音特徵，來作為主要訓練與辨識之依據。本系統以梅爾倒頻譜係數與線性預估倒頻譜係數來作雙特徵參數之萃取，並運用隱藏式馬可夫模型來作單音之辨識。在CPU 時脈為2.2 GHz的AMD Athlon XP 2800+ 之個人電腦與Ubuntu 9.04作業系統之環境下，針對82,000筆中文語詞、30,000筆葡萄牙文語詞、3,900筆印地文語詞，運用音位結構學之判別，系統辨識率可分別達到87.69% 、85.14% 和86.74%。運用上述之訓練架構，在三語言辨識系統實作上，吾人對每個語言各選取100筆語詞，共300筆的總語詞量，系統之語言正確辨識率可達98.00%，而每系統之平均辨識時間皆在2秒以內。
Abstract
The BRICS, Brazil, Russia, India, China and South Africa, have been making a significant amount of contribution to the global economy growth in the past few years. China possesses not only the largest population, but also the most splendid history in the world. During the recent years, the rapid development on all respects, including the enhanced economic trade with Taiwan, has made China in the line of the Super Powers. Brazil is the largest Portuguese speaking country in the world, where the world class manufacturer Foxconn Technology decided to build Apple iPad/iPhone factory in 2011. India has been flourishing in software, tele-communications and aviation industries since last decade. Offshore outsourcing consulting is so popular due to cost-down policy of the Western companies. Chinese, Portuguese and Hindi speaking population are over 1.573 billion, and account for over 22% of the world population. Therefore, it is our objective to establish a trilingual speech recognition system to help verbal communication and cultural understanding among languages. This thesis investigates the design and implementation strategies for a trilingual speech recognition system for Chinese, Portuguese and Hindi. Based on their pronunciation rules, the 404 Chinese, 515 Portuguese and 244 Hindi common mono-syllables are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct phrase recognition rates of 87.69%, 85.14% and 86.74% can be reached using phonotactical rules for the 82,000 Chinese, 30,000 Portuguese and 3,900 Hindi phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98% correct language-phrase recognition rate can be reached. The average computation time for each system is within 2 seconds.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v 圖次 ix 表次 x 第一章緒論 1 1.1 研究動機 1 1.2 研究目的 2 1.3 論文章節概要 2 第二章中葡印三國語言簡介 3 2.1 中文概述 3 2.2 葡萄牙文與印地文簡介 5 2.2.1 語系概覽 5 2.2.2 起源與發展 8 2.3 葡萄牙文之發音介紹 10 2.3.1 語音符號 10 2.3.2 母音發音規則 11 2.3.3 子音發音規則 12 2.4 印地文之發音介紹 14 2.4.1 母音發音規則 14 2.4.2 子音發音規則 15 2.4.3 語音符號 16 2.4.4 子音字母之合寫 18 2.4.5 音節與重音 19 第三章語音辨識系統的流程及數學架構 20 3.1 音節切割 21 3.1.1 音框能量 21 3.1.2 越零率 22 3.1.3 線性預估係數誤差能量 22 3.2 特徵萃取流程 24 3.2.1 語音訊號之前置處理 24 3.2.2 梅爾頻率倒頻譜係數 27 3.2.3 線性預估倒頻譜係數 32 3.3 隱藏式馬可夫模型 36 3.3.1 估算狀態機率問題 38 3.3.2 最佳狀態序列問題 42 3.3.3 模型參數估算問題 44 第四章辨識系統之訓練策略 45 4.1 硬體架構及規範 45 4.2 三國模擬語詞建構 46 4.3 單音類別的選取 47 4.4 單音模型訓練次數的選取 48 第五章系統實作及模擬 51 5.1 國語常用字詞辨識系統 51 5.2 葡萄牙語常用字詞辨識系統 53 5.3 印地語常用字詞辨識系統 54 5.4 三國語言常用字詞辨識系統 56 第六章結論與未來展望 58 參考文獻 59

參考文獻 References
[1] 維基百科，http://zh.wikipedia.org/ [2] 王鎖瑛，葡萄牙語語法，上海外語教育出版社，民國88年。 [3] 金鼎漢，印地語基礎教程-第1冊，北京大學出版社，民國81年。 [4] 劉安武，印度印地語文學史，人民文學出版社，民國76年。 [5] 殷洪元，印地語語法，北京大學出版社，民國82年。 [6] SIL, http://www.sil.org/ [7] Omniglot, http://www.omniglot.com/ [8] Thomas F. Quatieri, Discrete Time Speech Signal Processing Principles and Practice, Pearson , Taiwan, 2003. [9] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, Spoken Language Processing A Guide to Theory, Algorithm and System Development, Pearson Education Taiwan Ltd, 2005. [10] Wai C. Chu, Speech Coding Algorithms :Foundation and evolution of standardized coders, John Wiley & Sons, Taiwan, 2003.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0910112-155529.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS