Responsive image
博碩士論文 etd-0910112-155203 詳細資訊
Title page for etd-0910112-155203
論文名稱
Title
國語、義大利語及波斯語三語言語音辨識系統之設計研究
A Design of Trilingual Speech Recognition System for Chinese, Italian and Farsi
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
61
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-25
繳交日期
Date of Submission
2012-09-10
關鍵字
Keywords
語音辨識、線性預估倒頻譜係數、梅爾頻率倒頻譜係數、音位結構學、隱藏式馬可夫模型
Speech recognition, Linear predicted cepstral coefficients, Hidden Markov model, Mel-frequency cepstral coefficients, Phonotactics
統計
Statistics
本論文已被瀏覽 5647 次,被下載 237
The thesis/dissertation has been browsed 5647 times, has been downloaded 237 times.
中文摘要
中國、義大利與伊朗,這三個在語言、歷史、文化與經濟,看似差異頗大的國家,其實其相互間的交流,早就存在。公元四世紀時,統一中國北方的北魏王朝,與位於今日伊朗的波斯帝國,建立了緊密的經貿聯繫。考古學家經由近年北魏出土銀碗的外觀及材質判知,其與今日存於伊朗的薩珊式波斯銀器,極為相似。由此可知,當時中國與波斯雙方經貿往來之盛行。另外在公元十三世紀,中國的元朝時,義大利旅遊冒險家兼商人馬可波羅,來到東方的中國,帶回許多中國的器物,並寫下傳世著作「馬可波羅遊記」,述說當時中國的美好與進步,是今日中義雙邊往來之濫觴。現代東方社會所喜愛的亞曼尼西裝和法拉利跑車,均產於義大利,代表著亞歐間經貿文化交流的結果。因此,吾人希望建立一套三語言之語音辨識系統,能對國語、義大利語和波斯語之學習,能產生實質的助益。
本論文之語音辨識系統,運用線性預估倒頻譜係數和梅爾頻率倒頻譜係數,來作單音節雙特徵參數之萃取,再經隱藏式馬可夫模型之候選單音排序,最後以音位結構學之比對,來挑選出最佳的辨識結果。國語以錄製一輪2,699筆二字詞的方式,來作單音節訓練之依據;義大利語和波斯語,則使用了陰平與去聲兩類單音,共五輪十次的策略,來作訓練。針對82,000筆國語語詞、27,900筆義大利語詞與4,000筆波斯語詞之資料庫,本實作系統之語詞正確辨識率,可分別達到87.54%、87.48%與90.33%。而平均辨識時間,約在1.5秒之內。吾人運用上述訓練架構,建置一套三語言之辨識系統,各選取100筆各個語言之常用語詞,對此300筆資料做語言別及語詞正確之判定,系統辨識率可達98.67%,而平均辨識時間約為2秒。
Abstract
China, Italy and Iran are seemingly quite different in language, history, culture and economy. However, there have been existed mutual interactions among these three countries during the past age. In the fourth century, the Chinese Northern Wei Dynasty established close relation with the Persian Empire, located in Iran today. Persian language is also called Farsi in her native name. The unearthed silver bowls from China in the recent years showed similar appearance and material with the Sassanid-Persian silverware of Iran. Archaeologists found that ancient China and Iran used to be close international trading partners. In the thirteenth century, Marco-Polo, an Italian travel adventurer and merchant, visited Chinese Yuan Dynasty, and wrote a marvelous book “The Travels of Marco-Polo”. Fantastic experiences in China were depicted in this journal, and these initiated the Sino-Italian relation in the early days. Armani suits and Ferrari super racers become the oriental passion to the Italy in the Modern China, and this may represent the achievement of Asian-European culture exchange. Therefore, it is our objective to design a trilingual speech recognition system to help us to learn Chinese, Italian and Farsi languages.
Linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, hidden Markov model and phonotactics are used in this system as the two syllable feature models and the recognition model respectively. For the Chinese system, a 2,699 two-syllable words database is used as the training corpus. For the Italian and Farsi systems, a database of 10 utterances per mono-syllable is established by applying their pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with tone 1 and tone 4. The correct recognition rates of 87.54%, 87.48%, and 90.33% can be reached for the 82,000 Chinese, 27,900 Italian, and 4,000 Farsi phrase databases respectively. The computation time for each system is within 1.5 seconds. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 98.67 % correct language-phrase recognition rate can be obtained with the computation time about 2 seconds.
目次 Table of Contents
論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
目錄 v
圖次 vii
表次 viii
第一章 緒論 1
1.1 研究動機 1
1.2 研究方法 2
1.3 論文章節概要 3
第二章 三國語言語音學 4
2.1 國語 4
2.2 義大利語 6
2.2.1 字母表 8
2.2.2 母音與子音發音規則 9
2.2.3 發音的音節區分 14
2.2.4 重音符號與發音規則 15
2.3 波斯語 16
2.3.1 字母表 16
2.3.2 母音與子音發音規則 17
2.3.3 波斯文的特殊發音和符號 20
2.3.4 波斯文的音節和重音區分方式 21
第三章 語音辨識系統的流程架構 22
3.1 音節切割 23
3.1.1 能量(Energy) 23
3.1.2 越零率(Zero Crossing Rate) 24
3.1.3 線性預估係數誤差能量 24
3.2 語音訊號前處理過程 26
3.2.1 高頻預強調(Pre-Emphasis) 26
3.2.2 加視窗(Windowing)與取音框 26
3.3 特徵萃取流程 28
3.3.1 線性預估倒頻譜係數 28
3.3.2 梅爾頻率倒頻譜係數 30
3.4 隱藏式馬可夫模型 33
3.4.1 參數模型初始化 35
3.4.2 參數重估(Parameter Estimation) 36
第四章 語音辨識系統實作成果與辨識效能 42
4.1 國語辨識系統 42
4.2 義大利大詞彙辨識系統 44
4.3 波斯文小詞彙辨識系統 46
4.4 三國語言辨識系統 48
第五章 結論與未來展望 50
參考文獻 51
參考文獻 References
[1] 維基百科,http://zh.wikipedia.org/
[2] 林立樹,義大利史-西方文化的智庫,三民書局,2008。
[3] Nannini Alda 藤谷道夫原作, 鄭明德譯,義大利語入門,旺文社股份有限公司,1997。
[4] 康華倫,初級義大利文文法,茂昌圖書有限公司,2004。
[5] Persian alphabet , http://en.wikipedia.org/wiki/Persian_alphabet
[6] Farsi Pronouns , http://mylanguages.org/farsi_pronouns.php
[7] 王小川,語音訊號處理,台北:全華圖書,2004。
[8] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2001
[9] 義大利辦事處,http://www.italy.org.tw/Chinese/index.html/
[10] D'Orta, P. “A speech recognition system for the Italian language,” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.pp. 841 – 843, Apr 1987
[11] Lee, C.H., Rabiner, L.R., Pieraccini, R. and Wilpon, J.G., “Acoustic Modeling for Large Vocabulary Speech Recognition,” Computer Speech and Language, pp. 127-165, 2009
[12] Rashedi, A, “Appropriate Farsi speech recognizer for commanding robots, ” IEEE 10th International Conference on Signal Processing (ICSP), pp. 573 – 576,2010
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code