Responsive image
博碩士論文 etd-0910112-151648 詳細資訊
Title page for etd-0910112-151648
論文名稱
Title
國語、台語及粵語三語言語音辨識系統之設計研究
A Design of Trilingual Speech Recognition System for Chinese, Taiwanese and Cantonese
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
59
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-07-25
繳交日期
Date of Submission
2012-09-10
關鍵字
Keywords
音位結構學、隱藏式馬可夫模型、線性預估倒頻譜係數、梅爾頻率倒頻譜係數、語音辨識
Phonotactics, Hidden Markov model, Linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, Speech recognition
統計
Statistics
本論文已被瀏覽 5641 次,被下載 413
The thesis/dissertation has been browsed 5641 times, has been downloaded 413 times.
中文摘要
國語、台語及粵語三語言皆屬於中文。根據美國桑默語言學院統計,中文的世界人口使用數為十二億多,位居全球第一,且使用此三語言之國家,皆在國際經貿上占有一席之地,例如香港以及台灣,皆具有進出口貿易頻繁的港口。再者國語、台語及粵語皆屬中國七大方言。國語在早期就被聯合國承認為語言,粵語是在2006被聯合國所承認。粵語在西方國家使用人口數眾多,目前為澳大利亞第四大語言,加拿大與美國之第三大語言。就語音學而言,此三種語言皆為聲調語言,亦即字詞之語意,會因發音音調或發聲長短之不同而有所差異。
本論文探討國語、台語及粵語三語言語音辨識系統之設計與實作策略。吾人依據三語言之發音特性,歸納挑選各語言之常用單音節,再配合三語言之聲調,來作系統訓練與辨識之依據。本實作採用梅爾頻率倒頻譜係數與線性預估倒頻譜係數,來作雙特徵參數之萃取,再運用隱藏式馬可夫模型來作單音之辨識,最後以音位結構交叉比對之機制,來辨識出正確之語詞。在AMD Athlon XP 2800+ 之個人電腦與Ubuntu 9.04之作業系統環境下,針對82,000筆國語、5,129筆台語及3,051筆粵語詞彙資料庫,系統之正確語詞辨識率可分別達到88.03%、86.00% 與86.79%。吾人並於上述訓練架構下,建置三語言辨識系統,選取100筆各個語言之常用語詞,對此300筆資料,作語言別及語詞正確與否之判定,系統語言語詞之正確辨識率可達97.66%。
Abstract
Mandarin Chinese, Taiwanese and Cantonese all belong to the Chinese language family. According to the statistics from Summer Institute of Linguistics, USA, Chinese language are spoken by over 1.2 billion population, ranked number one in the world. The regions where these three languages are spoken have been playing an important role for global economy. For example, Hong Kong and Taiwan all have flourishing harbors for international trade. Furthermore, Mandarin Chinese, Taiwanese and Cantonese are the most influential among the seven Chinese dialects. Mandarin Chinese was admitted as a language by the United Nations in the early years while Cantonese was accepted in 2006. Cantonese is spoken in many Western countries. She is the fourth language in Australia as well as the third language in Canada and America. From the phonetics point of view, these three languages are all tonal languages in which words or phrases uttered in different pitch or duration have distinct lexical meaning.
This thesis investigates the design and implementation strategies for Chinese, Taiwanese and Cantonese. Based on their pronunciation rules and tonal properties, common mono-syllables for each language are selected and utilized as the major speech training and recognition methodology. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, the correct recognition rates of 88.03%, 86.00% and 86.79% can be reached using phonotactical rules for the 82,000 Chinese, 5,129 Taiwanese and 3,051 Cantonese phrase database respectively. Furthermore, a trilingual language-speech recognition system for 300 common words, composed of 100 words from each language, is developed. A 97.66% correct language-phrase recognition rate can be obtained.
目次 Table of Contents

論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
目錄 v
圖次 vii
表次 viii
第一章 緒論 1
1-1 研究動機 1
1-2 研究目的 3
1-3 論文章節概要 3
第二章 國語、台語及粵語語音學介紹 4
2-1 三國語言簡介 4
2-1-1三國語言起源 4
2-1-2三國語言探討 6
2-2拼音介紹 7
2-3 聲調介紹 10
2-4 變調介紹 11
第三章 語音辨識系統的流程與數學原理 15
3-1 音節切割之音框能量與越零率 16
3-2 音節切割之線性預測係數誤差能量 17
3-3 特徵萃取前處理之預強與加視窗 18
3-4特徵萃取流程 19
3-4-1線性預估倒頻譜係數 20
3-4-2線性預估倒頻譜係數 26
3-5隱藏式馬可夫模型 30
3-6隱藏式馬可夫模型遭遇之三大基本問題 32
3-7音位結構交叉比對 39
3-8聲調辨識 40
第四章 語音辨識系統之訓練策略 43
4-1 硬體設備與系統參數 43
4-2單音模型與訓練方式 44
4-3模擬詞彙建構 45
4-4單音訓練次數與辨識率之關係 46
第四章 語音辨識系統之訓練策略 48
參考文獻 49
參考文獻 References
[1]台灣海外網,http://www.taiwanus.net
[2]維基百科,http://zh.wikipedia.org/
[3]國立臺灣師範大學國音教材編輯委員會編纂,國音學,正中書局股份有限公司,民國96年
[4]Chinese Hakka e-learning, http://203.64.183.226/public2/hakka-edu/hakka-data.htm
[5]中文字元資料頁,http://input.foruto.com/ccc/index.htm
[6]香港中文大學-粵語網路課程,http://www.ilc.cuhk.edu.hk/chinese/pthprog1/index.html
[7]鄭良偉,台語的語音與詞法,遠流出版社,1997
[8]Wai C. Chu, Speech Coding Algorithms, Wiley Interscience, US, 2003.
[9]Lawrence R. Rabiner, Ronald W. Schafer, Theory and Applications of Digital Speech Processing, Prentice Hall, Taiwan, 2010.
[10]X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2001.
[11]Gang Peng, Hongying Zheng and William S-Y. Wang, Tone Recognition for Chinese Speech : A Comparative Study of Mandarin and Cantonese, ISCSLP 2004
[12]Chuan-Jie Lin and Hsin-His Chen, A Mandarin to Taiwanese Min Nan Machine Translation System with Speech Synthesis of Taiwanese Min Nan, Computational Linguistics and Chinese Language Processing, 1999
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code