Responsive image
博碩士論文 etd-0812111-144756 詳細資訊
Title page for etd-0812111-144756
論文名稱
Title
葡文語音辨識系統之設計研究
A Design of Portuguese Speech Recognition System
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
72
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2011-07-18
繳交日期
Date of Submission
2011-08-12
關鍵字
Keywords
線性預估倒頻譜係數、梅爾倒頻譜係數、葡萄牙文語音辨識系統、隱藏式馬可夫模型、音位結構學
Phonotactics, Portuguese Speech recognition system, Linear predicted cepstral coefficients, Hidden Markov model, Mel-frequency cepstral coefficients
統計
Statistics
本論文已被瀏覽 5699 次,被下載 2357
The thesis/dissertation has been browsed 5699 times, has been downloaded 2357 times.
中文摘要
近年來,電腦龍頭IBM與著名語音科技公司Nuance,相繼在市場上推出許多語音辨識之應用技術,兩家公司不約而同地加快在汽車、通訊及八大產業上之佈局與拓展。其範圍含括銀行、電子、能源與公用事業,醫療與生命科學,保險、媒體與娛樂,以及旅遊服務與交通運輸等產業。語音科技的成熟所能體現的智慧生活型態,是過去我們所無法想像的,如此完善的技術將帶給我們前所未有的舒適與愜意。2011年4月製造業大廠富士康決定在巴西投資120億美元,設立iPad 及iPhone製造工廠,而巴西是世界上最大葡萄牙語使用人口的國家。因此,建立一套葡萄牙文語音辨識系統,提供葡萄牙語學習,進而了解當地文化,擴展旅遊與生活視野,為本研究之主要目的。
本論文主要探討葡萄牙文語音辨識系統之設計與實作策略。系統之架構是以葡萄牙語常用單音節為基礎,依據葡萄牙語發音方式,歸納出303類單音以作為主要訓練與辨識之模型。每類單音由五輪之錄製而得,每輪依序由第1類錄至第303類,每類一次唸陰帄一聲與去聲四聲兩音來詮釋非重音與重音之差別。經此訓練策略,每類單音將可獲得10次之訓練語料。本系統採用梅爾頻率倒頻譜係數與線性預估倒頻譜係數來作特徵參數之萃取,以隱藏式馬可夫模型來作單音之辨識,最後輔以音位結構學的判別,辨識出待測語詞之結果。研究結果顯示,在CPU時脈為 2.2 GHz的AMD Athlon XP 2800+之個人電腦與Ubuntu 9.04 作業系統下,針對3900筆葡萄牙文日常生活語詞之資料庫,本系統可獲得87.26%的正確辨識率。系統所需辨識時間約為1.5秒,而總訓練時間約為2小時。
Abstract
IBM, a well-known computer giant, and Nuance, a renowned speech technology firm, have been offering numerous speech recognition applications in the recent years. The connections between these two companies and the automobile, communication, and other eight dominating industries, including banking, electronics, energy/utilities, medical/life science, insurance, media/entertainment, retail travel and transportation, are vastly expanded and flourished. Maturity of these speech technologies drives our lifestyle to a cozy level that we cannot imagine before. In April, 2011, the world class manufacturer Foxconn decided to invest 12 billion US dollars to build iPhone/iPad factories in Brazil, the largest Portuguese speaking country in the world. It is our objective to build a language system that can help us to learn Portuguese, to savor the beauty of their culture, and to widen our vision of travel and living.
This thesis investigates the design and implementation strategies for a Portuguese speech recognition system. It utilizes the speech features of the 303 common Portuguese mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Portuguese pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones.
The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 87.26% can be reached using phonotactical rules for a 3,900 vocabulary Portuguese phrase database. The average computation time for the Portuguese phrase system is less than 1.5 seconds, and the training time for the systems is about two hours.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖次 vii
表次 viii
第1章 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究內容 3
1.4 論文大綱 4
第2章 葡萄牙文簡介 5
2.1 語系概論 5
2.2 葡萄牙文字介紹 8
2.2.1 古葡萄牙語:葡萄牙語-加利西亞語時期 8
2.2.2 古葡萄牙語:地理大發現時期 9
2.2.3 當代葡萄牙語 9
2.2.4 現今葡萄牙語的使用狀況 10
2.3 葡萄牙文字的發展 11
2.3.1 母音發音規則 12
2.3.2 子音發音規則 14
2.3.3 語音符號 17
第3章 語音辨識原理及技術 18
3.1 語音相關前處理 18
3.1.1 音節切割 18
3.1.2 音訊預處理 20
3.2 特徵參數的介紹 23
3.2.1 線性預估倒頻譜係數 23
3.2.2 梅爾倒頻譜係數 30
3.3 隱藏式馬可夫模型的簡介 35
3.3.1 估算狀態路徑機率 36
3.3.2 最佳狀態序列 40
3.3.3 模型參數估算 42
第4章 系統流程及架構 44
4.1 系統流程 44
4.2 單音的選取 47
4.3 單音模型訓練次數的選取 49
4.4 音位結構學比對 53
4.5 硬體架構及規範 55
第5章 系統實作及模擬 56
5.1 葡萄牙語常用字詞辨識系統 56
5.2 葡萄牙語人名辨識系統 58
第6章 結論與未來展望 60
參考文獻 62
參考文獻 References
[1] SIL, http://www.sil.org/
[2] 維基百科,http://zh.wikipedia.org/
[3] 王鎖瑛,葡萄牙語語法,上海外語教育出版社。
[4] Omniglot, http://www.omniglot.com/
[5] Thomas F. Quatieri, Discrete Time Speech Signal Processing Principles and Practice, Pearson , Taiwan, 2003.
[6] 王小川,語音訊號處理,全華圖書出版,民國93年。
[7] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2000.
[8] Wai C. Chu, Speech Coding Algorithms, Wiley , Taiwan, 2003.
[9] S.Cerqueira Bisp dos Santos and A. Alcaim,“Reduced sets of subword units for continuous speech recognition of Portuguese,” Electronics Letters, pp.586-588, Mar 2000.
[10] V. Pera, F. Sa, P. Afonso, and R. Ferreira, “Audio-visual speech recognition in a Portuguese language based application,” IEEE International Conference on Industrial Technology (ICIT 2003), pp. 688-692, 2003.
[11] M. E. Dajer, J. C. Pereira and C. D. Maciel, “Nonlinear dynamical analysis of normal voices,” Seventh IEEE International Symposium on Multimedia, Dec. 2005.
[12] Andrade De A. Bresolin, A. D. D. Neto and P. J. Alsina, “Digit recognition using wavelet and SVM in Brazilian Portuguese,” IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP 2008), pp. 1545-1548, 2008.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code