Responsive image
博碩士論文 etd-0827111-202317 詳細資訊
Title page for etd-0827111-202317
A Design of Recognition Rate Improving Strategy For English Speech Recognition System
Year, semester
Number of pages
Advisory Committee
Date of Exam
Date of Submission
Mel-frequency cepstral coefficients, Phonotactics, Hidden Markov model, Linear predictive cepstral coefficients, English speech recognition system
本論文已被瀏覽 5696 次,被下載 1018
The thesis/dissertation has been browsed 5696 times, has been downloaded 1018 times.
本論文探討英文語音辨識系統增進辨識率之策略。以英文常用的989個單音節作為主要的訓練與辨識之基礎。將每個單音節類別以一聲錄製一輪,再以四聲錄製下一輪。兩種不同聲調交替錄音,共錄製十四輪的聲紋特性作為訓練語料。並使用音高週期判斷語音尾部端點,以強化端點標示的準確性。系統採用梅爾頻率倒頻譜係數及線性預測倒頻譜係數作為特徵參數,運用隱藏式馬可夫模型,作為單音辨識模型,且調整模型狀態數目到10,再利用音位結構學做比對,以增進辨識率。本系統在時脈為2.4GHz的Intel Core i5 CPU M450之筆記型電腦與Fedora 14 之作業系統環境下,針對6,812個英文語詞做辨識,可達到92.94%之正確辨識率,而平均所需辨識時間約在1.5秒以內。
Britain established the status of maritime hegemony in 1588. The English language along with the British colonized activities was spread to North America, India, Africa and Australia. After the end of World War I in 1918, the U.S. became the most powerful nation in the world economy. And at the same time, the world financial center was shifted to New York from London. In 1945, the World War II ended, the U.S. further played indispensable role in each aspect of international politics, economy and technologies. The United Nation, founded on October 24, 1945, adopted English, Chinese, French, Spanish, Arabic as well as Russian as the six working languages. These historical events facilitated a succession of language expansion and caused English to be the most widely used international language. Beside the political, economic and technological superiority, Britain owns the largest comprehensive museum in the globe, the British Museum. This Museum was located in London, built in 1753, and more than 13 million cultural relics of archaeology from around the world were collected. Her cultural resources are remarkably rich. It is our objective to build a language system that can help us to learn English more effectively and to widen our vision of living at the same time.
This thesis investigates the recognition rate improvement strategies for an English speech recognition system. It utilizes the speech features of the 989 common English mono-syllables as the major training and recognition methodology. A training database is established by reading each mono-syllable 14 rounds. Each one of the 989 mono-syllables is consecutively read with two different tones at alternate rounds. The odd pronounced rounds have high pitch of tone 1, while the even rounds have falling pitch of tone 4. The pitch period frame method is applied for enhancing the accuracy of end point detection. Mel-frequency cepstral coefficients, linear predictive cepstral coefficients, and hidden Markov model are used as the two feature models and the recognition model respectively. The number of HMM states is adjusted to 10 and the phonotactical rule is used for the recognition rate improvement. Under the Core ™ i5 CPU M450 notebook computer with 2.4GHz clock rate and Fedora 14 operating system environment, a 92.94% correct phrase recognition rate can be reached for a 6,812 English phrase database. The average computation time for each phrase is within 1.5 seconds.
目次 Table of Contents
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
目錄 v
圖次 viii
表次 ix

第一章 緒論 1
1-1 研究動機 1
1-2 研究目標 1
1-3 章節概要 2

第二章 英文的歷史演變及其發音特性 3
2-1 語系的分類及使用概況 3
2-2 英語的演變歷史 5
2-3 英語的發音 7

第三章 語音訊號處理相關技術介紹 10
3-1 音節端點標示 10
3-1-1 能量(Energy) 10
3-1-2 越零率(Zero Crossing Rate) 11
3-1-3 線性預測係數誤差能量(Linear Prediction Coefficients Error Energy) 12
3-2 梅爾倒譜係數特徵萃取(MFCC Feature Extraction) 14
3-2-1 預強濾波器(Pre-emphasis) 14
3-2-2 音框化(Frame Blocking) 15
3-2-3 加窗(Windowing) 15
3-2-4 離散傅氏轉換(Discrete Fourier Transform) 16
3-2-5 梅爾濾波器組(Mel-Frequency Filter Bank) 16
3-2-6 離散餘弦轉換 18
3-2-7 線性預測倒頻譜係數(LPC-Cepstrum) 19

4-1 信號模型 22
4-2 隱馬可夫模型介紹 22
4-3 隱馬可夫模型解三項問題 24
4-4 建立隱馬可夫模型 25
4-4-1 初始化 25
4-4-2 狀態觀察序列機率計算 26
4-4-3 參數重估(Parameter Estimation) 27
4-5 參數重估計算(Reestimation) 28
4-5-1 向前程序(Forward Procedure) 28
4-5-2 向後程序(Backward Procedure) 29
4-5-3 向前向後程序(Forward-Backward Procedure) 29
4-5-4 狀態轉移機率矩陣參數重估 30
4-5-5 狀態觀察機率矩陣參數重估 30
4-6 維特比演算法(Viterbi Algorithm) 32

第五章 英文語音辨識系統介紹 34
5-1 系統架構 34
5-1-1 模型訓練系統架構 34
5-1-2 辨識系統架構 35
5-2 設備環境 36
5-3 設計方法 36
5-3-1 音標代碼 36
5-3-2 音節類別的選取與語詞資料庫的建置 38
5-3-3 語音的活動與停止判別 42
5-3-4 音節的端點標示 42
5-3-5 聲學特徵的萃取 47
5-3-6 HMM模型訓練方式 47
5-3-7 決策方式 48

第六章 辨識策略研究及實驗設計 49
6-1 實驗參數設定與模擬語詞數量 49
6-2 調整模型訓練次數及訓練方式之實驗 50
6-2-1 單音節類別模型之訓練次數 50
6-2-2 單音節類別模型之訓練方式 52
6-3 改變端點標示方式實驗 55
6-3-1 保留完整子音的端點標示方式對辨識率的影響 55
6-4 採用多元聲學特徵參數實驗 57
6-4-1 單一聲學特徵與多元聲學特徵對辨識率的影響 57
6-5 調整HMM狀態數實驗 58
6-5-1 HMM狀態數對辨識率的影響 58

第七章 結論與未來展望 60
7-1 結論 60
7-2 未來展望 60

參考文獻 61
參考文獻 References
[1] 王小川,語音訊號處理, 全華圖書出版社,民國93年
[2] 胡航,語音信號處理,哈爾濱工業大學出版社,2009
[3] 陳永銘,英文語音辨識系統之設計研究, 國立中山大學電機工程研究所碩士論文, 民國98年7月
[4] 越力,語音信號處理,機械工業出版社,2009
[5] 維基百科,
[6] 劉樂和 宋庭新,語音識別與控制應用技術,科學出版社,2008
[7] Ben Gold and Nelson Morgan, Speech and Audio Signal Processing, John Wiley & Sons, inc., 1999
[8] Chin-Hui Lee, Haizhou Li, Bin Ma,Donglai Zhu, “Optimizing the Performance of Spoken Language Recognition With Discriminative Training”, IEEE Transactions on audio, Speech, and language processing. Vol.16, No.8,pp.1642-1652, November 2008
[9] Emmanuel Deruty and Geoffroy Peeters, “Sound Indexing Using Morphological Description”, IEEE Transactions on Audio, Speech, and Language processing, Vol. 18, No. 3,pp.675-687, March 2010
[10] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete-Time Processing of Speech Signal, IEEE Press, New York, 2000.
[11] Kai-Fu Lee, Automatic Speech Recognition, Kluwer Academic Publishers, Fourth Printing 1999.
[12] Thomas F. Quatieri, Discrete-Time Speech Signal Processing principles and practice, Prentice Hall, Taiwan, 2005
[13] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2005
電子全文 Fulltext
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available

紙本論文 Printed copies
開放時間 available 已公開 available

QR Code