Responsive image
博碩士論文 etd-0703103-001255 詳細資訊
Title page for etd-0703103-001255
論文名稱
Title
結合隱藏式馬可夫模型與一階動態規劃演算法之連續語音辨識系統.
The Continuous Speech Recognition System Base on Hidden Markov Models with One-Stage Dynamic Programming Algorithm.
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
97
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2003-06-28
繳交日期
Date of Submission
2003-07-03
關鍵字
Keywords
一階動態規劃演算法、隱藏式馬可夫模型、連續語音辨識
Hidden Markov Models, Continuous Speech Recognition, One-Stage Dynamic Programming Algorithm
統計
Statistics
本論文已被瀏覽 5722 次,被下載 0
The thesis/dissertation has been browsed 5722 times, has been downloaded 0 times.
中文摘要
本文以隱藏式馬可夫模型(Hidden Markov Model)為架構,結合『一階動態規劃演算法』(One Stage Dynamic Programming Algorithm),對非特定語者,設計一連續語音辨識系統。
為期該架構能配合不同硬體之效能,文中設計多種語音特徵參數與之搭配,並針對隱藏式馬可夫模型之建立,進行演算法最佳化。最後運用『狀態長度』( State Duration)及語音暫態資訊(Speech Temporal Information)所擷取出之『音調轉換參數』來提升連續語音辨識率。
經語料庫實驗證實,文中所提出之語音辨識架構,在加入狀態長度及語音轉調特性參數後,其辨識率較傳統之一階動態規劃演算法提高18%,使系統對非特定語者,在獨立字之辨識率高於96%,連結字之辨識率亦可達到74%以上;若將此架構應用於特定語者時,連結字辨識率更可達92%以上。
Abstract
Based on Hidden Markov Models (HMM) with One-Stage Dynamic Programming Algorithm, a continuous-speech and speaker-independent Mandarin digit speech recognition system was designed in this work.
In order to implement this architecture to fit the performance of hardware, various parameters of speech characteristics were defined to optimize the process. Finally, the “State Duration” and the “Tone Transition Property Parameter” were extracted from speech temporal information to improve the recognition rate.
Via using the test database, experimental results show that this new ideal of one-stage dynamic programming algorithm , with “state duration” and “ tone transition property parameter” , will have 18% recognition rate increase when compare to the conventional one. For speaker-independent and connect-word recognition, this system will achieve recognition rate to 74%. For speaker-independent but isolate-word recognition, it will have recognition rate higher than 96%. Recognition rate of 92% is obtained as this system is applied to the connect-word speaker-dependent recognition.
目次 Table of Contents
第一章 緒論(Introduction)
1.1 前言(Foreword) 1
1.2 研究動機與目標(Motive and Purpose) 2
1.3 研究背景及文獻回顧(Research Background and Literature Review) 4
1.4 章節概要(Chapter Outline) 6
第二章 語音前置處理(Pre-processing of Speech Signal)
2.1 簡介(Introduction) 7
2.2 去除直流偏壓(DC Bias Removing) 7
2.3 音框處理(Frame Processing) 8
2.3.1 語音訊號切割(Segment Speech Signal) 8
2.3.2 前置強波處理(Pre-emphasis) 9
2.3.3 加窗處理(Windowing) 10
第三章 語音辨識之特徵參數(Speech Features for Speech Recognition)
3.1 簡介(Introduction) 13
3.2 線性預測倒頻譜參數(Linear Predictive Cepstral Coefficient) 13
3.2.1 線性預測編碼(Linear Predictive Coding) 14
3.2.2 線性預測參數倒頻譜轉換(LPC to Cepstral Coefficient Conversion) 17
3.2.3 參數權重調整(Parameter Weighting) 18
3.2.4 差分倒頻譜參數(Differential Cepstral Coefficient) 19
3.3 梅爾倒頻譜參數(Mel-Frequency Cepstral Coefficient) 20
3.3.1 快速傅利葉轉換(Fast Fourier Transform) 20
3.3.2 梅爾頻譜(Mel-Frequency Spectrum) 21
3.3.3 梅爾倒頻譜轉換(Mea-Frequency Cepstral Conversion) 22
3.4 其他特徵向量(Other Feature Vectors) 24
3.4.1能量參數(Energy Coefficient) 24
3.4.2越零率(Zero-crossing Rate) 26
3.4.3差分化參數及差量化參數(Differential Coefficient and
Delta Coefficient) 28
3.4.4 頻率-時間參數(Frequency-Time Coefficient) 29
3.5 本系統之特徵參數(The Presentation of Speech Features for Speech
Recognition System) 30
第四章 隱藏式馬可夫模型理論基礎與其建立方式( Theory and Implementation of Hidden Markov Models)
4.1 隱藏式馬可夫模型概述(Introduction of Hidden Markov Models) 31
4.2 隱藏式馬可夫模型之元素(Element of Hidden Markov Model) 32
4.3 隱藏式馬可夫模型之建立(Implementation of Hidden Markov Model) 33
4.4 機率估算(Probability Evaluation) 35
4.4.1 觀測機率(Observation Probability) 35
4.4.2 狀態轉移機率(State-Transition Probability) 37
4.4.3正算程序(Forward Procedure) 37
4.4.4 逆算程序(Backward Procedure) 38
4.5 狀態序列之最佳化(Optimization of State Sequence) 41
4.5.1 維特比演算法( Viterbi Algorithm) 41
4.5.2 替代式維特比演算法( Alternative Viterbi Algorithm
Implementation) 43
4.6 模型參數估測(Estimation of the Model Parameter) 45
4.6.1 切割K均值訓練程序(Segmental K-means Training Procedure) 45
4.6.2 波氏重估程序(Baum-Welch Re-estimation Procedure) 48
第五章 連結字辨識(Connect Word Recognition)
5.1 簡介(Introduction) 51
5.2 一階動態規劃演算法(One Stage Dynamic Programming Algorithm) 51
5.3狀態邊界長度(Bounded State Duration) 54
5.4音調轉換參數(Tone Transition Property Parameter) 57
5.5結合狀態邊界長度及語調轉換特性參數之一階動態規劃演算法
(Combining One-Stage Dynamic Programming Algorithm With
Bounded State Duration and Tone Transition Property Parameter) 60
第六章 連續語音訊號辨識系統之製作( HMM-Base Continuous Speech Recognition System Implementation)
6.1簡介(Introduction) 63
6.2語音前置處理及特徵參數擷取(Pre-processing and Feature Extraction) 65
6.2.1線性預測編碼為主軸之語音特徵參數組(Feature Coefficient Base
on Linear Predictive Coding) 66
6.2.2 梅爾倒頻譜為主軸之語音特徵參數組(Feature Coefficient Base
on Mel-Frequency Cepstral) 68
6.3語音模型之訓練----建立語音辨識之隱藏式馬可夫模型
(Training System---- Establish HMM of Speech Recognition) 69
6.3.1訓練語料庫(The Database for Training) 69
6.3.2 建立語音模型(Training of the Models) 70
6.4 語音辨識辨識程序(Speech Recognition System) 71
6.4.1 獨立字辨識程序(Isolate Word Recognition Procedure) 72
6.4.2 連結字辨識程序(Connect Word Procedure) 73
第七章 中文數字辨識實驗 (Mandarin Digits Recognition Experiment)
7.1 簡介(Introduction) 75
7.2語料庫(Database) 75
7.3 實驗規劃(Experiments Scheme) 76
7.4 收斂臨界值對系統之影響(The Effect of Convergence Threshold) 78
7.4.1 實驗方法(Experiment Method) 78
7.4.2 實驗結果與討論(Experiment Result and Discussion) 78
7.5音框重疊寬度及語音特徵參數之選用(Using of Overlap Size and Feature Coefficient) 80
7.5.1 實驗方法(Experiment Method) 80
7.5.2 實驗結果與討論(Experiment Result and Discussion) 80
7.6狀態數對系統之影響(The Effect of State Number to The System) 83
7.6.1 實驗方法(Experiment Method) 83
7.6.2 實驗結果與討論(Experiment Result and Discussion) 84
7.7狀態邊界長度對系統影響(The Effect of Bounded State Duration) 85
7.7.1 實驗方法(Experiment Method) 86
7.7.2 實驗結果與討論(Experiment Result and Discussion) 86
7.8 音調轉換參數對系統之影響(The Effect of Tone Transition Property
Parameter) 86
7.8.1 實驗方法(Experiment Method) 86
7.8.2 實驗結果與討論(Experiment Result and Discussion) 86
7.9 非特定語者之系統應用於特定語者之影響(The Effect of Applying Independent Speaker System to Dependent Speaker System ) 88
7.9.1 實驗方法(Experiment Method) 88
7.9.2 實驗結果與討論(Experiment Result and Discussion) 88
7.10 辨識演算法HMM與BPNN之比較(The Comparison Between Recognition Algorithm HMM and BPNN) 89
7.10.1 實驗方法(Experiment Method) 89
7.10.2 實驗結果與討論(Experiment Result and Discussion) 89
7.11 PC平台效能(The Performance on PC) 90
7.11.1 實驗方法(Experiment Method) 91
7.11.2 實驗結果與討論(Experiment Result and Discussion) 91
第八章 結論與展望(Conclusion and Prospect)
8.1 結論(Conclusion) 93
8.2 展望(Prospect) 94
參考文獻(Reference) 95
參考文獻 References
[1] X.D. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” IEEE Trans on ASSP,1991
[2] P. Woodland, “Speech Recognition,” IEE, 1998.
[3] H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition,” IEEE Trans on ASSP, Vol.26, pp 43-49, Feb. 1978.
[4] C. Myers and L.R. Rabiner, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” IEEE Trans on ASSP, Vol.28, No.6, pp 623-635, Dec. 1980.
[5] D.P. Morgan and C.L. Scofield, Neural Networks and Speech Processing, Kluwer Academic, 1991.
[6] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE Trans on ASSP, Vol.77, No.2, pp 257-286, Feb. 1989.
[7] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, pp 200-232, 1993.
[8] E. Keller, Fundamentals of Speech Synthesis and Speech Recognition Basic Concepts, State of the Art and Future Challenges, John Wiley and Sons, 1994.
[9] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall 1993.
[10] L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978.
[11] S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans on ASSP, Vol.28, No.4, pp357-366, Aug. 1980.
[12] Alan V. Oppenheim Ronald W. Schafer with John R. Buck, Discrete-Time Signal Processing, Prentice Hall 1999.
[13] 4.Claudio Becchetti and Lucio Prina Ricotti, Speech Recognition and Theory and C++ Implementation, John Wiley & Sons,LTD,1999
[14] Markel J.D. and Gray A.H., Jr.Linear “Prediction of Speech Springer-Verlag,” Berlin/New York,1976.
[15] Gray A.H. and Markel J.D “Distance measure for speech processing,” IEEE Trans Acoust. Speech Signal Process. 24,P380-391,1976
[16] Juang ,B.H.,Rabiner ,L.R,and Wilpon,J.G. “On the use of bandpass liftering in speech recognition,” .IEEE Trans. Acoust .Speech Signal Process. 35(7),P947-959,1987
[17] Xuang,X.D. and Ariki, Y. and Jack, M.A.”Hidden Markov Models for Speech Recogintion,”Edinfurgh University Press, Chap 7 ,P187-205,1990.
[18]Yumin Lee and Lin-Shan Lee “Continuous Hidden Markov Models integrating transitional and instantaneous features for Mandrin syllable recognition,”Computer Speech and Language,vol 7 ,P247-263 ,1993.
[19]L.R. Rabiner and B.H. Juang “Introduction to Hidden Markov Models,”IEEE ASSP Magazine,P4-16 Jun.1986.
[20]L. R. Rabiner, C. H. Lee, “A frame-synchronous network search algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, “Vol. 37, Issue 11, Nov., 1989.
[21]T. K. Vintsyuk, “Element-wise recognition of continuous speech composed of words from a specified dictionary,” Kibernetika, vol. 7, pp. 133-143, Mar.-Apr. 1971
[22]L.R.Rabiner, J.G. Wilpon, and B.H. Juang “A Segmental k-means training procedure for connected word recognition based on whole word reference patterns,”AT&T Tech,J,vol65,no3,P21-31,May/June 1986.
[23]H.Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans.Acoustics,Speech,Signal Proc. , vol.32 ,no2 , P263-271 , Arril 1984.
[24]S. E. Levinson, L. R. Rabiner, and M. M. Sondhi, “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1035-1074, Apr. 1983
[25]A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Roy. Stat. Soc., vol. 39, no. 1, pp. 1-38, 1977.
[26]L. A. Liporace, “Maximum likelihood estimation for multivariate observations of Markov sources,” IEEE Trans. Informat. Theory, vol. IT-28, no. 5, pp. 729-734, 1982.
[27]B. H. Juang, “Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains,” AT&T Tech. J., vol. 64, no. 6, pp. 1235-1249, July-Aug. 1985.
[28]B. H. Juang, S. E. Levinson, and M. M. Sondhi, “Maximum likelihood estimation for multivariate mixture observations of Markov chains,” IEEE Trans. Informat. Theory, vol. IT-32, no. 2, pp. 307-309, Mar. 1986.
[29]rshtein, “Robust Parametric Modeling of Durations in Hidden Markov Models,” Speech and Audio Processing, IEEE Transactions on, Volume: 4 Issue: 3, pp. 240 -242 , May 1996.
[30]Ramachandrula, S. Thippur, “Connected phoneme HMMs with implicit duration modelling for better speech recognition,” Information, Communications and Signal Processing, 1997. ICICS., Proceedings of 1997 International Conference on, pp. 1024-1028 vol.2, 1997.
[31]P. Ramesh, J.G. Wilpon, “Modeling state durations in hidden Markov models for automatic speech recognition,” Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on , Volume: 1 , pp. 381-384 vol.1, 1992.
[32]H. Y. Gu; C. Y. Tseng; L. S. Lee, “Isolated-utterance speech recognition using hidden Markov models with bounded state durations,” Signal Processing, IEEE Transactions on, Volume: 39 Issue: 8 , pp. 1743 -1752, Aug. 1991.
[33]Wen-Shuo Chang and Chin-Teng Lin “Development of Principal Space HMM Algorithm and Its Application to Continuous Mandarin Digits Recognition,” National Chiao Tung University Department of Electrical and Control Engineering College of Electrical Engineering Dissertation of Master,2001
[34]張智星, Matlab程式設計與應用,清蔚科技出版社,1999.
[35]楊廷 江高飛, PC Matlab入門與實例應用,?眳p資訊,1993.
[36]鄭錦聰, Matlab程式設計基礎篇,全華科技圖書股份有限公司,1992.
[37]F. Jelinek, “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp. 532-536, Apr. 1976.
[38]R. Bakis, “Continuous speech word recognition via centi-second acoustic states,” in Proc. ASA Meeting (Washington, DC), Apr. 1976.
[39]陳松琳,”以類神經網路為架構之語音辨識系統,”國立中山大學電機系碩士論文,2002.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.135.213.214
論文開放下載的時間是 校外不公開

Your IP address is 3.135.213.214
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code