Responsive image
博碩士論文 etd-0207113-161555 詳細資訊
Title page for etd-0207113-161555
論文名稱
Title
高斯混合模型應用在自動語音辨識特徵補償於四種語言噪音性數字語料之評估
Gaussian Mixture Model with Application to Automatic Speech Recognition Feature Compensation in the Evaluation of Noisy Digital Corpora of Four Languages
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
48
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-01-30
繳交日期
Date of Submission
2013-02-07
關鍵字
Keywords
噪音強健性、高斯混合模型、最小均值誤差、AURORA 3.0
GMM, Noise Robustness, MMSE, AURORA 3.0
統計
Statistics
本論文已被瀏覽 5703 次,被下載 1734
The thesis/dissertation has been browsed 5703 times, has been downloaded 1734 times.
中文摘要
傳統上的噪音強健性方法,常用最小均值誤差(Minimum Mean Square Error, MMSE)的參數轉換方法求取乾淨的特徵參數,為了保持原參數的連續性及平滑性本論文採用一種基於高斯混合模型(GMM)的噪音強健性方法,以去除噪音的方式取代估算乾淨語料。我們假設平行語料中較低噪音者為乾淨語料,藉著訓練好的高斯混合模型找到該噪音對應的平均向量,並利用均值消去法的觀念求取語音特徵參數受噪音影響的偏移量,最後用最小均值誤差估計出平行語料噪音特徵參數距離來改善原本較高噪音的語音特徵參數。在實驗方面本論文採用AURORA 3.0語料庫做為我們噪音強健性的效能評估基準,測試語句會先經由訓練好的噪音分類器來判斷屬於何種噪音並選定相對應的GMM轉換模型,產生此模型下的噪音平均向量,而後即可將噪音平均向量經過不等權重的線性組合產生噪音的特徵參數,最後使用簡單的加減法即可達成減噪的效果。
Abstract
  According to the traditional methods of noise robustness, the Minimum Mean Square Error(MMSE) feature transformation method was usually used to estimate clean feature. In order to maintain the smoothness and continuity from original feature, we use the method of noise robustness which is based on Gaussian Mixture Model to remove the noise instead of estimating the clean feature. Our method assumed that the lower noisy corpus of parallel corpora is the clean one. We find the mean vector corresponding to the noise by using the trained Gaussian Mixture Model, and use the concept of MMSE to calculate the margin of a noise effect. Finally we estimated the distance between noise feature of parallel corpura by MMSE and subtracted it from the higher noise feature. We use AURURA 3.0 corpus by experiment to estimate noise robustness performance. Test data will be classified by the trained noise classfier, and select the corresponding GMM mapping model, estimated the mean noise vectors under this model and generated the noise feature through a linear combination of unequal weight. Finally, it is easy to remove the noise by subtraction to make noise reduction.
目次 Table of Contents
Acknowledgments c
List of Tables iii
List of Figures iv
Chapter 1 簡介1
1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究背景與文獻回顧. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 論文結構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Chapter 2 相關研究回顧5
2.1 高斯混合模型簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 單一高斯機率密度函數的參數估測. . . . . . . . . . . . . . . . . 5
2.1.2 高斯混合密度函數的參數估測. . . . . . . . . . . . . . . . . . . . 6
2.2 K-means分群. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 建立高斯混合模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 噪音分類器(Classfier) . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 轉換模型(Mapping Model) . . . . . . . . . . . . . . . . . . . . . . 7
2.4 最大化事後機率. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 最小均方誤差. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3 研究方法9
3.1 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 4 實驗11
4.1 實驗語料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.1 GMM補償之實驗設定. . . . . . . . . . . . . . . . . . . . . . . . 16
4.2.2 分類正確率對GMM補償之影響實驗設定. . . . . . . . . . . . . . 22
4.3 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.3.1 GMM Mapping補償實驗結果—分類器. . . . . . . . . . . . . . . 22
4.3.2 分類正確率對GMM Mapping補償之影響實驗結果. . . . . . . . . 22
4.4 討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.1 實驗於各噪音環境之效果比較. . . . . . . . . . . . . . . . . . . . 24
4.4.2 實驗於三種匹配狀態之效果比較. . . . . . . . . . . . . . . . . . 26
4.4.3 實驗於四種語言及AURORA 2.0之效果比較. . . . . . . . . . . . 27
4.4.4 分類正確率對實驗之影響. . . . . . . . . . . . . . . . . . . . . . 30
4.5 討論—分類器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5.1 分類正確率在各語言之差異. . . . . . . . . . . . . . . . . . . . . 32
4.5.2 分類正確率在各匹配狀況之差異. . . . . . . . . . . . . . . . . . 33
Chapter 5 總結與未來展望35
List of Tables
4.1 AURORA3.0 Danish語料庫內容. . . . . . . . . . . . . . . . . . . . . . . 12
4.2 AURORA3.0 Finnish語料庫內容. . . . . . . . . . . . . . . . . . . . . . . 13
4.3 AURORA3.0 German語料庫內容. . . . . . . . . . . . . . . . . . . . . . . 15
4.4 AURORA3.0 Spanish語料庫內容. . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Danish噪音分類器與轉換模型實驗設定. . . . . . . . . . . . . . . . . . . 18
4.6 Finnish噪音分類器與轉換模型實驗設定. . . . . . . . . . . . . . . . . . . 19
4.7 German噪音分類器與轉換模型實驗設定. . . . . . . . . . . . . . . . . . 20
4.8 Spanish噪音分類器與轉換模型實驗設定. . . . . . . . . . . . . . . . . . . 21
4.9 AURORA 3.0四種語言baseline辨識結果. . . . . . . . . . . . . . . . . . . 23
4.10 AURORA 3.0四種語言GMM Mapping補償後辨識結果. . . . . . . . . . . 23
4.13 在AURORA 3.0四種語言理想分類GMM Mapping相對改善率. . . . . . . 23
4.11 GMM Mapping補償在AURORA 3.0四種語言相對改善率. . . . . . . . . 24
4.12 AURORA 3.0四種語言理想分類GMM Mapping補償後辨識結果. . . . . 24
4.14 AURORA 3.0各語言各種噪音環境辨識結果. . . . . . . . . . . . . . . . . 25
4.15 分類器與理想分類實驗結果比較表. . . . . . . . . . . . . . . . . . . . . 32
4.16 噪音分類器於AURORA 3.0各語言分類結果. . . . . . . . . . . . . . . . . 33
List of Figures
3.1 實驗架構,x = x11; :::; x1d; :::; xk1; :::; xkd為HF麥克風收錄語音之mfcc特
徵參數,y = y11; :::; y1d; :::; yk1; :::; ykd為CT麥克風收錄語音之mfcc特徵
參數,z為x及y以音框為單位兩兩合併之串連特徵參數,^x為分類器判
定的CT或HF測試資料經轉換補償之結果。. . . . . . . . . . . . . . . . . 10
4.1 AURORA 3.0匹配良好情況下基線各噪音環境的辨識率比較圖. . . . . . 26
4.2 AURORA 3.0基線與GMM Mapping後辨識率比較圖. . . . . . . . . . . . 26
4.3 AURORA 3.0四種語言中度不匹配狀況訓練資料分佈. . . . . . . . . . . 28
4.4 GMM mapping相對改善率AURORA 3.0四種語言及AURORA 2.0比較圖. 29
4.5 語音特性與相對改善率關係圖,改善率高者原先較受噪音影響. . . . . 29
4.6 分類器與理想分類實驗結果比較圖. . . . . . . . . . . . . . . . . . . . . 31
參考文獻 References
[1] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE
Transactions on Audio, Speech, and Signal Processing, vol. 27, no. 2, pp. 113–120,
1979.
[2] A. D. Berstein and I. D. Shallom, “An hypothesized wiener filtering approach to noisy
speech recognition,” in Proceedings of 1991 IEEE International Conference on Acoustics,
Speech, and Signal Processing(ICASSP), vol. 2, pp. 913–916, Apr. 1991.
[3] C. Cerisara, S. Demange, and J. P. Haton, “On noise masking for automatic missing data
speech recognition: A survey and discussion.,” Computer Speech & Language, vol. 21,
no. 3, pp. 443–457, 2007.
[4] D. Macho and Y. M. Cheng, “SNR-dependent waveform processing for improving the
robustness of ASR front-end,” in proceedings of 2001 IEEE International Conference
on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 305–308, 2001.
[5] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions
on Acoustics, Speech, and Signal Processing, vol. 29, pp. 254–272, Apr. 1981.
[6] O. Viikki, D. Bye, and K. Laurila, “A recursive feature vector normalization approach
for robust speech recognition in noise,” in proceedings of 1998 IEEE International Conference
on Acoustics, Speech and Signal Processing(ICASSP), vol. 2, pp. 733 –736,
Nokia Research Center, May 1998.
[7] A′ ngel de la Torre, A. M. Peinado, J. C. Segura, J. L. Pe′rez-Co′rdoba, M. C. Ben′ıtez, and
A. J. Rubio, “Histogram equalization of speech representation for robust speech recogni37
tion,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 355–366,
2005.
[8] K.-H. Wu, “Empirical mode decomposition for noise-robust automatic speech recognition,”
Master’s thesis, Department of Computer Science and Engineering National Sun
Yat-sen University, 2010.
[9] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N.-C. Yen, C. C.
Tung, and H. H. Liu, “The empirical mode decomposition and the hilbert spectrum for
nonlinear and non-stationary time series analysis,” in Proceedings of the Royal Society
of London Series A Mathematical Physical and Engineering Sciences, vol. 454, pp. 903
–995, 1998.
[10] S.-C. Chiou, “Auditory based modification of mfcc feature extraction for robust automatic
speech recognition,” Master’s thesis, Department of Computer Science and Engineering
National Sun Yat-sen University, 2009.
[11] M.-L. Hsu, “Data-driven rescaling of energy features for noisy speech recognition,”
Master’s thesis, Department of Computer Science and Engineering National Sun Yatsen
University, 2012.
[12] P.-F. Wu, “Voice command for google map,” Master’s thesis, Department of Computer
Science and Engineering National Sun Yat-sen University, 2012.
[13] D. suk Kim, A. Member, R. M. Kil, S. young Lee, and R. M, “auditory processing of
speech signals for robust speech recognition in real-world noisy environments,” IEEE
Transactions on Speech and Audio Processing, vol. 7, no. 1, pp. 55–69, 1999.
[14] H.-B. Chen, “On the study of energy-based speech feature normalization and application
to voice activity detection,” Master’s thesis, Dept. of Computer Science & Information
Engineering National Taiwan Normal University, 2007.
[15] Y.-C. Chen, “Abnormal pedestrian behavior analysis using trajectory features,” Master’s
thesis, Department of Computer Science & Information Engineering National Central
University, 2006.
[16] B.-F. Yeh, “Gaussian mixture model-based feature compensation with application to
noise-robust speech recognition,” Master’s thesis, Department of Computer Science and
Engineering National Sun Yat-sen University, 2012.
[17] H. Zen, Y. Nankaku, and K. Tokuda, “Continuous stochastic feature mapping based
on trajectory hmms,” IEEE Transactions on Audio, Speech and Language Processing,
vol. 19, no. 2, pp. 417–430, 2011.
[18] C.-P. Chen and J. A. Bilmes, “MVA processing of speech features.,” IEEE Transactions
on Audio, Speech and Language Processing, vol. 15, no. 1, pp. 257–270, 2007.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code