國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,資料驅動能量特徵調整於雜訊性語音辨識,Data-Driven Rescaling of Energy Features for Noisy Speech Recognition

論文名稱 Title	資料驅動能量特徵調整於雜訊性語音辨識 Data-Driven Rescaling of Energy Features for Noisy Speech Recognition
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	43
研究生 Author	許妙鸞 Miau Luan
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	吳宗憲 Chung-Hsien Wu
口試委員 Advisory Committee	王新民 Hsin-Min Wang
口試日期 Date of Exam	2012-06-29	繳交日期 Date of Submission	2012-07-18
關鍵字 Keywords	Teager能量、能量重刻、資料驅動、語音活動偵測、強健性語音辨識 voice activity detection, energy rescale, Teager energy, noise-robust speech recognition, data-driven
統計 Statistics	本論文已被瀏覽 5663 次，被下載 460 次 The thesis/dissertation has been browsed 5663 times, has been downloaded 460 times.

中文摘要
本論文主要探討能量特徵重刻技術對雜訊性語音辨識的影響。語音辨識系統常會受到環境雜訊的影響而導致辨識效能低落，使得語音強健性技術長久以來被視為一個非常重要的研究課題。然而過去有不少研究指出語音能量特徵對於雜訊環境下的語音辨識影響甚鉅，因此我們提出資料驅動能量特徵重刻法(Data-driven energy features rescaling, DEFR) 對能量特徵作進一步的調整。此方法分為語音活動偵測、分段對數尺度函數以及參數搜尋法三個部分。目的是希望能夠減少雜訊與乾淨語音特徵值的差異性。我們將此方法應用在梅爾倒頻譜參數與Teager 能量倒頻譜參數上，並且和均值消去法與均值正規化法作比較。我們採用Aurora 2.0 與Aurora 3.0 語料庫來驗證此方法之成效，由實驗結果證實本論文所提出之方法，能夠有效地提升辨識率。
Abstract
In this paper, we investigate rescaling of energy features for noise-robust speech recognition. The performance of the speech recognition system will degrade very quickly by the influence of environmental noise. As a result, speech robustness technique has become an important research issue for a long time. However, many studies have pointed out that the impact of speech recognition under the noisy environment is enormous. Therefore, we proposed the data-driven energy features rescaling (DEFR) to adjust the features. The method is divided into three parts, that are voice activity detection (VAD), piecewise log rescaling function and parameter searching algorithm. The purpose is to reduce the difference of noisy and clean speech features. We apply this method on Mel-frequency cepstral coefficients (MFCC) and Teager energy cepstral coefficients (TECC), and we compare the proposed method with mean subtraction (MS) and mean and variance normalization (MVN). We use the Aurora 2.0 and Aurora 3.0 databases to evaluate the performance. From the experimental results, we proved that the proposed method can effectively improve the recognition accuracy.

目次 Table of Contents
List of Tables viii List of Figures ix Chapter 1 介紹1 1.1 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 2 特徵參數擷取4 2.1 梅爾倒頻譜參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Teager能量倒頻譜參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Gamma-tone濾波器. . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Teager能量評估法. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3 能量特徵重刻11 3.1 資料驅動能量特徵重刻法. . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1.1 低頻譜之語音活動偵測. . . . . . . . . . . . . . . . . . . . . . . . 12 3.1.2 分段對數尺度函數. . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.3 參數搜尋法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapter 4 實驗18 4.1 辨識系統設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 實驗語料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.1 Aurora 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2.2 Aurora 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 效能評估方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.4 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Chapter 5 結論與未來展望28 5.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

參考文獻 References
[1] D. Dimitriadis, P. Maragos, and A. Potamianos, “On the Effects of Filterbank Design and Energy Computation on Robust Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 1504–1516, August 2011. [2] W. Zhu and D. O’Shaughnessy, “Log-energy dynamic range normalization for robust speech recognition,” in proceedings of 2005 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Philadelphia, vol. 1, pp. 245–249, March 2005. [3] T.-H. Hwang and S.-C. Chang, “Energy contour enhancement for noisy speech recognition,” in proceedings of 4th International Symposium on Chinese Spoken Language Processing (ISCSLP 2004), Hong Kong, pp. 249 – 252, December 2004. [4] S. M. Ahadi, H. Sheikhzadeh, R. L. Brennan, and G. Freeman, “An energy normalization scheme for improved robustness in speech recognition,” in proceedings of 8th International Conference on Spoken Language Processing(ICSLP 2004), Korea, October 2004. [5] R. Chengalvarayan, “Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition.,” in proceedings of 6th European Conference on Speech Communication and Technology(EUROSPEECH 1999), Hungary, September 1999. [6] 陳鴻彬, ”On the Study of Energy-Based Speech Feature Normalization and Application to Voice Activity Detection,”國立臺灣師範大學資訊工程學系碩士論文, 2007. 31 [7] 杜文祥, ”Study on the Voice Activity Detection Techniques for Robust Speech Feature Extraction,”國立暨南國際大學電機工程學系碩士論文, 2007. [8] C. Garreton, N. B. Yoma, and M. Torres, “Channel Robust Feature Transformation Based on Filter-Bank Energy Filtering,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, pp. 1082 –1086, July 2010. [9] X. Huang, “Minimizing speaker variation effects for speaker-independent speech recognition,” in proceedings of the workshop on Speech and Natural Language, pp. 191–196, 1992. [10] D. Y. Zhao andW. B. Kleijn, “HMM-Based Gain Modeling for Enhancement of Speech in Noise,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp. 882–892, March 2007. [11] J. Ming, R. Srinivasan, and D. Crookes, “A Corpus-Based Approach to Speech Enhancement From Nonstationary Noise,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, pp. 822–836, May 2011. [12] K. Ngo, A. Spriet, M. Moonen, J. Wouters, and S. H. Jensen, “A combined multichannel Wiener filter-based noise reduction and dynamic range compression in hearing aids,” Signal Processing, vol. 92, pp. 417–426, Feb 2012. [13] R. Gomez, A. Lee, H. Saruwatari, and K. Shikano, “Robust speech recognition with spectral subtraction in low SNR,” in proceedings of 8th International Conference on Spoken Language Processing (ICSLP 2004), Korea, October 2004. [14] D. Macho and Y. M. Cheng, “SNR-dependent waveform processing for improving the robustness of ASR front-end,” in proceedings of 2001 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), vol. 1, pp. 305–308, 2001. [15] C. Cerisara, S. Demange, and J. P. Haton, “On noise masking for automatic missing data speech recognition: A survey and discussion,” Computer Speech & Language, vol. 21, no. 3, pp. 443–457, 2007. 32 [16] H. Veisi and H. Sameti, “The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition,” The 17th International Conference on Digital Signal Processing(DSP 2011), Greece, vol. 21, pp. 36–53, July 2011. [17] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, vol. 25, pp. 133–147, August 1998. [18] T. Claes, I. Dologlou, L. ten Bosch, and D. V. Compernolle, “A novel feature transformation for vocal tract length normalization in automatic speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 6, pp. 549–557, November 1998. [19] Y. Obuchi and R. M. Stern, “Normalization of time-derivative parameters using histogram equalization,” in proceedings of 8th European Conference on Speech Communication and Technology(EUROSPEECH 2003), Switzerland, September 2003. [20] H. Misra, S. Ikbal, S. Sivadas, and H. Bourlard, “Multi-resolution spectral entropy feature for robust ASR,” in proceedings of 2005 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), Philadelphia, vol. 1, pp. 253–256, March 2005. [21] P. Raghavan, R. Renomeron, C. Che, D.-S. Yuk, and J. Flanagan, “Speech recognition in a reverberant environment using matched filter array (MFA) processing and linguistic-tree maximum likelihood linear regression (LT-MLLR) adaptation,” in proceedings of 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing( ICASSP), Phoenix, vol. 2, pp. 777–780, March 1999. [22] J. L. Gauvain and C. H. Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 2, pp. 291 –298, April 1994. [23] H. Veisi and H. Sameti, “An improved parallel model combination method for noisy speech recognition,” in proceedings of 2009 IEEE Workshop on Automatic Speech Recognition & Understanding(ASRU 2009), Italy, pp. 237–242, December 2009. [24] M. Slaney, “An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank,” Apple Computer Perception Group Tech Rep, no. 35, 1993.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0718112-111800.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS