國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,基於經驗模態分解之噪音強健性自動語音辨識,Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition

論文名稱 Title	基於經驗模態分解之噪音強健性自動語音辨識 Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	98 學年度第 2 學期 The spring semester of Academic Year 98	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	44
研究生 Author	吳國豪 Kuo-hao Wu
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	王新民 Hsin-Min Wang
口試委員 Advisory Committee	葉瑞峰, 吳宗憲 Jui-Feng Yeh; Chung-Hsien Wu
口試日期 Date of Exam	2010-07-28	繳交日期 Date of Submission	2010-08-25
關鍵字 Keywords	語音辨識、噪音強健性、經驗模態分解 noise robustness, empirical mode decomposition, speech recognition
統計 Statistics	本論文已被瀏覽 5652 次，被下載 0 次 The thesis/dissertation has been browsed 5652 times, has been downloaded 0 times.

中文摘要
在這篇論文裡，會提出一個以經驗模態分解(EMD) 為基礎的新穎技術並對噪音強健性自動語音辨識系統做測試。EMD是一種概括傅立葉分析(Fourier analysis) 且用於處理非線性和非平穩時間的函數，在我們的情況下就是處理語音特徵序列。對數能量維度的特徵向量(log energy feature)做前置處理時，我們會使用從EMD分析中所得到的本質模態函數(IMF)，正弦函數是本質模態函數的一種特殊情況。我們將提出來的方法以Aurora 2.0跟Aurora 3.0語料庫做測試。我們在Aurora 2.0語料庫中，不匹配的條件情況下(mismatched tasks)(乾淨語料訓練)得到44.9%相對於基本結果(baseline)的進步率。在Aurora 3.0語料庫中，高度不匹配的條件情況下(high-mismatch tasks)得到49.5%相對於基本結果的進步率。這些實驗結果顯示我們提出的方法會有很大的進步。
Abstract
In this thesis, a novel technique based on the empirical mode decomposition (EMD) methodology is proposed and examined for the noise-robustness of automatic speech recognition systems. The EMD analysis is a generalization of the Fourier analysis for processing nonlinear and non-stationary time functions, in our case, the speech feature sequences. We use the intrinsic mode functions (IMF), which include the sinusoidal functions as special cases, obtained from the EMD analysis in the post-processing of the log energy feature. We evaluate the proposed method on Aurora 2.0 and Aurora 3.0 databases. On Aurora 2.0, we obtain a 44.9% overall relative improvement over the baseline for the mismatched (clean-training) tasks. The results show an overall improvement of 49.5% over the baseline for Aurora 3.0 on the high-mismatch tasks. It shows that our proposed method leads to significant improvement.

目次 Table of Contents
List of Tables iii List of Figures iv 誌謝vi Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2 Related Works 3 2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Common Techniques of Noise Robustness . . . . . . . . . . . . . . . . . . . 6 2.2.1 Spectral Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 MVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 Cubic Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.2 Number of Unknowns . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.3 Number of Knowns . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.4 The Two Extra Constraints . . . . . . . . . . . . . . . . . . . . . . . 9 Chapter 3 Methods 10 3.1 EMD and Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 i 3.3 Spline Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.4 The Properties of EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.5 Features Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Chapter 4 Experiments 20 4.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.1 Aurora 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.1.2 Aurora 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.1 Applying the EMD-Base Post-Processing to the Log-Energy Feature Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.2 The EMD-Base Post-Processing . . . . . . . . . . . . . . . . . . . . 25 4.4.3 Comparison the Results of Subtracting Different Numbers of IMFs . 27 4.4.4 Subtracting a Dynamic Number of IMFs . . . . . . . . . . . . . . . . 29 Chapter 5 Conclusion and Future Works 32 5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

參考文獻 References
[1] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 27, no. 2, pp. 113–120, 1979. [2] A. Berstein and I. Shallom, “An hypothesizedWiener filtering approach to noisy speech recognition,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 913–916, 1991. [3] W. Zhu and D. O’Shaughnessy, “Incorporating frequency masking filtering in a standard MFCC feature extraction algorithm,” in Proc. IEEE Intl. Conf. on Signal Processing, pp. 617–620, 2004. [4] B. Strope and A. Alwan, “A model of dynamic auditory perception and its application to robust word recognition,” IEEE transactions on Speech and Audio Processing, vol. 5, no. 5, pp. 451–464, 1997. [5] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 29, no. 2, pp. 254–272, 1981. [6] O. Viikki, D. Bye, and K. Laurila, “A recursive feature vector normalization approach for robust speech recognition in noise,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, pp. 733–736, 1998. [7] A. de La Torre, A. Peinado, J. Segura, J. Perez-Cordoba, M. Benitez, and A. Rubio, “Histogram equalization of speech representation for robust speech recognition,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 355–366, 2005. [8] N. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N. Yen, C. Tung, and H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis,” Proceeding of the Royal Society of London Series A– Mathematical Physical and Engineering Sciences, vol. 454, pp. 903–995, 1998. [9] C. Chen and J. Bilmes, “MVA processing of speech features,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 257–270, 2007. [10] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980. [11] S. McKinley and M. Levine, “Cubic spline interpolation,” Student Projects in Linear Algebra, College of the Redwood, vol. 20, 2006. [12] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738–1752, 1990. [13] D. Pearce and H. Hirsch, “The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” in ICSA ITRW ASR2000, September 2000. [14] Motorola Au/374/01, “Small vocabulary evaluation: Baseline mel-cepstrum performances with speech endpoints,” October 2001.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.224.30.118 論文開放下載的時間是校外不公開 Your IP address is 18.224.30.118 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS