Responsive image
博碩士論文 etd-0825110-171559 詳細資訊
Title page for etd-0825110-171559
論文名稱
Title
基於經驗模態分解之噪音強健性自動語音辨識
Empirical Mode Decomposition for Noise-Robust Automatic Speech Recognition
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
44
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2010-07-28
繳交日期
Date of Submission
2010-08-25
關鍵字
Keywords
語音辨識、噪音強健性、經驗模態分解
noise robustness, empirical mode decomposition, speech recognition
統計
Statistics
本論文已被瀏覽 5652 次,被下載 0
The thesis/dissertation has been browsed 5652 times, has been downloaded 0 times.
中文摘要
在這篇論文裡,會提出一個以經驗模態分解(EMD) 為基礎的新穎技術並對噪音強健性自動語音辨識系統做測試。EMD是一種概括傅立葉分析(Fourier analysis) 且用
於處理非線性和非平穩時間的函數,在我們的情況下就是處理語音特徵序列。對數能量維度的特徵向量(log energy feature)做前置處理時,我們會使用從EMD分析中所得到的本質模態函數(IMF),正弦函數是本質模態函數的一種特殊情況。我們將提出來的方法以Aurora 2.0跟Aurora 3.0語料庫做測試。我們在Aurora 2.0語料庫中,不匹配的條件情況下(mismatched tasks)(乾淨語料訓練)得到44.9%相對於基本結果(baseline)的進步率。在Aurora 3.0語料庫中,高度不匹配的條件情況下(high-mismatch tasks)得到49.5%相對於基本結果的進步率。這些實驗結果顯示我們提出的方法會有很大的進步。
Abstract
In this thesis, a novel technique based on the empirical mode decomposition (EMD) methodology
is proposed and examined for the noise-robustness of automatic speech recognition systems. The EMD analysis is a generalization of the Fourier analysis for processing nonlinear and non-stationary time functions, in our case, the speech feature sequences. We use the intrinsic mode functions (IMF), which include the sinusoidal functions as special cases,
obtained from the EMD analysis in the post-processing of the log energy feature. We evaluate
the proposed method on Aurora 2.0 and Aurora 3.0 databases. On Aurora 2.0, we obtain a 44.9% overall relative improvement over the baseline for the mismatched (clean-training) tasks. The results show an overall improvement of 49.5% over the baseline for Aurora 3.0 on the high-mismatch tasks. It shows that our proposed method leads to significant improvement.
目次 Table of Contents
List of Tables iii
List of Figures iv
誌謝vi
Chapter 1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2 Related Works 3
2.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Common Techniques of Noise Robustness . . . . . . . . . . . . . . . . . . . 6
2.2.1 Spectral Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 MVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.1 Cubic Spline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3.2 Number of Unknowns . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.3 Number of Knowns . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.4 The Two Extra Constraints . . . . . . . . . . . . . . . . . . . . . . . 9
Chapter 3 Methods 10
3.1 EMD and Fourier Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
i
3.3 Spline Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4 The Properties of EMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Features Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Chapter 4 Experiments 20
4.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.1 Aurora 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1.2 Aurora 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.1 Applying the EMD-Base Post-Processing to the Log-Energy Feature
Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4.2 The EMD-Base Post-Processing . . . . . . . . . . . . . . . . . . . . 25
4.4.3 Comparison the Results of Subtracting Different Numbers of IMFs . 27
4.4.4 Subtracting a Dynamic Number of IMFs . . . . . . . . . . . . . . . . 29
Chapter 5 Conclusion and Future Works 32
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
參考文獻 References
[1] S. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE
Transactions on Acoustics, Speech and Signal Processing, vol. 27, no. 2, pp. 113–120,
1979.
[2] A. Berstein and I. Shallom, “An hypothesizedWiener filtering approach to noisy speech
recognition,” in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing,
pp. 913–916, 1991.
[3] W. Zhu and D. O’Shaughnessy, “Incorporating frequency masking filtering in a standard
MFCC feature extraction algorithm,” in Proc. IEEE Intl. Conf. on Signal Processing,
pp. 617–620, 2004.
[4] B. Strope and A. Alwan, “A model of dynamic auditory perception and its application
to robust word recognition,” IEEE transactions on Speech and Audio Processing, vol. 5,
no. 5, pp. 451–464, 1997.
[5] S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Transactions
on Acoustics, Speech and Signal Processing, vol. 29, no. 2, pp. 254–272, 1981.
[6] O. Viikki, D. Bye, and K. Laurila, “A recursive feature vector normalization approach
for robust speech recognition in noise,” in Proc. IEEE Intl. Conf. on Acoustics, Speech,
and Signal Processing, pp. 733–736, 1998.
[7] A. de La Torre, A. Peinado, J. Segura, J. Perez-Cordoba, M. Benitez, and A. Rubio,
“Histogram equalization of speech representation for robust speech recognition,” IEEE
Transactions on Speech and Audio Processing, vol. 13, no. 3, pp. 355–366, 2005.
[8] N. Huang, Z. Shen, S. Long, M. Wu, H. Shih, Q. Zheng, N. Yen, C. Tung, and H. Liu,
“The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary
time series analysis,” Proceeding of the Royal Society of London Series A–
Mathematical Physical and Engineering Sciences, vol. 454, pp. 903–995, 1998.
[9] C. Chen and J. Bilmes, “MVA processing of speech features,” IEEE Transactions on
Audio, Speech, and Language Processing, vol. 15, no. 1, pp. 257–270, 2007.
[10] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic
word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.
[11] S. McKinley and M. Levine, “Cubic spline interpolation,” Student Projects in Linear
Algebra, College of the Redwood, vol. 20, 2006.
[12] H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech,” Journal of the
Acoustical Society of America, vol. 87, no. 4, pp. 1738–1752, 1990.
[13] D. Pearce and H. Hirsch, “The AURORA experimental framework for the performance
evaluation of speech recognition systems under noisy conditions,” in ICSA ITRW
ASR2000, September 2000.
[14] Motorola Au/374/01, “Small vocabulary evaluation: Baseline mel-cepstrum performances
with speech endpoints,” October 2001.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.224.30.118
論文開放下載的時間是 校外不公開

Your IP address is 18.224.30.118
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code