Responsive image
博碩士論文 etd-0718112-111800 詳細資訊
Title page for etd-0718112-111800
論文名稱
Title
資料驅動能量特徵調整於雜訊性語音辨識
Data-Driven Rescaling of Energy Features for Noisy Speech Recognition
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
43
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-06-29
繳交日期
Date of Submission
2012-07-18
關鍵字
Keywords
Teager能量、能量重刻、資料驅動、語音活動偵測、強健性語音辨識
voice activity detection, energy rescale, Teager energy, noise-robust speech recognition, data-driven
統計
Statistics
本論文已被瀏覽 5663 次,被下載 460
The thesis/dissertation has been browsed 5663 times, has been downloaded 460 times.
中文摘要
本論文主要探討能量特徵重刻技術對雜訊性語音辨識的影響。語音辨識系統常會受
到環境雜訊的影響而導致辨識效能低落,使得語音強健性技術長久以來被視為一個
非常重要的研究課題。然而過去有不少研究指出語音能量特徵對於雜訊環境下的語
音辨識影響甚鉅,因此我們提出資料驅動能量特徵重刻法(Data-driven energy features
rescaling, DEFR) 對能量特徵作進一步的調整。此方法分為語音活動偵測、分段對數
尺度函數以及參數搜尋法三個部分。目的是希望能夠減少雜訊與乾淨語音特徵值的差
異性。我們將此方法應用在梅爾倒頻譜參數與Teager 能量倒頻譜參數上,並且和均
值消去法與均值正規化法作比較。我們採用Aurora 2.0 與Aurora 3.0 語料庫來驗證此
方法之成效,由實驗結果證實本論文所提出之方法,能夠有效地提升辨識率。
Abstract
In this paper, we investigate rescaling of energy features for noise-robust speech recognition.
The performance of the speech recognition system will degrade very quickly by the influence
of environmental noise. As a result, speech robustness technique has become an important
research issue for a long time. However, many studies have pointed out that the impact of
speech recognition under the noisy environment is enormous. Therefore, we proposed the
data-driven energy features rescaling (DEFR) to adjust the features. The method is divided
into three parts, that are voice activity detection (VAD), piecewise log rescaling function and
parameter searching algorithm. The purpose is to reduce the difference of noisy and clean
speech features. We apply this method on Mel-frequency cepstral coefficients (MFCC) and
Teager energy cepstral coefficients (TECC), and we compare the proposed method with mean
subtraction (MS) and mean and variance normalization (MVN). We use the Aurora 2.0 and
Aurora 3.0 databases to evaluate the performance. From the experimental results, we proved
that the proposed method can effectively improve the recognition accuracy.
目次 Table of Contents
List of Tables viii
List of Figures ix
Chapter 1 介紹1
1.1 研究動機與目的. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2 特徵參數擷取4
2.1 梅爾倒頻譜參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Teager能量倒頻譜參數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Gamma-tone濾波器. . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Teager能量評估法. . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 3 能量特徵重刻11
3.1 資料驅動能量特徵重刻法. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.1.1 低頻譜之語音活動偵測. . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 分段對數尺度函數. . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.3 參數搜尋法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Chapter 4 實驗18
4.1 辨識系統設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 實驗語料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Aurora 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Aurora 3.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 效能評估方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 實驗結果. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Chapter 5 結論與未來展望28
5.1 結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
參考文獻 References
[1] D. Dimitriadis, P. Maragos, and A. Potamianos, “On the Effects of Filterbank Design
and Energy Computation on Robust Speech Recognition,” IEEE Transactions on Audio,
Speech, and Language Processing, vol. 19, pp. 1504–1516, August 2011.
[2] W. Zhu and D. O’Shaughnessy, “Log-energy dynamic range normalization for robust
speech recognition,” in proceedings of 2005 IEEE International Conference on Acoustics,
Speech and Signal Processing(ICASSP), Philadelphia, vol. 1, pp. 245–249, March
2005.
[3] T.-H. Hwang and S.-C. Chang, “Energy contour enhancement for noisy speech recognition,”
in proceedings of 4th International Symposium on Chinese Spoken Language
Processing (ISCSLP 2004), Hong Kong, pp. 249 – 252, December 2004.
[4] S. M. Ahadi, H. Sheikhzadeh, R. L. Brennan, and G. Freeman, “An energy normalization
scheme for improved robustness in speech recognition,” in proceedings of 8th
International Conference on Spoken Language Processing(ICSLP 2004), Korea, October
2004.
[5] R. Chengalvarayan, “Robust energy normalization using speech/nonspeech discriminator
for German connected digit recognition.,” in proceedings of 6th European Conference
on Speech Communication and Technology(EUROSPEECH 1999), Hungary,
September 1999.
[6] 陳鴻彬, ”On the Study of Energy-Based Speech Feature Normalization and Application
to Voice Activity Detection,”國立臺灣師範大學資訊工程學系碩士論文, 2007.
31
[7] 杜文祥, ”Study on the Voice Activity Detection Techniques for Robust Speech Feature
Extraction,”國立暨南國際大學電機工程學系碩士論文, 2007.
[8] C. Garreton, N. B. Yoma, and M. Torres, “Channel Robust Feature Transformation
Based on Filter-Bank Energy Filtering,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 18, pp. 1082 –1086, July 2010.
[9] X. Huang, “Minimizing speaker variation effects for speaker-independent speech recognition,”
in proceedings of the workshop on Speech and Natural Language, pp. 191–196,
1992.
[10] D. Y. Zhao andW. B. Kleijn, “HMM-Based Gain Modeling for Enhancement of Speech
in Noise,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15,
pp. 882–892, March 2007.
[11] J. Ming, R. Srinivasan, and D. Crookes, “A Corpus-Based Approach to Speech Enhancement
From Nonstationary Noise,” IEEE Transactions on Audio, Speech, and Language
Processing, vol. 19, pp. 822–836, May 2011.
[12] K. Ngo, A. Spriet, M. Moonen, J. Wouters, and S. H. Jensen, “A combined multichannel
Wiener filter-based noise reduction and dynamic range compression in hearing
aids,” Signal Processing, vol. 92, pp. 417–426, Feb 2012.
[13] R. Gomez, A. Lee, H. Saruwatari, and K. Shikano, “Robust speech recognition with
spectral subtraction in low SNR,” in proceedings of 8th International Conference on
Spoken Language Processing (ICSLP 2004), Korea, October 2004.
[14] D. Macho and Y. M. Cheng, “SNR-dependent waveform processing for improving the
robustness of ASR front-end,” in proceedings of 2001 IEEE International Conference
on Acoustics, Speech and Signal Processing(ICASSP), vol. 1, pp. 305–308, 2001.
[15] C. Cerisara, S. Demange, and J. P. Haton, “On noise masking for automatic missing data
speech recognition: A survey and discussion,” Computer Speech & Language, vol. 21,
no. 3, pp. 443–457, 2007.
32
[16] H. Veisi and H. Sameti, “The integration of principal component analysis and cepstral
mean subtraction in parallel model combination for robust speech recognition,” The 17th
International Conference on Digital Signal Processing(DSP 2011), Greece, vol. 21,
pp. 36–53, July 2011.
[17] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for
noise robust speech recognition,” Speech Communication, vol. 25, pp. 133–147, August
1998.
[18] T. Claes, I. Dologlou, L. ten Bosch, and D. V. Compernolle, “A novel feature transformation
for vocal tract length normalization in automatic speech recognition,” IEEE
Transactions on Speech and Audio Processing, vol. 6, pp. 549–557, November 1998.
[19] Y. Obuchi and R. M. Stern, “Normalization of time-derivative parameters using histogram
equalization,” in proceedings of 8th European Conference on Speech Communication
and Technology(EUROSPEECH 2003), Switzerland, September 2003.
[20] H. Misra, S. Ikbal, S. Sivadas, and H. Bourlard, “Multi-resolution spectral entropy feature
for robust ASR,” in proceedings of 2005 IEEE International Conference on Acoustics,
Speech and Signal Processing(ICASSP), Philadelphia, vol. 1, pp. 253–256, March
2005.
[21] P. Raghavan, R. Renomeron, C. Che, D.-S. Yuk, and J. Flanagan, “Speech recognition
in a reverberant environment using matched filter array (MFA) processing and
linguistic-tree maximum likelihood linear regression (LT-MLLR) adaptation,” in proceedings
of 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing(
ICASSP), Phoenix, vol. 2, pp. 777–780, March 1999.
[22] J. L. Gauvain and C. H. Lee, “Maximum a posteriori estimation for multivariate Gaussian
mixture observations of Markov chains,” IEEE Transactions on Audio, Speech, and
Language Processing, vol. 2, pp. 291 –298, April 1994.
[23] H. Veisi and H. Sameti, “An improved parallel model combination method for noisy
speech recognition,” in proceedings of 2009 IEEE Workshop on Automatic Speech
Recognition & Understanding(ASRU 2009), Italy, pp. 237–242, December 2009.
[24] M. Slaney, “An Efficient Implementation of the Patterson-Holdsworth Auditory Filter
Bank,” Apple Computer Perception Group Tech Rep, no. 35, 1993.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code