Responsive image
博碩士論文 etd-0801114-191423 詳細資訊
Title page for etd-0801114-191423
論文名稱
Title
稀疏表示法應用於噪音強健性語音辨識
Noise Robust Speech Recognition using Sparse Representations
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
39
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-07-24
繳交日期
Date of Submission
2014-09-09
關鍵字
Keywords
語音辨識、噪音強健性、範本字典、特徵增強、稀疏表示法
sparse representation, noise robustness, exemplar dictionary, feature enhancement, speech recognition
統計
Statistics
本論文已被瀏覽 5682 次,被下載 1019
The thesis/dissertation has been browsed 5682 times, has been downloaded 1019 times.
中文摘要
本篇論文旨在利用以範本為基礎的稀疏表示法來增加噪音強健性。在這篇論文中,我們會提到如何將一連串的語音訊號表示成若干語音範本的稀疏線性組合,這裡的語音範本可以是訓練資料中時間對頻率的某一片段,我們選取大量的頻譜片段來建構語音範本字典作為線性組合的元素。在我們把受噪音影響的語音寫成範本的線性組合以後,我們可以利用線性組合係數進行特徵增強,或者可以與範本本身所屬之音素狀態序列資訊求出各音框對音素狀態的事後機率,最後再利用維特比演算法解碼產生辨識結果。我們將實驗做在AURORA-2語料庫上,結果顯示,雖然高訊噪比時效果不佳,但是在低訊噪比的情況下可以獲得不錯的結果,可見此框架對於噪音強健性甚有幫助。
Abstract
This paper aimed at improving the noise robustness of automatic speech recognition by using a novel technique, called exemplar-based sparse representation. In this paper, we will first describe how to model speech to a linear combination of exemplars. An exemplar is a time-frequency segment from training data, we generate a large amount of exemplars to construct an exemplar dictionary, which is the atoms for the linear combination. After we model the test speech to a linear combination of exemplar dictionary, we can do feature enhancement, directly, or get the posterior probability of frames given phone state by using the coefficients of linear combination together with the exemplars and their phonetic information. Finally, we use the Viterbi search algorithm to get the recognition result. We evaluate the performance by the experiments on AURORA-2 corpus, the results show that although there are no improvement on high SNRs, we can see a huge improvement on low SNRs, we believe that this framework can give us a way for noise robust speech recognition task.
目次 Table of Contents
論文審定書 i
中文摘要 ii
英文摘要 iii
目錄 iv
圖次 vi
表次 vii
Chapter 1 研究背景與動機 1
Chapter 2 文獻回顧 3
2.1 以範本為基礎的稀疏表示法應用在連續數字辨識 4
2.2 利用稀疏表示法於遺失資料插補應用在噪音強健性語音辨識 5
2.3 以範本為基礎的稀疏表示特徵:從TIMIT到大字彙連續語音辨識 6
Chapter 3 研究目標與方法 9
3.1 利用稀疏表示法表示含噪音語音 9
3.2 活化向量的求法 12
3.3 利用平移窗達到連續性 12
3.4 利用窗活化矩陣進行特徵增強 13
3.5 利用活化向量計算狀態相似度 15
Chapter 4 實驗 16
4.1 語料庫介紹 16
4.2 實驗設定與流程 17
4.2.1 基準辨識器 17
4.2.2 範本字典的建立 19
4.2.3 特徵增強 20
4.2.4 狀態相似度 20
4.2.5 加速疊代式的計算 21
4.3 實驗結果與比較 22
Chapter 5 總結與未來展望 27
參考文獻 28
參考文獻 References
[1] J. Lim and A. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67, pp. 1586 – 1604, Dec. 1979.
[2] Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 32, pp. 1109 – 1121, Dec. 1984.
[3] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society, Series B, vol. 39, no. 1, pp. 1 – 38, 1977.
[4] S. Furui, “Cepstral analysis technique for automatic speaker verification,” Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 29, pp. 254 – 272, Apr. 1981.
[5] O. Viikki and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communcation, vol. 25, pp. 133 – 147, Aug. 1998.
[6] A. de la Torre, A. Peinado, J. Segura, J. Perez-Cordoba, M. Benitez, and A. Rubio, “Histogram equalization of speech representation for robust speech recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 13, pp. 355 – 366, May 2005.
[7] H. Hermansky and N. Morgan, “RASTA processing of speech,” Speech and Audio Processing, IEEE Transactions on, vol. 2, pp. 578 – 589, Oct. 1994.
[8] C.-P. Chen and J. A. Bilmes, “MVA processing of speech features,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, pp. 257 – 270, Jan. 2007.
[9] P. J. Moreno, B. Raj, and R. M. Stern, “Data-driven environmental compensation for speech recognition: A unified approach,” Speech Communication, pp. 267 – 285, 1998.
[10] J. luc Gauvain and C. hui Lee, “Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains,” IEEE Transactions on Speech and Audio Processing, vol. 2, pp. 291 – 298, 1994.
[11] M. Gales and S. Young, “Cepstral parameter compensation for HMM recognition in noise,” Speech Communication, vol. 12, no. 3, pp. 231 – 239, 1993.
[12] Y. Gong, “A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition,” Speech and Audio Processing, IEEE Transactions on, vol. 13, pp. 975 – 983, Sep. 2005.
[13] T. N. Sainath, S. Maskey, D. Kanevsky, B. Ramabhadran, D. Nahamoo, and J. Hirschberg, “Sparse representations for text categorization,” in INTERSPEECH (T. Kobayashi, K. Hirose, and S. Nakamura, eds.), pp. 2266 – 2269, ISCA, 2010.
[14] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” Pattern analysis and machine intelligence, IEEE Transactions on, vol. 31, pp. 210 – 227, Feb. 2009.
[15] J. F. Gemmeke, L. ten Bosch, L.Boves, and B. Cranen, “Using sparse representations for exemplar based continuous digit recognition,” in Proc. EUSIPCO, (Glasgow, Scotland), pp. 1755 – 1759, Aug. 24 - 28 2009.
[16] J. F. Gemmeke and B. Cranen, “Using sparse representations for missing data imputation in noise robust speech recognition,” in Proc. EUSIPCO, (Lausanne, Switzerland), Aug. 25 - 29 2008.
[17] J. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-based sparse representations for noise robust automatic speech recognition,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, pp. 2067 – 2080, Sep. 2011.
[18] T. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo, and D. Kanevsky, “Exemplarbased sparse representation features: From TIMIT to LVCSR,” Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, pp. 2598 – 2613, Nov. 2011.
[19] H. Nyquist, “Certain topics in telegraph transmission theory,” Transactions of the AIEE, vol. 47, pp. 617 – 644, 1928.
[20] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” Signal Processing, IEEE Transactions on, vol. 41, pp. 3397 – 3415, Dec. 1993.
[21] S. Chen, D. Donoho, and M. Saunders, “Atomic Decomposition by Basis Pursuit,” Technical Report 479, Department of Statistics, Stanford University, May 1995.
[22] E. J. Candes and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theor., vol. 51, pp. 4203 – 4215, Dec. 2005.
[23] D. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, pp. 1289 – 1306, Apr. 2006.
[24] E. J. Cand`es, “Compressive sampling,” in Proceedings oh the International Congress of Mathematicians: Madrid, August 22-30, 2006: invited lectures, pp. 1433 – 1452, 2006.
[25] J. Gemmeke, H. Van Hamme, B. Cranen, and L. Boves, “Compressive sensing for missing data imputation in noise robust speech recognition,” Selected Topics in Signal Processing, IEEE Journal of, vol. 4, pp. 272 – 287, Apr. 2010.
[26] MATLAB, version 7.11.0.584 (R2010b). Natick, Massachusetts: The MathWorks Inc., 2010.
[27] “MATLAB GPU computing support for NVIDIA CUDA-enabled GPUs.” http://www.mathworks.com/discovery/matlab-gpu.html.
[28] “NVIDIA CUDA Zone.” https://developer.nvidia.com/cuda-zone.
[29] “GPUmat - a C/C++ GPU engine for MATLAB based on NVIDIA CUDA.” http://sourceforge.net/projects/gpumat.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code