Responsive image
博碩士論文 etd-0812117-095430 詳細資訊
Title page for etd-0812117-095430
論文名稱
Title
應用於語音辨識之高效率雜訊偵測與消除方法
Efficient Noise Detection and Elimination Method for Voice Recognition Applications
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
103
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2017-09-07
繳交日期
Date of Submission
2017-09-12
關鍵字
Keywords
語音辨識、雜訊消除、雜訊偵測、可加性高斯白雜訊、品質提升
additive Gaussian white noise (AWGN), quality improvement, voice recognition, noise elimination, noise detection
統計
Statistics
本論文已被瀏覽 5643 次,被下載 10
The thesis/dissertation has been browsed 5643 times, has been downloaded 10 times.
中文摘要
近年來,智慧家庭與智慧型手機大廠紛紛推出了語音助理,藉由辨識人類的語言來進行相關指令的反饋,使得裝置的操作更加便利。此外,語音辨識系統在車用電子上也扮演著重要角色,正在駕駛的人們無法騰出雙手來進行導航、空調等車用設備的操作,但藉著語音下達指令,即可在行駛間安全地完成相關設定。
然而,由於製程技術的微小化,使得語音辨識電路更容易受到外在雜訊的干擾,人們所下達的語音指令極有可能因此無法正確辨識。此問題將導致裝置的操作失效,甚至可能產生裝置操作上的安全疑慮。故如何準確偵測語音訊號中的雜訊,並進一步消除雜訊至關重要,而這也是本論文的研究目標。
在本論文中我們分別以理想與非理想的可加性高斯白雜訊(Additive White Gaussian Noise, AWGN)來模擬語音訊號受到雜訊干擾的種種情況,以及其對語音辨識成功率的影響。相較於文獻中的既有方法,我們也提出更為準確的雜訊偵測技術,其偵測成功率可高達99%以上。
另一方面,我們也探討了現有文獻方法對於消除雜訊的有效性。我們發現對於受到理想AWGN影響的語音訊號而言,過去方法僅能將辨識率提升4.69%,而非理想AWGN的部分則提升8.15%。然而,將本論文所提出之偵測方法結合現有文獻中的方法進而開發一嶄新雜訊消除技術後,我們發現理想AWGN的語音辨識率可提升12%,而非理想AWGN的部分則可提升42.11%。
現有文獻雜訊消除方法的一大問題為其運算相當複雜,其消除雜訊所需時間為原語音訊號時間的7.4倍。為了降低雜訊修復的運算複雜度,本論文更提出一以線性近似為基礎之創新雜訊消除技術,其執行時間僅為文獻方法的6.5%。對受到理想AWGN的語音訊號而言,其辨識率依然能提升12%,而對於非理想AWGN的語音訊號,辨識率可提升53.31%。
Abstract
In recent years, voice recognition has been widely used in many smart home and smart phone applications. By recognizing human being’s voices, more convenient operations are enabled. Voice recognition also plays an important role in automotive electronics. Human beings’ can thus safely control navigation systems or air conditioners during their driving.
However, with the feature size shrinking of the semiconductor manufacturing technology, voice recognition circuits are more easily to be affected by external noises. This may result in unrecognition of voices, and therefore invalidates the system operations, and even raises safety concerns. As a result, it is very critical to accurately detect noises in voice signals, and accordingly eliminate them. This is also the research objective of this thesis.
In this thesis we employ the ideal and non-ideal additive Gaussian white noise (AWGN) to simulate various types of noises in voice signals, and accordingly analyze their impacts on the recognition rate. Compared with the developed methods in the literature, we propose a more accurate noise detection technique. Our experimental results show that the proposed technique can achieve more than 99% accuracy.
On the other hand, we also investigate the effectiveness of the previous methods in the literature for eliminating noises. We find that for ideal AWGN, the recognition rate can be enhanced by only 4.69%, while for non-ideal AWGN, 8.15% enhancement is achieved. However, by integrating the proposed noise detection technique with the previous method to develop a new noise elimination technique, we find that 12% enhancement on the recognition rate for ideal AWGN can be achieved, while the enhancement for the non-ideal AWGN is 42.11%.
One major problem for the previous noise elimination methods is that the computation is quite complicated where the required execution time is 7.4 times the length of the target voice signal. In order to reduce the required computation complexity for eliminating noises, we further propose a linear approximation based novel noise elimination technique. The execution time of the proposed technique is only 6.5% of that for the previous method. For ideal AWGN, the proposed technique can enhance the recognition rate by 12%. As for non-ideal AWGN, the enhancement is 53.31%.
目次 Table of Contents
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
目錄 v
圖目錄 viii
表目錄 xiv
第一章 概述 1
1.1 研究動機 1
1.2 貢獻 2
1.3 論文章節概要 4
第二章 相關背景及文獻 5
2.1 語音辨識之容誤特性 5
2.2 可加性高斯白雜訊(Additive White Gaussian Noise, AWGN) 6
2.3 現有文獻對於Impulsive Noise之修復 7
2.4 向量自回歸模型(Vector Autoregressive model, VAR) 7
2.5 卡爾曼濾波器(Kalman Filter) 8
第三章 VAR模型與卡爾曼濾波器於AWGN之消除 10
3.1 實驗相關設定 10
3.1.1 環境及雜訊類型 10
3.1.2 實驗評估標準—語音辨識率 11
3.1.3 加入雜訊後之語音辨識率 11
3.2 演算法流程 12
3.3 實驗結果分析與探討 14
3.4 修復效果不佳之探討 17
3.5 執行時間分析 20
第四章 基於語音頻率特性提出偵測AWGN之方法 21
4.1 語音頻率特性分析及偵測方法之開發 21
4.1.1 離群值偵測 21
4.1.2 轉折點規則 23
4.1.3 波形走向分析 25
4.1.4 忽略微幅的振幅變動 29
4.2 Forward與Backward的偵測 30
4.3 偵測流程 31
第五章 AWGN之偵測與消除 32
5.1 所提偵測方法及VAR模型於AWGN之偵測與消除 32
5.1.1 VAR模型預測協助AWGN偵測 33
5.1.2 應用所提出偵測方法及VAR模型進行修復之方法 33
5.1.3 實驗流程 34
5.1.4 實驗結果分析與探討 34
5.1.5 偵測準確率 42
5.1.6 執行時間分析 43
5.2 所提偵測方法及線性近似法於AWGN之偵測與消除 44
5.2.1 語音頻率特性分析與線性預測方法開發 44
5.2.2 實驗結果分析與探討 47
5.2.3 辨識率比較 56
5.2.4 偵測準確率 58
5.2.5 執行時間分析與比較 59
第六章 非理想AWGN之偵測與消除 61
6.1 模擬非理想雜訊 61
6.2 非理想AWGN與理想AWGN之語音辨識結果比較 63
6.3 VAR模型與卡爾曼濾波器於非理想AWGN之消除 65
6.3.1 實驗結果分析與探討 65
6.3.2 應用於理想與非理想AWGN消除之比較 67
6.4 所提之偵測方法與VAR模型於非理想AWGN之消除 70
6.4.1 實驗結果分析與探討 70
6.4.2 應用於理想與非理想AWGN消除之比較 73
6.5 所提之偵測方法與線性近似法於非理想AWGN之修復 76
6.5.1 實驗結果分析與探討 76
6.5.2 應用於理想與非理想AWGN消除之比較 78
第七章 硬體實現 82
7.1 硬體實現目的及其架構 82
7.2 腳位設定與運作流程 82
7.3 成本及效能 84
第八章 總結與未來展望 86
第九章 參考文獻 87
參考文獻 References
[1] T. Hsu, “An Initial Study on English Continuous Speech Recognition”, M.A. thesis, National Taiwan Normal University, Taiwan, 2007.
[2] M. A. Breuer, S. K. Gupta and T. M. Mak, “Defect and error-tolerance in the presence of massive numbers of defects,” IEEE Design & Test of Computers, vol 21, no. 3, pp. 216-227, 2004.
[3] M. Niedźwiecki and M. Ciołek. “Elimination of impulsive disturbances from stereo audio recordings.” IEEE European Signal Processing Conf., 2014, pp.66-70
[4] M. Niedźwiecki, M. Ciołek and K. Cisowski, “Elimination of Impulsive Disturbances From Stereo Audio Recordings Using Vector Autoregressive Modeling and Variable-order Kalman Filtering,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 23, no. 6, pp. 970-981, June 2015.
[5] Andy Bateman, Digital Communications: Design for the Real Word, Prentice Hall, 1998, pp. 89.
[6] R. Tong, Y. Zhou, L. Zhang, G. Bao and Z. Ye, “A robust time-frequency decomposition model for suppression of mixed gaussian-impulse noise in audio signals,” IEEE/ACM Trans. on Audio, Speech, and Language Processing, vol. 23, no. 1, pp. 69-79, Jan. 2015.
[7] 陳旭昇,「時間序列分析,總體經濟與財務金融之應用二版」,台北市:東華書局,2013。
[8] E. Zivot, J. Wang, “Modeling Financial Time Series with S-PLUS”, Springer-Verlag New York, 2006, pp. 385-387
[9] A. Adler, V. Emiya, M. G. Jafari, M. Elad, R. Gribonval and M. D. Plumbley, “Audio Inpainting,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 922-932, March 2012.
[10] C. E. Shannon, “Communication in the presence of noise.” Proceedings of the IRE 37.1, pp. 10-21, 1949.
[11] T. Yoshida, “The rewritable MiniDisc system,” Proceedings of the IEEE, vol. 82, no. 10, pp. 1492-1500, Oct 1994.
[12] Behzad Munir. (2012, March 10). Voice Fundamentals–Human Speech Frequency [Online]. Available: http://www.uoverip.com/voice-fundamentals-human-speech-frequency/ Aug. 1, 2017
[13] H. Traunmüller, and E. Anders. “The frequency range of the voice fundamental in the speech of male and female adults.” Manuscript, Department of Linguistics, University of Stockholm (1994).
[14] S. Canazza, G. De Poli and G. A. Mian, “Restoration of Audio Documents by Means of Extended Kalman Filter,” IEEE Trans. on Audio, Speech, and Language Processing, vol. 18, no. 6, pp. 1107-1115, 2010
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code