Responsive image
博碩士論文 etd-0706102-135328 詳細資訊
Title page for etd-0706102-135328
論文名稱
Title
以類神經網路為架構之語音辨識系統
The Speech Recognition System using Neural Networks
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
78
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2002-06-28
繳交日期
Date of Submission
2002-07-06
關鍵字
Keywords
倒傳遞演算法、類神經網路、語音辨識
neural network, speech recognition, backpropagation
統計
Statistics
本論文已被瀏覽 5702 次,被下載 4659
The thesis/dissertation has been browsed 5702 times, has been downloaded 4659 times.
中文摘要
本論文以倒傳遞類神經網路(BPNN : Backpropagation Neural Network)為架構設計一非特定語者之中文數字語音辨識系統,辨識率可達95%。當此系統應用於特定語者時,經適應修正後,更可使系統之辨識率高於99%。為能使系統轉移到數位處理器(DSP)平台,針對類神經網路模型,提出了神經元移除法則,利用此法則可減去約 數量之神經元,降低系統20%∼40%的記憶體需求,且系統之辨識率仍可達85%。在BPNN網路模型的輸出架構中,提出以二進位編碼之方式取代傳統一對一之架構,以增加系統可辨字彙之數量。對於語音訊號端點偵測,也提出另一有效之搜尋法則,不論雜訊干擾存在與否,不需複雜之運算,可有效定位出有聲段所在之處。
Abstract
This paper describes an isolated-word and speaker-independent Mandarin digit speech recognition system based on Backpropagation Neural Networks(BPNN). The recognition rate will achieve up to 95%. When the system was applied to a new user with adaptive modification method, the recognition rate will be higher than 99%. In order to implement the speech recognition system on Digital Signal Processors (DSP) we use a neuron-cancellation rule in accordance with BPNN. The system will cancel about 1/3 neurons and reduce 20%∼40% memory size under the rule. However, the recognition rate can still achiever up to 85%. For the output structure of the BPNN, we present a binary-code to supersede the one-to-one model. In addition, we use a new ideal about endpoint detection algorithm for the recoding signals. It can avoid disturbance without complex computations.
目次 Table of Contents
第一章 緒論
1.1 前言
1.2 研究背景
1.3 研究動機與目標
第二章 分析框處理
2.1 簡介
2.2 最常用的兩種分析框
2.3 固定寬度與變動寬度之分析框
第三章 端點偵測演算法
3.1 簡介
3.2 時域端點偵測法相關參數簡介
3.3 端點偵測法
第四章 特徵參數擷取
4.1 簡介
4.2 前置強波處理
4.3 倒頻譜參數
4.4 頻譜參數與倒頻譜參數之比較
第五章 動態時間校準演算法
5.1 簡介
5.2 動態時間校準法
5.3 語音樣板資料庫的建立
第六章 倒傳遞類神經網路模型
6.1 簡介
6.2 多層感知機之架構
6.3 倒傳遞演算法
6.4 辨識系統訓練方法
6.5 神經元移除法則
第七章 實驗方法
7.1 前言
7.2 語音樣本資料庫
7.3 實驗設計
第八章 實驗結果
8.1 端點偵測演算法之比較結果
8.2 特徵參數的選定
8.3 學習調整率 對系統學習效能的影響
8.4 非特定語者之中文數字辨識系統
8.5 倒傳遞類神經網路輸出架構之設計
8.6 神經元移除法則的應用
8.7 辨識演算法DTW與BPNN之比較
8.8 非特定語者之系統應用於特定語者之影響
第九章 結論與展望
9.1 結論
9.2 展望
參考文獻
參考文獻 References
[1] X.D. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” IEEE Trans on ASSP,1991
[2] P. Woodland, “Speech Recognition,” IEE, 1998.
[3] H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition,” IEEE Trans on ASSP, Vol.26, pp 43-49, Feb. 1978.
[4] C. Myers and L.R. Rabiner, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” IEEE Trans on ASSP, Vol.28, No.6, pp 623-635, Dec. 1980.
[5] D.P. Morgan and C.L. Scofield, Neural Networks and Speech Processing, Kluwer Academic, 1991.
[6] L.R. Rabiner, “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition,” IEEE Trans on ASSP, Vol.77, No.2, pp 257-286, Feb. 1989.
[7] L.R. Rabiner and B.H. Juang, Fundamentals of Speech Recognition, Prentice Hall, pp 200-232, 1993.
[8] F. Runstein and F. Violaro, “An Isolated-Word Speech Recognition System Using Neural Networks,” IEEE Trans on ASSP, pp 550-553, 1996.
[9] G.D. Wu and C.T. Lin, “A Recurrent Neural Fuzzy Network for Word Boundary Detection in Variable Noise-Level Environments,” IEEE Trans on System, Man, and Cybernetics, Vol. 31, No. 1, pp 84-97, Feb. 2001.
[10] L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signals, Prentice Hall, 1978.
[11] A. Hussain, S.A. Samad and L.B. Fah, “Endpoint Detection of Speech Signal using Neural Network,” IEEE Trans on ASSP, pp 271-274, 2000.
[12] L.F. Lamel and L.R. Rabiner, “An Improved Endpoint Detector for Isolated Word Recognition,” IEEE Trans on ASSP, Vol.29, No.4, pp 777-785, Aug. 1981.
[13] E. Keller, Fundamentals of Speech Synthesis and Speech Recognition Basic Concepts, State of the Art and Future Challenges, John Wiley and Sons, 1994.

[14] S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans on ASSP, Vol.28, No.4, pp357-366, Aug. 1980.
[15] J.R. Deller, J.G. Proakis and J.H.L. Hansen, Discrete-Time Processing of Speech Signals, Macmillan, 1993.
[16] J.N. Holmes, Speech Synthesis and Recognition, Van Nostrand Reinhold, 1988.
[17] 葉怡成, 類神經網路模式應用與實作, 儒林出版社, 1993.
[18] 中國科學技術大學生物醫學工程跨系委員會, 神經網路及其應用, 儒林出版社, 1993.
[19] H.A. Bourlard and N. Morgan, Connectionist Speech Recognition A Hybrid Approach, Kluwer Academic, 1994.
[20] M.T. Hagan, H.B. Demuth and M. Beale, Neural Network Design, PWS, 1996.
[21] J.S. Jang, C.T. Sun and E. Mizutani, Neuro-Funzzy and Soft Computing, Prentice Hall, 1997.
[22] K.J. Astrom, B. Wittenmark, Adaptive Control, 2nd, Addison Wesley, 1995.
[23] 蘇木春, 張孝德, 機器學習:類神經網路、模糊系統以及基因演算法則, 全華科技圖書, 1997.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code