國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,DSP-Based非特定語言關鍵詞檢索與辨識系統,DSP-Based non-Language specific Keyword Retrieval and Recognition System

論文名稱 Title	DSP-Based非特定語言關鍵詞檢索與辨識系統 DSP-Based non-Language specific Keyword Retrieval and Recognition System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	93 學年度第 2 學期 The spring semester of Academic Year 93	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	133
研究生 Author	林炳豪 Bing-Hau Lin
指導教授 Advisor	陳遵立 Tzuen-Lih Chern
召集委員 Convenor	吳永春 Yung-Chun Wu
口試委員 Advisory Committee	黃金請, 鄭志強, 高一智 Chin-Ching Huang; Chih-Chiang Cheng; I-Chih Kao
口試日期 Date of Exam	2005-06-25	繳交日期 Date of Submission	2005-07-11
關鍵字 Keywords	定點數位訊號處理器、關鍵詞辨識、關鍵詞檢索 fixed point DSP, keyword Retrieval, keyword recognition
統計 Statistics	本論文已被瀏覽 5694 次，被下載 12 次 The thesis/dissertation has been browsed 5694 times, has been downloaded 12 times.

中文摘要
本論文中，在PC平台與數位訊號處理器平台，建立一語音關鍵詞辨識系統與語音關鍵詞檢索系統。系統當中的關鍵詞與描述語句沒有語言上以及字數的限制，並且不需作語音模型訓練，系統詞庫具有可擴充性而不需要重新訓練語音模型。先在PC平台上去建立這樣的系統，並且將程式移植到定點運算的DSP板，在語音的訊號處理過程當中，會使用到大量的數學函式運算，要將其達到即時的效果顯現，才能讓系統實用化，並且定點運算訊號處理器在成本上相對於浮點有其優勢存在，讓系統可以更貼近使用者。經實驗測試結果，語音關鍵詞辨識與關鍵詞檢索系統都有不錯之辨識率與執行效率。
Abstract
In this thesis, the PC base and DSP base speech keyword retrieval and recognition systems could work. The keywords and describing sentences will not have the limit of word length and could be any languages. Besides, training speech models is not needed anymore. It means that the database gets its expansibility without training speech models again. We can establish the system on the PC base, and calculate the program with fixed-point DSP board. In the processing of speech signal, lots of mathematical functions will be required. We must reach its immediately effect, so that the system could be useful. In addition, compared with floating point, the fixed point DSP cost much less; it makes the system nearer to users. After being tested by experiments, the speech keyword retrieval and recognition system got great recognition and efficiency.

目次 Table of Contents
第一章序論 1 1.1 前言 1 1.2 研究動機與目標 2 1.3 語音關鍵詞檢索說明 3 1.4 語音關鍵詞辨識說明 4 1.5 論文章節說明 4 第二章系統架構 5 2.1 關鍵詞檢索系統簡介 5 2.2 需要訓練模型關鍵詞檢索系統架構 5 2.2.1 理論簡介 5 2.2.2 系統特色 6 2.3 不需要訓練模型關鍵詞檢索系統架構 7 2.3.1 理論簡介 7 2.3.2 系統特色 8 2.4 關鍵詞辨識系統簡介 9 2.5 需要訓練模型關鍵詞辨識系統架構 9 2.5.1 理論簡介 9 2.5.2 系統特色 10 2.6 不需訓練模型關鍵詞辨識系統架構 10 2.6.1 理論簡介 10 2.6.2 系統特色 12 2.7 本研究採取之架構 12 第三章語音訊號擷取與前置處理 13 3.1 語音訊號處理 13 3.2 語音前置處理 14 3.3 去除直流偏壓 15 3.4 音框切割 16 3.5 端點偵測 17 3.5.1 端點偵測演算法 17 3.5.2 端點偵測法相關參數簡介 19 3.5.2.1 能量平方和參數 19 3.5.2.2 越零率參數 20 3.5.2.3 熵 21 3.5.3 端點偵測法 23 3.5.3.1 能量曲線判別法 23 3.5.3.2 R-S端點偵測法 25 3.6 預先強化 29 3.7 漢明視窗 30 第四章特徵參數萃取 32 4.1 線性預測倒頻譜參數 32 4.1.1 LPC概論 32 4.2 梅爾倒頻譜參數 34 4.2.1 快速傅立葉轉換 35 4.2.2 梅爾頻譜 36 4.2.3 梅爾通道能量 38 4.2.4 對數能量計算 40 4.2.5 離散餘弦轉換 40 4.3 小波轉換係數 41 4.3.1 小波理論轉換簡介 41 4.3.2 離散小波轉換 42 4.4 其他強化特徵參數方法 44 4.4.1 對數能量參數 44 4.4.2 轉移倒頻譜參數 45 4.4.3 二階差分參數 46 4.5 本系統之特徵參數組合 47 第五章樣式比對 49 5.1 語音辨識之樣本比對 49 5.2 動態規劃演算法 50 5.3 動態時間校準演算法 52 5.4 一階動態規劃演算法 53 5.5 一階動態演算法用在關鍵詞檢索與辨識 56 5.6 校準函數限制條件 58 5.6.1 搜尋路徑 59 5.6.2 整體搜尋範圍限制 60 5.6.3 步數正規化 62 5.6.4 局部限制條件 62 第六章非關鍵詞拒絕 63 6.1 非關鍵詞拒絕目的 63 6.2 不需訓練語音模型的非關鍵詞拒絕系統 64 6.2.1 尋找最佳相似比值門檻 66 6.2.2 過小相似比值臨界值配合失真量臨界值 68 6.2.3 過大相似比值臨界值配合失真量臨界值 70 第七章硬體架構 72 7.1 PC BASE 72 7.1.1 關鍵詞檢索系統 72 7.1.2 關鍵詞辨識系統 73 7.2 DSP BASE 75 7.2.1 DSP之發展與簡介 75 7.2.2 DSP之特點 76 7.2.3 DSP架構 77 7.2.4 DSP的應用 79 7.2.5 ADSP-BF533 EZ-KIT Lite系統簡介 80 7.2.6 DSP系統發展所提供資源簡介 82 7.2.7 DSP BASE之關鍵詞辨識系統 83 第八章實驗結果 85 8.1 實驗環境說明 85 8.1.1 硬體規格 85 8.1.2 軟體環境 85 8.1.3 系統參數 86 8.2 實驗方法與測試樣本說明 86 8.2.1 實驗方法 86 8.2.2 測試樣本說明 87 8.3 實驗數據結果 88 8.3.1 關鍵詞檢索系統 88 8.3.2 關鍵詞辨識系統 91 8.3.3 非關鍵詞拒絕系統 93 8.3.4 具非關鍵詞拒絕系統之關鍵詞辨識系統 98 8.4 ADSP-BF533 EZ-KIT Lite效能 98 第九章結論與未來展望 100 9.1結論 100 9.2未來展望 100

參考文獻 References
[1] ”ADSP-21161 DSP Hardware Reference”, Analog Devices Corp., 2002. [2]”ADSP-21161N EZ-KIT LITE Evaluation System Manual”, Analog Devices Corp., 2002. [3] Chen B. ,” Speech Information Retrieval for Mandarin Chinese - Syllable-Based Indexing Features, Statistical Retrieval Models and Improved Approaches”, National Taiwan University Department of Computer Science and information Engineering Dissertation of Master, 2000. [4] C.H. Min, “A Study On The Keyword Spotting System”, National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 1995. [5] C. Myers and L.R. Rabiner, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” IEEE Trans on ASSP, Vol.28, No.6, pp 623-635, Dec. 1980. [6] F. Jelinek, “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp 532-536, Apr. 1976. [7] Wu F.C., “Small-Vocabulary Speaker-Independent Mandarin Word Recognition Based on Syllable Templates,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1993. [8] H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition,” IEEE Trans on ASSP, Vol.26, pp 43-49, Feb. 1978. [9] H.Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans Acoustics Speech Signal Proc. , vol.32 ,no2 , pp263-271 , Arril 1984. [10] Liu H.H., “Implementation of MFCC Processor Design for Speech Feature Extraction,” Master Thesis, Department of Electrical Engineering National Cheng Kung University, Taiwan, R.O.C., June, 2001. [11] Jong H.J., “Improvement of Keyword Spotting Method,” National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 1998. [12] Hung Y.C., “Robust Multi-keyword Spotting of Telephone Speech Using Stochastic Matching,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1997. [13] Qiu J.H., “The Adaptive Keyword Spotting System,” National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 2001. [14] Chen K.H., “Using Dynamic Programming Bayesian Neural Network for Mandarin Consonant Recognition,” National Cheng Kung University Department of Electrical Engineering Dissertation of Master,1992. [15] L.R. Rabiner, R.W. Schafer, “Digital Processing of Speech Signals,” Bell Laboratories, Incorporated, 1978. [16] B.H. Juang, “Fundamentals of Speech Recognition,” AT&T, 1993. [17] L.R. Rabiner, C.H. Lee, “A frame-synchronous network search algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, “Vol. 37, Issue 11, Nov., 1989. [18] S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans on ASSP, Vol.28, No.4, pp357-366, Aug. 1980. [19] Shiau J.L., “On the Use of Prosodic Information for Mandarin Word Recognition,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1996. [20] S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, V. Valtchev, P. Woodland, “The HTK BOOK”（for HTK Version 3.1）,December 2001. [21] Tseng M.S., “A Singer Independent Karaoke Song Recognizer,’ National Cheng Kung University Department of Electrical Engineering Dissertation of Master,1996. [22] ‘VisualDSP++ 3.0 Getting Started Guide for Blackfin Family DSPs”, Analog Devices Corp., 2002.4. [23] ”VisualDSP++ 3.0 Getting Started Guide for SHARC Family DSPs,” , Analog Devices Corp.,2002.5. [24] Tsai W.H., “Automatic Identification and Indexing of Chinese Multilingual Spoken Messages,” Doctor Thesis, Department of Electrical Engineering National Chiao Tung University, Taiwan, R.O.C., May 2001. [25] Huang X.D. and Lee K.F., “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” IEEE Trans on ASSP,1991 [26]謝依蘭,”語音訊號數位處理,”松崗電腦圖書資料股份有限公司,1991. [27]陳明熒,”PC電腦語音辨認實作,”旗標出版社,1992. [28]黃偉倫,凌明煌,薛沛宏,曾中浩,林俊良,”MS-Windows 多媒體程式設計—概念與實務,”松崗電腦圖書資料股份有限公司,1995年1月. [29]王仁華,”人機語音通信,”聯經出版事業公司,1995. [30]連國珍,”數位信號處理簡介,”茂昌圖書有限公司, pp.140-168,1995. [31]陳芯暉,”應用因素樣本串接方式於連續語音關鍵詞辨認,”國立成功大學資訊工程研究所碩士論文,1995. [32]林傳生,李佩謙,”數位訊號處理器（DSP）簡介與應用,”全華科技圖書股份有限公司,1996. [33]顏國郎,”應用鑑別性語句驗證於電話語音關鍵詞辨識之研究,”國立成功大學資訊工程研究所碩士論文,1997. [34]林建良,”應用模糊隱藏式馬可夫模型於對話系統中語言型態之模擬,”國立成功大學資訊工程研究所碩士論文,1998. [35]王明習,”資料結構,”全華科技圖書股份有限公司,1998. [36]陳科旭,”使用右文相關聲韻母模式之國語關鍵詞辨認,”國立交通大學電信工程系碩士論文,1999. [37]楊哲堯,”應用部分樣本樹於會話語音之文句驗證與錯誤補償,”國立成功大學資訊工程研究所碩士論文,1999. [38]林宸生,”數位信號—影像與語音處理,”全華科技圖書股份有限公司,pp3_1-3_30,1999. [39]Mickey Williams, “Teach Yourself Visual C++ 6,” 第三波資訊股份有限公司,1999. [40]謝宏坤,”國立台灣科技大學電機工程系碩士論文,” 語音說明中搜尋任意定義之關鍵詞的研究,2000. [41]方士豪,”雜訊及通道環境下語音辨認技術之研究,”國立台灣大學電信工程學研究所碩士論文,2000. [42]林明宗,“Windows NT環境下PC-Based即時控制架構之發展與應用”,國立中正大學機械系碩士論文, 2000。 [43]劉佑德,”多關鍵詞文句之辨認方法,”國立清華大學電機工程學系碩士論文,2000. [44]林輝彥,”應用聽覺效應之模型於噪音環境中語音辨識,”國立成功大學資訊工程系碩士論文,2000. [45]陳順入,”應用叢集驗證法則於決策樹建立與語音辨識,”國立成功大學資訊工程研究所碩士論文,2000. [46]黃銘崇,”不特定語者語詞辨識系統之特徵設計,”國立中山大學電機工程研究所碩士論文,2001. [47]張展嘉,”自由音節解碼在全文資訊檢索及語句辨識之應用,”國立清華大學資訊工程學系碩士論文,2002. [48]葉志強,”音叉頻譜在母音辨識上之應用,”國立成功大學應用數學研究所碩士論文,2002. [49]莊益瑞,吳權威,”C++程式設計實務,”?眳p資訊股份有限公司,2002. [50]吳逸賢,吳目誠,”精彩C++ Builder 6程式設計,”知城數位科技股份有限公司,2002. [51]陳松琳,”以類神經網路為架構之語音辨識系統,”國立中山大學電機系碩士論文,2002. [52]楊鎮光,”Visual Basic與語音辨識,”文魁資訊股份有限公司,2002. [53]余明興,吳明哲,黃世陽,黃豐隆,紀旺松,潘能煌,”Borland C++ Builder 6程式設計經典”,pp14_2-14_40,2002年11月. [54]謝芳易,”結合隱藏式馬可夫模型與一階動態規劃演算法之連續語音辨識系統,”國立中山大學電機系碩士論文,2003. [55]徐嘉宏,”DSP BASED之手寫數字與形狀辨識系統,”國立中山大學電機工程研究所碩士論文,2003. [56]莊博雅,” DSP BASE之語音關鍵詞檢索與關鍵詞辨識系統,”國立中山大學電機工程研究所碩士論文,2004. [57]鄭凱文,”以DSP為基礎人類頭部追蹤系統之研發,”國立中山大學電機工程研究所碩士論文,2004. [58]顏銘祥,”以DSP為架構的不特定語詞即時語者辨識系統,”國立中山大學電機工程研究所碩士論文,2004. [59]古詩峰,”基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統,”長庚大學電機工程研究所碩士論文,2002.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.131.110.169 論文開放下載的時間是校外不公開 Your IP address is 3.131.110.169 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS