國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,DSP BASE之語音關鍵詞檢索與辨識系統,DSP Based Speech keyword Retrieval and Recognition System

論文名稱 Title	DSP BASE之語音關鍵詞檢索與辨識系統 DSP Based Speech keyword Retrieval and Recognition System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	92 學年度第 2 學期 The spring semester of Academic Year 92	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	140
研究生 Author	莊博雅 Bo-Ya Juang
指導教授 Advisor	陳遵立 Tzuen-lih Chern
召集委員 Convenor	吳永春 Yung-Chun Wu
口試委員 Advisory Committee	黃金請, 高一智, 俞再鈞 Chin-Ching Huang; I-Chih Kao; Zai-Jun Yu
口試日期 Date of Exam	2004-07-10	繳交日期 Date of Submission	2004-07-27
關鍵字 Keywords	關鍵詞搜尋、關鍵詞辨識、數位訊號處理器 DSP, keyword Retrieval, keyword recognition
統計 Statistics	本論文已被瀏覽 5692 次，被下載 0 次 The thesis/dissertation has been browsed 5692 times, has been downloaded 0 times.

中文摘要
本論文中，以相同基本演算法，建立一語音關鍵詞辨識系統與語音關鍵詞檢索系統，分別在PC平台與數位訊號處理器平台上實現。此套架構不需作語音模型訓練，系統詞庫具有可擴充性而不需要重新訓練語音模型，且關鍵詞與描述語句皆無字數長短與語言的限制。語音前置處理方面，在去除直流偏壓，與音框切割後，採用R-S端點偵測法去除關鍵詞無聲段靜音，之後再加以預強調與漢明視窗處理，以方便接下來的特徵參數計算。在語音特徵參數方面，採用梅爾倒頻譜特徵參數十二維配合一階差量十二維之特徵參數組合。系統核心樣本比對部分，則採用改良式動態規劃法配合一階式演算法尋找最佳路徑。為使系統移轉到數位訊號處理器平台，非關鍵詞拒絕的判斷採用一最佳相似比值臨界值判斷法，可以所有關鍵詞設一相同臨界值，改良傳統使用最小失真量作非關鍵詞拒絕系統需要多組臨界值之缺點，以降低系統記憶體需求。經實驗測試結果，語音關鍵詞辨識與關鍵詞檢索系統都有不錯之辨識率與執行效率。
Abstract
This thesis established the DSP-based and PC-based system for speech keyword retrieval and recognition according to the same basic algorithm. This system does not need to train speech models, and the keywords and describing sentences do not put the limit of the number of words and could be any language. Before calculating the speech features, the speech signal need to be pre-processed. The pre-process includes DC bias removing, segment, Rabiner & Sambur end point detection, pre-emphasis, and windowing. About the speech features, the system used 12 degrees of Mel-Frequency cepstral coefficient and 12 degrees of delta coefficient to make a 24-degreed speech feature. The key point of the system is the process of pattern comparison. The system adopted dynamic time warping cooperating with one pass algorithm to improve the optimal process. In order to attain the DSP system, using an optimum likelihood ratio threshold to be the determine standard for not keyword rejection. All of the keywords use the same threshold in the method. It improves the original method which uses least differential to set up the threshold by reducing the requirement of ram. After testing in the experiments, the speech keyword retrieval and recognition system both have great recognition and efficiency.

目次 Table of Contents
中文摘要 Ⅰ 英文摘要 Ⅱ 目錄 Ⅲ 圖目錄 Ⅷ 表目錄 ⅩⅡ 第一章序論 1 1.1前言 1 1.2研究動機與目標 2 1.3語音關鍵詞搜尋說明 3 1.4語音關鍵詞辨識說明 3 1.5論文章節說明 4 第二章關鍵詞搜尋系統簡介 5 2.1需訓練語音模型架構 5 2.1.1理論簡介 5 2.1.2系統架構圖 5 2.1.3系統特色 6 2.2不需訓練語音模型架構 7 2.2.1理論簡介 7 2.2.1系統架構圖 7 2.2.3系統特色 8 第三章關鍵詞辨識系統簡介 10 3.1需訓練語音模型架構 10 3.1.1理論簡介 10 3.1.2系統架構圖 10 3.1.3系統特色 11 3.2不需訓練語音模型架構 11 3.2.1理論簡介 11 3.2.2系統架構圖 12 3.2.3系統特色 13 3.3本研究所採取的架構 13 第四章語音訊號擷取與前置處理 15 4.1語音訊號處理 15 4.2語音前置處理 16 4.3去除直流偏壓 17 4.4音框切割 18 4.5端點偵測 19 4.5.1端點偵測演算法 19 4.5.2端點偵測法相關參數簡介 20 4.5.2.1能量平方和參數 21 4.5.2.2越零率參數 21 4.5.2.3熵 23 4.5.3端點偵測法 25 4.5.3.1能量曲線判別法 25 4.5.3.2 R-S端點偵測法 27 4.5.3.3 EE端點偵測法 31 4.6預強調 33 4.7漢明視窗 34 第五章特徵參數萃取 36 5.1 線性預測倒頻譜係數 36 5.1.1 LPC概論 36 5.1.2自相關定理 38 5.1.3線性預測分析 39 5.1.4倒頻譜分析 41 5.2梅爾倒頻譜參數 42 5.2.1 快速傅立葉轉換 43 5.2.2 梅爾頻譜 45 5.2.3 梅爾通道能量 46 5.2.4對數能量的計算 48 5.2.5離散餘弦轉換 48 5.3其他強化特徵參數方法 49 5.3.1對數能量參數 49 5.3.2轉移倒頻譜參數 50 5.3.2二階差分參數 51 5.3.3帶通提升視窗 52 5.3.4去除通道效應 53 5.4本系統之特徵參數組合 54 第六章樣式比對 55 6.1語音辨識之樣本比對 55 6.2動態規劃演算法 56 6.3動態時間校準演算法 58 6.4一階動態規劃演算法 60 6.5一階動態演算法用在關鍵詞搜尋與辨識 64 6.6校準函數限制條件 66 6.6.1搜尋路徑 66 6.6.2整體搜尋範圍限制 68 6.6.3步數正規化 69 6.6.4局部限制條件 70 第七章非關鍵詞拒絕 71 7.1非關鍵詞拒絕目的 71 7.2不需訓練語音模型的非關鍵詞拒絕系統 72 7.2.1尋找最佳相似比值門檻 75 7.2.2過小相似比值臨界值配合失真量臨界值 77 7.2.3過大相似比值臨界值配合失真量臨界值 79 第八章系統架構 82 8.1 使用HTK萃取特徵參數之架構 82 8.1.1 HTK簡介 82 8.1.2 利用HTK萃取語音特徵參數 83 8.2 PC BASE架構 86 8.2.1關鍵詞檢索系統 86 8.2.2關鍵詞辨識系統 88 8.3 DSP BASE架構 91 8.3.1 DSP之發展與簡介 91 8.3.2 DSP之特點 91 8.3.3 DSP架構 92 8.3.4 DSP的應用 94 8.3.5 ADSP-21161系統簡介 95 8.3.6 DSP系統發展所提供資源簡介 96 8.3.7 DSP錄音介面 98 8.3.8 DSP BASE之關鍵詞辨識系統 100 第九章實驗結果 103 9.1實驗環境說明 103 9.1.1硬體規格 103 9.1.2軟體環境 103 9.1.3系統參數 104 9.2實驗方法與測試樣本說明 105 9.2.1實驗方法 105 9.2.2測試樣本說明 105 9.3實驗數據結果 106 9.3.1關鍵詞搜尋系統 106 9.3.2關鍵詞辨識系統 110 9.3.3非關鍵詞拒絕系統 113 9.3.4具非關鍵詞拒絕系統之關鍵詞辨識系統 118 9.4 ADSP-21161效能 118 第十章結論與未來發展 120 10.1結論 120 10.2未來發展 120 參考資料 122 附錄1 語調庫測試語句與關鍵詞內容內容 127 附錄2 測試語句內容與檔案說明 139

參考文獻 References
[1] ”ADSP-21161 DSP Hardware Reference”, Analog Devices Corp., 2002. [2] ”ADSP-21161N EZ-KIT LITE Evaluation System Manual”, Analog Devices Corp., 2002. [3] Berlin Chen ,” Speech Information Retrieval for Mandarin Chinese - Syllable-Based Indexing Features, Statistical Retrieval Models and Improved Approaches”, National Taiwan University Department of Computer Science and information Engineering Dissertation of Master, 2000. [4] Cheng Han Min, “A Study On The Keyword Spotting System”, National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 1995. [5] C. Myers and L.R. Rabiner, “Performance Tradeoffs in Dynamic Time Warping Algorithms for Isolated Word Recognition,” IEEE Trans on ASSP, Vol.28, No.6, pp 623-635, Dec. 1980. [6] F. Jelinek, “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp 532-536, Apr. 1976. [7] Fu-Cheng Wu, “Small-Vocabulary Speaker-Independent Mandarin Word Recognition Based on Syllable Templates,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1993. [8] H. Sakoe and S. Chiba, “Dynamic Programming Optimization for Spoken Word Recognition,” IEEE Trans on ASSP, Vol.26, pp 43-49, Feb. 1978. [9] H.Ney, “The use of a one-stage Dynamic Programming Algorithm for connected word rcognition,” IEEE Trans Acoustics Speech Signal Proc. , vol.32 ,no2 , pp263-271 , Arril 1984. [10] Hsin-Hung Liu, “Implementation of MFCC Processor Design for Speech Feature Extraction,” Master Thesis, Department of Electrical Engineering National Cheng Kung University, Taiwan, R.O.C., June, 2001. [11] Huey - Jen Jong, “Improvement of Keyword Spotting Method,” National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 1998. [12] Hung Yu-Chun, “Robust Multi-keyword Spotting of Telephone Speech Using Stochastic Matching,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1997. [13]Jian-Hong Qiu, “The Adaptive Keyword Spotting System,” National Tsing Hua University Department of Electrical Engineering Dissertation of Master, 2001. [14] Kuan-Hung Chen, “Using Dynamic Programming Bayesian Neural Network for Mandarin Consonant Recognition,” National Cheng Kung University Department of Electrical Engineering Dissertation of Master,1992. [15] Lawrence R. Rabiner, Ronald W. Schafer, “Digital Processing of Speech Signals,” Bell Laboratories, Incorporated, 1978. [16] Lawrence Rabiner Biing-Hwang Juang, “Fundamentals of Speech Recognition,” AT&T, 1993. [17] L. R. Rabiner, C. H. Lee, “A frame-synchronous network search algorithm for connected word recognition,” Acoustics, Speech and Signal Processing, IEEE Transactions on, “Vol. 37, Issue 11, Nov., 1989. [18] S.B. Davis and P. Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Trans on ASSP, Vol.28, No.4, pp357-366, Aug. 1980. [19] Shiau Jay-Lin, ‘On the Use of Prosodic Information for Mandarin Word Recognition,” National Cheng Kung University Department of Computer Science and information Engineering Dissertation of Master, 1996. [20] Steve Young, Gunnar Evermann, Dan Kershaw, Gareth Moore, Julian Odell, Dave Ollason, Valtcho Valtchev, Phil Woodland, “The HTK BOOK”（for HTK Version 3.1）,December 2001. [21] Tseng Ming-Shang, “A Singer Independent Karaoke Song Recognizer,’ National Cheng Kung University Department of Electrical Engineering Dissertation of Master,1996. [22] ‘VisualDSP++ 3.0 Getting Started Guide for Blackfin Family DSPs”, Analog Devices Corp., 2002.4. [23] ”VisualDSP++ 3.0 Getting Started Guide for SHARC Family DSPs,” , Analog Devices Corp.,2002.5. [24] Wei-Ho Tsai, “Automatic Identification and Indexing of Chinese Multilingual Spoken Messages,” Doctor Thesis, Department of Electrical Engineering National Chiao Tung University, Taiwan, R.O.C., May 2001. [25] X.D. Huang and K.F. Lee, “On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition,” IEEE Trans on ASSP,1991 [26]謝依蘭,”語音訊號數位處理,”松崗電腦圖書資料股份有限公司,1991. [27]陳明熒,”PC電腦語音辨認實作,”旗標出版社,1992. [28]黃偉倫,凌明煌,薛沛宏,曾中浩,林俊良,”MS-Windows 多媒體程式設計—概念與實務,”松崗電腦圖書資料股份有限公司,1995年1月. [29]王仁華,”人機語音通信,”聯經出版事業公司,1995. [30]連國珍,”數位信號處理簡介,”茂昌圖書有限公司, pp.140-168,1995. [31]陳芯暉,”應用因素樣本串接方式於連續語音關鍵詞辨認,”國立成功大學資訊工程研究所碩士論文,1995. [32]林傳生,李佩謙,”數位訊號處理器（DSP）簡介與應用,”全華科技圖書股份有限公司,1996. [33]顏國郎,”應用鑑別性語句驗證於電話語音關鍵詞辨識之研究,”國立成功大學資訊工程研究所碩士論文,1997. [34]林建良,”應用模糊隱藏式馬可夫模型於對話系統中語言型態之模擬,”國立成功大學資訊工程研究所碩士論文,1998. [35]王明習,”資料結構,”全華科技圖書股份有限公司,1998. [36]陳科旭,”使用右文相關聲韻母模式之國語關鍵詞辨認,”國立交通大學電信工程系碩士論文,1999. [37]楊哲堯,”應用部分樣本樹於會話語音之文句驗證與錯誤補償,”國立成功大學資訊工程研究所碩士論文,1999. [38]林宸生,”數位信號—影像與語音處理,”全華科技圖書股份有限公司,pp3_1-3_30,1999. [39]Mickey Williams, “Teach Yourself Visual C++ 6,” 第三波資訊股份有限公司,1999. [40]謝宏坤,”國立台灣科技大學電機工程系碩士論文,” 語音說明中搜尋任意定義之關鍵詞的研究,2000. [41]方士豪,”雜訊及通道環境下語音辨認技術之研究,”國立台灣大學電信工程學研究所碩士論文,2000. [42]林明宗,“Windows NT環境下PC-Based即時控制架構之發展與應用”,國立中正大學機械系碩士論文, 2000。 [43]劉佑德,”多關鍵詞文句之辨認方法,”國立清華大學電機工程學系碩士論文,2000. [44]林輝彥,”應用聽覺效應之模型於噪音環境中語音辨識,”國立成功大學資訊工程系碩士論文,2000. [45]陳順入,”應用叢集驗證法則於決策樹建立與語音辨識,”國立成功大學資訊工程研究所碩士論文,2000. [46]黃銘崇,”不特定語者語詞辨識系統之特徵設計,”國立中山大學電機工程研究所碩士論文,2001. [47]張展嘉,”自由音節解碼在全文資訊檢索及語句辨識之應用,”國立清華大學資訊工程學系碩士論文,2002. [48]葉志強,”音叉頻譜在母音辨識上之應用,”國立成功大學應用數學研究所碩士論文,2002. [49]莊益瑞,吳權威,”C++程式設計實務,”?眳p資訊股份有限公司,2002. [50]吳逸賢,吳目誠,”精彩C++ Builder 6程式設計,”知城數位科技股份有限公司,2002. [51]陳松琳,”以類神經網路為架構之語音辨識系統,”國立中山大學電機系碩士論文,2002. [52]楊鎮光,”Visual Basic與語音辨識,”文魁資訊股份有限公司,2002. [53]余明興,吳明哲,黃世陽,黃豐隆,紀旺松,潘能煌,”Borland C++ Builder 6程式設計經典”,pp14_2-14_40,2002年11月. [54]謝芳易,”結合隱藏式馬可夫模型與一階動態規劃演算法之連續語音辨識系統,”國立中山大學電機系碩士論文,2003. [55]徐嘉宏,”DSP BASED之手寫數字與形狀辨識系統,”國立中山大學電機工程研究所碩士論文,2003.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.131.13.194 論文開放下載的時間是校外不公開 Your IP address is 3.131.13.194 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS