國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,中文關鍵語詞搜尋系統之設計研究 ,A Design of Mandarin Keyword Spotting System

論文名稱 Title	中文關鍵語詞搜尋系統之設計研究 A Design of Mandarin Keyword Spotting System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	91 學年度第 1 學期 The fall semester of Academic Year 91	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	102
研究生 Author	王怡理 Yi-Lii Wang
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Wang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2003-01-08	繳交日期 Date of Submission	2003-02-07
關鍵字 Keywords	隱藏式馬可夫模型、閞鍵語詞搜尋、語詞辨識、線性預估編碼、倒頻譜、口語對話系統 Speech Recognition, LPC, Hidden Markov Model, Cepstrum
統計 Statistics	本論文已被瀏覽 5664 次，被下載 0 次 The thesis/dissertation has been browsed 5664 times, has been downloaded 0 times.

中文摘要
本論文探討如何利用線性預估編碼、向量量化、離散型隱藏式馬可夫模型及維特比演算法，來設計中文關鍵語詞搜尋系統。在論文中，我們進一步參考了各種對話系統的原理，實作了一個以對話方式來作輸入的台灣鐵路局自然語言訂票系統。系統透過對話詢問使用者之姓名及身分證字號、起站、終站、車種及張數和搭車時間等五個問題，來實地驗證了本論文的可行性與實用性。在實驗室的環境下，個別使用者平均可在90秒內，由印表機正確列印出所需之火車車票。
Abstract
A Mandarin keyword spotting system based on LPC, VQ, discrete-time HMM and Viterbi algorithm is proposed in the thesis. Joining with a dialogue system, this keyword spotting platform is further refined to a prototype of Taiwan Railway Natural Language Reservation System. In the reservation process, five questions: name and ID number, departure station, destination station, train type and number of tickets, and time schedule are asked by the computer-dialogue attendant. Following by the customer’s speech confirmation, electronic tickets can be correctly issued and printed within 90 seconds in a laboratory environment.

目次 Table of Contents
目錄頁次論文摘要……………………………………………………………….Ⅰ 致謝…………………………………………………………………….Ⅱ 目錄…………………………………………………………………….Ⅲ 圖表目錄……………………………………………………………….Ⅵ 第一章諸論 1 1.1 研究動機…………………………………………………... 1 1.2 研究主題…………………………………………………... 2 1.3 章節概要…………………………………………………... 2 第二章語音資料………………………………………………… 4 2.1 向量量化碼簿………………………………………………. 4 2.1.1 碼簿所對應的向量空間－中文1312個單音………. 4 2.1.2 向量空間的充實…………………………………….. 5 2.1.3 碼簿的大小………………………………………….. 5 2.2 關鍵語詞……………………………………………………. 5 2.3 本章摘要…………………………………………………… 6 第三章語音編碼…………………………………………………. 7 3.1 聲道模擬系統………………………………………………. 7 3.1.1 線性預估…………………………………………….. 8 3.1.2全極點系統…………………………………………… 9 3.1.3係數ak之計算………………………………………… 15 3.1.4線性預估的誤差與輸入訊號………………………… 19 3.2格狀結構系統……………………………………………….. 19 3.2.1 由低階系統的係數疊代出高階系統的係數……….. 26 3.2.2反射係數與估測誤差………………………………… 26 3.2.3 反射係數之計算…………………………………….. 29 3.3 本章摘要……………………………………………………. 33 第四章特徵萃取的方法…………………………………………. 34 4.1 語音訊號的前處理－頻譜分析……………………………. 34 4.2 線性預估編碼模型…………………………………………. 34 4.3 LPC-based Cepstrum處理程序…………………………….. 35 4.3.1 預強………………………………………………….. 35 4.3.2 取音框及加窗函數………………………………….. 36 4.3.3 LPC-based Cepstrum轉換…………………………… 37 4.3.4 Temporal Cepstral Derivative時間頻譜導數………... 37 4.4 本章摘要……………………………………………………. 38 第五章　語音模型……………………………………………….. 39 5.1語音模型基本概念………………………………………….. 39 5.2 隱藏馬可夫模型………………………………………. 40 5.2.1 隱藏馬可夫鏈………………………………………. 40 5.2.2隱藏馬可夫模型的參數……………………………… 41 5.2.3隱藏馬可夫模型應用於解決實際問題的三個方向… 43 5.3 隱藏馬可夫模型三個基本問題之解法…………………… 43 5.3.1 問題一的解法-Forward and backward 演算法… 44 5.3.2問題二的解法 – Viterbi演算法…………………… 48 5.3.3問題三的解法 – Baum-Welch(EM)方法………… 51 5.4 語音隱藏馬可夫模型實作上須考慮之細節……………… 56 5.4.1語音訊號的隱藏馬可夫模型的型式………………… 56 5.4.2隱藏馬可夫模型狀態數目的選擇…………………… 56 5.4.3 隱藏馬可夫模型參數初始化……………………….. 57 5.4.4 在順向與逆向程序中保持數值精確的方法……….. 57 5.4.5 Viterbi演算法實現方式……………………………… 60 5.4.6多重語詞訓練方法…………………………………… 62 5.5 本章摘要……………………………………………………. 63 第六章比對………………………………………………………. 64 6.1語詞辨識的方法…………………………………………….. 64 6.2關鍵語詞搜尋的方法………………………………………. 64 6.3 語詞切割的應用……………………………………………. 69 6.4 本章摘要……………………………………………………. 69 第七章對話系統………………………………………………… 70 7.1 口語對話系統的設計………………………………………. 70 7.1.1 對話系統進行流程………………………………….. 71 7.2 對話系統的策略…………………………………………… 73 7.2.1系統提示策略………………………………………… 73 7.2.2系統更新策略………………………………………… 74 7.3 使用者回應的模型…………………………………………. 75 7.4 模擬通道效應的模型………………………………………. 76 7.5 本章摘要……………………………………………………. 77 第八章研究成果與具體貢獻……………………………………. 78 8.1 語音類別……………………………………………………. 78 8.1.1 語音類別說明……………………………………….. 81 8.1.2 WSpeech類別………………………………………… 81 8.1.3 WLPC類別…………………………………………… 82 8.1.4 WCep類別……………………………………………. 82 8.1.5 WVQ類別……………………………………………. 84 8.1.6 WSeg類別……………………………………………. 85 8.1.7 WFFT類別…………………………………………… 85 8.2 系統類別……………………………………………………. 86 8.2.1 WHMM類別…………………………………………. 86 8.2.2 WTicket類別…………………………………………. 87 8.2.3 WTrain類別…………………………………………... 88 8.3 台鐵自然語音訂票系統……………………………………. 89 8.4 本章摘要……………………………………………………. 89 第九章未來展望與結論…………………………………………. 90 9.1 口話化對話應系統的設計………………………………… 90 9.2 資料庫內容及功能…………………………………………. 90 9.3 結論…………………………………………………………. 91 附表一中文1312個單音……………………………………………… 92 附表二關鍵字組………………………………………………………. 93 參考資料……………………………………………………………………94

參考文獻 References
1.Ben Gold and Nelson Morgan, Speech and Audio Signal Processing, JOHN WILEY & SONS INC., 2000. 2.Rabiner, L., and Junang, B.-H., “Fundamentals of speech recognition” , Pentice-Hell Englewood Cliffs, N.J., 1993. 3.John G. Proakis and Dimitris G. Manolakis “Digital signal process principles, Algorithms, and Application”, Pentice-Hell, 1996. 4.Papamichalis P.E. “Practical Approaches to Speech Coding” 5.J. Makhoul, :”Linear prediction: A tutorial review', Proceedings of the IEEE, vol. 63, pp. 561-580, April 1975. 6 . B.H. Juang and L.R. Rabiner,”Mixture Autoregressive Hidden Markov models for speech signals.” IEEE Trans. ASSP, ASSP-33, (1985) 1404-1413 7.John R. Deller,Jr. , John G. Proakis, and John H. L. Hansen,:Discrete- Time Processing of Speech Signals , New Jersey,Prentice Hall,Inc., 1987. 8. Rabiner, L.R.,:” A tutorial on hidden Markov models and selected applications in speech recognition”, Proceedings of the IEEE , Volume: 77 Issue: 2 , Feb. 1989 Page(s): 257 -286 9.Jeff A. Bilmes, “A Gentle Tutorial of the EM Algorithm to Parameter Estimation for Gaussian Mixture and Hidden Markov Model”, International Computer Science Institute Berkeley CA, 1998. 10.Lawrence R. Rabiner, Jay G. Wilpon and Frank K. Soong, “High Performance Connected Digit Recognition Using Hidden Markov Model”, IEEE Trans. ASSP Vol 37, No 8, August 1989. 11.Jay G. Wilpon, L. R. Rabiner, Chin-Hui Lee, and E. R. Goldman, “Automatic Recognition of Keyword in Unconstrained Speech Using Hidden Markov Model”, IEEE Trans. ASSP Vol 38, No 11, November 1990. 12.Eng-Fong Huang, Hsiao-Chuan Wang, and Frank K. Soong, “A Fast Algorithm for Large Vocabulary Keyword Spotting Application”, IEEE Trans. ASSP Vol 2, No 3, July 1994. 13.J. T. Foote, S.J. Young, G. J. F. Jones, and K. Sparck Jones, “Unconstrained Keyword Spotting Using Phone Lattices with Application to Spoken Document Retrieval”, Academic Press Limited 1997. 14.Bor-shen Lin and Lin-shan Lee, “Computer-Aided Analysis and Design for Spoken Dialogue System Based on Quantitative Simulations”, IEEE Trans. ASSP Vol 9, No 5, July 2001. 15.Victor Zue, Stephanie Seneff, and James R. Glass, “Jupiter: A Telephone-based Conversational Interface for Weather Information”, IEEE Trans. ASSP Vol 8, No 1, January 2000. 16. Victor Zue, and James R. Glass,”Conversational Interfaces: Advances and Challenges”, Proceedings of the IEEE Vol 88, No 8, August 2000. 17.Biing-Hwang Juang ,and Sadaoki Furui, “Automatic Recognition and Understanding of Spoken Language-A First Step Toward Natural Human-Machine Communication”, Proceedings of the IEEE Vol 88, No 8, August 2000. 18. Paul M. Embree and Damon Danieli “C++ Algorithms for Digital Signal Processing”, Prentice-Hill Inc N.J. 1999.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.221.141.44 論文開放下載的時間是校外不公開 Your IP address is 18.221.141.44 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS