國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,特定語句與不特定語句語者辨識系統之設計研究,A Design of Text-Dependent and Text-Independent Speaker Recognition System

論文名稱 Title	特定語句與不特定語句語者辨識系統之設計研究 A Design of Text-Dependent and Text-Independent Speaker Recognition System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	59
研究生 Author	游薪樵 Sin-ciao You
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	柏小松, 李聰 Sheau-Shong Bor; Tsung Lee
口試日期 Date of Exam	2013-07-10	繳交日期 Date of Submission	2013-09-12
關鍵字 Keywords	高斯混合模型、語者辨識、線性預估倒頻譜係數、梅爾頻率倒頻譜係數、隱藏式馬可夫模型、卡氏轉換 Gaussian mixture model, Karhunen-Loeve transform, Hidden Markov model, Mel frequency cepstral coefficients, Speaker recognition, Linear predictive cepstral coefficients
統計 Statistics	本論文已被瀏覽 5688 次，被下載 86 次 The thesis/dissertation has been browsed 5688 times, has been downloaded 86 times.

中文摘要
身份辨識一直以來，在科技界都是一項重要的議題。在現今許多地方，例如大樓設施之門禁系統與資訊設備之登入系統，都可以看見它的蹤跡。以門禁系統來說，除了運用數位型的辨識方法，如RFID等技術外，透過生物辨識，更可以加強系統之安全性。生物辨識系統主要是透過生物特徵，例如人臉、指紋、虹膜或聲紋等，來進行辨識。其中需要考慮到的是，特徵必須具有唯一性、普遍性與永久性，也就是說，每個人身上都要有此特徵，是獨一無二的，且不會隨著時間有所改變。人臉辨識常會因為化妝而使辨識效果產生落差；指紋辨識也會因指紋樣本被他人取得，而使系統無法達到效果；虹膜辨識雖然有相當高的準確性，但其系統造價高昂是最大的缺點；聲音是人類與生俱來特有的，使用上相當方便，且其系統建置成本相對較低。因此，吾人希望透過聲紋特徵，設計一套可靠經濟與安全便利的語者辨識系統。本論文分別針對特定語句與不特定語句二種運作模式，設計與實作語者辨識系統。特定語句系統以隱藏式馬可夫模型與高斯混合模型，為語者辨識之基礎；不特定語句系統以高斯混合模型、卡氏轉換與巴氏距離為核心，進一步搭配典型相關分析，來作語者之最後判定。本論文特定語句系統，每位語者使用三次音檔作訓練，110位語者之正確辨識率為93.33%。不特定語句系統，運用40秒音檔作訓練，5秒語料作測試，一千位語者之辨識率為95.8%。在CPU 時脈為2.2 GHz的AMD Athlon XP 2800+ 之個人電腦與Ubuntu 9.04作業系統環境下，本論文中所完成之兩類語者辨識系統，其平均辨識時間均在1秒上下。
Abstract
Identification has been an important issue in the scientific and technological circles. It can be found today in the access control systems for the facilities and the login systems for the information equipment. Utilizing digital identification of RFID patterns to recognize the authorized users is an example of access control. However, the card could be lost and maybe lent to others. Biometrics therefore can be applied to greatly enhance the security of the system. Facial feature, fingerprint, iris pattern and voiceprint are common used in biological identification due to their unique, universal and permanent characteristics. Face recognition system could be failed for cosmetic makeup. Fingerprint system could be cheated by duplicating the samples by translucent tapes. Iris system performs the best. However its cost is extremely expensive. Voiceprint system is convenient and much more economical to establish. Therefore, it is our objective to design a reliable, convenient and cost effective speaker recognition system to fulfill the identification task. In this thesis, both text-dependent and text-independent speaker recognition systems are designed and implemented. Hidden Markov model and Gaussian mixture model are used as the foundation in the text-dependent scenario. Gaussian mixture model, Karhunen-Loeve transform and Bhattacharyya distance are applied in the initial classification, and the canonical analysis is utilized in the final decision for the text-independent system. Three pre-designed sentences per person are recorded to train a 110 speakers’ text-dependent system. A correct recognition rate of 93.33% can be reached. For the text-independent system, training material of 40 seconds for each person is collected. A correct rate of 95.8% can be obtained for a 1000 speakers’ system using test patterns of 5 seconds. Under the AMD Athlon XP 2800+ 2.2 GHz personal computer and Ubuntu 9.04 operating system environment, the average computation time for each system is about 1 second.

目次 Table of Contents
論文審定書 i 誌謝 ii 摘要 iii Abstract iv 目錄 v 圖次 vii 表次 viii 第一章緒論 1 1.1 研究動機 1 1.2 研究方法 3 1.3 論文章節概要 4 1.4 語者辨識 5 第二章語者辨識系統之流程與技術 8 2.1前處理 8 2.1.1音框化 9 2.1.2能量(Energy) 9 2.1.3越零率(Zero Crossing Rate) 10 2.1.4高頻預強調（Pre-Emphasis） 11 2.1.5加視窗(Windowing) 12 2.2特徵參數萃取 13 2.2.1線性預估倒頻譜係數 13 2.2.2梅爾頻率倒頻譜係數 19 2.3隱藏式馬可夫模型 23 2.3.1參數模型初始化 24 2.3.2最佳狀態序列(optimal state sequence) 27 2.3.3 模型參數估算 29 2.4高斯混合模型 31 2.4.1模型描述 32 2.4.2初始參數 32 2.4.3期望值最大化演算法 35 2.4.4最大事後機率(Maximum a Posteriori Criterion,MAP) 35 2.5卡式轉換辨識系統 36 2.5.1卡氏轉換 37 2.5.2巴氏距離(Bhattacharyya distance) 40 2.5.3典型相關分析(Canonical Analysis) 41 第三章語者辨識系統實作成果與辨識效能 44 3.1特定語句之隱藏式馬可夫模型辨識系統 44 3.2特定語句之高斯混合模型辨識系統 44 3.3不特定語句之高斯混合模型辨識系統 45 3.4不特定語句之卡式轉換辨識系統 46 3.5不特定語句之卡式轉換與高斯混合模型之混合系統 46 3.6不特定語句之典型相關系統 47 第四章結論與未來展望 48 參考文獻 49

參考文獻 References
[1] S.J. Abdallah, I.M. Osman, M. E. Mustafa, “Text-independent speaker identification using hidden Markov model”, World of Computer Science and Information Technology Journal WCSIT, Vol. 2, No. 6, pp.203-208, 2012 [2] W.M. Fisher, G.R. Doddingdon, “The DARPA speech recognition research database: specifications and status”, Proc. DARPA Workshop Speech Recognition, pp.93-99, Feb. 1986 [3] Chih-Chien Thomas Chen, Chin-Ta Chen, Cheng-Kuan Hou, “Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach” , Pattern Recognition, Vol.37, pp.1073-1075, 2004. [4] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972 [5] 維基百科, http://en.wikipedia.org/wiki/Canonical _correlation [6] Thomas F. Quatieri, Discrete-Time Speech Signal Processing Principles and Practice, Prentice Hall, Taiwan, 2005 [7] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2001 [8] Douglas A. Reynolds, Richard C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, Vol.2, No.1, January 1995 [9] 王小川，語音訊號處理，台北：全華圖書，2004

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0812113-132458.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS