Responsive image
博碩士論文 etd-0812113-132458 詳細資訊
Title page for etd-0812113-132458
論文名稱
Title
特定語句與不特定語句語者辨識系統之設計研究
A Design of Text-Dependent and Text-Independent Speaker Recognition System
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
59
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2013-07-10
繳交日期
Date of Submission
2013-09-12
關鍵字
Keywords
高斯混合模型、語者辨識、線性預估倒頻譜係數、梅爾頻率倒頻譜係數、隱藏式馬可夫模型、卡氏轉換
Gaussian mixture model, Karhunen-Loeve transform, Hidden Markov model, Mel frequency cepstral coefficients, Speaker recognition, Linear predictive cepstral coefficients
統計
Statistics
本論文已被瀏覽 5688 次,被下載 86
The thesis/dissertation has been browsed 5688 times, has been downloaded 86 times.
中文摘要
身份辨識一直以來,在科技界都是一項重要的議題。在現今許多地方,例如大樓設施之門禁系統與資訊設備之登入系統,都可以看見它的蹤跡。以門禁系統來說,除了運用數位型的辨識方法,如RFID等技術外,透過生物辨識,更可以加強系統之安全性。生物辨識系統主要是透過生物特徵,例如人臉、指紋、虹膜或聲紋等,來進行辨識。其中需要考慮到的是,特徵必須具有唯一性、普遍性與永久性,也就是說,每個人身上都要有此特徵,是獨一無二的,且不會隨著時間有所改變。人臉辨識常會因為化妝而使辨識效果產生落差;指紋辨識也會因指紋樣本被他人取得,而使系統無法達到效果;虹膜辨識雖然有相當高的準確性,但其系統造價高昂是最大的缺點;聲音是人類與生俱來特有的,使用上相當方便,且其系統建置成本相對較低。因此,吾人希望透過聲紋特徵,設計一套可靠經濟與安全便利的語者辨識系統。
本論文分別針對特定語句與不特定語句二種運作模式,設計與實作語者辨識系統。特定語句系統以隱藏式馬可夫模型與高斯混合模型,為語者辨識之基礎;不特定語句系統以高斯混合模型、卡氏轉換與巴氏距離為核心,進一步搭配典型相關分析,來作語者之最後判定。本論文特定語句系統,每位語者使用三次音檔作訓練,110位語者之正確辨識率為93.33%。不特定語句系統,運用40秒音檔作訓練,5秒語料作測試,一千位語者之辨識率為95.8%。在CPU 時脈為2.2 GHz的AMD Athlon XP 2800+ 之個人電腦與Ubuntu 9.04作業系統環境下,本論文中所完成之兩類語者辨識系統,其平均辨識時間均在1秒上下。
Abstract
Identification has been an important issue in the scientific and technological circles. It can be found today in the access control systems for the facilities and the login systems for the information equipment. Utilizing digital identification of RFID patterns to recognize the authorized users is an example of access control. However, the card could be lost and maybe lent to others. Biometrics therefore can be applied to greatly enhance the security of the system. Facial feature, fingerprint, iris pattern and voiceprint are common used in biological identification due to their unique, universal and permanent characteristics. Face recognition system could be failed for cosmetic makeup. Fingerprint system could be cheated by duplicating the samples by translucent tapes. Iris system performs the best. However its cost is extremely expensive. Voiceprint system is convenient and much more economical to establish. Therefore, it is our objective to design a reliable, convenient and cost effective speaker recognition system to fulfill the identification task.
In this thesis, both text-dependent and text-independent speaker recognition systems are designed and implemented. Hidden Markov model and Gaussian mixture model are used as the foundation in the text-dependent scenario. Gaussian mixture model, Karhunen-Loeve transform and Bhattacharyya distance are applied in the initial classification, and the canonical analysis is utilized in the final decision for the text-independent system. Three pre-designed sentences per person are recorded to train a 110 speakers’ text-dependent system. A correct recognition rate of 93.33% can be reached. For the text-independent system, training material of 40 seconds for each person is collected. A correct rate of 95.8% can be obtained for a 1000 speakers’ system using test patterns of 5 seconds. Under the AMD Athlon XP 2800+ 2.2 GHz personal computer and Ubuntu 9.04 operating system environment, the average computation time for each system is about 1 second.
目次 Table of Contents
論文審定書 i
誌謝 ii
摘要 iii
Abstract iv
目錄 v
圖次 vii
表次 viii
第一章 緒論 1
1.1 研究動機 1
1.2 研究方法 3
1.3 論文章節概要 4
1.4 語者辨識 5
第二章 語者辨識系統之流程與技術 8
2.1前處理 8
2.1.1音框化 9
2.1.2能量(Energy) 9
2.1.3越零率(Zero Crossing Rate) 10
2.1.4高頻預強調(Pre-Emphasis) 11
2.1.5加視窗(Windowing) 12
2.2特徵參數萃取 13
2.2.1線性預估倒頻譜係數 13
2.2.2梅爾頻率倒頻譜係數 19
2.3隱藏式馬可夫模型 23
2.3.1參數模型初始化 24
2.3.2最佳狀態序列(optimal state sequence) 27
2.3.3 模型參數估算 29
2.4高斯混合模型 31
2.4.1模型描述 32
2.4.2初始參數 32
2.4.3期望值最大化演算法 35
2.4.4最大事後機率(Maximum a Posteriori Criterion,MAP) 35
2.5卡式轉換辨識系統 36
2.5.1卡氏轉換 37
2.5.2巴氏距離(Bhattacharyya distance) 40
2.5.3典型相關分析(Canonical Analysis) 41
第三章 語者辨識系統實作成果與辨識效能 44
3.1特定語句之隱藏式馬可夫模型辨識系統 44
3.2特定語句之高斯混合模型辨識系統 44
3.3不特定語句之高斯混合模型辨識系統 45
3.4不特定語句之卡式轉換辨識系統 46
3.5不特定語句之卡式轉換與高斯混合模型之混合系統 46
3.6不特定語句之典型相關系統 47
第四章 結論與未來展望 48
參考文獻 49
參考文獻 References
[1] S.J. Abdallah, I.M. Osman, M. E. Mustafa, “Text-independent speaker identification using hidden Markov model”, World of Computer Science and Information Technology Journal WCSIT, Vol. 2, No. 6, pp.203-208, 2012
[2] W.M. Fisher, G.R. Doddingdon, “The DARPA speech recognition research database: specifications and status”, Proc. DARPA Workshop Speech Recognition, pp.93-99, Feb. 1986
[3] Chih-Chien Thomas Chen, Chin-Ta Chen, Cheng-Kuan Hou, “Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach” , Pattern Recognition, Vol.37, pp.1073-1075, 2004.
[4] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972
[5] 維基百科, http://en.wikipedia.org/wiki/Canonical _correlation
[6] Thomas F. Quatieri, Discrete-Time Speech Signal Processing Principles and Practice, Prentice Hall, Taiwan, 2005
[7] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2001
[8] Douglas A. Reynolds, Richard C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, Vol.2, No.1, January 1995
[9] 王小川,語音訊號處理,台北:全華圖書,2004
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code