國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,不特定語句中量語者辨識系統之設計研究 ,A design of text-independent medium-size speaker recognition system

論文名稱 Title	不特定語句中量語者辨識系統之設計研究 A design of text-independent medium-size speaker recognition system
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	90 學年度第 2 學期 The spring semester of Academic Year 90	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	47
研究生 Author	鄭順德 Shun-De Zheng
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2002-07-24	繳交日期 Date of Submission	2002-09-13
關鍵字 Keywords	梅爾倒頻譜係數、向量量化、語者辨識系統、混合高斯模型 Gaussian mixture model, Vector quantization, Speaker recognition, Mel-frequency cepstrum coefficients
統計 Statistics	本論文已被瀏覽 5699 次，被下載 49 次 The thesis/dissertation has been browsed 5699 times, has been downloaded 49 times.

中文摘要
本論文之目的在於建立一個應用於中量語者的不特定語詞之語者辨識系統。使用的資料庫為自行錄製之電視語音資料庫和TIMIT資料庫，內含四百人的語音資料，每個人的語音長度為七十五秒。不特定語句語者辨識系統的運作優劣，取決於系統之正確辨識率及辨識速度。辨識正確率，一般受語者人數多寡之影響。本論文擬用向量量化式混合高斯模型(Vector Quantization Gaussian Mixture Model)，將大量人數的資料庫加以分類使其辨識正確率不會因人數之大量增加而快速下降。我們用每一個在高斯混合模型中獨立的高斯分佈來模擬語者廣闊的語音空間中某一種特徵。實驗結果發現，在文字不特定的模式底下，高斯混合模型的與者辨識系統有極為突出的表現。另外使用預先分群之構想使與者訓練速度較傳統GMM為分群之方法快很多，最多可節省一半的訓練時間。本論文也比較對角共變異矩陣與全共變異矩陣的效能，後者的辨識率雖然些微高於前者，但訓練時間為前者的三倍以上。
Abstract
This paper presents text-independent speaker identification results for medium-size speaker population sizes up to 400 speakers for TV speech and TIMIT database . A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TV database and TIMIT database. The TV-Database results show medium-size population performance under TV conditions. These are believed to be the first speaker identification experiments on the complete 400 speaker TV databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 94.5% on the TV databases, respectively and 98.5% on the TIMIT database .

目次 Table of Contents
第一章序論 1-1 研究動機 ………………………………… 1 1-2 語者辨識概論 …………………………… 2 1-3 研究背景及目的 ………………………… 4 1-4 章節概要 ………………………………… 6 第二章語者辨識系統及技術 2-1 語者辨認的基本技術 ………………… 7 2-2 特徵萃取 2-2-1倒頻譜係數 …………………… 9 2-2-2功率頻譜密度 ……………… 9 2-2-3梅爾倒頻譜係數 …………… 11 2-3 高斯混合模型與語者辨識 2-3-1 模型描述 ……………………… 15 2-3-2 參數初始化 ………………… 17 2-3-3 最佳可能性估測法 ………… 19 2-3-4 期望值最大演算法 ………… 19 2-3-5 語者指認 …………………… 24 2-4 向量量化式高斯混合模型(VQGMM) 2-4-1 VQGMM系統架構 ………………………… 25 2-4-2傳統GMM和VQGMM計算複雜度的比較 …………… 26 第三章實驗設計、數據及討論 3-1 資料庫與系統參數設計 ……………… 28 3-2 實驗結果 ……………………………… 29 第四章結論與展望 ………………………… 34 附錄電視人名表 ………………………………… 36 TIMIT人名表 ……………………………… 40 性質證明 ………………………………… 44 參考文獻 ……………………………………………… 46

參考文獻 References
[1]J.J. Webb, E.L. Rissanen “Speaker Identification Experiments Using HMMs”, Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, Vol. 2 , 1993 ,pp 387 -390 [2]W.M. Fisher, G.R. Doddingdon “The DARPA Speech Recognition Research Database: Specifications And Status”, In Proc. DARPA Workshop Speech Recognition, Feb. 1986, pp93-99 [3]L.F. Lemal, J.L Cauvain “Cross-Lingual Experiments With Phone Recognition”, In Proc. Int. Conf. Acoustic Speech Signal Processing, 1993, pp507-510 [4]J.L. Floch, C. Montacie “Investingations On Speaker Characterization From Orphee System Technics”, In Proc. Int. Conf. Acoustic Speech Signal Processing, Apr. 1994, pp149-152 [5]Douglas A. Reynolds, Richard C. Rose “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Signal Transactions On Speech and Audio Processing, Vol.3, No.1, January 1995, pp72-83 [6]Douglas A. Reynolds “Large Population Speaker Identification Using Clean And Telephone Speech”, IEEE Signal Processing Letters, Vol.2, No.3, March 1995, pp46-48 [7]John R. Deller, John G. Prooakls, John H. Hansen, “Discrete-Time Processing Of Speech Signals”, Maxwell Macmillan international [8]Y. Linde, A. Buzo & R. Gray, “An Algorithm For Vector Quantizer Design”, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980 [9]Moody. J, Slomka .S, Pelecanos. J, “On The Convergence Of Gasssain Mixture Models: Improvements Through Vector Quantization”, ICSLP98. [10]A. P. Dempster, N. M. Laird, “Maximum-Likelihood For Incomplete Data Via The EM Algorithm”, J. Royal Statist. Soc. SerB., pp39, 1977. [11]C. F. J. Wu, “On The Convergence Properties Of The EM Algorithm”, The Annals of Statistics, 11(1):95-113, 1983. [12]S. B. Davis, P. Mermelstein, “Comparison Of Parametric Representations For Monosyllabic Word Recognition In Continuously Spoken Sentences”, IEEE Transations on ASSP-28, pp 357-366, 1980 [13]L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Signal Processing Series, 1993. [14]A. Kalyanswamy, “NTIMIT : A Phonetically Balanced, Continuous Speech, Telephone Bandwidth Speech Database”, Acoustics, Speech, and Signal Processing, pp109-112, 1990. [15]Jeff A. Blimes, “A Gentle Tutorial Of The EM Algorithm And Its Application To Parameter Estimation For Gaussian Mixture And Hidden Markov Models”, International Computer Science Institute, April 1998.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內公開，校外永不公開 restricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.22.181.81 論文開放下載的時間是校外不公開 Your IP address is 3.22.181.81 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS