Responsive image
博碩士論文 etd-0913102-043719 詳細資訊
Title page for etd-0913102-043719
論文名稱
Title
不特定語句中量語者辨識系統之設計研究
A design of text-independent medium-size speaker recognition system
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
47
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2002-07-24
繳交日期
Date of Submission
2002-09-13
關鍵字
Keywords
梅爾倒頻譜係數、向量量化、語者辨識系統、混合高斯模型
Gaussian mixture model, Vector quantization, Speaker recognition, Mel-frequency cepstrum coefficients
統計
Statistics
本論文已被瀏覽 5699 次,被下載 49
The thesis/dissertation has been browsed 5699 times, has been downloaded 49 times.
中文摘要
本論文之目的在於建立一個應用於中量語者的不特定語詞之語者辨識系統。使用的資料庫為自行錄製之電視語音資料庫和TIMIT資料庫,內含四百人的語音資料,每個人的語音長度為七十五秒。
不特定語句語者辨識系統的運作優劣,取決於系統之正確辨識率及辨識速度。辨識正確率,一般受語者人數多寡之影響。本論文擬用向量量化式混合高斯模型(Vector Quantization Gaussian Mixture Model),將大量人數的資料庫加以分類使其辨識正確率不會因人數之大量增加而快速下降。我們用每一個在高斯混合模型中獨立的高斯分佈來模擬語者廣闊的語音空間中某一種特徵。實驗結果發現,在文字不特定的模式底下,高斯混合模型的與者辨識系統有極為突出的表現。另外使用預先分群之構想使與者訓練速度較傳統GMM為分群之方法快很多,最多可節省一半的訓練時間。本論文也比較對角共變異矩陣與全共變異矩陣的效能,後者的辨識率雖然些微高於前者,但訓練時間為前者的三倍以上。

Abstract
This paper presents text-independent speaker identification results for medium-size speaker population sizes up to 400 speakers for TV speech and TIMIT database . A system based on Gaussian mixture speaker models is used for speaker identification, and experiments are conducted on the TV database and TIMIT database. The TV-Database results show medium-size population performance under TV conditions. These are believed to be the first speaker identification experiments on the complete 400 speaker TV databases and the largest text-independent speaker identification task reported to date. Identification accuracies of 94.5% on the TV databases, respectively and 98.5% on the TIMIT database .
目次 Table of Contents
第一章 序論
1-1 研究動機 ………………………………… 1
1-2 語者辨識概論 …………………………… 2
1-3 研究背景及目的 ………………………… 4
1-4 章節概要 ………………………………… 6

第二章 語者辨識系統及技術
2-1 語者辨認的基本技術 ………………… 7
2-2 特徵萃取
2-2-1倒頻譜係數 …………………… 9
2-2-2功率頻譜密度 ……………… 9
2-2-3梅爾倒頻譜係數 …………… 11

2-3 高斯混合模型與語者辨識
2-3-1 模型描述 ……………………… 15
2-3-2 參數初始化 ………………… 17
2-3-3 最佳可能性估測法 ………… 19
2-3-4 期望值最大演算法 ………… 19
2-3-5 語者指認 …………………… 24
2-4 向量量化式高斯混合模型(VQGMM)
2-4-1 VQGMM系統架構 ………………………… 25
2-4-2傳統GMM和VQGMM計算複雜度的比較 …………… 26

第三章 實驗設計、數據及討論
3-1 資料庫與系統參數設計 ……………… 28
3-2 實驗結果 ……………………………… 29

第四章 結論與展望 ………………………… 34
附錄
電視人名表 ………………………………… 36
TIMIT人名表 ……………………………… 40
性質證明 ………………………………… 44
參考文獻 ……………………………………………… 46
參考文獻 References
[1]J.J. Webb, E.L. Rissanen “Speaker Identification Experiments Using HMMs”, Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on, Vol. 2 , 1993 ,pp 387 -390

[2]W.M. Fisher, G.R. Doddingdon “The DARPA Speech Recognition Research Database: Specifications And Status”, In Proc. DARPA Workshop Speech Recognition, Feb. 1986, pp93-99

[3]L.F. Lemal, J.L Cauvain “Cross-Lingual Experiments With Phone Recognition”, In Proc. Int. Conf. Acoustic Speech Signal Processing, 1993, pp507-510

[4]J.L. Floch, C. Montacie “Investingations On Speaker Characterization From Orphee System Technics”, In Proc. Int. Conf. Acoustic Speech Signal Processing, Apr. 1994, pp149-152

[5]Douglas A. Reynolds, Richard C. Rose “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Signal Transactions On Speech and Audio Processing, Vol.3, No.1, January 1995, pp72-83

[6]Douglas A. Reynolds “Large Population Speaker Identification Using Clean And Telephone Speech”, IEEE Signal Processing Letters, Vol.2, No.3, March 1995, pp46-48

[7]John R. Deller, John G. Prooakls, John H. Hansen, “Discrete-Time Processing Of Speech Signals”, Maxwell Macmillan international

[8]Y. Linde, A. Buzo & R. Gray, “An Algorithm For Vector Quantizer Design”, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980

[9]Moody. J, Slomka .S, Pelecanos. J, “On The Convergence Of Gasssain Mixture Models: Improvements Through Vector Quantization”, ICSLP98.

[10]A. P. Dempster, N. M. Laird, “Maximum-Likelihood For Incomplete Data Via The EM Algorithm”, J. Royal Statist. Soc. SerB., pp39, 1977.

[11]C. F. J. Wu, “On The Convergence Properties Of The EM Algorithm”, The Annals of Statistics, 11(1):95-113, 1983.

[12]S. B. Davis, P. Mermelstein, “Comparison Of Parametric Representations For Monosyllabic Word Recognition In Continuously Spoken Sentences”, IEEE Transations on ASSP-28, pp 357-366, 1980

[13]L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Signal Processing Series, 1993.

[14]A. Kalyanswamy, “NTIMIT : A Phonetically Balanced, Continuous Speech, Telephone Bandwidth Speech Database”, Acoustics, Speech, and Signal Processing, pp109-112, 1990.

[15]Jeff A. Blimes, “A Gentle Tutorial Of The EM Algorithm And Its Application To Parameter Estimation For Gaussian Mixture And Hidden Markov Models”, International Computer Science Institute, April 1998.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內公開,校外永不公開 restricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.22.181.81
論文開放下載的時間是 校外不公開

Your IP address is 3.22.181.81
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code