國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多時段不特定語句語者辨識用電視影音資料庫之設計研究,A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition

論文名稱 Title	多時段不特定語句語者辨識用電視影音資料庫之設計研究 A Design of Multi-Session, Text Independent, TV-Recorded Audio-Video Database for Speaker Recognition
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	94 學年度第 2 學期 The spring semester of Academic Year 94	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	51
研究生 Author	王龍政 Long-Cheng Wang
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	汪啟茂 Chii-Maw Uang
口試委員 Advisory Committee	李聰 Tsung Lee
口試日期 Date of Exam	2006-07-25	繳交日期 Date of Submission	2006-09-07
關鍵字 Keywords	梅爾頻率倒頻譜係數、語者辨識、不特定語句、向量量化、高斯混合模型 Speaker recognition, Text independent, Vector quantization, Gaussian mixture model, Mel-frequency cepstrum coefficients
統計 Statistics	本論文已被瀏覽 5649 次，被下載 0 次 The thesis/dissertation has been browsed 5649 times, has been downloaded 0 times.

中文摘要
在本論文中，我們建立了一個四個時段的電視影音資料庫。針對該資料庫，我們應用梅爾倒頻譜係數與高斯混合模型，來探討同時段與多時段、不特定語句、大量語者的辨識問題。文中首先針對同時段的語者辨識，驗證系統的可靠度，實驗證明在3000位電視語者的情況下，辨識率可達九成。但在不同時段的辨識問題上，我們以800位電視語者來作探討，正確率卻只有六成七。由於不同時段的語者資料，存在著許多錄音環境、語者心境與其他未知的特性變化，使得系統辨識能力大幅下降，這是我們在未來的研究中極需克服的問題。本論文的主要貢獻，在針對這個複雜問題，提供了未來研究一個有效可用的多時段大量語者影音資料庫。
Abstract
A four-session text independent, TV-recorded audio-video database for speaker recognition is collected in this thesis. The speaker data is used to verify the applicability of a design methodology based on Mel-frequency cepstrum coefficients and Gaussian mixture model. Both single-session and multi-session problems are discussed in the thesis. Experimental results indicate that 90% correct rate can be achieved for a single-session 3000-speaker corpus while only 67% correct rate can be obtained for a two-session 800-speaker dataset. The performance of a multi-session speaker recognition system is greatly reduced due to the variability incurred in the recording environment, speakers’ recording mood and other unknown factors. How to increase the system performance under multi-session conditions becomes a challenging task in the future. And the establishment of such a multi-session large-scale speaker database does indeed play an indispensable role in this task.

目次 Table of Contents
論文摘要 1 致謝 2 目錄 3 圖目錄 5 表目錄 6 第一章緒論 1-1 研究動機 7 1-2 語者辨識概論 7 1-3 研究方法 9 1-4 章節概要 10 第二章語音訊號處理與特徵參數萃取 2-1 語音訊號處理 11 2-2 音框能量與越零率 13 2-2-1音框能量量測 13 2-2-2越零率 13 2-2-3端點偵測 13 2-3 聲音特徵參數萃取 14 2-3-1倒頻譜係數 16 2-3-2梅爾倒頻譜係數 19 第三章電視影音資料庫 3-1 動機 22 3-2 電視影音資料庫內容 22 3-3 電視影音資料庫建立流程 23 3-3-1語者影音檔的蒐 24 3-3-2語者聲音原始檔的擷取 25 3-3-3語者聲音原始檔的處理 27 3-4 電視影音資料庫使用說明 27 第四章高斯混合模型為基礎的語者辨識 4-1 模型描述 29 4-2 模型解釋 30 4-3 向量量化與參數初始化 31 4-4 最佳可能性估測法 35 4-5 期望值最大化演算法 35 4-6 語者指認 40 第五章語者辨識實驗 5-1 高斯混合模型在電視影音與TIMIT資料庫之辨識實驗 41 5-2 同一個語者同時段與不同時段不特定語句辨識實驗 42 5-3 中量語者不同時段與大量語者同時段不特定語句辨識實驗43 5-3-1中量語者不同時段辨識 43 5-3-2大量語者同時段辨識 44 5-4 調整語者模型之系統設計於不同時段語者辨識實驗 45 第六章結論及未來展望 6-1結論 47 6-2未來展望 47 參考文獻 49

參考文獻 References
[1] L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Signal Processing Series, 1993. [2] John R. Deller, John G. Prooakls, John H. Hansen, “Discrete-Time Processing Of Speech Signals”, Maxwell Macmillan international [3] W.M. Fisher, G.R. Doddingdon “The DARPA Speech Recognition Research Database: Specifications And Status”, In Proc. DARPA Workshop Speech Recognition, Feb. 1986, pp93-99 [4] Alan V. Oppenheim, Ronald W. Schafer, “Discrete-Time Signal Processing”, Prentice Hall Signal Processing Series, 1999. [5] Douglas A. Reynolds, Richard C. Rose “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, IEEE Signal Transactions on Speech and Audio Processing, Vol.3, No.1, January 1995, pp72-83 [6] Douglas A. Reynolds “Large Population Speaker Identification Using Clean And Telephone Speech”, IEEE Signal Processing Letters, Vol.2, No.3, March 1995, pp46-48 [7] S. B. Davis, P. Mermelstein, “Comparison Of Parametric Representations For Monosyllabic Word Recognition In Continuously Spoken Sentences”, IEEE Transations on ASSP-28, pp 357-366, 1980 [8] A. P. Dempster, N. M. Laird, “Maximum-Likelihood For Incomplete Data Via The EM Algorithm”, J. Royal Statist. Soc. SerB., pp39, 1977. [9] C. F. J. Wu, “On The Convergence Properties Of The EM Algorithm”, The Annals of Statistics, 11(1):95-113, 1983. [10] Jeff A. Blimes, “A Gentle Tutorial Of The EM Algorithm And Its Application To Parameter Estimation For Gaussian Mixture And Hidden Markov Models”, International Computer Science Institute, April 1998. [11] Todd K. Moon, “The Expectation-Maximization Algorithm,” IEEE Signal Processing Magazine, Nov. 1996. [12] S. Furui, “Cepstral Analysis Technique for Automatic Speaker Verification.” IEEE Trans. Acoustics, Speech and Signal Processing, vol. 29, No. 2, pp. 254-272, 1981. [13] Jean-Luc Gauvain, Chin-Hui Lee, “Maximum a Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” IEEE Transactions on Speech and Audio Processing, 1994. [14] 王小川，語音訊號處理，2004，台北：全華。 [15] 鄭順德，“不特定語句中量語者辨識系統之設計研究”，國立中山大學電機工程研究所碩士論文，2002。

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 3.133.79.70 論文開放下載的時間是校外不公開 Your IP address is 3.133.79.70 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS