Responsive image
博碩士論文 etd-0905108-021052 詳細資訊
Title page for etd-0905108-021052
論文名稱
Title
多時段不特定語句語者辨識用數位攝影機影音資料庫之設計研究
A Design of Multi-session Text-independent Digital Camcorder Audio-Video Database for Speaker Recognition
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
43
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2008-07-25
繳交日期
Date of Submission
2008-09-05
關鍵字
Keywords
生物辨識、文字轉語音系統、語者辨識、自動語音辨識系統
Speaker Recognition, Automatic Speech Recognition System, Text-to-Speech System, Biometrics
統計
Statistics
本論文已被瀏覽 5646 次,被下載 0
The thesis/dissertation has been browsed 5646 times, has been downloaded 0 times.
中文摘要
在本論文中,我們將使用數位攝影機來建立一個數位影音資料庫,以作語者辨識之用。此資料庫包含了1500個人,於三個不同時段,所錄製之不特定語句語者的動態影音資料,及每人每段影音資料中所擷取之20張照片。寄望本資料庫之完成,能對同時以聲音及臉形兩種生物特徵,來作身份辨認之系統設計,提供一個訓練與評量之機制。
Abstract
In this thesis, an audio-video database for speaker recognition is constructed using a digital camcorder. Motion pictures of fifteen hundred speakers are recorded in three different sessions in the database. For each speaker, 20 still images per session are also derived from the video data. It is hoped that this database can provide an appropriate training and testing mechanism for person identification using both voice and face features.
目次 Table of Contents
誌謝辭 I
論文摘要 II
目錄 III
圖目錄 V
第一章 緒論 1
1-1研究動機與目的 1
1-2 語者辨識概論 2
1-3 章節概要 5
第二章 語音處理相關技術之研究 6
2-1 語音處理相關領域 6
2-2 語音辨識相關技術 9
2-2-1 端點偵測(Endpoint Detection) 9
2-2-2 能量(Energy) 10
2-2-3 越零率(Zero Crossing Rate) 10
2-2-4 最大相似比(Maximum Likelihood Rate, MLR)11
2-3 視窗函數(Window Function) 12
2-4 特徵萃取(Feature Extraction) 13
2-4-1 線性預估編碼(Linear Predictive Coding, LPC)14
2-4-2梅爾倒頻譜係數(Mel-Frequency Cepstrum Coefficients, MFCC)16
第三章 國內、外語者辨識用影音資料庫19
3-1 國內語音資料庫 19
3-1-1國語語音資料庫MAT-160 19
3-1-2國語語音資料庫MAT-400 19
3-1-3國語語音資料庫MAT-2500 20
3-1-4麥克風語料庫 TCC-300Edu 20
3-1-5國語連續數字語音資料庫 21
3-1-6中文廣播新聞語料庫 MATBN 22
3-2 國外語音資料庫 23
3-2-1 TIMIT語音資料庫 23
3-2-2 NTIMIT語音資料庫 25
3-2-3 CTIMIT語音資料庫 26
3-2-4 NIST標準對話電話語音資料庫 26
3-2-5 AVTIMIT影音資料庫 27
3-2-6 VidTIMIT影音資料庫 27
第四章 實驗設計、數據及討論 28
4-1 資料庫建立之參數設計 28
4-2 實驗結果 30
第五章 結論與展望 33
5-1 結論 33
5-2 未來展望 33
參考文獻 35
參考文獻 References
[1] Lawrence Rabiner and Bing-Hwang Juang, “Fundamentals of speech recognition”,Prentice Hall, 1993.

[2] D.A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication 17. pp.91-108 , March 1995

[3] A. P. Dempster, N. M. Laira and D. B. Rubin, “Maximum Likelihood fromIncomplete Data via the EM Algorithm,” Harvard University and Educational Testing Service, Dec. 1976.

[4] Biing-Hwang Juang, Wu Chou, and Chin-Hui Lee, ”Minimum Classification ErrorRate Methods for Speech Recognition,” IEEE Trans. On Speech and Audio Processing. Vol. 5, NO. 3, May 1997.

[5] W. Chou, B.H. Juang and C.H Lee, “Segmental GPD Training of HMM based Speech Recognizer,” In proceedings of ICASSP, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, page(s): 473 -476, 1992.41

[6] del Alamo, C.M.; Caminero Gil, F.J.; dela Torre Munilla, C.; Hernandez Gomez, L.“Discriminative Training of GMM for Speaker Identification,” In proceedings ofICASSP, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, page(s): 89 -92 , 1996.

[7] Li Lee, Richard Rose, “A Frequency Warping Approach to Speaker
Normalization,” IEEE Trans. On Speech and Audio Processing. Vol. 6, NO.1,January 1998.

[8] Welling, L.; Kanthak, S.; Ney, H., “Improved Method For Vocal TractNormalization,” IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.2, page(s): 761 –764, 1999.

[9] W.M. Fisher, G.R. Doddingdon “The DARPA Speech Recognition Research Database: Specifications And Status”, In Proc. DARPA Workshop Speech Recognition, Feb. 1986, pp93-99

[10]L.F. Lemal, J.L Cauvain “Cross-Lingual Experiments With Phone Recognition”, In Proc. Int. Conf. Acoustic Speech Signal Processing, 1993, pp507-510

[11]John R. Deller, John G. Prooakls, John H. Hansen, “Discrete-Time Processing Of Speech Signals”, Maxwell Macmillan international

[12] Y. Linde, A. Buzo & R. Gray, “An Algorithm For Vector Quantizer Design”, IEEE Transactions on Communications, Vol. 28, pp.84-95, 1980

[13]Moody. J, Slomka .S, Pelecanos. J, “On The Convergence Of Gasssain Mixture Models: Improvements Through Vector Quantization”, ICSLP98.
[14]A. P. Dempster, N. M. Laird, “Maximum-Likelihood For Incomplete Data Via The EM Algorithm”, J. Royal Statist. Soc. SerB., pp39, 1977.

[13]L. Rabiner, B. H. Juang, “Fundamentals of Speech Recognition”, Prentice Hall Signal Processing Series, 1993.

[14]王小川,”語音訊號處理”,全華,民國93年.

[15]戴顯權,”資料壓縮”,紳藍,民國91年.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 3.215.183.194
論文開放下載的時間是 校外不公開

Your IP address is 3.215.183.194
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code