Responsive image
博碩士論文 etd-0716104-171952 詳細資訊
Title page for etd-0716104-171952
論文名稱
Title
以卡式轉換為基礎之身份辨識
Person Identification Based on Karhunen-Loeve Transform
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
108
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2004-05-26
繳交日期
Date of Submission
2004-07-16
關鍵字
Keywords
向量量化、基因工程演算法、硬極限卡式轉換、高斯混合模型、卡式轉換
Gaussian Mixture Model, Karhunen Loeve Transform, Vector Quantizer, Genetic Algorithm, Hard-Limit Karhunen Loeve Transform
統計
Statistics
本論文已被瀏覽 5669 次,被下載 0
The thesis/dissertation has been browsed 5669 times, has been downloaded 0 times.
中文摘要
中文摘要
在本論文中,研究的主題是以卡式轉換為基礎之身份辨識。其中包含的兩個大主題就是語者辨識和人臉辨識。因此辨識率的提昇、計算量的簡化與強健性的加入,是本論文研究之三個重點。
首先是設計特徵萃取的方法,以達到大量簡化計算量,並維持高辨識率的問題。本論文提出以卡式轉換的方法作特徵萃取的設計。由於卡式轉換在表示隨機過程時,每一個轉換係數皆是獨立的,且因這些係數具有最小的截斷誤差與最大能量保存的特性,只要選出少量基底,即可包含大部分原有資料的有用訊息。所以,在資料轉換、訓練及測試時,都可大大地減少計算量;在資料表達的完整性上,被視為是一種最佳的線性轉換。但是,由於卡式轉換所導出的特徵向量,在資料轉換、語料的訓練和測試的時候,需要執行大量的浮點乘法運算,較不能滿足即時的運算要求。我們利用卡式轉換中所導出來的特徵向量,在波形零交越(zero-crossing)處,取出含有辨識能力之資訊,提出了一個以基底結構近似為基礎之硬極限卡式轉換(Hard-Limited KLT),此方法在犠牲些微辨識率的情況下,能夠大量地加快辨識速度,而仍保有極高之辨識精確度。
硬極限卡式轉換除了應用在語者辨識系統上,我們亦將之應用於人臉辨識系統上。我們設計了一個針對卡式特徵臉,尋找最佳結構近似的評量方法,順利圓滿地完成硬極限卡式轉換的過程。實驗結果相當令人滿意。
其次是設計辨識器部份,以達到縮短辨識時間與增加辨識率的問題。由於高斯混合模型原就有相當之辨識能力,但因為它會耗去很多的計算資源,所以在本論文中,我們設計了合併卡式轉換和高斯混合模型的方法。步驟一,使用卡式轉換法,挑選出與測試語者最相近的候選人族群,i.e.,捨棄差異性較大的語者。再使用高斯混合模型,由所挑選的候選人族群中,指出最相近的語者。實驗結果證實,不僅辨識時間減少,也增加了辨識率。
再者,本論文亦提出結合基因演算法和向量量化之策略,我們引入基因演算法來搜尋最佳之向量量化器之碼簿,以避免傳統向量量化器可能找到局部最佳特徵,而非整體最佳特徵之缺點。我們由實驗證實,此策略較傳統向量量化器,可獲得更佳的辨識率。
最後是設計一個具有高度雜音系統的問題。上述以卡式轉換為第一級初選之方法,再以第二級MFCC為特徵之高斯混合模型來作抉擇之策略,不僅能增加辨識率,亦減少了辨識時間。
Abstract
Abstract


In this dissertation, person identification systems based on Karhunen-Loeve transform (KLT) are investigated. Both speaker and face recognition are considered in our design. Among many aspects of the system design issues, three important problems: how to improve the correct classification rate, how to reduce the computational cost and how to increase the robustness property of the system, are addressed in this thesis.

Improvement of the correct classification rate and reduction of the computational cost for the person identification system can be accomplished by appropriate feature design methodology. KLT and hard-limited KLT (HLKLT) are proposed here to extract class related features. Theoretically, KLT is the optimal transform in minimum mean square error and maximal energy packing sense. The transformed data is totally uncorrelated and it contains most of the classification information in the first few coordinates. Therefore, satisfactory correct classification rate can be achieved by using only the first few KLT derived eigenfeatures.

In the above data transformation process, the transformed data is calculated from the inner products of the original samples and the selected eigenvectors. The computation is of course floating point arithmetic. If this linear transformation process can be further reduced to integer arithmetic, the time used for both person feature training and person classification will be greatly reduced. The hard-limiting process (HLKLT) here is used to extract the zero-crossing information in the eigenvectors, which is hypothesized to contain important information that can be used for classification. This kind of feature tremendously simplifies the linear transformation process since the computation is merely integer arithmetic.

In this thesis, it is demonstrated that the hard-limited KL transform has much simpler structure than that of the KL transform and it possess approximately the same excellent performances for both speaker identification system and face recognition system.

Moreover, a hybrid KLT/GMM speaker identification system is proposed in this thesis to improve classification rate and to save computational time. The increase of the correct rate comes from the fact that two different sets of speech features, one from the KLT features, the other from the MFCC features of the Gaussian mixture speaker model (GMM), are applied in the hybrid system.

Furthermore, this hybrid system performs classification in a sequential manner. In the first stage, the relatively faster KLT features are used as the initial candidate selection tool to discard those speakers with larger separability. Then in the second stage, the GMM is utilized as the final speaker recognition means to make the ultimate decision. Therefore, only a small portion of the speakers needed to be discriminated in the time-consuming GMM stage. Our results show that the combination is beneficial to both classification accuracy and computational cost.

The above hybrid KLT/GMM design is also applied to a robust speaker identification system. Under both additive white Gaussian noise (AWGN) and car noise environments, it is demonstrated that accuracy improvement and computational saving compared to the conventional GMM model can be achieved.

Genetic algorithm (GA) is proposed in this thesis to improve the speaker identification performance of the vector quantizer (VQ) by avoiding typical local minima incurred in the LBG process. The results indicates that this scheme is useful for our application on recognition and practice.
目次 Table of Contents
目錄
頁次
致謝辭…………………………………………………………………I
論文提要…………………………………………………………….II
目錄…………………………………………………………………….VIII
圖目錄…………………………………………………………………….XI
表目錄………………………………………………………………....XIII
第一章
序論………………………………………………………….....…….1
1-1 研究動機與目的…………………………………1
1-2 論文主體………………………………….…….2
1-3 論文貢獻…………………………………….....5
1-4 論文架構……………..………………………...9
第二章 文獻綜觀……………………………………...12
第三章 以卡式轉換為基礎之不特定語句語者辨識系統22
3-1 簡介……………………………………………22
3-2 長時域語音頻譜………………………….….24
3-3 卡氏轉換……...……………………...….26
3-4 硬極限卡氏轉換……………………………….29
3-5 二次式分類法………………………………...32
3-6 實驗結果………………………………….…..33
3-7 結論…………………………………...….….35
第四章 向量量化器與基因演算法在語者辨識系統上之應用..…36
4-1 簡介…………………………………………36
4-2 向量量化器………………………………………………37
4-3 基因演算法………………………………….….38
4-4 應用基因演算法搜尋最佳向量量化碼簿之策略46
4-5 實驗結果…………………………………….….48
4-6 結論…………………...……………….…….49
第五章 大量語者辨識系統………………………………51
5-1 簡介………………………………….…...….51
5-2 梅爾倒頻譜係數……………………………………52
5-3 巴氏距離………………………………….……56
5-4 高斯混合模型式………………………………….…..…57
5-5 結合卡式轉換與高斯混合模型之語者辨識系統71
5-6 實驗結果……………………………………………72
5-7 結論…………………………………………………75
第六章 強健式語者辨識系統……………………………76
6-1 簡介………………………...………..……….76
6-2 梅爾倒頻譜平均值消去係數………………………78
6-3 結合卡式轉換與高斯混合模型之強健式語者辨識系統...78
6-4 實驗結果…………………………………………80
6-5 結論…………………...……..…..………..82
第七章 硬極限卡式轉換在人臉辨識上之應用…….…83
7-1 簡介……………………………………………83
7-2 卡氏特徵臉…………………………..……….84
7-3 硬極限卡式特徵臉求取的實務考量………….87
7-4 實驗結果……………………………...……….90
7-5 結論………………………………………..……92
第八章 結論………………………………………………93
8-1 總結……………………………………...……93
8-2 未來展望……………………………………….97
參考文獻…………………………………………………...…………99
附錄:性質證明………………………………..……………………105
自傳………………………………………….......…………….…107
發表之著作…………………………….............……………108
參考文獻 References
參考文獻

[1] http://www.db-t.com/2-1.htm
[2] L. Rabiner, B. H. Jang, Fundamentals of Speech Recognition, Pretice Hall, New Jersey, 1993.
[3] J. D. Markei, B. T. Oshika, and A. H Gray, “Long term feature averaging for speaker recognition”, IEEE Trans. Acoustics Speech Signal Process., ASSP-25, pp.330-337, 1977
[4] H. Jialong, L. Liu, and G. Palm, “A discriminative training algorithm for VQ-based speaker identification”, IEEE Trans. Speech Audio Processing, Vol.7, No.3, pp.353-356, May 1999
[5] R. P. Ramachandean, M. S. Zilovic and R. J. Mammone, “A comparative study of robust linear predictive analysis methods with application to speaker identification”, IEEE Trans. Speech Audio Processing, Vol.3, No.2, pp.117-125, Mar. 1995
[6] M. S. Zilovic, R. P. Ramachandean and R. J. Mammone, “Speaker identification based on the use of robust cepstral features obtained from pole-zero transfer functions”, IEEE Trans. Speech Audio Processing, Vol. 6, No. 3, pp. 260-266, May 1998
[7] D. A. Reynolds, “Experimental evaluation of features for robust speaker identification”, IEEE Trans. Speech Audio Processing, Vol. 2, No. 4, pp. 639-643, Oct. 1994
[8] A. Haydar, M. Demirekler and M. K. Yurtseven, “Speaker identification through use of features selected using genetic algorithm”, Electronics Letters, Vol. 34, pp. 39-40, Jan. 1998
[9] S. Furui, “Ceptrum analysis technique for automatic speaker verification”, IEEE Trans. Acoustics Speech Signal Process., Vol. 29, pp.254-272, 1981
[10] D.A. Reynolds and R.C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. Speech Audio Processing, Vol. 3, No.1, pp. 72-83, 1995
[11] J. Olsen, “A two-stage procedure for phone based speaker verification”, Pattern Recognition Letters 18, pp. 889-897, 1997
[12] D. A. Reynolds, “Large population speaker identification using clean and telephone speech”, IEEE Signal Processing Letters, Vol.2, No.3, pp46-48, Mar. 1995
[13] R. Brunelli, and T. Poggio, “Face recognition: features versus templates”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, pp. 1042 -1052 ,Oct. 1993 ,
[14] M. A. Turk and A. P. Pentland, “Face recognition using eigenfaces”, in Proc. Int. Conf. on IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR '91., June 1991, pp. 586 -591
[15] M. A. Turk and A. P. Pentland, “Eigenfaces for recognition”, Journal of Cognitive Neuroscience, Vol. 3, No. 1, Massacbusetts Institute of Technology, pp. 71-86, 1991,
[16] R. Chellappa, C. L. Wilson, and S. Sirohey, “Human and machine recognition of faces: a survey”, in Proc. of the IEEE, Vol. 83, May 1995, pp. 705 -741
[17] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: a convolutional neural-network approach”, IEEE Trans. Neural Networks, Vol. 8, pp. 98 -113, Jan. 1997
[18] H. A. Rowley, S. Baluja, and T. Kanade, “ Neural network-based face detection”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 20, pp. 23 -38, Jan. 1998
[19] A. M. Martinez, “ Recognizing imprecisely localized,partially occluded and expression variant faces from a single sample per class”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, pp. 748 -763, Jun. 2002
[20] A. M. Martinez and A. C. Kak, “PCA versus LDA”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, pp. 228 -233, Feb. 2001
[21] M. H. Yang, D. J. Kriegman and N. Ahuja, “Detecting faces in images: a survey”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 24, pp. 34 -58, Jan. 2002
[22] R. C. Gonzalez and R. E. Woods, Digital image processing, Prentice Hall, New Jersey, 2002
[23] R. L. Hsu and A. K. Jain, “Face detection in color images”, in Proc. Int. Conf. on Image Processing, Vol. 1, Oct. 2001, pp.1046 -1049
[24] S. Srisuk and W. Kurutach, “A new robust face detection in color images”, in Proc. Int. Conf. on IEEE Automatic Face and Gesture Recognition, 2002, 20-21 May 2002, pp. 291 -296
[25] D. L. Swets and J. J. Weng, “Using discriminant eigenfeatures for image retrieval ”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 18, pp. 831 -836, Aug. 1996
[26] Y. Wang and B. Yuan, “ Segmentation method for face detection in complex background”, Electronics Letters , Vol. 36, No. 3, pp. 213 -214, Feb. 2000
[27] C. C. T. Chen and D. A. Landgrebe, “A spectral feature design system for the HIRIS/MODIS era”, IEEE Trans. Geoscience Remote Sensing, Vol. 27, No. 6, pp. 681-686, 1989
[28] K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, San Diego, California, USA, 1990
[29] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete Time Processing of Speech Signals, New York, Macmillan Pub. Co., 1993
[30] J.P. Campbell, “Speaker recognition: A Tutorial”, in Proc. IEEE, Vol. 85, Sep. 1997, pp. 1437-1462
[31] A. M. Kondoz, Digital Speech Coding, New York: John Wiley &Sons Inc., 1994.
[32] R. W. Schafer and J.D. Markel, Eds., Speech Analysis, New York, IEEE Press, 1979.
[33] J. R. Deller, J. G. Prooakls, J. H. Hansen, Discrete-Time Processing of Speech Signals, New Jersey, Prentice Hall, 1987
[34] 王理嘉, 語音學教程, 五南圖書出版公司, 1995
[35] A V. Oppenheim, R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, 1993
[36] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning, Addison-Wesley, 1989
[37] Y. Linde, A. Buzo, and R.M. Gray, “An algorithm for vector quantizer design”, IEEE Trans. Communications, vol.28, pp. 84-95 Jan. 1980
[38] J. K. Flanagan, D. R. Morell, R. L. Frost, C. J. Read, and B. E. Nelson, “Vector quantisation codebook generation using simulated annealing”, in Proc. Int. Conf. on Acoustics, Speech and Signal Processing, 1989, pp. 1759-1762
[39] Z. Yaxin, R. Togneri, and M. Alder, “Phoneme-based vector quantization in a discrete HMM speech recognizer”, IEEE Trans. Speech Audio Processing, Vol. 5, pp. 26 – 32, Jan. 1997
[40] Guorong Xuan, Wei Zhang, and Peiqi Chai, “EM algorithms of Gaussian mixture model and hidden Markov model”, in Proc. Int. Conf. on Image Processing, Vol. 1 , Oct. 2001, pp. 145 - 148
[41] 噪音雜訊資料庫http://spib.rice.edu/spib/data/signals/noise/
[42] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. Acoustic Speech Signal Processing, ASSP-27, pp. 113-120, Apr. 1979
[43] P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor(NSS), hidden Markov models and the projection, for robust speech recognition in cars”, Speech Commun., Vol. 11, pp. 215-228, Jun. 1992
[44] C. C. T. Chen, C. T. Chen, and C. M. Tsai, “Hard-limited Karhunen-Loeve transform for text-independent speaker recognition”, Electronics Letters, Vol. 33, pp. 2014-2016, Nov. 1997
[45] S. Y. Lung and C. C. T. Chen, “A new approach for text-independent speaker recognition”, Pattern Recognition, Vol. 33, pp. 1401-1403, 2000
[46] S. Y. Lung and C. C. T. Chen, “Further reduced form of Karhunen-Loeve transform for text independent speaker recognition”, Electronics Letters, Vol. 34, pp. 1380-1382, Jul. 1998
[47] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, San Diego, 1999
[48] R. A. Finan, A. T. Sapeluk, and R. I. Damper, “Imposter cohort selection for score normalization in speaker verification”, Pattern Recognition Letter, Vol. 18, pp. 881-887, 1997
[49] J. Colombi, D. Ruck, S. Rogers, M. Oxley and T. Anderson, “Cohort selection and word grammer effects for speaker recognition”, in Proc. IEEE. Conf. Acoust., Speech, Signal Processing, 1996, pp.85-88
[50] S. Furui, “Research on individuality features in speech waves and automatic speaker recognition techniques”, Speech Commun., Vol. 2, pp.183-197, 1986
[51] A. M. Martinez and A. C. Kak, “PCA versus LDA”, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 23, pp. 228 -233, Feb. 2001
[52] C. Liu and H. Wechsler, “Independence component analysis of gabor features for face recognition”, IEEE Trans. Neural Networks, vol. 14, no. 4, Jul. 2003
[53] N. R. French and J. C. Steinberg, “Factors governing the intelligibility of speech sounds”, J. Acoust. Soc. Am., Vol. 19, pp. 90-119, 1947
[54] P. Castellano, “A Study of LVQ learning schedules for ANN speaker identification”, in Proc. IEEE Region 10’s Ninth Annual International Conference, pp. 902-905, 1994
[55] Z. X. Yuan, B. L. Xu, and C. Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification”, IEEE Trans. Speech Audio Processing, Vol.7, No.1, pp70-78, Jan. 1999
[56] V. Moonasar and G. K. Venayagamoorthy, “Speaker identification using a combination of different parameters as feature inputs to artificial neural network classifier”, in Proc. IEEE Africon, Vol.1, 1999, pp189-194
[57] P. N.Belhumeur, J. P. Hespanha, and D. J Kriegman, “Eigenfaces vs. fisherfaces: recognition using class specific linear projection” , IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 19, pp. 711 -720, Jul. 1997
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外均不公開 not available
開放時間 Available:
校內 Campus:永不公開 not available
校外 Off-campus:永不公開 not available

您的 IP(校外) 位址是 18.218.129.100
論文開放下載的時間是 校外不公開

Your IP address is 18.218.129.100
This thesis will be available to you on Indicate off-campus access is not available.

紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code