國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,高斯混合模型之語者與情緒辨識系統,Speaker and Emotion Recognition System of Gaussian Mixture Model

論文名稱 Title	高斯混合模型之語者與情緒辨識系統 Speaker and Emotion Recognition System of Gaussian Mixture Model
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	94 學年度第 2 學期 The spring semester of Academic Year 94	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	73
研究生 Author	王中義 Jhong-yi Wang
指導教授 Advisor	陳遵立 none
召集委員 Convenor	高一智 none
口試委員 Advisory Committee	吳永春, 張介能 none; none
口試日期 Date of Exam	2006-07-22	繳交日期 Date of Submission	2006-08-01
關鍵字 Keywords	語者與情緒辨識系統、數位信號處理、高斯混合模型 Speaker and Emotion Recognition System, Gaussian Mixture Model, DSP
統計 Statistics	本論文已被瀏覽 5642 次，被下載 19 次 The thesis/dissertation has been browsed 5642 times, has been downloaded 19 times.

中文摘要
本論文中，將分別在PC與數位信號處理器(DSP)平台上，建立語者與情緒辨識系統。大部分的語者和情緒辨識，多為二者分開辨識，並沒有將二者結合在同一個系統之中，本論文則是將語者以及情緒辨識結合在同一個系統之中，由麥克風擷取聲音、DSP完成特徵點抽取，再經由樣式比對，即可得出辨識結果。整個辨識系統分為四個子系統:語音前置處理，語者模型訓練，語者與情緒識別，語者確認。語者前置處理是使用麥克風擷取聲音，透過DSP處理板傳送到SRAM中，然後進行前處理的動作。語者模型訓練，利用高斯混合模型建立出各別語者的，平均值、變異係數和權重值，當作整個系統辨識的基準。語者辨別主要利用機率密度來辨別語者身分，情緒辨識則是利用變異係數的變化來辨識情緒。語者確認是為了確保使用者是否為系統資料庫中的同一位語者。以DSP為架構的辨識系統，包括兩個部分：硬體的設定與演算法的實現。DSP方面使用的是定點運算的DSP板，而辨識的演算法是利用高斯混合模型。定點運算訊號處理器在成本上相對於浮點有其優勢存在，讓系統可以更貼近使用者。
Abstract
In this thesis, the speaker and emotion recognition system is established by PC and digit signal processor (DSP). Most speaker and emotion recognition systems are separately accomplished, but not combined together in the same system. In this thesis, it will show how speaker and emotion recognition systems are combined in the same system. In this system, the voice is picked up by a mike and through DSP to extract the characteristics. Then it passes the sample correctly, it can draw the result of distinguishing. The recognition system is divided into four sub-systems: the pronunciation pre-process, the speaker training model, the speaker and emotion recognition, and the speaker confirmation. The pronunciation pre-process uses the mike to capture the voice, and through the DSP board to convey the voice to the SRAM, then movements dealt with pre-process. The speaker trained model uses the Gaussian mixture model to establish the average, coefficient of variation and weight value of the person who sets up speaker specifically. And we’ll take this information to be the datum of the whole recognition system. The speaker recognition mainly uses the density of probability to recognition the speaker’s identity. The emotion recognition takes advantage of the coefficient of variation to recognize the emotion. The speaker confirms is set up to sure whether the user is the same speaker who hits for the systematic database. The recognition system based on DSP includes two parts：Hardware setting and implementation of speaker algorithm. We use the fixed-arithmetician DSP chip (chipboard) in the DSP, the algorithm of recognition is Gaussian mixture model. In addition, compared with floating point, the fixed point DSP cost much less; it makes the system nearer to users.

目次 Table of Contents
摘要 I Abstract II 目錄 IV 圖目錄 VI 表目錄 VIII 第一章緒論 1 1.1前言 1 1.2語者辨識概述 2 1.3 情緒辨識概述 3 1.4研究動機 3 1.5章節概要 4 第二章語者與情緒辨識系統 5 2.1 簡介 5 2.2系統架構 6 2.2.1特徵萃取流程 7 2.2.2語者模型訓練流程 8 2.2.3語者與情緒識別流程 9 2.2.4語者確認流程 10 2.3特徵萃取 11 2.3.1 去除直流偏壓 11 2.3.2語音正規化 12 2.3.3音框處理 12 2.3.4 端點偵測演算法 13 2.3.5預先強化 16 2.3.6漢明窗 17 2.3.7快速傅立葉轉換 18 2.3.8 梅爾倒頻譜參數 19 2.3.8.1 梅爾頻譜 21 2.3.8.2梅爾通道能量 23 2.3.8.3對數能量的計算 24 2.3.8.4離散餘弦轉換 25 2.3.8.5差異值係數 26 2.4高斯混合模型 27 2.4.1 模型描述 27 2.4.2最佳可能性估測法 30 2.4.3期望值最佳演法 30 2.4.4取自然對數的期望值最佳演法 35 2.4.5情緒辨識 37 2.4.6語者識別 38 第三章系統架構 39 3.1 PC BASE 39 3.1.1 錄音系統 39 3.1.2 訓練系統 40 3.1.3 語者與情緒辨識系統 42 3.2 DSP BASE 44 3.2.1 DSP之發展與簡介 44 3.2.2 DSP之特點 44 3.2.3 DSP的架構 45 3.2.4 DSP的應用 46 3.2.5 ADSP-BF533 EZ-KIT Lite系統簡介 47 3.2.6 DSP系統發展資源簡介 49 3.2.7 DSP BASE之語者與情緒辨識系統 50 3.3 DSP語者辨識操作界面 52 第四章實驗方法與結果 54 4.1 實驗環境說明 54 4.1.1 硬體規格 54 4.1.2 軟體環境 54 4.1.3 系統參數 55 4.2實驗規劃 55 4.2.1各種情緒的比較 55 4.2.2不同語者在相同情緒上的比較 56 4.2.3情緒辨識率 58 4.2.4不同高斯模型對系統辨識率的影響 58 第五章結論與未來展望 60 5.1結論 60 5.2未來展望 61 參考文獻 62

參考文獻 References
[1] Cannon, W.B., “Again the James-Lange theory of emotion: a critical examination and an alternative theory”, Am. J. Psychol, 39,106-24, 1931. [2] Strongman K.T.著，游恆山譯，「情緒心理學」，五洲發行，文笙總經銷，民76，台北巿。 [3] Cornelius R.R., The science of emotion. Research and tradition in the psychology of emotion Upper Saddle River (NJ): Prentice-Hall, 1996. [4] Cornelius R.R., “THEORETICAL APPROACHES TO EMOTION ”, ISCA Workshop on Speech and Emotion,Vassar College, Poughkeepsie, NY USA, 2000. [5] Picard R.W., Vyzas E., and Healey J., “Toward Machine Emotional Intelligence: Analysis of Affective Physiological State”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, no. 10, October 2002. [6] Markel, J. D., Oshika, B. and Gray, A. H. , “Long-Term Feature Averaging for Speaker Recognition,” IEEE Trans. Acoust, Speech and Signal Processing, Vol. ASSP-25, PP.330-337, August 1977. [7] L. Rudasi and S. A. Zahorian,”Text-independent talker identification with neural networks,” in Proc. IEEE ICASSP, May 1991, pp.389-392 [8] 陳高斌，”應用SOM-PNN混合神經網路在語者識別”，義守大學，2001 [9] 黃俊豪，”大量語者不特定語句環境下語者辨識系統之特徵設計”，國立中山大學，2000 [10] Pereira C., “Dimension of emotional meaning in speech”,ISCA Workshop on Speech and Emotion, Speech, Hearing and Language Research Centre, Macquarie University, Australia,2000. [11] Davitz, J.R., “Auditory correlates of vocal expression of emotional feeling. In the communication of emotional meaning”, New York: McGraw-Hill, 1964. [12] Mozziconacci, S.J.L, “Speech variability and emotion:Production and perception.” Ph.D. thesis, indhoven, The 59 Netherlands, 1998. [13] Iida A., Campbell N., Iga S., Higuchi F., Higuchi F., and Yasumura M., “A Speech Synthesis System with Emotion for Assisting Communication”, ISCA Workshop on Speech and Emotion, Keio Research Institute at SFC, Keio University,ATR Information Sciences Division, 2000. [14] Paeschke A., and Sendlmeier W. F., ”Prosodic Characteristics of Emotional Speech: Measurement of Fundamental Frequency Movements” ISCA Workshop on Speech and Emotion, Technical University Berlin, Germany, 2000. [15] Amir N., Ron S., and Laor N., “Analysis of an emotional speech cprpus in Hebrew based on objective criteria”, ISCA Workshop on Speech and Emotion, Holon Academic Institute of Technology, Holon, Israel, 2000. [16] Roach P., “Techniques for the Phonetic Description of Emotional Speech”, ICSA Workshop on Speech and Emotion, School of Linguistics and Applied Language Studies,University of Reading, U.K., 2000. [17] Nicholson J., Takahashi K., and Nakatsu R., “Emotion recognition in speech using neural networks”, ATR Media Integration & Communications Research Lab Neural Information Processing, 1999. Proceedings. ICONIP '99. 6th International Conference on, Volume: 2, 1999. [18] Yamada T., Hashimoto H., and Tosa N., “Pattern recognition of emotion with Neural Network”, Proceedings of the 1995 IEEE IECON 21st International Conference on, Volume: 1,1995. [19] Polzin T.S., “Detecting Verbal and Non-verbal Cues in the Communication of Emotion” Ph.D. thesis, Carnegie Mellon University, 1998.60 [20] Fukuda S., and Kostov V., ”Extracting emotion from voice”, IEEE International Conference on Systems, Man, and Cybernetics, 1999. [21] Nwe T.L., and Wei F.S., ”Speech Based Emotion Classification”, Electrical and Electronic Technology, 2001. TENCON. Proceedings of IEEE Region 10 International Conference on, Volume: 1, 2001. [22] Sato J., and Morishima S., “Emotion modeling in speech production using emotion space”, Faculty of Engineering,Seikei University, IEEE International Workshop on Robot and Human Communication,1996. [23] Lawrence Rabiner Biing-Hwang Juang，”FUNDAMENTALS OF SPEECH RECOGNITION”, PTR Prentice-Hall,Inc. A Simon & Schuster Company Englewood Cliffs, New Jersey 07632 [24] HTK BOOK(for HTK Version 3.1),2003. [25] Douglas A. Reynolds and Richard C. Rose,”Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models”, 1995 [26] 鍾偉仁，”語者辨認與驗證初步之研究”，國立台灣大學，2000 [27] Bhattacharyya, S.; Srikanthan, T.; Krishnamurthy, P.,” Ideal GMM parameters & posterior log likelihood for speaker verification”, in Proc. IEEE 10-12 Sept. 2001 [28] J.P.Campbell, “Speaker Recognition: A tutorial,” Proc.IEEE, vol.85,pp.14437-1462,Sept.1997 [29] 林宸生，”數位訊號-影像與語音處理”，台北，全華科技，1997 [30] “VisualDSP++ 3.0 Getting Started Guide for Blackfin Family DSPs”, 2002.4, Analog Devices Corp. [31] ”VisualDSP User’s Guide for Blackfin Family DSPs”,2002.4, Analog Devices Crops. [32] ”VisualDSP++ 3.0 Getting Started Guide for SHAR Family DSPs”, 2002.5,Analog Devices Corp. [33] ”VisualDSP User’s Guide for SHARC Family DSPs”, 2002.5,Analog Devices Corp. [34] 顏銘祥，”DSP為架構的不特定語詞即時語者辨識系統”，國立中山大學電機工程研究所碩士論文，2004 [35] 林炳豪，”DSP-Based非特定語言關鍵詞檢索與辨識系統”，國立中山大學電機工程研究所碩士論文，2005 [36] ”ADSP-21161N EZ-KIT LITE Evaluation System Manual”, 2002.5,Analog Devices Corp. [37] ”ADSP-21161N EZ-KIT LITE Evaluation System Manual”, 2002.5,Analog Devices Corp. [38] F.Soong et al.,”A vector quantization approach to speaker recognition,”in Proc. IEEE ICASSP,1985,pp.379-382 [39] Laszlo Rudasi and Stephen A. Zahorizn,”Text-Independent Talker Identification With Neural Networks” in Proc. IEEE [40] Gish,H.; Schmidt, M. “Text-independent speaker identification”,Signal Processing Magazine, IEEE, Volume: 11 Issue: 4 , Oct. 1994 [41] D. A. Davis and R. C. Rose, and M. J. T. Smith, “PC-based TMS320C30 implementation of Gaussian Mixture Model text-independent speaker recognition system,” in Proc. Int. Conf. Signal Processing Appl., Technol., Nov 1992, pp. 967-973 [42] B. Atal, “Automatic recognition of speakers from their voices,” Proc. IEEE, vol.64, pp.460-475, Apr 1976 [43] T. Matsui and S. Furui, “A text-independent speaker recognition method robust against utterance variations,” in Proc. IEEE ICASSP, 1991, pp. 388-380 [44] A. Higgins, L. Bahler, and J. Porter, ”Voice identification using nearest neighbor distance measure,” in Proc. IEEE ICASSP, Apr. 1993, pp. II-375-II378 [45] B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” J. Acoust. Soc. Amer., vol. 55, pp. 1304-1312, June 1974. [46] R.E. Bogner, “On talker verification via orthogonal parameters,” IEEE Trans. Acoust., Speech, signal Processing, vol. ASSP-29, pp. 1-12, Feb. 1981. [47] ”Assembler Manual for ADSP-21xx Family DSPs”,2002.5,Analog Devices Corp. [48] ”Linker & Utilities Manual for ADSP-21xx Family DSPs”,2002.5,Analog Devices Corp. [49] ”C Complier & Library Manual for ADSP-21xx Family DSPs”,2002.5,Analog Devices Corp. [50] 陳松琳，”以類神經網路為架構之語音辨識系統”，國立中山大學電機工程學系碩士論文，2001 [51] 謝芳易，”結合隱藏式馬可夫模型一階動態規劃演算法之連續語音辨識系統”，國立中山大學電機工程學系碩士論文，2003 [52] 古詩峰，”基於小波轉換特徵參數以及使用麥克風和電話語料之大量語者識別系統”，長庚大學，2002 [53] 陳明熒著，「PC 電腦語音辨認實作」，旗標出版，民83，台北市。

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內一年後公開，校外永不公開 campus withheld 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.119.123.32 論文開放下載的時間是校外不公開 Your IP address is 18.119.123.32 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS