國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,多國語言辨識系統之特徵設計研究 ,A Feature Design of Multi-Language Identification System

論文名稱 Title	多國語言辨識系統之特徵設計研究 A Feature Design of Multi-Language Identification System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	91 學年度第 2 學期 The spring semester of Academic Year 91	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	75
研究生 Author	林俊青 Jun-Ching Lin
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	李聰 Tsung Lee
口試委員 Advisory Committee	汪啟茂 none
口試日期 Date of Exam	2002-07-24	繳交日期 Date of Submission	2003-07-17
關鍵字 Keywords	語言辨識、高斯機率密度函數、線性預估編碼、倒頻譜、差量倒頻譜 Gaussian Mixture Model, Linear Predictive Coding, Cepstrum, Language Identification, Delta Cepstrum
統計 Statistics	本論文已被瀏覽 5657 次，被下載 0 次 The thesis/dissertation has been browsed 5657 times, has been downloaded 0 times.

中文摘要
本論文從探討多國語言的特徵參數開始，實驗了多種特徵比對的方式，建立一套可行的多國語言辨識系統。本論文使用了三種語音特徵的萃取方式，分別是倒頻譜參數，差量倒頻譜參數，和線性預估編碼參數，以及結合二種語音辨識的方法，分別是高斯混和模型以及N-gram模型，來分析辨別：中、日、韓、泰、越南、英、法、德、西、波斯等10種語言。多國語言辨識系統的可行性在本論文中獲得了初步的驗證。
Abstract
A multi-language identification system of 10 languages: Mandarin, Japanese, Korean, Tamil, Vietnamese, English, French, German, Spanish and Farsi, is built in this thesis. The system utilizes cepstrum coefficients, delta cepstrum coefficients and linear predictive coding coefficients to extract the language features, and incorporates Gaussian mixture model and N-gram model to make the language classification. The feasibility of the system is demonstrated in this thesis.

目次 Table of Contents
目錄第一章序論 1.1 摘要…………………………………………………………………1 1.2 論文系統介紹………………………………………………………3 第二章理論架構（Theory Framework） 2.1 對多國語言的辨別資訊 2.1.1 概要……………………………………………………………5 2.1.2 語音體系（Phonology）概要資訊……………………………6 2.1.3 韻律層面（Prosody）概要資訊………………………………7 2.2 雜訊之處理技術 2.2.1 頻譜消去法（Spectral Subtraction）………………………9 2.2.2 倒頻譜平均消去法（cepstrum mean subtraction）…………12 2.3 時間濾波器（Time-Filter）…………………………………………15 2.3.1 差量濾波器（delta-filter）…………………………………16 2.3.2 相對頻譜濾波器（RASTA Filter）……………………………18 2.3.3 離散餘弦轉換（DCT）……………………………………………20 2.3.5 線性預估編碼（Linear Prediction Coefficient, LPC）……21 2.4 機率架構 2.4.1 大量事後機率（Post Probability）方法設計………………24 2.4.2 音框基礎通道（Frame-Based Approach）……………………26 2.4.3 弧形基礎通道（Segment-Based Approach）…………………27 第三章系統設計（System Design） 3.1 系統之廣泛結論 3.1.1 資料庫（OGI Database）大綱…………………………………31 3.1.2 系統估測………………………………………………………33 3.2 基礎系統架構………………………………………………………34 3.3 語音特徵參數之萃取 3.3.1 概要……………………………………………………………35 3.3.2 語音訊號處理…………………………………………………35 3.3.3 倒頻譜濾波器…………………………………………………37 3.3.4 倒頻譜消去法…………………………………………………39 3.3.5 差量倒頻譜濾波器……………………………………………40 3.3.6 線性預估編碼（Linear Prediction Coefficient, LPC）……40 3.3.7 向量量化（Vector Quantization, VQ）………………………41 3.3.8 高斯混和模型（Gaussian Mixture Model, GMM）之參數估測法………………………………………………………………46 3.3.8.1 單一高斯機率密度函數的參數估測法…………………47 3.3.8.2 高斯混和密度函數的參數估測法………………………49 3.3.9 N-Gram演算法…………………………………………………54 3.3.10 系統整合………………………………………………………57 第四章實驗結果的分析 4.1 概論…………………………………………………………………59 4.2 各個單獨模組的效能………………………………………………59 4.3 語料範圍的效能分析………………………………………………61 4.4 語者數量的效能分析………………………………………………61 4.5 多種語言一套的效能分析…………………………………………62 4.6 實驗結果整合………………………………………………………65 第五章結論與展望 5.1 總結…………………………………………………………………66 5.2 展望…………………………………………………………………67 參考文獻……………………………………………………………………………68 圖表目錄圖1.1: 多國語言辨識設計之規劃…………………………………………………4 圖2.1: 頻率響應……………………………………………………………………17 圖2.2: 差量之頻率響應……………………………………………………………17 圖2.3: 相對頻譜濾波器之實驗……………………………………………………19 圖2.4: 倒頻譜時間矩陣的概念……………………………………………………20 圖3.1: 基礎系統架構………………………………………………………………34 圖3.2: 常用之視窗函數……………………………………………………………36 圖3.3: 倒頻譜濾波器之概念………………………………………………………38 圖3.4: 二維空間上的向量量化函義………………………………………………43 圖3.5: k-means演算法中的4個叢聚……………………………………………44 圖3.6: n-gram加成向量量化的結果………………………………………………55 圖3.7: 比對語音序列的結果………………………………………………………57 圖4.1: 語料範圍內之語言辨識正確率的實驗結果………………………………61 圖4.2: 語者數量大小之語言辨識正確率的實驗結果……………………………62 圖4.3: 根據辨識效能相關性的語言分群…………………………………………65 表3.1: 多國語言的語彙特徵………………………………………………………46 表4.1: 單獨模組使用在系統上之概要……………………………………………60 表4.2: 使用多樣模組設計的系統效能……………………………………………60 表4.3: 結果模式和使用4種印歐語系的系統效能………………………………63 表4.4: 結果模式和使用4種非印歐語系的系統效能……………………………63 表4.5: 結果模式和使用4種不同語言的系統效能………………………………64 表4.6: Indo-European VS. non-Indo-European的結果模型…………………64

參考文獻 References
【1】Goodman, F.J.; Martin, A.F.; Wohlford, R.E., Improved automatic language identification in noisy speech, Acoustics, Speech, and Signal Processing, 1989.ICASSP-89., 1989 International Conference on , 1989 Page(s): 528 -531vol.1 【2】Navratil, J.; Zuhlke, W., An efficient phonotactic-acoustic system for language identification, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE InternationalConference on ,Volume: 2 , 1998 Page(s): 781 -784 vol.2 【3】Sugiyama, M., Automatic language recognition using acoustic features, Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference on , 1991 Page(s): 813 -816 vol.2 【4】Zissman, M.A., Automatic language identification using Gaussian mixture and hidden Markov models, Acoustics, Speech, and Signal Processing, 1993. ICASSP-93., 1993 IEEE International Conference on ,Volume: 2 , 1993 Page(s): 399 -402 vol.2 【5】Schultz, T.; Rogina, I.; Waibel, A., LVCSR-based language identification, Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on ,Volume: 2 , 1996 Page(s): 781 -784 vol. 2 【6】Pellegrino, F.; Andre-Obrecht, R. Acoustics, An unsupervised approach to language identification, Speech,and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on , Volume: 2 ,1999 Page(s): 833 -836 vol.2 【7】Zissman M.A. Singer, E.; Automatic language identification of telephone speech messages using phoneme recognition and N-grammodeling, Acoustics, Speech, and SignalProcessing,1994.ICASSP-94., 1994 IEEE International Conference on , Volume: i , 1994Page(s):I/305-I/308vol.1 【8】Zissman M.A.; Comparison of four approaches to automatic language identification of telephone speech, Speech and Audio Processing, IEEE Transactions on , Volume: 4 Issue: 1 , Jan. 1996 Page(s): 31 【9】Guorong Xuan; Wei Zhang; Peiqi Chai ; EM algorithms of Gaussian mixture model and hidden Markov model, Image Processing, 2001. Proceedings. 2001 International Conference on , Volume: 1 , 2001 Page(s): 145 -148 vol.1 【10】Arslan, L.M.; Hansen, J.H.L.; Frequency characteristics of foreign accented speech, Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on , Volume: 2 , 21-24 Apr 1997 Page(s): 1123 -1126 vol.2 【11】Kohler, J.; Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds, Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on , Volume: 4 , 3-6 Oct 1996 Page(s): 2195 -2198 vol.4 【12】Cherif, A.; Pitch and formants extraction algorithm for speech processing , Electronics, Circuits and Systems, 2000. ICECS 2000. The 7th IEEE Page(s): 595 -598 vol.1 【13】 Alan V.Oppenheim, Ronald W.Schafer, Discrete-Time Signal Processing, Prentice Hall. 【14】 Y. K. Muthusamy and R. A. Cole, Automatic segmentation and identification of ten languages using telephone speech, in Proc. ICSLP ’92, vol. 2, Oct. 1992, pp.1007-1010 【15】 R. M. Gray, A.Buzo, A.H. Gray, Jr., and Y. Matsuyama, Distortion measures for speech processing, IEEE Trans. Acoustics, Speech, Signal Proc., ASSP-28 (4): 367-376, August 1980. 【16】 John, R.Deller, John G.Proaskis, John H.L.Hansen, Discrete-time processing of speech signals.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外均不公開 not available 開放時間 Available：校內 Campus：永不公開 not available 校外 Off-campus：永不公開 not available 您的 IP(校外) 位址是 18.117.196.184 論文開放下載的時間是校外不公開 Your IP address is 18.117.196.184 This thesis will be available to you on Indicate off-campus access is not available.
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS