國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,阿拉伯文語音辨識系統之設計研究 ,A Design of Arabic Speech Recognition System

論文名稱 Title	阿拉伯文語音辨識系統之設計研究 A Design of Arabic Speech Recognition System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	99 學年度第 2 學期 The spring semester of Academic Year 99	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	72
研究生 Author	李世群 Shih-Chung Lee
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	盧而輝 Erl-Huei Lu
口試委員 Advisory Committee	李聰, 柏小松, 汪啟茂 Tsung Lee; Sheau-Shong Bor; Chii-Maw Uang
口試日期 Date of Exam	2011-07-18	繳交日期 Date of Submission	2011-08-19
關鍵字 Keywords	音位結構學、隱藏式馬可夫模型、線性預估倒頻譜係數、阿拉伯文語音辨識系統、梅爾頻率倒頻譜係數 linear predicted cepstral coefficients, Mel-frequency cepstral coefficients, Arabic speech recognition system, phonotactics, hidden Markov model
統計 Statistics	本論文已被瀏覽 5671 次，被下載 549 次 The thesis/dissertation has been browsed 5671 times, has been downloaded 549 times.

中文摘要
阿拉伯世界是世上最令人驚豔的地區之一，她涵蓋超過2,800年的歷史、伊斯蘭教的教義與無與倫比的文化背景。阿拉伯世界是由24個國家及地區所組成，並以阿拉伯語為官方語言。根據美國的桑莫語言學院在2009年的統計，使用阿拉伯語的人口約有2.21億，為世界排名的第四大語言。自從1973年以來，阿拉伯世界多次的石油禁運政策，造成了全球的能源危機，不僅重創了世界的經濟，還嚴重影響了各國的國家安全。除非吾人能讓綠色能源的產能更加有效，否則人類對於石化能源的高度依賴，將無法完全被取代。本研究之目的在建立一套阿拉伯文語言辨識系統，期能幫助吾人學習阿拉伯語，欣賞阿拉伯文化之美，增廣吾人對伊斯蘭教之瞭解。本論文探討阿拉伯文語音辨識系統之設計與實作策略。系統利用阿拉伯語的發音規則，挑選出302個常用單音節之語音特徵來作主要的訓練與辨識之依據。語音訓練資料庫之錄製，採取陰平一聲與去聲四聲兩種音調連續錄製之策略，以彰顯阿拉伯語中非重音與重音之區別。其中一聲為音高維持高值之音調，四聲為音高由高至低值之音調。念完一個單音類別之一四聲後，接著念下一類之單音，將302類單音念完一輪，可得每單音二次之訓練語料。本論文使用5輪10次之訓練機制，並採用梅爾頻率倒頻譜係數與線性預估倒頻譜係數來作特徵參數之萃取，運用隱藏式馬可夫模型來作單音之辨識，最後再由音位結構學之比對，獲得最終之辨識結果。在CPU時脈為2.2 GHz的 AMD Athlon XP 2800+ 之個人電腦與Ubuntu 9.04之作業系統環境下，針對3,600筆阿拉伯文常用語詞資料庫與590筆阿拉伯文人名資料庫，本系統之正確辨識率可分別達到86.31%與93.90%。兩個資料庫之平均辨識時間皆少於1秒。而本系統所需的訓練時間約為二小時。
Abstract
Arab world is one of the most spectacular regions in the earth, especially for her over 2,800 year history, Islamic religion and magnificent culture. She consists of 24 countries and territories where people speak Arabic. The population of Arabic speaking people is approximately 221 million, and ranked the fourth according to the 2009 statistics by Summer Institute of Linguistics, USA. Since 1973, petroleum embargoes, imposed by the Arab world, have influenced global economy and hurt national security seriously. This kind of fossil energy is still irreplaceable until efficient green energy alternative becomes feasible. It is our objective to build a language system that can help us to learn Arabic, to appreciate the beauty of her culture, and to widen our vision of religions. This thesis investigates the design and implementation strategies for an Arabic speech recognition system. It utilizes the speech features of the 302 common Arabic mono-syllables as the major training and recognition methodology. A training database of 10 utterances per mono-syllable is established by applying Arabic pronunciation rules. These 10 utterances are collected through reading 5 rounds of the same mono-syllables twice with different tones. The first pronounced pattern has high pitch of tone 1, while the second one has falling pitch of tone 4. Mel-frequency cepstral coefficients, linear predicted cepstral coefficients, and hidden Markov model are used as the two syllable feature models and the recognition model respectively. Under the AMD 2.2 GHz Athlon XP 2800+ personal computer and Ubuntu 9.04 operating system environment, correct phrase recognition rates of 86.31% and 93.90% can be reached respectively using phonotactical rules for a 3,600 vocabulary Arabic phrase database and a 590 person name database for Arabic figures. The average computation time for each system is less than 1 second, and the training time for the systems is about two hours.

目次 Table of Contents
論文審定書 i 誌謝 ii 中文摘要 iii 英文摘要 iv 目錄 v 圖次 viii 表次 ix 第一章緒論 1 1-1 研究動機 1 1-2 研究目的 2 1-3 論文章節概要 3 第二章阿拉伯文語音學介紹 4 2-1 語系概覽 4 2-1-1 阿拉伯語文的起源 5 2-1-2 阿拉伯語使用地區分布 6 2-2 阿拉伯之字母 9 2-3 阿拉伯語之發音 12 2-3-1 阿拉伯語之輔音 12 2-3-2 阿拉伯語之母音 14 2-3-3 阿拉伯語之音節與重音判別 17 第三章語音辨識系統的流程與數學原理 18 3-1 前處理流程 19 3-1-1 音框化 19 3-1-2 音框能量 19 3-1-3 越零率 20 3-1-4 線性預測係數誤差能量 20 3-1-5 預強化 21 3-1-6 加視窗 22 3-2 特徵萃取流程 23 3-2-1線性預估倒頻譜係數 24 3-2-2梅爾頻率倒頻譜 29 3-3 隱藏式馬可夫模型 34 3-3-1最佳期望值問題 36 3-3-2最佳狀態序列問題 40 3-3-3模型參數估算問題 41 3-4 音位結構交叉比對 44 第四章語音辨識系統之訓練策略 45 4-1 單音模型分類之策略 45 4-2 模擬詞彙建構 46 4-3 單音模型之訓練方式 47 4-3-1 單音訓練次數對辨識率之關係 47 4-3-2 錄製時間點與辨識率之關係 48 4-3-3 單音訓練每次錄製不同個數之單音 51 4-3-4 每輪每次錄製兩個不同音調之單音 53 第五章阿拉伯文語音辨識系統實作與辨識效能 55 5-1 阿拉伯文常用語詞辨識系統 55 5-2 阿拉伯文人名辨識系統 57 5-3 硬體環境與軟體規範 59 第六章結論與未來展望 60 參考文獻 61

參考文獻 References
[1] 劉開古, 阿拉伯語發展史, 上海外語教育出版社, 民國84年 [2] 張日銘, 伊斯蘭世界, 明文書局股份有限公司, 民國87年 [3] 維基百科, http://zh.wikipedia.org/ [4] 利傳田, 初學阿拉伯文文法, 秀威資訊科技, 民國97年 [5] 李生俊, 簡易實用阿拉伯語三百句, 三思堂, 民國90年 [6] 利傳田, 空中阿拉伯語, 冠唐國際圖書, 民國86年 [7] 王小川, 語音訊號處理, 全華圖書出版社, 民國93年 [8] Thomas F. Quatieri, Discrete-Time Speech Signal Processing Principles and Practice, Prentice Hall, Taiwan, 2005 [9] X. Huang, A. Acero, and H.W. Hon, Spoken Language Processing, Prentice Hall, Taiwan, 2001 [10] 中華人民共和國外交部, http://big5.fmprc.gov.cn

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0819111-212604.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS