國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,朗誦式詩詞歌賦搜尋系統之設計研究,A Design of Chinese Poem Retrieval System by Acoustic Input

論文名稱 Title	朗誦式詩詞歌賦搜尋系統之設計研究 A Design of Chinese Poem Retrieval System by Acoustic Input
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	104 學年度第 1 學期 The fall semester of Academic Year 104	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	72
研究生 Author	趙俞竣 Yu-Jyun Jhao
指導教授 Advisor	陳志堅 Chih-Chien Chen
召集委員 Convenor	柏小松 Sheau-Shong Bor
口試委員 Advisory Committee	李聰, 汪啟茂 Tsung Lee; Chii-Maw Wang
口試日期 Date of Exam	2015-07-29	繳交日期 Date of Submission	2016-01-19
關鍵字 Keywords	音位結構學、隱藏式馬可夫模型、梅爾頻率倒頻譜係數、語音辨識系統、線性預估倒頻譜係數 Phonotactics, Hidden Markov model, Mel-frequency cepstral coefficients, Linear predicted cepstral coefficients, Speech recognition system
統計 Statistics	本論文已被瀏覽 5721 次，被下載 51 次 The thesis/dissertation has been browsed 5721 times, has been downloaded 51 times.

中文摘要
詩歌源遠流長，在沒有文字作為載體的遠古時期，詩歌透過神話傳說和民間歌謠等語言與音樂的方式來流傳。在歷史的長河中，騷人墨客吸取前人的智慧，融合當時的文化背景與個人情懷，常能完成曠古鉅作。至今，吾人可看到以發揚中華文化為宗旨的比賽節目，考驗參賽者的文學造詣，同時亦讓觀眾能在比賽中，獲得美感的啟發與創意的靈感。因此，本論文希望建立一套結合語音辨識技術的詩詞歌賦搜尋系統，透過片段詩詞的朗誦，找出作者的背景與詩歌的全部內容，期能使世人，能更加深入地賞析與有效地傳揚中華文化。本系統選擇2,699筆二字詞，作為訓練資料庫，其中包含了所有中文音節的抑揚頓挫。吾人錄製訓練語料，對每一類中文音節，運用線性預估倒頻譜係數及梅爾頻率倒頻譜係數，萃取語音聲紋之雙特徵參數，並透過隱藏式馬可夫模型，計算其聲學統計特性。在辨識系統中，吾人搭配音位結構學的策略，去除在辨識候選結果中不合理的答案，得到最終正確之語詞。此外，論文中針對過去實驗室2,699筆訓練資料庫，做了進一步的統計分析，提出降低訓練量的方法，在辨識率尚可接受的情況下，降低使用者一半的訓練量，省下時間，讓使用上更有效率。吾人亦對不同人的輸入做了實驗，達到語者獨立的結果，並結合多工的辦法，善用多核心電腦資源，讓辨識程序同時平行進行，降低辨識時間。本系統蒐集之內容，含括十三經、漢賦、唐詩、宋詞、元曲、近代新詩，及國中小學文章。使用者從中選擇任一句朗誦，便可在網頁獲得作者與全文資訊。在Intel Core 2 Quad 2.5 GHz CPU及Ubuntu 10.04之作業系統環境下，系統透過測試12,390筆資料作驗證，可獲得96.3 % 的正確辨識率，而所需時間約為1.3秒。
Abstract
Poetry has played an everlasting role in Chinese literature. It was widely spread among the people through oral legends and verbal songs in the ancient times when the Chinese writing characters were not invented. Many masterpieces were created to enlighten the wisdom of our ancestors, to explore the culture of those ages, and to express the sentiment of the poets. Recently, TV poem and lyric contests, dedicating to the promotion of Chinese culture, are so popular that all competitors are struggled to their extremes and the viewers are also inspired by their ultimate aesthetic standards. In this thesis, a Chinese poem retrieval system by acoustic input of one phrase is designed and implemented by incorporating speech technologies to assist users to find the author’s background and the whole content of a poem. It is hoped that the beauty of Chinese culture can be appreciated more deeply and communicated more effectively. A database of 2,699 two-syllable Mandarin phrases, including all possible pronunciations, is selected to train the system. Each recorded syllable is first used to extract its bi-parametric features based on the linear predicted cepstral coefficients and Mel frequency cepstral coefficients. The hidden Markov model is then applied to estimate the associated probabilistic properties. Finally, in the recognition phase, the phonotactic strategies are adopted to rule out the unreasonable answers, obtain the correct phrase and show the related information of the poem. Furthermore, a statistical analysis is applied to the training database with 2,699 phrases. It is recommended that only half of them are needed for training if minor error rate increase is accepted. Multi-speaker and multi-core computation schemes have also been implemented to increase the versatility and efficiency of the system. The designed system collects articles from Han Rhapsody, Tang Poem, Song Lyric and Yuan Drama, Thirteen Classics, Modern Verse and school textbooks. By reciting any one phrase in a poem, the user can acquire the author’s background and the whole content of a poem. Under the Intel Core 2 Quad 2.5 GHz CPU and Ubuntu 10.04 operating system environment, a 96.3% correct recognition rate can be achieved by using a 12,390 phrase test database. The system recognition time per poem is about 1.3 seconds.

目次 Table of Contents
論文審定書 i 致謝 ii 摘要 iii Abstract iv 圖次 vii 表次 ix 第一章緒論 1 1.1 研究動機與目的 1 1.2 研究方法 2 1.3 章節概要 2 第二章詩詞歌賦背景知識 3 2.1 遠古 3 2.2 先秦 3 2.3 秦朝 5 2.4 漢朝 6 2.5 魏晉南北朝 8 2.6 隋朝 8 2.7 唐朝 8 2.8 宋朝 11 2.9 元朝 12 2.10 結語 13 第三章語音學介紹 14 3.1 發音器官 14 3.2 語音產生 17 第四章語音處理相關技術 19 4.1 語音前置處理 19 4.1.1 DC OffSet 19 4.1.2 預強調 21 4.1.3 音框化 21 4.1.4 視窗函數 22 4.2 語音切割 23 4.2.1 能量與越零率 23 4.2.2 線性預估誤差能量(Lpccee) 25 4.2.3 熵(Entropy) 26 4.2.4 最大相似比(MLR) 27 4.2.5 音高(Pitch) 29 4.3 特徵萃取 31 4.3.1 傅立葉轉換(Fourier transform) 31 4.3.2 倒頻譜(cepstrum) 32 4.3.3 線性預估倒頻譜參數(Linear Predictive Cepstral Coefficeint) 33 4.3.4 梅爾倒頻譜參數 37 第五章隱藏式馬可夫模型 42 5.1 HMM模型 42 5.2 HMM模型訓練 44 5.3 HMM模型辨識 46 第六章系統架構與實作成果 47 6.1 硬體設備 47 6.2 訓練模型策略 48 6.3 辨識系統架構 49 6.4 音位結構學 51 6.5 辨識結果 52 6.6 降低訓練資料量 57 第七章未來展望 59 參考文獻 60

參考文獻 References
[1] 羅林，中國文學史講義，鼎茂圖書出版，民國98年 [2] 張建業，中國詩歌史，文津出版，民國84年 [3] 王孝，中國文學史，台灣商務印書館出版，民國78年 [4] 門巋與張燕瑾，中國俗文學史，文津出版，民國84年 [5] 臺靜農，中國俗文學史(上)，台灣大學出版，民國98年 [6] 臺靜農，中國俗文學史(下)，台灣大學出版，民國98年 [7] 蔣勳，藝術概論，東華書局出版，民國84年 [8] Thomas F. Quatieri, “Discrete-Time Speech Signal Processing Principles and Practice,” Prentice Hall, Taiwan, 2005 [9] 毛壽彭，流體力學，五南圖書出版，民國73年 [10] 王小川，語音訊號處理，全華出版，民國93年 [11] 維基百科，https://en.wikipedia.org/wiki/DC_bias [12] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, “Spoken Language Processing: A Guide to Theory,” Algorithm and System Development Pearson Education, Taiwan, 2005. [13] Bernard Gold and Nelson Morgan, “Speech and Audio Signal Processing : Processing and Perception of Speech and Music, ” New York, 2000 [14] L. Lamel, L. Labiner, A. Rosenberg and J. Wilpon, “An Improved Endpoint Detector for Isolated Word Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, pp. 406-412, 1981 [15] M.H. Savoji, “A Robust Algorithm for Accurate Endpointing of Speech,” Speech Communication, Vol. 8, pp. 45-60, 1989 [16] 潘睿慈， “特定語者中文語詞辨識系統之設計研究” ，國立中山大學電機工程研究所碩士論文，2005 [17] Jialin Shen, Jeihweih Hung and Linshan Lee, “Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments,” International Conference on Spoken Language Processing, Sydney, 1998 [18] V.R. Algazi, K.L. Brown, M.J. Ready, D.H. Irvine, C.L. Cadwell and Sang Chung, “Transform Representation of the Spectra of Acoustic Speech Segments with Applications-I: General Approach and Application to Speech Recognition,” IEEE Trans. Speech and Audio processing, Vol. 1, No. 2, 1993 [19] Alan V. Oppenheim and Ronald W. Schafer, “Discrete-Time Signal Processing,” Prentice Hall, 1993 [20] 蔡旭曜， “哼唱式卡拉OK歌曲搜尋系統之設計研究” ，國立中山大學電機工程研究所碩士論文，2003 [21] 維基百科，https://zh.wikipedia.org/zh-tw/傅里叶变换 [22] J.R. Deller, J.G. Proakis and J.H.L. Hansen, “Discrete Time Processing of Speech Signals, ” New York: Macmillan Pub. Co., 1993 [23] John Coleman, “Introducing speech and Language Processing,” University of Cambridge, 2004 [24] Wai C. Chu, “Speech Coding Algorithms :Foundation and evolution of standardized coders,” John Wiley & Sons, Taiwan, 2003

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0008116-143632.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS