Responsive image
博碩士論文 etd-0008116-143632 詳細資訊
Title page for etd-0008116-143632
論文名稱
Title
朗誦式詩詞歌賦搜尋系統之設計研究
A Design of Chinese Poem Retrieval System by Acoustic Input
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
72
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-07-29
繳交日期
Date of Submission
2016-01-19
關鍵字
Keywords
音位結構學、隱藏式馬可夫模型、梅爾頻率倒頻譜係數、語音辨識系統、線性預估倒頻譜係數
Phonotactics, Hidden Markov model, Mel-frequency cepstral coefficients, Linear predicted cepstral coefficients, Speech recognition system
統計
Statistics
本論文已被瀏覽 5721 次,被下載 51
The thesis/dissertation has been browsed 5721 times, has been downloaded 51 times.
中文摘要
詩歌源遠流長,在沒有文字作為載體的遠古時期,詩歌透過神話傳說和民間歌謠等語言與音樂的方式來流傳。在歷史的長河中,騷人墨客吸取前人的智慧,融合當時的文化背景與個人情懷,常能完成曠古鉅作。至今,吾人可看到以發揚中華文化為宗旨的比賽節目,考驗參賽者的文學造詣,同時亦讓觀眾能在比賽中,獲得美感的啟發與創意的靈感。因此,本論文希望建立一套結合語音辨識技術的詩詞歌賦搜尋系統,透過片段詩詞的朗誦,找出作者的背景與詩歌的全部內容,期能使世人,能更加深入地賞析與有效地傳揚中華文化。

本系統選擇2,699筆二字詞,作為訓練資料庫,其中包含了所有中文音節的抑揚頓挫。吾人錄製訓練語料,對每一類中文音節,運用線性預估倒頻譜係數及梅爾頻率倒頻譜係數,萃取語音聲紋之雙特徵參數,並透過隱藏式馬可夫模型,計算其聲學統計特性。在辨識系統中,吾人搭配音位結構學的策略,去除在辨識候選結果中不合理的答案,得到最終正確之語詞。

此外,論文中針對過去實驗室2,699筆訓練資料庫,做了進一步的統計分析,提出降低訓練量的方法,在辨識率尚可接受的情況下,降低使用者一半的訓練量,省下時間,讓使用上更有效率。吾人亦對不同人的輸入做了實驗,達到語者獨立的結果,並結合多工的辦法,善用多核心電腦資源,讓辨識程序同時平行進行,降低辨識時間。

本系統蒐集之內容,含括十三經、漢賦、唐詩、宋詞、元曲、近代新詩,及國中小學文章。使用者從中選擇任一句朗誦,便可在網頁獲得作者與全文資訊。在Intel Core 2 Quad 2.5 GHz CPU及Ubuntu 10.04之作業系統環境下,系統透過測試12,390筆資料作驗證,可獲得96.3 % 的正確辨識率,而所需時間約為1.3秒。
Abstract
Poetry has played an everlasting role in Chinese literature. It was widely spread among the people through oral legends and verbal songs in the ancient times when the Chinese writing characters were not invented. Many masterpieces were created to enlighten the wisdom of our ancestors, to explore the culture of those ages, and to express the sentiment of the poets. Recently, TV poem and lyric contests, dedicating to the promotion of Chinese culture, are so popular that all competitors are struggled to their extremes and the viewers are also inspired by their ultimate aesthetic standards. In this thesis, a Chinese poem retrieval system by acoustic input of one phrase is designed and implemented by incorporating speech technologies to assist users to find the author’s background and the whole content of a poem. It is hoped that the beauty of Chinese culture can be appreciated more deeply and communicated more effectively.

A database of 2,699 two-syllable Mandarin phrases, including all possible pronunciations, is selected to train the system. Each recorded syllable is first used to extract its bi-parametric features based on the linear predicted cepstral coefficients and Mel frequency cepstral coefficients. The hidden Markov model is then applied to estimate the associated probabilistic properties. Finally, in the recognition phase, the phonotactic strategies are adopted to rule out the unreasonable answers, obtain the correct phrase and show the related information of the poem.

Furthermore, a statistical analysis is applied to the training database with 2,699 phrases. It is recommended that only half of them are needed for training if minor error rate increase is accepted. Multi-speaker and multi-core computation schemes have also been implemented to increase the versatility and efficiency of the system.

The designed system collects articles from Han Rhapsody, Tang Poem, Song Lyric and Yuan Drama, Thirteen Classics, Modern Verse and school textbooks. By reciting any one phrase in a poem, the user can acquire the author’s background and the whole content of a poem. Under the Intel Core 2 Quad 2.5 GHz CPU and Ubuntu 10.04 operating system environment, a 96.3% correct recognition rate can be achieved by using a 12,390 phrase test database. The system recognition time per poem is about 1.3 seconds.
目次 Table of Contents
論文審定書 i
致謝 ii
摘要 iii
Abstract iv
圖次 vii
表次 ix
第一章 緒論 1
1.1 研究動機與目的 1
1.2 研究方法 2
1.3 章節概要 2
第二章 詩詞歌賦背景知識 3
2.1 遠古 3
2.2 先秦 3
2.3 秦朝 5
2.4 漢朝 6
2.5 魏晉南北朝 8
2.6 隋朝 8
2.7 唐朝 8
2.8 宋朝 11
2.9 元朝 12
2.10 結語 13
第三章 語音學介紹 14
3.1 發音器官 14
3.2 語音產生 17
第四章 語音處理相關技術 19
4.1 語音前置處理 19
4.1.1 DC OffSet 19
4.1.2 預強調 21
4.1.3 音框化 21
4.1.4 視窗函數 22
4.2 語音切割 23
4.2.1 能量與越零率 23
4.2.2 線性預估誤差能量(Lpccee) 25
4.2.3 熵(Entropy) 26
4.2.4 最大相似比(MLR) 27
4.2.5 音高(Pitch) 29
4.3 特徵萃取 31
4.3.1 傅立葉轉換(Fourier transform) 31
4.3.2 倒頻譜(cepstrum) 32
4.3.3 線性預估倒頻譜參數(Linear Predictive Cepstral Coefficeint) 33
4.3.4 梅爾倒頻譜參數 37
第五章 隱藏式馬可夫模型 42
5.1 HMM模型 42
5.2 HMM模型訓練 44
5.3 HMM模型辨識 46
第六章 系統架構與實作成果 47
6.1 硬體設備 47
6.2 訓練模型策略 48
6.3 辨識系統架構 49
6.4 音位結構學 51
6.5 辨識結果 52
6.6 降低訓練資料量 57
第七章 未來展望 59
參考文獻 60
參考文獻 References
[1] 羅林,中國文學史講義,鼎茂圖書出版,民國98年
[2] 張建業,中國詩歌史,文津出版,民國84年
[3] 王孝,中國文學史,台灣商務印書館出版,民國78年
[4] 門巋與張燕瑾,中國俗文學史,文津出版,民國84年
[5] 臺靜農,中國俗文學史(上),台灣大學出版,民國98年
[6] 臺靜農,中國俗文學史(下),台灣大學出版,民國98年
[7] 蔣勳,藝術概論,東華書局出版,民國84年
[8] Thomas F. Quatieri, “Discrete-Time Speech Signal Processing Principles and Practice,” Prentice Hall, Taiwan, 2005
[9] 毛壽彭,流體力學,五南圖書出版,民國73年
[10] 王小川,語音訊號處理,全華出版,民國93年
[11] 維基百科,https://en.wikipedia.org/wiki/DC_bias
[12] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon, “Spoken Language Processing: A Guide to Theory,” Algorithm and System Development Pearson Education, Taiwan, 2005.
[13] Bernard Gold and Nelson Morgan, “Speech and Audio Signal Processing : Processing and Perception of Speech and Music, ” New York, 2000
[14] L. Lamel, L. Labiner, A. Rosenberg and J. Wilpon, “An Improved Endpoint Detector for Isolated Word Recognition,” IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 29, pp. 406-412, 1981
[15] M.H. Savoji, “A Robust Algorithm for Accurate Endpointing of Speech,” Speech Communication, Vol. 8, pp. 45-60, 1989
[16] 潘睿慈, “特定語者中文語詞辨識系統之設計研究” ,國立中山大學電機工程研究所碩士論文,2005
[17] Jialin Shen, Jeihweih Hung and Linshan Lee, “Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments,” International Conference on Spoken Language Processing, Sydney, 1998
[18] V.R. Algazi, K.L. Brown, M.J. Ready, D.H. Irvine, C.L. Cadwell and Sang Chung, “Transform Representation of the Spectra of Acoustic Speech Segments with Applications-I: General Approach and Application to Speech Recognition,” IEEE Trans. Speech and Audio processing, Vol. 1, No. 2, 1993
[19] Alan V. Oppenheim and Ronald W. Schafer, “Discrete-Time Signal Processing,” Prentice Hall, 1993
[20] 蔡旭曜, “哼唱式卡拉OK歌曲搜尋系統之設計研究” ,國立中山大學電機工程研究所碩士論文,2003
[21] 維基百科,https://zh.wikipedia.org/zh-tw/傅里叶变换
[22] J.R. Deller, J.G. Proakis and J.H.L. Hansen, “Discrete Time Processing of Speech Signals, ” New York: Macmillan Pub. Co., 1993
[23] John Coleman, “Introducing speech and Language Processing,” University of Cambridge, 2004
[24] Wai C. Chu, “Speech Coding Algorithms :Foundation and evolution of standardized coders,” John Wiley & Sons, Taiwan, 2003
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code