論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
利用中文字第一個注音符號之手機中文輸入法與語者分割與分群方法之各個擊破 Chinese Input Method Based on First Mandarin Phonetic Alphabet for Mobile Devices and an Approach in Speaker Diarization with Divide-and-Conquer |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
79 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2008-07-28 |
繳交日期 Date of Submission |
2008-09-09 |
關鍵字 Keywords |
none divide-and-conquer, Chinese input method, speaker diarization |
||
統計 Statistics |
本論文已被瀏覽 5689 次,被下載 1043 次 The thesis/dissertation has been browsed 5689 times, has been downloaded 1043 times. |
中文摘要 |
在這篇論文中有兩個研究的主題。 第一、我們實作一個高效率的中文輸入法。 第二、我們將各個擊破(Divide-and-Conquer) 的機制使用在語者分割與分群(Speaker Diarization)的問題裡。 我們所實作的中文輸入法是將輸入的第一個注音符號序列 轉換成中文字串。這意思是說使用者對於每個字只需要輸入 它的第一個注音符號,因此相對於其他目前的輸入法來說是很有效率的。 我們的實作是使用動態規畫(dynamic programming)的機制與 語言模型(language model)。為了降低時間複雜度,語言模型 所使用的字彙只使用單字詞、雙字詞與三字詞。 語者分割與分群的系統是由切割與分群這兩個單元所構成的。 各個擊破的方法實質上是實作在分群的單元中。而我 們在評估效能時所使用的分數是定義在 2003 Rich Transcription Evaluation Plan 中。值得注意的是,與基本的效能來作比較之後, 我們的方法在沒有降低語者分割與分群的正確率之下減少了執行的時間。 |
Abstract |
There are two research topics in this thesis. First, we implement a highly efficient Chinese input method. Second, we apply a divide-and-conquer scheme to the speaker diarization problem. The implemented Chinese input method transforms an input first-symbol sequence into a character string (a sentence). This means that a user only needs to input a first Mandarin phonetic symbol per character, which is very efficient compared to the current methods. The implementation is based on a dynamic programming scheme and language models. To reduce time complexity, the vocabulary for the language model consists of 1-, 2-, and 3-character words only. The speaker diarization system consists of segmentation and clustering modules. The divide-and-conquer scheme is essentially implemented in the clustering module. We evaluate the performance of our system using the speaker diarization score defined in the 2003 Rich Transcription Evaluation Plan. Compared to the baseline, our method significantly reduces the processing time without compromising diarization accuracy. |
目次 Table of Contents |
1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Chinese Input Method . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Speaker Diarization . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 An Introduction to Chinese Input Method . . . . . . . . . . . . . . . 2 1.3 An Introduction to Speaker Diarization . . . . . . . . . . . . . . . . . 4 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Chinese Input Method 7 2.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.1 Some Common Input Methods . . . . . . . . . . . . . . . . . . 8 2.1.2 Researches about Input Methods . . . . . . . . . . . . . . . . 10 2.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Language Model . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . 17 2.2.3 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 Speaker Diarization 30 3.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.1 Speaker Diarization System . . . . . . . . . . . . . . . . . . . 31 3.1.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.1.3 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.2 System Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4 Conclusion and Future Work 57 4.1 Input Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.2 Speaker Diarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 |
參考文獻 References |
[1] Amelia Fong Lochovsky and Hon Kit Cheung, "N-gram estimates in probabilistic models for pinyin to hanzi transcription", IEEE International Conference on Intelligent Processing Systems, Beijing, 1997, pp. 1798-1803. [2] Bing-Quan Liu and Xiao-LongWang, "An approach to machine learning of Chinese pinyin-to-character conversion for small-memory application", Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 2002, pp. 1287-1291. [3] Claude Barras, Xuan Zhu, Sylvain Meignier, and Jean-Luc Gauvain, "Multistage Speaker Diarization of Broadcast News", IEEE Transactions on audio, speech and language processing, Volume 14, Issue 5, September 2006. [4] Chun-Han Tseng and Chia-Ping Chen, "Chinese Input Method Based On Reduced Mandarin Phonetic Alphabet", Proceedings of Interspeech 2006, pp.733- 736. [5] Chooi-Ling Goh, Masayuki Asahara and Yuji Matsumoto, "Chinese Word Segmentation by Classi cation of Characters", the Association for Computational Linguistics and Chinese Language Processing, Volume 10, Issue 3, September 2005, pp.381-396. [6] Chung Hsien Wu and Chia Hsin Hsieh, "Multiple Change-Point Audio Segmentation and Classi cation Using an MDL-based Gaussian Model", IEEE Transactions on Audio, Speech and Language Processing, Volume 14, Issue 2, March 2006, Pages 647-657. [7] Claude Barras, Xuan Zhu, Sylvain Meignier, and Jean-Luc Gauvain, "Improving Speaker Diarization", In Proc. DARPA RT04, Palisades NY, November 2004. [8] Daniel Jurafsky and James H. Martin,"Speeh and Language Processing", Prentice-Hall 2000, ISBN 0-13-122798-X. [9] Feng Zhang, Zheng Chen, Mingjing Li, Guozhong Dai, "Chinese Pinyin Input Method for Mobile Phone", International Symposium on Chinese Spoken Language Processing, Singapore, 13-16 December 2006. [10] Jin Hu Huang and David Powers, "Adaptive Compression-based Approach for Chinese Pinyin Input", ACL SIGHAN Workshop on Chinese Language Processing, Pages 24-27, 2004. [11] J. Ajmera, H. Bourlard, I. Lapidot, and I. McCowan, "Unknown-multiple speaker clustering using hmm", in J. H. L. Hansen and B. Pellom, editors, Proc. ICSLP, Denver, September 2002. [12] J. Ajmera, H. Bourlard, and I. Lapidot, "Improved unknown-multiple speaker clustering using hmm", Technical Report RR-02-23, IDIAP, 2002, http://www.idiap.ch/publications . [13] J. Ajmera, I. McCowan, and H. Bourlard, "Speech/Music Discrimination using Entropy and Dynamism Features in a HMM Classi cation Framework", Speech Communication, Volume 40, 2003, Pages 351-363. [14] J. L. Gauvain, L. Lamel and G. Adda, "Partitioning and Transcription of Broadcast News Data", In International Conference on Speech and Language Processing, Volume 4, Pages 1335-1338, Sydney, Australia, Dec 1998. [15] Lynn Wilcox, Francine Chen, Don Kimber, and Vijay Balasubramanian, "Segmentation of speech using speaker identi cation", IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1994. [16] Matthew A. Siegler, Uday Jain, Bhiksha Raj and Richard M. Stern, "Automatic Segmentation, Classi cation and clustering of broadcast News Audio", Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, New York, 1996. [17] Mauro Cettolo, Michele Vescovi, and Romeo Rizzi, "Evaluation of BIC-based algorithms for audio segmentation", Computer Speech and Language, Volume 19, Issue 2, April 2005, Pages 147-170. [18] NIST 2003 Workshop, http://www.nist.gov/speech/tests/rt/2003- spring/index.html [19] P. Delacourt and C.J. Wellekens, "DISTBIC: A speaker-based segmentation for audio data indexing", Speech Communication Volume 32, Issue 1-2, September 2000, Pages 111-126, Accessing Information in Spoken Audio. [20] Sue E. Tranter and Douglas A. Reynolds, "An Overview of Automatic Speaker Diarization Systems", IEEE Transactions on audio, speech and language processing, Volume 14, Issue 5, September 2006. [21] Scott Shaobing Chen and P.S. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion", in Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, Virginia, USA, February 1998. [22] Stanley F. Chen and Joshua Goodman, "An Empirical Study of Smoothing Techniques for Language Modeling", Annual Meeting of the ACL, Santa Cruz, California, pp310-318, 1996. [23] S. E. Tranter and D. A. Reynolds, "Speaker Diarisation for Broadcast News", Proc. Odyssey 2004 Speaker and Language Recognition Workshop, Pages 337- 344, June 2004 (Toledo, Spain). [24] Xiao-long Wang, Qingcai Chen, and Daniel S. Yeung, "Mining pinyin-tocharacter conversion rules from large-scale corpus: A rough set approach", IEEE Transactions on System, Man, and Cybernetics, 2004, pp. 834-844. [25] XuanWang, Lu Li, Lin Yao, andWaqas Anwar, "A maximum entropy approach to Chinese pinyin-to-character conversion", IEEE International Conference on Systems, Man, and Cybernetics, Taipei, 2006, pp. 2956-2959. [26] Ying Xiong, Jie Zhu, "Toward a Uni ed Approach to lexicon Optimization and Perplexity Minimization for Chinese Language Modeling", Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005. [27] Zheng Chen and Kai-Fu Lee,"A New Statistical Approach to Chinese Pinyin Input", The 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, 3-6 October 2000. [28] 中央研究院漢語料庫的內容與說明, http://www.sinica.edu.tw/SinicaCorpus/98-04.pdf . [29] 微軟新注音, http://www.microsoft.com/taiwan/windowsxp/ime/windowsxp.htm . [30] 王駿發, 林博川, 王家慶, 宋豪靜, "以支援向量機為基礎之新穎語者切換偵測演算法" , in Proceedings ROCLING, Tainan, 2005. [31] 許聞廉, 陳克健, "自然智慧型輸入系統的語意分析脈絡會意法", Proceedings of the 6th International Symposium on Cognitive Aspects of the Chinese Language, 1993, 527-540. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:校內立即公開,校外一年後公開 off campus withheld 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |