國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,聲控Google地圖,Voice Command for Google Map

論文名稱 Title	聲控Google地圖 Voice Command for Google Map
系所名稱 Department	資訊工程學系 Department of Computer Science and Engineering
畢業學年期 Year, semester	100 學年度第 2 學期 The spring semester of Academic Year 100	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	53
研究生 Author	吳柏鋒 Po-feng Wu
指導教授 Advisor	陳嘉平 Chia-Ping Chen
召集委員 Convenor	吳宗憲 Chung-Hsien Wu
口試委員 Advisory Committee	王新民, 禹良治 Hsin-Min Wang; Liang-Chih Yu
口試日期 Date of Exam	2012-04-26	繳交日期 Date of Submission	2012-05-18
關鍵字 Keywords	聲控、解碼器、Google地圖 voice command, decoder, Google Map
統計 Statistics	本論文已被瀏覽 5674 次，被下載 576 次 The thesis/dissertation has been browsed 5674 times, has been downloaded 576 times.

中文摘要
本論文中，我們整合聲控技術於Google地圖。也就是說，我們可以將原本利用滑鼠或鍵盤的部分地圖操作，改由聲控來進行。相較於最新的即時語音處理技術，我們系統最大的不同，在於所有的語音運算處理都是在客戶端上作執行。在語料庫部分，我們錄製了100個熱門的台灣景點與一些特定的地圖控制指令來作為訓練語料。在我們的實驗中，使用了不同的訓練方式來訓練聲學模型、設計字典和語言模型並估算我們系統的效能。在系統實際使用情況，透過位置、控制和座標不同部分的聲控操作，便可循序地移動地圖中心到達指定的搜尋景點。不同使用者針對數個特定位置進行估算，整體搜尋過程平均花費20.8秒，其中大部分時間都是花費在錄音階段。
Abstract
In this research, we integrate the voice commands technique into Google Map. It means that we can control part of the movements for Google Map search without using the mouse or keyboard but with voice. Our voice command system is built on the client side. The biggest different between our system and state-of-the-art real-time speech processing system is that all the computation about the speech process always work on the client side. For our corpus, we choose the Top100 scenic spots in Taiwan and some specific control commands as our training data. In the experiment of our research, we make use of the different ways to train the acoustic models and design dictionary and language models to estimate the efficiency on our system. Actual usage in the system, we can move the map center to the specific location sequentially by voice command operations for location, control and coordinate. we estimate the overall search process time on some specific locations by different users. It spends 20.8 seconds in average which spends most of time in recording stage.

目次 Table of Contents
Acknowledgments d List of Tables iii List of Figures iv Chapter 1 簡介1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 聲控系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Google地圖應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 網頁型的語音辨識系統中伺服器與客戶端的關係. . . . . . . . . . . . . 4 1.6 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 自動語音辨識系統架構6 2.1 隱藏式馬可夫模型和HTK工具. . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 模型單位集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 語音特徵參數擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 使用HTK工具指令於辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 應用於Google地圖上的中文聲控系統16 3.1 中文與注音符號系統的介紹. . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 解碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 錄音時間資訊分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 Silverlight的瀏覽器外用支援(Out-of-browser) . . . . . . . . . . . . . . . . 23 Chapter 4 語料庫與實驗25 4.1 語音語料庫的蒐集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 模型訓練和文法設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 實驗估算. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 地圖搜尋情境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 5 總結與未來展望31 5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 附錄A 37

參考文獻 References
[1] Apple, “iPhone 4S.” http://www.apple.com/iphone/features/siri. html. [2] B.-K. Shim, Y.-K. Cho, J.-B. Won, and S.-H. Han, “A Study on Real-time Control of Mobile Robot with Based on Voice Command,” in proceedings of 11th International Conference on Control, Automation and Systems(ICCAS), Korea, pp. 1102 – 1103, October 2011. [3] Y. Lu, L. Liu, S. Chen, and Q. Huang, “Voice Based Control for Humanoid Teleoperation,” in proceedings of 2010 International Conference on Intelligent System Design and Engineering Application,China, vol. 2, pp. 814 – 818, October 2010. [4] Google, “Google 地圖行動版.” http://www.google.com.tw/mobile/ maps/. [5] Microsoft, “Microsoft Silverlight.” http://www.microsoft.com/ silverlight/. [6] 賽微科技股份有限公司, “Cyberon Voice Commander 多國語言語音命令系統.” http://www.aclweb.org/anthology-new/O/O07/O07-1005.pdf. [7] J.-B. G’omez, A. Ceballos, F. Prieto, and T. Redarce, “Mouth Gesture and Voice Command Based Robot Command Interface,” in proceedings of 2009 IEEE International Conference on Robotics and Automation(ICRA), Japan, pp. 4289 – 4294, may 2009. [8] Google, “Google Maps Javascript API V3 Basics.” http://code.google.com/ intl/en/apis/maps/documentation/javascript/basics.html. [9] F. L. Huang, S. W. Lin, and J. H. Lin, “Integrating Speech and Google Maps System With Community Site Based on Text-to Speech Approach ,” in proceedings of 2011 Advanced Speech Processing Technology and Application, Taiwan, pp. 61 – 82, June 2011. [10] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. K. G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing,” tech. rep., 2009. [11] J. Borges, J. Jimenez, and N. Rodriquez, “Speech Browsing the World Wide Web,” in proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, Japan, vol. 4, pp. 80 – 86, October 1999. [12] T. Hain, A. E. Hannani, S. N. Wrigley, and V. Wan, “Automatic speech recognition for scientific purpose - webASR,” in proceedings of 9th Annual Conference of the International Speech Communication Association(INTERSPEECH2008), Australia, pp. 504 – 507, September 2008. [13] P. R. Dixon and S. Furui, “ExploringWeb-Browser based Runtimes Engines for Creating Ubiquitous Speech Interfaces,” in proceedings of 11th Annual Conference of the International Speech Communication Association(INTERSPEECH2010), Japan, pp. 630 – 632, September 2010. [14] M. Mohri, F. Pereira, and M. Riley, “Speech Recognition With Weighted Finite-State Transducers,” Springer Handbook on Speech Processing and Speech Communication, pp. 559–584, 2008. [15] D. Moore, J. Dines, M. M. Doss, J. Vepa, O. Cheng, and T. Hain, “Juicer: A Weighted Finite-State Transducer speech decoder,” 3rd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms MLMI’06, pp. 285–296, 2006. [16] J.-P. Hosom, J. de Villiers, R. Cole, M. Fanty, J. Schalkwyk, Y. Yan, and W. Wei, Training Hidden Markov Model/Artificial Neural Network (HMM/ANN) Hybrids for Automatic Speech Recognition (ASR). Center for Spoken Language Understanding (CSLU), 2006. [17] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book Version 3.3. Cambridge University Engineering Department, 2005. [18] K. Aida–Zade, C. Ardil, and S. Rustamov, “Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems,” World Academy of Science, Engineering and Technology, vol. 19, pp. 74 – 80, 2006. [19] 教育部國語推行委員會, “注音符號.” http://www.edu.tw/files/site_ content/M0001/juyin/ppp.htm?open. [20] J. M. Unger, “Pinyin.info - a guild to the writing of Mandarin Chinese in romanization.” http://www.pinyin.info/index.html. [21] 教育部國語推行委員會, “國語注音符號第二式.” http://language.moe.gov.tw/upload/public/20110125/ f0a04047-ac6b-498a-ba33-04fd81e575b4.pdf. [22] C. Huang, Y. Shi, J. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental tonal modeling for phone set design in mandarin LVCSR,” in proceedings of 2004 International Conference on Acoustics, Speech, and Signal Processing(ICASSP), Canada, pp. 901 – 904, May 2004. [23] Merialdo and Bernard, “Multilevel decoding for Very-Large-Size-Dictionary speech recognition,” IBM Journal of Research and Development, vol. 32, no. 2, pp. 227 – 237, 1988. [24] V. Ion and R. Haeb-Umbach, “A Novel Uncertainty Decoding RuleWith Applications to Transmission Error Robust Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, pp. 1047 – 1060, 2008. [25] J. Bloit and X. Rodet, “Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task ,” in proceedings of 2008 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), U.S.A, pp. 2121 – 2124, April 2008. [26] G. Forney and JR., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268 – 278, 1973. [27] D. Goodman, Dynamic HTML: The Definitive Reference (Dynamic Html). 2006.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0518112-140702.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS