論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available
論文名稱 Title |
聲控Google地圖 Voice Command for Google Map |
||
系所名稱 Department |
|||
畢業學年期 Year, semester |
語文別 Language |
||
學位類別 Degree |
頁數 Number of pages |
53 |
|
研究生 Author |
|||
指導教授 Advisor |
|||
召集委員 Convenor |
|||
口試委員 Advisory Committee |
|||
口試日期 Date of Exam |
2012-04-26 |
繳交日期 Date of Submission |
2012-05-18 |
關鍵字 Keywords |
聲控、解碼器、Google地圖 voice command, decoder, Google Map |
||
統計 Statistics |
本論文已被瀏覽 5674 次,被下載 576 次 The thesis/dissertation has been browsed 5674 times, has been downloaded 576 times. |
中文摘要 |
本論文中,我們整合聲控技術於Google地圖。也就是說,我們可以將原本利用滑鼠或鍵盤的部分地圖操作,改由聲控來進行。相較於最新的即時語音處理技術,我們系統最大的不同,在於所有的語音運算處理都是在客戶端上作執行。在語料庫部分,我們錄製了100個熱門的台灣景點與一些特定的地圖控制指令來作為訓練語料。在我們的實驗中,使用了不同的訓練方式來訓練聲學模型、設計字典和語言模型並估算我們系統的效能。在系統實際使用情況,透過位置、控制和座標不同部分的聲控操作,便可循序地移動地圖中心到達指定的搜尋景點。不同使用者針對數個特定位置進行估算,整體搜尋過程平均花費20.8秒,其中大部分時間都是花費在錄音階段。 |
Abstract |
In this research, we integrate the voice commands technique into Google Map. It means that we can control part of the movements for Google Map search without using the mouse or keyboard but with voice. Our voice command system is built on the client side. The biggest different between our system and state-of-the-art real-time speech processing system is that all the computation about the speech process always work on the client side. For our corpus, we choose the Top100 scenic spots in Taiwan and some specific control commands as our training data. In the experiment of our research, we make use of the different ways to train the acoustic models and design dictionary and language models to estimate the efficiency on our system. Actual usage in the system, we can move the map center to the specific location sequentially by voice command operations for location, control and coordinate. we estimate the overall search process time on some specific locations by different users. It spends 20.8 seconds in average which spends most of time in recording stage. |
目次 Table of Contents |
Acknowledgments d List of Tables iii List of Figures iv Chapter 1 簡介1 1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 聲控系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Google地圖應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 網頁型的語音辨識系統中伺服器與客戶端的關係. . . . . . . . . . . . . 4 1.6 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Chapter 2 自動語音辨識系統架構6 2.1 隱藏式馬可夫模型和HTK工具. . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 模型單位集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 語音特徵參數擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.5 使用HTK工具指令於辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Chapter 3 應用於Google地圖上的中文聲控系統16 3.1 中文與注音符號系統的介紹. . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 解碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.4 錄音時間資訊分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 Silverlight的瀏覽器外用支援(Out-of-browser) . . . . . . . . . . . . . . . . 23 Chapter 4 語料庫與實驗25 4.1 語音語料庫的蒐集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 模型訓練和文法設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3 實驗估算. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.4 地圖搜尋情境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Chapter 5 總結與未來展望31 5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 附錄A 37 |
參考文獻 References |
[1] Apple, “iPhone 4S.” http://www.apple.com/iphone/features/siri. html. [2] B.-K. Shim, Y.-K. Cho, J.-B. Won, and S.-H. Han, “A Study on Real-time Control of Mobile Robot with Based on Voice Command,” in proceedings of 11th International Conference on Control, Automation and Systems(ICCAS), Korea, pp. 1102 – 1103, October 2011. [3] Y. Lu, L. Liu, S. Chen, and Q. Huang, “Voice Based Control for Humanoid Teleoperation,” in proceedings of 2010 International Conference on Intelligent System Design and Engineering Application,China, vol. 2, pp. 814 – 818, October 2010. [4] Google, “Google 地圖行動版.” http://www.google.com.tw/mobile/ maps/. [5] Microsoft, “Microsoft Silverlight.” http://www.microsoft.com/ silverlight/. [6] 賽微科技股份有限公司, “Cyberon Voice Commander 多國語言語音命令系統.” http://www.aclweb.org/anthology-new/O/O07/O07-1005.pdf. [7] J.-B. G’omez, A. Ceballos, F. Prieto, and T. Redarce, “Mouth Gesture and Voice Command Based Robot Command Interface,” in proceedings of 2009 IEEE International Conference on Robotics and Automation(ICRA), Japan, pp. 4289 – 4294, may 2009. [8] Google, “Google Maps Javascript API V3 Basics.” http://code.google.com/ intl/en/apis/maps/documentation/javascript/basics.html. [9] F. L. Huang, S. W. Lin, and J. H. Lin, “Integrating Speech and Google Maps System With Community Site Based on Text-to Speech Approach ,” in proceedings of 2011 Advanced Speech Processing Technology and Application, Taiwan, pp. 61 – 82, June 2011. [10] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. K. G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing,” tech. rep., 2009. [11] J. Borges, J. Jimenez, and N. Rodriquez, “Speech Browsing the World Wide Web,” in proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics, Japan, vol. 4, pp. 80 – 86, October 1999. [12] T. Hain, A. E. Hannani, S. N. Wrigley, and V. Wan, “Automatic speech recognition for scientific purpose - webASR,” in proceedings of 9th Annual Conference of the International Speech Communication Association(INTERSPEECH2008), Australia, pp. 504 – 507, September 2008. [13] P. R. Dixon and S. Furui, “ExploringWeb-Browser based Runtimes Engines for Creating Ubiquitous Speech Interfaces,” in proceedings of 11th Annual Conference of the International Speech Communication Association(INTERSPEECH2010), Japan, pp. 630 – 632, September 2010. [14] M. Mohri, F. Pereira, and M. Riley, “Speech Recognition With Weighted Finite-State Transducers,” Springer Handbook on Speech Processing and Speech Communication, pp. 559–584, 2008. [15] D. Moore, J. Dines, M. M. Doss, J. Vepa, O. Cheng, and T. Hain, “Juicer: A Weighted Finite-State Transducer speech decoder,” 3rd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms MLMI’06, pp. 285–296, 2006. [16] J.-P. Hosom, J. de Villiers, R. Cole, M. Fanty, J. Schalkwyk, Y. Yan, and W. Wei, Training Hidden Markov Model/Artificial Neural Network (HMM/ANN) Hybrids for Automatic Speech Recognition (ASR). Center for Spoken Language Understanding (CSLU), 2006. [17] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book Version 3.3. Cambridge University Engineering Department, 2005. [18] K. Aida–Zade, C. Ardil, and S. Rustamov, “Investigation of Combined use of MFCC and LPC Features in Speech Recognition Systems,” World Academy of Science, Engineering and Technology, vol. 19, pp. 74 – 80, 2006. [19] 教育部國語推行委員會, “注音符號.” http://www.edu.tw/files/site_ content/M0001/juyin/ppp.htm?open. [20] J. M. Unger, “Pinyin.info - a guild to the writing of Mandarin Chinese in romanization.” http://www.pinyin.info/index.html. [21] 教育部國語推行委員會, “國語注音符號第二式.” http://language.moe.gov.tw/upload/public/20110125/ f0a04047-ac6b-498a-ba33-04fd81e575b4.pdf. [22] C. Huang, Y. Shi, J. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental tonal modeling for phone set design in mandarin LVCSR,” in proceedings of 2004 International Conference on Acoustics, Speech, and Signal Processing(ICASSP), Canada, pp. 901 – 904, May 2004. [23] Merialdo and Bernard, “Multilevel decoding for Very-Large-Size-Dictionary speech recognition,” IBM Journal of Research and Development, vol. 32, no. 2, pp. 227 – 237, 1988. [24] V. Ion and R. Haeb-Umbach, “A Novel Uncertainty Decoding RuleWith Applications to Transmission Error Robust Speech Recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 5, pp. 1047 – 1060, 2008. [25] J. Bloit and X. Rodet, “Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task ,” in proceedings of 2008 IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP), U.S.A, pp. 2121 – 2124, April 2008. [26] G. Forney and JR., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3, pp. 268 – 278, 1973. [27] D. Goodman, Dynamic HTML: The Definitive Reference (Dynamic Html). 2006. |
電子全文 Fulltext |
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。 論文使用權限 Thesis access permission:自定論文開放時間 user define 開放時間 Available: 校內 Campus: 已公開 available 校外 Off-campus: 已公開 available |
紙本論文 Printed copies |
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。 開放時間 available 已公開 available |
QR Code |