Responsive image
博碩士論文 etd-0518112-140702 詳細資訊
Title page for etd-0518112-140702
論文名稱
Title
聲控Google地圖
Voice Command for Google Map
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
53
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2012-04-26
繳交日期
Date of Submission
2012-05-18
關鍵字
Keywords
聲控、解碼器、Google地圖
voice command, decoder, Google Map
統計
Statistics
本論文已被瀏覽 5674 次,被下載 576
The thesis/dissertation has been browsed 5674 times, has been downloaded 576 times.
中文摘要
本論文中,我們整合聲控技術於Google地圖。也就是說,我們可以將原本利用滑鼠或鍵盤的部分地圖操作,改由聲控來進行。相較於最新的即時語音處理技術,我們系統最大的不同,在於所有的語音運算處理都是在客戶端上作執行。在語料庫部分,我們錄製了100個熱門的台灣景點與一些特定的地圖控制指令來作為訓練語料。在我們的實驗中,使用了不同的訓練方式來訓練聲學模型、設計字典和語言模型並估算我們系統的效能。在系統實際使用情況,透過位置、控制和座標不同部分的聲控操作,便可循序地移動地圖中心到達指定的搜尋景點。不同使用者針對數個特定位置進行估算,整體搜尋過程平均花費20.8秒,其中大部分時間都是花費在錄音階段。
Abstract
In this research, we integrate the voice commands technique into Google Map. It means
that we can control part of the movements for Google Map search without using the mouse or
keyboard but with voice. Our voice command system is built on the client side. The biggest
different between our system and state-of-the-art real-time speech processing system is that
all the computation about the speech process always work on the client side. For our corpus,
we choose the Top100 scenic spots in Taiwan and some specific control commands as our
training data. In the experiment of our research, we make use of the different ways to train
the acoustic models and design dictionary and language models to estimate the efficiency on our system. Actual usage in the system, we can move the map center to the specific location sequentially by voice command operations for location, control and coordinate. we estimate the overall search process time on some specific locations by different users. It spends 20.8 seconds in average which spends most of time in recording stage.
目次 Table of Contents
Acknowledgments d
List of Tables iii
List of Figures iv
Chapter 1 簡介1
1.1 研究背景. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 聲控系統. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Google地圖應用. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 網頁型的語音辨識系統中伺服器與客戶端的關係. . . . . . . . . . . . . 4
1.6 論文架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Chapter 2 自動語音辨識系統架構6
2.1 隱藏式馬可夫模型和HTK工具. . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 模型單位集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 語音特徵參數擷取. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 聲學模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 使用HTK工具指令於辨識. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Chapter 3 應用於Google地圖上的中文聲控系統16
3.1 中文與注音符號系統的介紹. . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 解碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 系統架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 錄音時間資訊分析. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Silverlight的瀏覽器外用支援(Out-of-browser) . . . . . . . . . . . . . . . . 23
Chapter 4 語料庫與實驗25
4.1 語音語料庫的蒐集. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2 模型訓練和文法設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3 實驗估算. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.4 地圖搜尋情境. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 5 總結與未來展望31
5.1 總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
附錄A 37
參考文獻 References
[1] Apple, “iPhone 4S.” http://www.apple.com/iphone/features/siri.
html.
[2] B.-K. Shim, Y.-K. Cho, J.-B. Won, and S.-H. Han, “A Study on Real-time Control of
Mobile Robot with Based on Voice Command,” in proceedings of 11th International
Conference on Control, Automation and Systems(ICCAS), Korea, pp. 1102 – 1103, October
2011.
[3] Y. Lu, L. Liu, S. Chen, and Q. Huang, “Voice Based Control for Humanoid Teleoperation,”
in proceedings of 2010 International Conference on Intelligent System Design
and Engineering Application,China, vol. 2, pp. 814 – 818, October 2010.
[4] Google, “Google 地圖行動版.” http://www.google.com.tw/mobile/
maps/.
[5] Microsoft, “Microsoft Silverlight.” http://www.microsoft.com/
silverlight/.
[6] 賽微科技股份有限公司, “Cyberon Voice Commander 多國語言語音命令系統.”
http://www.aclweb.org/anthology-new/O/O07/O07-1005.pdf.
[7] J.-B. G’omez, A. Ceballos, F. Prieto, and T. Redarce, “Mouth Gesture and Voice Command
Based Robot Command Interface,” in proceedings of 2009 IEEE International
Conference on Robotics and Automation(ICRA), Japan, pp. 4289 – 4294, may 2009.
[8] Google, “Google Maps Javascript API V3 Basics.” http://code.google.com/
intl/en/apis/maps/documentation/javascript/basics.html.
[9] F. L. Huang, S. W. Lin, and J. H. Lin, “Integrating Speech and Google Maps System
With Community Site Based on Text-to Speech Approach ,” in proceedings of 2011
Advanced Speech Processing Technology and Application, Taiwan, pp. 61 – 82, June
2011.
[10] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. H. Katz, A. K. G. Lee, D. A. Patterson,
A. Rabkin, I. Stoica, and M. Zaharia, “Above the Clouds: A Berkeley View of
Cloud Computing,” tech. rep., 2009.
[11] J. Borges, J. Jimenez, and N. Rodriquez, “Speech Browsing the World Wide Web,” in
proceedings of 1999 IEEE International Conference on Systems, Man, and Cybernetics,
Japan, vol. 4, pp. 80 – 86, October 1999.
[12] T. Hain, A. E. Hannani, S. N. Wrigley, and V. Wan, “Automatic speech recognition for
scientific purpose - webASR,” in proceedings of 9th Annual Conference of the International
Speech Communication Association(INTERSPEECH2008), Australia, pp. 504 –
507, September 2008.
[13] P. R. Dixon and S. Furui, “ExploringWeb-Browser based Runtimes Engines for Creating
Ubiquitous Speech Interfaces,” in proceedings of 11th Annual Conference of the International
Speech Communication Association(INTERSPEECH2010), Japan, pp. 630 –
632, September 2010.
[14] M. Mohri, F. Pereira, and M. Riley, “Speech Recognition With Weighted Finite-State
Transducers,” Springer Handbook on Speech Processing and Speech Communication,
pp. 559–584, 2008.
[15] D. Moore, J. Dines, M. M. Doss, J. Vepa, O. Cheng, and T. Hain, “Juicer: A Weighted
Finite-State Transducer speech decoder,” 3rd Joint Workshop on Multimodal Interaction
and Related Machine Learning Algorithms MLMI’06, pp. 285–296, 2006.
[16] J.-P. Hosom, J. de Villiers, R. Cole, M. Fanty, J. Schalkwyk, Y. Yan, and W. Wei, Training
Hidden Markov Model/Artificial Neural Network (HMM/ANN) Hybrids for Automatic
Speech Recognition (ASR). Center for Spoken Language Understanding (CSLU),
2006.
[17] S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason,
D. Povey, V. Valtchev, and P. Woodland, The HTK Book Version 3.3. Cambridge
University Engineering Department, 2005.
[18] K. Aida–Zade, C. Ardil, and S. Rustamov, “Investigation of Combined use of MFCC and
LPC Features in Speech Recognition Systems,” World Academy of Science, Engineering
and Technology, vol. 19, pp. 74 – 80, 2006.
[19] 教育部國語推行委員會, “注音符號.” http://www.edu.tw/files/site_
content/M0001/juyin/ppp.htm?open.
[20] J. M. Unger, “Pinyin.info - a guild to the writing of Mandarin Chinese in romanization.”
http://www.pinyin.info/index.html.
[21] 教育部國語推行委員會, “國語注音符號第二式.”
http://language.moe.gov.tw/upload/public/20110125/
f0a04047-ac6b-498a-ba33-04fd81e575b4.pdf.
[22] C. Huang, Y. Shi, J. Zhou, M. Chu, T. Wang, and E. Chang, “Segmental tonal modeling
for phone set design in mandarin LVCSR,” in proceedings of 2004 International Conference
on Acoustics, Speech, and Signal Processing(ICASSP), Canada, pp. 901 – 904,
May 2004.
[23] Merialdo and Bernard, “Multilevel decoding for Very-Large-Size-Dictionary speech
recognition,” IBM Journal of Research and Development, vol. 32, no. 2, pp. 227 – 237,
1988.
[24] V. Ion and R. Haeb-Umbach, “A Novel Uncertainty Decoding RuleWith Applications to
Transmission Error Robust Speech Recognition,” IEEE Transactions on Audio, Speech,
and Language Processing, vol. 16, no. 5, pp. 1047 – 1060, 2008.
[25] J. Bloit and X. Rodet, “Short-time Viterbi for online HMM decoding: Evaluation on
a real-time phone recognition task ,” in proceedings of 2008 IEEE International Conference
on Acoustics, Speech and Signal Processing(ICASSP), U.S.A, pp. 2121 – 2124,
April 2008.
[26] G. Forney and JR., “The viterbi algorithm,” Proceedings of the IEEE, vol. 61, no. 3,
pp. 268 – 278, 1973.
[27] D. Goodman, Dynamic HTML: The Definitive Reference (Dynamic Html). 2006.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code