Responsive image
博碩士論文 etd-1117115-161646 詳細資訊
Title page for etd-1117115-161646
論文名稱
Title
多媒體搜尋系統之設計研究
A Design of Multimedia Search System
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
88
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-07-27
繳交日期
Date of Submission
2016-07-11
關鍵字
Keywords
梅爾頻率倒頻譜係數、高斯混合模型、影像辨識系統、局部二元模式、語者辨識系統、色彩模型
Color model, Image recognition system, Local binary pattern, Gaussian mixture model, Speaker recognition system, Mel-frequency cepstral coefficients
統計
Statistics
本論文已被瀏覽 5701 次,被下載 24
The thesis/dissertation has been browsed 5701 times, has been downloaded 24 times.
中文摘要
多媒體在現今的生活中,扮演著極為重要的角色。透過文字、聲音、圖片、動畫與影像等多重方式來傳達訊息、呈現想法,不僅比使用一般書面文字做溝通,來得清楚易懂,更能提升閱覽的興致。隨著多媒體和網路通訊科技的發達,人類運用多媒體技術進行展示或教學已逐漸普及。使用多媒體教學,能使人們和科技產品進行互動,有效提升學習的主動性與便利性,更能對教學活動產生增長的作用。
在本論文中,吾人將建立一套影像及語音的多媒體搜尋系統,透過手機、相機與錄音設備來拍攝美術圖畫或錄製演講,經影像或語者辨識後,為使用者找到完整且正確的畫作或講者資訊。
論文中的畫作辨識系統,透過手機或相機攝取數位圖像,經由前處理將畫作切割及尺寸正規化後,以YCbCr之色彩模型、投影特徵及局部二元模式,做特徵之萃取,計算與訓練畫作間特徵的最短距離,以得到最終答案。實作系統中,吾人分別以手機及相機來做測試,在CPU時脈2.6 GHz 的Intel Core i5筆記型電腦與Windows 7作業系統環境下,針對20,160幅畫作,系統正確辨識率,可分別達到92.06% 與 90.07%。
而語者辨識系統,則利用梅爾倒頻譜係數作特徵參數,並建立每位語者的高斯混合模型,以40秒的音檔當作訓練,播放同一人但不同時間且不同地點的10秒音檔當作測試,在CPU時脈1.8GHz的Intel Core i5-3337U之個人電腦與Ubuntu14.04作業系統下,針對2,000位語者的系統正確辨識率,可達到83.4%。
Abstract
Multimedia plays an important role in our life. It communicates messages and ideas using a combination of text, audio, image, animation and video, and delivers easily understandable and much more fascinating materials than pure text. As multimedia and internet technologies are advanced, the applications of multimedia to the product presentation and education training are getting popular. Multimedia education makes effective interaction, active engagement and convenient learning plausible, and promote education to a higher level.
In this thesis, a multimedia searching system of image and speech is developed. Both mobile phone and camera are used for painting image capture, and microphone is used for speech recording. The system will respond the complete information about the painting or the speaker after correct image or speech recognition.
For painting recognition, the image from mobile phone or camera is first preprocessed, segmented and scale normalized. Then, YCbCr color model, projection features and local binary pattern are utilized for feature extraction. The nearest similarity between the testing and the training image is calculated to obtain the final answer. Using mobile phone and camera as the testing image capture devices, recognition rates of 92.06% and 90.07% can be reached respectively for the 20,160 paintings' system under the Intel Core i5 2.6 GHz laptop and Windows 7 operating system environment.
For speaker recognition, Mel frequency cepstrum coefficients are applied as the feature parameters, and a Gaussian mixture model for each speaker is established using 40 second training material. Using 10 second testing material at different times and locations, a correct speaker recognition rate of 83.4% can be obtained for the 2,000 speakers' system under the Intel Core i5-3337U 1.8 GHz personal computer and Ubuntu 14.04 operating system environment.
目次 Table of Contents
論文審定書 i
論文公開授權書 ii
誌謝 iii
摘要 iv
Abstract v
目錄 vii
圖次 xii
表次 xiv
第一章 緒論 1
1.1研究動機 1
1.2研究目的與方法 1
1.3論文章節概要 2
第二章 多媒體介紹 3
2.1數位影像 4
2.1.1解析度 6
2.1.2色彩模式 6
2.1.3影像格式 7
2.2數位聲音 8
第三章 西洋及中國美術史 10
3.1西洋美術史簡介 10
3.1.1文藝復興時期 10
3.1.2巴洛克時期 12
3.1.3洛可可時期 13
3.1.4新古典時期 13
3.1.5浪漫主義時期 14
3.1.6寫實主義時期 14
3.2中國美術史簡介 15
3.2.1史前~秦漢時期 15
3.2.2魏晉南北朝 15
3.2.3隋唐 16
3.2.4五代兩宋 16
3.2.5元朝 17
3.2.6明朝 17
3.2.7清朝 18
3.2.8民國 19
第四章 影像及語音處理介紹 22
4.1色彩模型 22
4.1.1 HSV色彩空間 22
4.1.2 YCbCr色彩空間 23
4.2平滑化濾波器 24
4.2.1均值濾波器 25
4.2.2中值濾波器 26
4.3 局部二元模式 27
4.3.1旋轉不變局部二元模式 29
4.3.2 均勻化局部二元模式 31
4.4發聲系統 32
4.5梅爾頻率倒頻譜係數 33
第五章 多媒體辨識系統架構設計 39
5.1影像辨識系統流程 39
5.2影像前處理 40
5.2.1灰階化 41
5.2.2調整亮度 41
5.2.3垂直、水平投影 43
5.2.4尺寸正規化 44
5.3局部二元模式特徵萃取 47
5.4語者搜尋系統流程 48
5.5語音前處理 49
5.5.1雜訊切除 49
5.5.2預強調 51
5.5.3漢明窗 51
5.6高斯混合模型 52
5.6.1模型介紹 53
5.6.2參數初始化 55
5.6.3最佳相似性估測法 57
5.6.4期望值最大演算法 57
第六章 辨識系統之實作成果與效能評估 61
6.1軟硬體設備與開發平台 61
6.2圖畫影像模型建立 62
6.3影像特徵比對 63
6.4影像辨識系統效能 67
6.5語者搜尋系統資料庫設計與辨識效能 67
第七章 結論與未來展望 69
參考文獻 70
參考文獻 References
[1] Bao Lingling and Shen Xizhong, “Improved Gaussian mixture model and application in speaker recognition,” International Conference on Control, Automation and Robotics, pp. 387-390, April 2016.
[2] Md. Afzal Hossan, Sheeraz Memon and Mark A Gregory, “A novel approach for MFCC feature extraction,” International Conference on Signal Processing and Communication Systems, pp. 1-5, December 2010.
[3] Gonzalez Woods,數位影像處理(繆紹綱譯),台灣培生教育,民國98年
[4] 鍾國亮,影像處理與電腦視覺(第六版),東華書局出版,民國104年
[5] 侯志欽,聲學原理與多媒體音訊科技,台灣商務出版,民國96年
[6] http://vr.theatre.ntu.edu.tw/fineart/chap19/chap19-01.htm
[7] 許麗雯暨高談藝術企劃小組,你不可以不知道的100位西洋畫家及其創作,華滋出版,民國102年
[8] http://vr.theatre.ntu.edu.tw/fineart/chap18/chap18-01.htm
[9] 張桐瑀,你不可以不知道的100位中國畫家及其作品,高談文化出版,民國94年
[10] http://culture.dwnews.com/big5/news/2014-08-05/59602021.html
[11] https://zh.wikipedia.org/wiki/女史箴圖
[12] http://vr.theatre.ntu.edu.tw/fineart/chap18/chap18-05.htm
[13] http://web.ptes.tp.edu.tw/big6/ceamics/history.htm
[14] http://www.cmuyu.idv.tw/articles/peoplenews/line&liness.htm
[15] Son Lam Phung, A. Bouzerdoum and D. Chai, “A novel skin color model in YCbCr color space and its application to human face detection,” International Conference on Image Processing, Vol. 1, pp. 289-292, 2002.
[16] D. Chai and A. Bouzerdoum, “A Bayesian approach to skin color classification in YCbCr color space,” TENCON 2,000, Vol. 2, pp. 421-424, 2,000.
[17] Xiaohua Duan, Guifeng Zheng, and Hongyang Chao, “An adaptive real-time descreening method based on SVM and improved SUSAN filter,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1462-1465, March 2010.
[18] Hasib Siddiqui, Mireille Boutin and Charles A. Bouman, “Hardware-friendly descreening,” IEEE Transactions on Image Processing, Vol. 19, pp. 746-757, March 2010.
[19] T. Ojala, Matti Pietikainen and D. Harwood, “Performance evaluation of texture measures with classification based on Kullback discrimination of distributions,” IEEE Transactions on Pattern Recognition, Vol. 1, pp. 582-585, 1994.
[20] T. Ojala, Matti Pietikainen and Topi Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 971-987, 2002.
[21] A. Hassan, F. Riaz and S. Rehman, “Rotation and scale invariant texture classification by compensating for distribution changes using covariate shift in uniform local binary patterns,” Electronics Letters, Vol. 50, pp. 27-29, January 2014.
[22] 王小川,語音訊號處理,全華出版,民國93年
[23] Jin Chensheng, Zhang Xueying and Jia Hairong, “A speech enhancement method based on signal subspace and hearing masking effect,” International Forum on Computer Science-Technology and Applications, Vol. 3, pp. 15-18, December 2009.
[24] Jan Skoglund and W. Bastiaan Kleijn, “On time-frequency masking in voiced speech,” IEEE Transactions on Speech and Audio Processing, Vol. 8, No. 4, pp. 361-369, July 2,000.
[25] http://web.ee.nthu.edu.tw/files/15-1030-11508,c3084-1.php
[26] https://en.wikipedia.org/wiki/Jensen's_inequality
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code