國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以基於區域的卷積神經網路實現空拍影片之魟魚偵測和辨識,Stingray Detection and Recognition of Aerial Videos with Region-based Convolution Neural Network

論文名稱 Title	以基於區域的卷積神經網路實現空拍影片之魟魚偵測和辨識 Stingray Detection and Recognition of Aerial Videos with Region-based Convolution Neural Network
系所名稱 Department	機械與機電工程學系 Department of Mechanical and Electro-Mechanical Engineering
畢業學年期 Year, semester	106 學年度第 1 學期 The fall semester of Academic Year 106	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	103
研究生 Author	陳建宏 Chien-Hung Chen
指導教授 Advisor	劉耿豪 Keng-Hao Liu
召集委員 Convenor	程啟正 Chi-Cheng Cheng
口試委員 Advisory Committee	宋克義, 嚴成文, 陳祝嵩 Keryea Soong; Chen-Wen Yen; Chu-Song Chen
口試日期 Date of Exam	2017-09-22	繳交日期 Date of Submission	2017-10-06
關鍵字 Keywords	深度學習、物件偵測、卷積神經網路、空拍影像、機器視覺 Object Detection, Convolution Neural Network, Deep Learning, Machine Vision, Aerial Imaging
統計 Statistics	本論文已被瀏覽 5674 次，被下載 366 次 The thesis/dissertation has been browsed 5674 times, has been downloaded 366 times.

中文摘要
由於近年來影像處理技術因為深度學習的崛起而有了驚人的發展，許多過去處理起來棘手的問題現在都有了轉機。在生態研究領域，生態學家經常利用攝影器材輔助蒐集影像或影片資料，經過人工判讀與統計後進行分析。有些資料判讀的過程對人而言極為耗時，隨著蒐集資料的增加，研究進度往往會被拖慢。我們所遇到的狀況為生態研究者利用空拍機對東沙群島沿海執行遙測高空攝影，來進行魟魚統計分析的工作。傳統上，資料蒐集結束後需要花費大量人工時間觀看影片，用肉眼定位並統計出魟魚的大小與數目。研究者希望能使用電腦自動化判讀來取代人力，但由於魟魚在的形狀與顏色與背景相當接近，使用一般物件偵測影像技術並無法有效捕捉到魟魚。本論文嘗試發展一個基於深度學習的電腦自動化的魟魚偵測方法，該法使用區域性卷積神經網路為基本單張影像物件偵測模型，並導入物件移動軌跡的可預測性以及時間軸上物件位置的一致性等特徵進行後處理，整合成適用於偵測影片中移動物件的方法。本研究的目標是發展一套能自動處理空拍魟魚影片偵測的軟體，希望未來能節省生態研究者在整理資料上所耗費的時間。
Abstract
In recent years, image processing technology has made a major breakthrough because of the appearance of deep learning. Nowadays, many problems that were difficult to deal with in the past can be resolved by using deep learning methods. Image processing has been used as an assistant tool in many different kinds of research fields. In ecological research, researchers usually utilize photography equipment to collecting image or video data, and then process them for further analysis. Processing some types of data for people is very tedious and time-consuming. As the growth of the data, research progress could be slowing down. In this thesis, the situation we faced is that the biology researchers use the unmanned aerial vehicle (UAV) to take aerial video along the seaside of Dongsha islands, and they need to recognize the location of stingrays and counting the number of them in those videos. Since using traditional detection methods are difficult to detect stingray, we attempt develop a deep learning-based method for automatic stingray detection. It uses region-based convolution neural network (CNN) as the basic model for frame-wise detection. To increase the capability, the temporal information, such as the predictability of moving trajectory, and the consistency of object’s location on the time axis, is further integrated into the model. The goal of this study is to develop a system that can automatically handle the detection tasks for aerial videos. We hope that the achievement could help ecological researchers save time in processing video data.

目次 Table of Contents
論文審定書 ⅰ 摘要 ⅱ ABSTRACT ⅲ 目錄 ⅳ 圖目錄 ⅵ 表目錄 ⅷ 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 論文架構 3 第二章相關研究與文獻回顧 4 2.1 基於規則的方法 4 2.2 機器學習方法 5 2.3 深度學習方法 7 2.4 深度學習相關文獻回顧 8 第三章基於區域卷積網路的物件偵測方法 12 3.1 類神經網路 12 3.1.1 類神經網路簡介 12 3.1.2 類神經網路運作程序 14 3.1.3 誤差反向傳播演算法: 15 3.1.4 學習率和慣性項 20 3.1.5 小批量梯度下降法 21 3.2 基於區域的卷積神經網路：Faster R-CNN 22 3.2.1 神經層介紹 23 3.2.2 多層卷積網路 27 3.2.3 區域提案網路 31 3.2.4 區域識別網路 35 第四章連續影像之魟魚偵測方法 38 4.1 使用Faster R-CNN進行影片偵測 38 4.2 導入時間資訊的Faster R-CNN進行影片偵測 40 4.2.1 移動軌跡資訊 40 4.2.2 前後影格偵測結果一致性 44 4.3 物件偵測評估方法 45 第五章實驗資料與實驗結果 48 5.1 魟魚影像資料介紹 48 5.1.1 實驗資料 48 5.1.2 資料擴增 51 5.2 實驗設定 52 5.2.1 電腦硬體與作業平台 52 5.2.2 參數設定 52 5.3 Baseline方法簡介 53 5.3.1 Selective Search 53 5.3.2 方向梯度直方圖 54 5.3.2 支持向量機 54 5.4 網路模型訓練結果 57 5.5 實驗結果與分析 59 第六章結論與未來展望 92 參考文獻 93

參考文獻 References
[1] J. Canny, “A Computational Approach to Edge Detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. [2] R. O. Duda and P. E. Hart, “Use of the Hough Transformation to Detect Lines and Curves in Pictures,” Commun. ACM, vol. 15, no. 1, pp. 11–15, Jan. 1972. [3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 886–893 vol. 1. [4] D. G. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157 vol.2. [5] “Robust real-time face recognition,” in 2013 Africon, 2013, pp. 1–5. [6] C. H. Lampert, M. B. Blaschko, and T. Hofmann, “Beyond sliding windows: Object localization by efficient subwindow search,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8. [7] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders, “Selective Search for Object Recognition,” International Journal of Computer Vision, vol. 104, 2013. [8] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient Graph-Based Image Segmentation,” Int. J. Comput. Vision, vol. 59, no. 2, pp. 167–181, Sep. 2004. [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105. [10] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. [11] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. [12] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” arXiv:1311.2901 [cs], Nov. 2013. [13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-Based Convolutional Networks for Accurate Object Detection and Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 142–158, Jan. 2016. [14] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv:1409.1556 [cs], Sep. 2014. [15] R. Girshick, “Fast R-CNN,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448. [16] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PP, no. 99, pp. 1–1, 2016. [17] W. Han et al., “Seq-NMS for Video Object Detection,” arXiv:1602.08465 [cs], Feb. 2016. [18] K. Kang, W. Ouyang, H. Li, and X. Wang, “Object Detection from Video Tubelets with Convolutional Neural Networks,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 817–825. [19] L. Wang, W. Ouyang, X. Wang, and H. Lu, “Visual Tracking with Fully Convolutional Networks,” in 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 3119–3127. [20] S. Haykin, Neural Networks: A Comprehensive Foundation (3rd Edition). Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 2007. [21] http://book.paddlepaddle.org/02.recognize_digits/ [22] M. Everingham, L. V. Gool, C. K. I. Williams, J. Winn, and A. Zisserman, The PASCAL Visual Object Classes (VOC) challenge. 2009. [23] Y. Jia et al., “Caffe: Convolutional Architecture for Fast Feature Embedding,” in Proceedings of the 22Nd ACM International Conference on Multimedia, New York, NY, USA, 2014, pp. 675–678. [24] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin, “LIBLINEAR: A Library for Large Linear Classification,” J. Mach. Learn. Res., vol. 9, pp. 1871–1874, Jun. 2008.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0708117-234728.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS