Responsive image
博碩士論文 etd-0901120-173847 詳細資訊
Title page for etd-0901120-173847
論文名稱
Title
時空注意的局部區域提議方法
A Local Region Proposal Method with Attentive Temporal-Spatial Pathways
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
74
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2020-10-07
繳交日期
Date of Submission
2020-10-01
關鍵字
Keywords
混合注意力機制、注意力機制、深度強化學習、物件偵測
Hybrid Attention Mechanism, Attention Mechanism, Object Detection, Deep Reinforcement Learning
統計
Statistics
本論文已被瀏覽 85 次,被下載 26
The thesis/dissertation has been browsed 85 times, has been downloaded 26 times.
中文摘要
目前主流的物件偵測系統多採用全域偵測與結合窮盡搜索法結合的方式,尋找圖片中的物體來得到最後偵測結果,但在真實情況下,想關注的物體往往只占圖圖片的一小部分,因此會造成運算資源上的浪費。為了解決這個問題,本論文在基於混合注意力機制與深度強化學習的基礎上提出了改善區域提議的方法。模型的核心模組Hybrid Attention Model (HAM)是一個深度強化學習搜尋系統。此模型主要能夠要搜尋圖片上的哪部分區域及在已經搜尋完想關注的區域後自動停止。首先利用HAM來順序性從Region Proposal Networks (RPN)所產生之全域的區域提議(region proposals, Region of Interests, RoIs)中選擇局部區域的RoIs進行偵測,且在模型中利用與β-softmax使訓練過程收斂得更快、更好,此模型稱為HAM-beta。接著,改良了原本從特徵圖上進行遮罩的動作,提出在attention的過程中利用一mask map進行遮罩,這樣的改變令模型在處理不同資料集時,有著不同的優勢,此模型稱為HAM-beta-mask。兩個模型在多個資料集的實驗結果均顯示這樣的改動使其效能超越了drl-RPN。
Abstract
At present, mainstream object detection systems mostly use a combination of global detection and an exhaustive search method to find objects in the picture to obtain the final detection result. But in real situations, the object need to be pay attention to often only occupies a small part of the picture. So it will cause a waste of computing resources. In order to solve this problem, this paper proposes a method to improve the region proposal based on the hybrid attention mechanism and deep reinforcement learning. The core module of the model Hybrid Attention Model (HAM) is a deep reinforcement learning search system. This model is mainly able to search which part of the picture and automatically stop after searching the area you want to focus on. First, use HAM to sequentially select the RoIs of the local region from the global region proposals (region proposals, Region of Interests, RoIs) generated by Region Proposal Networks (RPN) for detection, and use the β-softmax in the model for the training process can converges faster and better. This model is called HAM-beta. Next, the original masking action from the feature map is improved, and a mask map is used to mask during the attention process. This change makes the model have different advantages when processing different datasets. This model is called HAM-beta-mask. The experimental results of the two models in multiple datasets show that such changes make the performance beyond drl-RPN.
目次 Table of Contents
論文審定書 i
中文摘要 iii
Abstract iv
Table of Contents v
Figure catalog ix
Table catalog xii
1. INTRODUCTION 1
1-1 Motivation and Literature review 1
1-2 Organization of thesis 2
2. Research background 3
2-1 Convolution Neural Network 3
2-1-1 Convolution layer 4
2-1-2 Pooling Layer 5
2-1-3 Fully Connected Layer 5
2-2 Convolutional Neural Network Model 6
2-2-1 ResNet, Deep Learning Network 6
2-3 Faster R-CNN 8
2-3-1 Feature Extractor 8
2-3-2 Region Proposal Network, RPN 9
2-3-3 ROI Alignment 10
2-3-4 Detector 10
2-4 Drl-RPN 11
2-5 Convolutional Gated Recurrent Unit, Conv-GRU 12
2-6 Reinforcement Learning 14
2-6-1 Value function and policy 15
2-6-2 Policy Gradient 16
2-6-3 REINFORCE Algorithm 18
3. A Local Region Proposal Method with Attentive Temporal-Spatial Pathways 19
3-1 HAM-feature extraction network 23
3-1-1 res101 and FPN@P4 23
3-2 HAM - hybrid attention 26
3-2-1 Hybrid Attention 27
3-2-2 State space 31
3-2-3 Action space 32
3-2-4 HAM - operation procedure 33
3-2-5 Reward design 35
3-3 Model training 37
4. Experiments 40
4-1 Environments 40
4-2 Experiment datasets 40
4-2-1 Pascal VOC 40
4-2-2 Colonoscopy dataset 41
4-2-3 Laryngoscope dataset 42
4-3 Experimental details 42
4-4 Evaluation index 43
4-4-1 Confusion Matrix 44
4-4-2 Average Precision 45
4-4-3 The evaluation method of the dataset 46
4-5 Experiment result 46
4-5-1 Experiment of Pascal VOC 47
4-5-2 Experiment of colonoscopy 53
4-5-3 Experiment of laryngoscope 56
5. Conclusions and future work 59
5-1 Conclusions 59
5-2 Future work 59
References 60
參考文獻 References
[1] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
[2] K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask r-cnn," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961-2969.
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[4] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[5] A. Pirinen and C. Sminchisescu, "Deep reinforcement learning of region proposal networks for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6945-6954.
[6] V. Mnih, N. Heess, and A. Graves, "Recurrent models of visual attention," in Advances in neural information processing systems, 2014, pp. 2204-2212.
[7] S. Mathe, A. Pirinen, and C. Sminchisescu, "Reinforcement learning for visual object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2894-2902.
[8] P. Murugan, "Feed forward and backward run in deep convolution neural network," arXiv preprint arXiv:1711.03278, 2017.
[9] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[10] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[11] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[12] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
[13] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in neural information processing systems, 2000, pp. 1057-1063.
[14] R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Machine learning, vol. 8, no. 3-4, pp. 229-256, 1992.
[15] O. Chapelle and M. Wu, "Gradient descent optimization of smoothed information retrieval metrics," Information retrieval, vol. 13, no. 3, pp. 216-235, 2010.
[16] K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention," in International conference on machine learning, 2015, pp. 2048-2057.
[17] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[18] R. Girshick, "Fast r-cnn," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440-1448.
[19] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes (voc) challenge," International journal of computer vision, vol. 88, no. 2, pp. 303-338, 2010.
[20] J. Bernal, F. J. Sánchez, G. Fernández-Esparrach, D. Gil, C. Rodríguez, and F. Vilariño, "WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians," Computerized Medical Imaging and Graphics, vol. 43, pp. 99-111, 2015.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code