國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以加強式學習實現適應性視覺伺服於機器手臂控制,Adaptive Image-Based Visual Servoing of Robot Manipulators by Reinforcement Learning

論文名稱 Title	以加強式學習實現適應性視覺伺服於機器手臂控制 Adaptive Image-Based Visual Servoing of Robot Manipulators by Reinforcement Learning
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	54
研究生 Author	李佳霖 Jia-Lin Lee
指導教授 Advisor	黃國勝 Kao-Shing Hwang
召集委員 Convenor	李祖聖 Tzuu-Hseng S. Li
口試委員 Advisory Committee	陳昱仁, 朱明毅, 蔡清池 Yu-Jen Chen; Ming-Yi Ju; Ching-Chih Tsai
口試日期 Date of Exam	2017-07-18	繳交日期 Date of Submission	2017-07-19
關鍵字 Keywords	視覺伺服、機器手臂、加強式學習、Q學習法 Visual servoing, Robot arm, Q-learning, Reinforcement learning
統計 Statistics	本論文已被瀏覽 5632 次，被下載 148 次 The thesis/dissertation has been browsed 5632 times, has been downloaded 148 times.

中文摘要
本論文旨在影像視覺伺服系統裡引入加強式學習中的Q學習法設計一個智慧型增益控制器，並應用於機器手臂控制。利用影像處理演算法進行目標影像與當前影像的特徵擷取後，計算影像特徵向量間的誤差距離，此誤差距離將泛化後形成Q學習的狀態空間。而動作空間將由控制增益量組成，根據手臂從影像取得的狀態，利用貪婪演算法選擇適當的動作，進行手臂的移動控制。為進一步增加手臂逼近目標影像精確度，本論文基於原本的動作空間引入一衰減值，當目前特徵誤差小於一定量時，每一動作將附上一衰減值減少控制量，使得手臂於目標位置附近時能更加精確與穩定。透過本論文提出的方法使機器手臂在進行影像視覺伺服的過程中，可以解決固定控制增益值過大時造成控制系統過衝，以及控制增益值過小時使手臂移動速度過於緩慢的問題。由於Q學習法不需要事先擁有任何環境相關的知識，即可進行學習，適合應用在決策控制系統的問題上。Q學習法藉由學習代理人與環境互動取得報酬，互動的同時Q學習會根據報酬值的強弱調整策略，當經過多次且長時間與環境互動，累積一定數量的經驗，最後代理人會學習到一組最佳策略。經學習後的控制增益值能使系統穩定的達到目標狀態位置，且有效的減少達到目標狀態的單位時間。為驗證本論文的方法將利用一七軸機器手臂分別在模擬與實機實驗環境進行實作。本研究也與控制增益固定的方法做比較以驗證其有效性。
Abstract
The main objective of this thesis is to design an intelligent gain controller for a robot arm based on reinforcement learning methods. The controller is applied in image-based visual servoing. This research uses the image processing algorithm to compute the features of desired image and current image. The image feature error is used to generate the state space of Q-learning. The ε-greedy method is applied to choose a suitable action which robot arm will take according to the input state. The action space consists of control gains. In order to make the control system more flexible, this thesis introduces an attenuation value based on the original action space. Each action will be accompanied by an attenuation value to reduce the amount of control when the current feature error is less than a threshold, so that the arm in the vicinity of the target position will more accurate and stable. The learning method will solve the control system problem. The fixed large control gain will lead to the system overshoot. In contrast small the control gain will cause the system to converge slowly in visual servoing. Moreover, Q-learning doesn’t need any knowledge about the environment, it is suitable for controller for decision making. Q-learning gets reward through a learning agent interacting with the environment. The agent will adjust the policy according to the strength of reward and try to maximize reward over time. After some learning iterations, the controller can output a series of control gain to achieve the goal efficiency. The proposed method will be implemented by a 7-axis robot arm in the simulation and experimental environment. The results also is compared with the one of fixed control gain method to verify the efficiency of the proposed method.

目次 Table of Contents
論文審定書+i 誌謝+iii 中文摘要+iv Abstract+v 目錄+vi 圖目錄+viii 表目錄+ix 第一章緒論+1 1-1 研究動機與目的+1 1-2 文獻回顧+2 1-3 論文架構+3 第二章相關技術與背景+4 2-1 ORB(ORIENTED FAST AND ROTATED BRIEF)演算法+4 2-2 加強式學習+6 2-3 Q學習法(Q-LEARNING)+7 第三章系統架構與研究方法+9 3-1 建立狀態空間+10 3-2 動作空間+13 3-3 機器手臂賈可賓矩陣+17 3-4 報酬函式+18 3-5 更新函式+19 3-6 整體系統演算法+20 第四章模擬實驗與實做結果+22 4-1 模擬說明+22 4-1-1 模擬環境+23 4-1-2 七軸機器手臂規格+24 4-2 模擬實驗+25 4-3 實機實驗說明+29 4-3-2 實機環境+30 4-3-3 實機實驗+32 第五章結論與未來工作+41 參考文獻+42

參考文獻 References
[1] P. Serra, R. Cunha, T. Hamel, D. Cabecinhas and C. Silvestre, “Landing of a Quadrotor on a Moving Target Using Dynamic Image-Based Visual Servo Control,” IEEE Transactions on Robotics, vol. 32, no. 6, pp. 1524-1535, 2016. [2] F. Chaumette and S. Hutchinson, “Visual servo control, Part I: Basic approaches,” IEEE Robot. Automation Magazine, vol. 13, no. 4, pp. 82-90, Dec. 2006. [3] F. Chaumette, S. Hutchinson. “Visual Servo Control, Part II: Advanced Approaches,” IEEE Robotics and Automation Magazine, vol. 14, no.1, pp. 109-118, March 2007. [4] S. Hutchinson, G. D. Hager and P. I. Corke, “A tutorial on visual servo Control,” IEEE Transactions on Robotics and Automation, vol. 12, no. 5, pp. 651-670, 1996. [5] F. Chaumette, “Potential problems of stability and convergence in image-based and position-based visual servoing,” in The Confluence of Vision and Control, vol. 237, pp. 66-78, 1998. [6] David G. Lowe, “Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91-110, 2004. [7] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded Up Robust Features,” Computer Vision and Image Understanding (CVIU), vol. 110, no. 3, pp. 346-359, 2008. [8] E. Rosten and T. Drummond. “Machine learning for high speed corner detection,” in European Conference on Computer Vision, vol. 1, pp. 430-443, 2006. [9] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski. “ORB: an efficient alternative to SIFT or SURF,” in Proceeding of the IEEE Intl. Conference on Computer Vision (ICCV), vol. 13, pp. 2564-2571, 2011. [10] Tom M. Mitchell, Machine Learning, Chapter 13, MCGRAW HILL Publisher, 1997. [11] R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998. [12] S. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” Informatica Journal, vol. 31, no. 3, pp. 249-268, 2007. [13] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. “Brief: Binary robust independent elementary features,” in European Conference on Computer Vision, pp. 778-792, 2010. [14] P. L. Rosin. “Measuring corner properties,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 291-307, 1999. [15] C. J. Watkins and P. Dayan, "Technical Note: Q-Learning," Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992. [16] M. Toussaint and A. Storkey. “Probabilistic inference for solving discreteand continuous state Markov Decision Processes,” in Proceeding of the 23nd International Conference on Machine Learning, pp. 945-952, 2006. [17] H. Michel and P. Rives, “Singularities in the determination of the situation of a robot effector from the perspective view of three points,” NRIA Research Report, Tech. Rep. 1850, Feb. 1993. [18] J. C. A. Barata1 and M. S. Hussein. “The Moore-Penrose Pseudoinverse. A Tutorial Review of the Theory,” Brazilian Journal of Physics, vol. 42, no. 1-2, pp. 146-165, 2012. [19] Paul, Richard (1981). Robot manipulators: mathematics, programming, and control : the computer control of robot manipulators. MIT Press, Cambridge, MA. ISBN 978-0-262-16082-7. [20] D. Guo and Y. Zhang, “A new inequality-based obstacle-avoidance MVN scheme and its application to redundant robot manipulators,” IEEE Trans. Systems, Man, and Cybernetics, Part C, vol. 42, no. 6, pp. 1326-1340, Nov 2012. [21] 晉茂林，機器人學，五南圖書出版，2000.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0619117-151814.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS