國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,具經驗轉移功能之加強式學習於步態平衡之應用,Gait Balancing by Q-Learning with a Knowledge Transfer Function

論文名稱 Title	具經驗轉移功能之加強式學習於步態平衡之應用 Gait Balancing by Q-Learning with a Knowledge Transfer Function
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	102 學年度第 1 學期 The fall semester of Academic Year 102	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	90
研究生 Author	劉力維 Li-wei Liu
指導教授 Advisor	黃國勝 Kao-Shing Huang
召集委員 Convenor	黃宗傳 Tsung-Chuan Huang
口試委員 Advisory Committee	陳昱仁, 許蒼嶺 Yu-Jen Chen; Tsang-Ling Sheu
口試日期 Date of Exam	2014-01-28	繳交日期 Date of Submission	2014-02-13
關鍵字 Keywords	雙足機器人、零力矩點、加強式學習 Biped robot, Zero moment point (ZMP), Reinforcement learning
統計 Statistics	本論文已被瀏覽 5698 次，被下載 90 次 The thesis/dissertation has been browsed 5698 times, has been downloaded 90 times.

中文摘要
本論文的目的是基於加強式學習(reinforcement learning)應用在二足機器人的行走平衡的控制上，無涉及使用任何雙足機器人的動態模型的相關知識情況下，學習如何平衡步態的向前行走，為了使雙足機器人能更貼近真實環境下所使用，如何讓動作在可調控範圍內成為連續動作是一個值得探討的問題，因為這是比簡單的離散動作更能吻合應用於真實環境之中，並且提高學習效率，接著以增強式學習法架構出地形耦合演算法來學習如何將各種來源任務傳輸到正確的目標任務上，使適應各種不同地形以達到平衡狀態。 Q-learning在學習每一個單腳平衡時的姿態的零力矩點(ZMP)平衡至穩定狀態的同時，因此，我們設計了一個學習架構，解決雙足機器人在真實環境中的平衡且穩定行走，然而我們是使用手與腳的同時擺動來改變雙足機器人當前的姿態，並達到ZMP的轉移。Q-learning 基於查表法會對應一有限的離散動作和狀態空間。我們提出一個學習架構來處理連續動作空間中增強式學習的問題，將自組織式的狀態聚集機制在增強式學習演算法上，同時也簡化多顆馬達控制的複雜問題。此外，本論文提出Knowledge Transfer Learning，將不同經驗的轉換使用，是以各種不同的經驗視為來源任務且應用於翹翹板環境，使機器人在翹翹板上能穩定前進。其中，以上坡、下坡和平地等地形的行走經驗作為來源任務。研究結果以影片呈現在http://youtu.be/mVahCHBFWyo
Abstract
The purpose of this thesis is based on reinforcement learning applied on the control of biped robot’s walking balance, and learning how to walk straightforward with well balance without involving any related biped robot dynamic model knowledge. In order to make the biped robot to be employed in a real environment, we need to make a discrete action space within a continuous action domain. It has become an issue worth exploring, because continuous actions are more applicable to real environment than simple discrete action and could enhance the learning efficiency. And then we use reinforcement learning to transfer for knowledge and to learn how to transfer tasks from various sources to the right objectives and to adapt them to different terrain to reach balance. On training the zero moment point (ZMP) of each pose in to a stable and well-balanced state by Q-learning. Therefore, to make the biped robot walk well-balanced and steadily in real environment, we designed a learning structure; The balance control way that utilized the motion of robot arm and leg to transfer the ZMP. Based convention Q-learning which can generate a mapping between a paradigm action and a discrete state space, the proposed reinforcement learning algorithm is developed to deal with the problem of reinforcement learning in continuous action domain by means of a self-organized mechanism . In addition, this thesis also presents terrain coupling algorithm to convert different experiences, which is based on various different tasks and experiences applied to seesaw environment, with the purpose of making the robot move forward stably on a seesaw. Among them, the uphill, downhill, and flat ground walking experiences are viewed as our source task. The Research results are presented by the video at YouTube: http://youtu.be/mVahCHBFWyo

目次 Table of Contents
摘要 i ABSTRACT ii LIST OF FIGURES vi LIST OF TABLES viii I. INTRODUCTION 1 1.1 Motivation 1 1.2 Objective 2 1.3 Organization of Thesis 3 II. BACKGROUND 4 2.1 Reinforcement Learning 4 2.2 Q-learning Algorithm 6 2.3 Related Works 8 III. PROPOSED METHOD 10 3.1 Policy Update 11 3.2 State Space Construction 12 3.3 Action Space 15 3.3.1 Continuous Action 21 3.3.2 Action Bias Mean 22 3.3.3 Action Bias Variance 24 3.4 Reward 24 3.5 Continuous-Action Q-learning 27 3.6 Knowledge Transfer Learning 30 3.6.1 State Space Construction 31 3.6.2 Action Space 32 3.6.3 Reward 32 IV. SIMULATION 34 4.1 Simulation Environment 34 4.1.2 Biped Robot 37 4.2 Using Q-learning for Gait Balancing 40 4.2.1 With & Without Continuous Action Q-Agent 41 4.2.2 Compare with Other Method 46 4.3 The Adaptability to Different Terrains 48 4.4 Knowledge Transfer Learning 52 V. EXPERIMENT 57 5.1 Experiment Environment 57 5.2 Biped Robot 59 5.3 The Results of Experiment 64 5.3.1 Results of Bioloid Robot Walking on Various Environment 65 5.3.2 Using Knowledge Transfer Learning for Walking on Seesaw 70 VI. CONCLUSION 72 6.1 Summary 72 6.2 Future Work 73 REFERENCES 75

參考文獻 References
[1] R. S. Sutton, and Andrew G. Barto, Reinforcement Learning:An Introduction, MIT Press, Cambridge,1998. [2] L. Hu, and Z. Sun, “Reinforcement Learning Method-Based Stable Gait Synthesis for Biped Robot, ” Control, Automation, Robotics and Vision Conference, Vol. 2, pp. 1017 - 1022, 2004. [3] N. J. Nilsson, “Introduction to Machine Learning, ” Robotics Laboratory Department of Computer Science, Stanford University, p. 159 –174, 1997. [4] J. Morimoto, and C. G. Atkeson, “Learning Biped Locomotion, ” IEEE Robotics & Automation Magazine, p. 41 –51, 2007. [5] J. Morimoto, G. Cheng, C. G. Atkeson, and G. Zeglin, “A Simple Reinforcement Learning Algorithm for Biped Robot, ” IEEE International Conference on Robotics & Automation, p. 3030 –3035, 2004. [6] A. W. Salatian, K. Y. Yi, and Y. F. Zheng, “Reinforcement Learning for a Biped Robot to Climb Sloping Surfaces, ” Journal of Robotic Systems 14(4), p. 283 –296. [7] Napoleon, S. Nakaura, and M. Sampei, “ Balance Control Analysis of Humanoid Robot Based on ZMP Feedback Control, ” IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol.3, p. 2437 – 2442, 2002. [8] T. Suzuki, T. Tsuji and K. Ohnishi, “Trajectory Planning of Biped Robot for Running Motion,, ” Industrial Electronics Society IECON 2005. 31st Annual Conference of IEEE, p. 1815–1820, 2005. [9] D. Tlalolini, C. Chevallereau, and Y. Aoustin, “Human-Like Walking: Optimal Motion of a Bipedal Robot with Toe-Rotation Motion, ” IEEE/ASME Transactions on Mechatronics, p. 310 – 320, 2011. [10] K. C. Choi, H. J. Lee, and M. Cheol, “Fuzzy Posture Control for Biped Walking Robot Based on Force Sensor for ZMP, ” SICE-ICASE International Joint Conference, p. 1185 – 1189, 2006. [11] K. Suwanratchatamanee, and M. Matsumoto, “Balance Control of Robot and Human-Robot Interaction with Haptic Sensing Foots, ” HSI '09. 2nd Conference on Human System Interactions, p. 68 – 74, 2009. [12] T. S. Li, Y. T. Su, S. H. Liu, J. J. Hu, and C. C. Chen, “Dynamic Balance Control for Biped Robot Walking Using Sensor Fusion, Kalman Filter, and Fuzzy Logic, ” IEEE Transactions on Industrial Electronics, p. 4394 – 4408, 2012. [13] K. S. Hwang, Y.J. Chen, “An Adaptive State Aggregation Approach to Q-Learning with Real-Valued Action Function, ” IEEE International Conference on Systems, Man, and Cybernetics, p. 164 – 170, 2010. [14] J. S. Li, Gait Balancing of Biped Robots by Reinforcement Learning, Master Thesis, Department of Electrical Engineering, National Sun Yat-sen University, 2013. [15] Webots Reference Manual. [Online]. Available: http://www.cyberbotics.com/reference/. [Accessed 6 6 2013]. [16] Webots User Guide. [Online]. Available: http://www.cyberbotics.com/guide/. [Accessed 7 6 2013]. [17] Dynamixel AX-12A. [Online]. Available: http://www.robotis.com/.[Accessed 10 12 2013]. [18] AS-FS Force sensor. [Online]. Available:http://www.robotsfx.com/robot/AS_FS.html. [Accessed 12 12 2013]. [19] AGB65-ADC. [Online]. Available: http://www.robotsfx.com/robot/AGB65_ADC.html. [Accessed 12 12 2013].

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0113114-142816.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS