國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,加強式學習於二足機器人的步態平衡,Gait Balancing of Biped Robots by Reinforcement Learning

論文名稱 Title	加強式學習於二足機器人的步態平衡 Gait Balancing of Biped Robots by Reinforcement Learning
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	101 學年度第 2 學期 The spring semester of Academic Year 101	語文別 Language	英文 English
學位類別 Degree	碩士 Master	頁數 Number of pages	77
研究生 Author	李哲勛 Jhe-Syun Li
指導教授 Advisor	黃國勝, 陳昱仁 Kao-Shing Hwang; Yu-Jen Chen
召集委員 Convenor	黃宗傳 Tsung-Chuan Huang
口試委員 Advisory Committee	許蒼嶺 Tsang-Ling Sheu
口試日期 Date of Exam	2013-07-19	繳交日期 Date of Submission	2013-08-26
關鍵字 Keywords	加強式學習、雙足機器人、零力矩點 Reinforcement learning, Biped robot, Zero moment point (ZMP)
統計 Statistics	本論文已被瀏覽 5724 次，被下載 721 次 The thesis/dissertation has been browsed 5724 times, has been downloaded 721 times.

中文摘要
在機器人行走的研究中，要建立一個具有 18 個維度的雙足機器人模型，並且要讓機器人能夠在行走過程中保持平衡，這是需要大量的數學計算與推導才能做到的。本論文的研究目的在於利用加強式學習來實現控制雙足機器人的平衡走路。雙足機器人平衡走路必須要考慮到機器人本身的零力矩點(ZMP)位置，若能適當控制機器人的零力矩點，機器人將可以穩定行走於平地，甚至是行走於有斜度的平面。在機器人行走的過程中，最容易跌倒的情況是發生在單腳支撐的時候，所以我們的研究主要是專注在單腳站立的平衡問題上。其中，平衡控制的方式是利用機器人手和腳的動作來轉移機器人的零力矩點，使得零力矩點能夠被控制在一個穩定的狀態。除此之外，以手和腳的組合動作為改變零力矩點的行為，也可以簡化控制機器人多顆馬達的複雜問題。本論文中，代理人的平衡學習加入了人類行走的經驗來做為學習的評估依據，除了增進學習效率之外，並且能夠讓機器人走路的步態更類似人類的行為模式。此外，本論文方法融合了平衡演算法與平衡控制方式，並且將其應用在二足機器人走平地和走翹翹板上，使機器人能夠平穩地行走。最後，以模擬與實作來呈現此平衡學習方法的可行性與執行效率。研究成果以影片呈現在YouTube: http://youtu.be/05a0hamjt9Q
Abstract
In the research of the humanoid biped robot, for building a robot model with 18 dimensions and applying this model to achieve the balance of robot behavior, it needs for large amount of calculation of mathematical derivations. The study on biped walking control using reinforcement learning is presented in this paper. When the robot keeps balance to walk, the zero moment point (ZMP) position of a biped robot has to be considered. If the ZMP of a biped robot could be controlled in an ideal state, the robot would walk steadily on the plain, even when the robot walks on a slope. In the robot walking process, a robot is easy to fall down when standing with one leg. Therefore, the research topic is mainly focused on how the robot keeps balance with one leg. The balance control way that utilized the motion of robot arm and leg to transfer the ZMP of the robot would maintain the ZMP in a stable state. In addition, the balance control way also can simplify the complexity of control of many servo motors. In this paper, the agent learns to control the ZMP by some balance control experience of human walking. It not only enhances learning efficiency, but also enables the robot walking gait more like human behavior. Furthermore, the proposed method integrates the balanced algorithm with the balance control way and is applied on biped walking on the plain or seesaw make the biped walk more stable. Finally, there are several simulations that demonstrate the feasibility and effectiveness of the proposed learning scheme. The Research results are presented by the video at YouTube: http://youtu.be/05a0hamjt9Q

目次 Table of Contents
摘要 i Abstract ii TABLE OF CONTENTS iii LIST OF FIGURES v LIST OF TABLES vii I. INTRODUCTION 1 1.1 Preface 1 1.2 Motivation and Objective 2 1.3 Organization of thesis 3 II. BACKGROUND 4 2.1 Reinforcement Learning 4 2.2 Q-learning Algorithm 6 2.3 Related Works 8 III. PROPOSED METHOD 10 3.1 Policy Update 10 3.2 State Space Construction 11 3.3 Action Space 13 3.4 Reward 17 3.4.1 Discrete Reward 19 3.4.2 Continuous Reward 20 3.5 Learning Process 21 IV. SIMULATION 23 4.1 Simulation Model 23 4.1.1 Simulation Environment 23 4.1.2 Biped Robot 25 4.2 One Leg Balance 28 4.3 Walking on Plain 32 4.3.1 Discrete Reward 33 4.3.2 Continuous Reward 34 4.4 Walking on Seesaw 37 4.4.1 Discrete Reward 38 4.4.2 Continuous Reward 39 4.5 Adaptability of a Policy in Different Environments 41 V. EXPERIMENT 45 5.1 Experiment Environment 45 5.2 Biped Robot 47 5.3 The Results of Experiment 52 5.3.1 Walking on Plain 53 5.3.2 Walking on Seesaw 55 5.3.3 Discussion of Experiment Results 57 VI. CONCLUSION 58 6.1 Summary 58 6.2 Future Work 58 REFERENCES 60 VITA 64

參考文獻 References
[1] K. Hirai, M. Hirose, and T. Takenaka, “The Development of Honda Humanoid Robot,” IEEE International Conference on Robotics & Automation, Vol.2, pp. 1321 – 1326, 1998. [2] R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998. [3] N. J. Nilsson, “Introduction to Machine Learning,” Robotics Laboratory Department of Computer Science, Stanford University, pp. 159 – 174, 1997. [4] T. Suzuki, T. Tsuji and K. Ohnishi, “Trajectory Planning of Biped Robot for Running Motion,” IECON 2005. 31st Annual Conference of IEEE Industrial Electronics Society, pp. 1815 – 1820, 2005. [5] L. Hu, and Z. Sun, “Reinforcement Learning Method-Based Stable Gait Synthesis for Biped Robot,” International Conference on Control, Automation, Robotics and Vision, pp.1017 – 1022, 2004. [6] J. Lee, and J. H. Oh, “Biped Walking Pattern Generation Using Reinforcement Learning,” 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 416 – 421, 2007. [7] A. W. Salatian, K. Y. Yiand, and Y. F. Zheng, “Reinforcement learning for a biped robot to climb sloping surfaces,” Journal of Robotic Systems 14(4), pp. 283 – 296, 1997. [8] J. Morimoto, G. Cheng, C. G. Atkeson, and G. Zeglin, “A Simple Reinforcement Learning Algorithm for Biped Robot,” IEEE International Conference on Robotics & Automation, pp. 3030 – 3035, 2004. [9] Napoleon, S. Nakaura, and M. Sampei, “Balance Control Analysis of Humanoid Robot Based on ZMP Feedback Control,” IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol.3, pp. 2437 – 2442, 2002. [10] D. Kim, S. J. Seo, and G. T. Park, “Zero-Moment Point Trajectory Modelling of a Biped Walking Robot Using an Adaptive Neuro-Fuzzy System,” IEE Proceedings on Control Theory and Applications, pp. 411 – 426, 2005. [11] D. Tlalolini, C. Chevallereau, and Y. Aoustin, “Human-Like Walking: Optimal Motion of a Bipedal Robot with Toe-Rotation Motion,” IEEE/ASME Transactions on Mechatronics, pp. 310 – 320, 2011. [12] J. Morimoto, and C. G. Atkeson, “Learning Biped Locomotion,” IEEE Robotics & Automation Magazine, pp. 41 – 51, 2007. [13] K. Suwanratchatamanee, and M. Matsumoto, “Balance Control of Robot and Human-Robot Interaction with Haptic Sensing Foots,” HSI '09. 2nd Conference on Human System Interactions, pp. 68 – 74, 2009. [14] K. C. Choi, H. J. Lee and M. Cheol, “Fuzzy Posture Control for Biped Walking Robot Based on Force Sensor for ZMP,” SICE-ICASE International Joint Conference, pp. 1185 – 1189, 2006. [15] J. P. Ferreira, M. M. Crisóstomo, and A. Paulo Coimbra, “SVR Versus Neural-Fuzzy Network Controllers for the Sagittal Balance of a Biped Robot,” IEEE Transactions on Neural Networks, pp. 1885 – 1897, 2009. [16] C. Zhou, and Q. Meng, “Reinforcement Learning with Fuzzy Evaluative Feedback for a Biped Robot,” IEEE International Conference on Robotics & Automation, pp. 3829 – 3834, 2000. [17] T. S. Li, Y. T. Su, S. H. Liu, J. J. Hu, and C. C. Chen, “Dynamic Balance Control for Biped Robot Walking Using Sensor Fusion, Kalman Filter, and Fuzzy Logic,” IEEE Transactions on Industrial Electronics, pp. 4394 – 4408, 2012. [18] Cyberbotics Ltd. Webots Reference Manual, http://www.cyberbotics.com/reference/ [19] AS-FS Force sensor, http://www.robotsfx.com/robot/AS_FS.html [20] AGB65-ADC, http://www.robotsfx.com/robot/AGB65_ADC.html [21] Dynamixel AX-12A, http://www.robotis.com/

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0726113-161151.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS