Responsive image
博碩士論文 etd-1028114-170500 詳細資訊
Title page for etd-1028114-170500
論文名稱
Title
基於關鍵狀態的逆向增強式學習演算法
Inverse Reinforcement Learning based on Critical State
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
57
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2014-11-22
繳交日期
Date of Submission
2014-11-28
關鍵字
Keywords
加強式學習、逆向加強式學習、獎懲函數、獎懲特徵建構、學徒學習
reward feature construction, Apprenticeship Learning, Inverse Reinforcement learning, reward function, Reinforcement learning
統計
Statistics
本論文已被瀏覽 5702 次,被下載 663
The thesis/dissertation has been browsed 5702 times, has been downloaded 663 times.
中文摘要
增強式學習法透過學習代理人與動態環境互動獲得獎懲資訊,以此更新策略,達到控制最佳化,增強式學習中有一個重要的依據,獎懲函數。獎懲函數是一組最簡潔扼要的資訊表達專家意圖,在一些複雜的問題中,獎懲函數往往難以決定。為了解決這樣的問題,逆向增強式學習開始受到重視。逆向增強式學習主要用於尋找馬可夫決策程序中的獎懲函數。使用傳統逆向增強式學習演算法必須提供獎懲函數索引,和一組範例軌跡,但在複雜的問題中,往往難以挑出適當的獎懲函數索引,有時會直接採用整個狀態空間作為獎懲函數索引。本論文提出基於關鍵狀態的逆向增強式學習演算法,可以提供一組正確範例軌跡,和一組錯誤的範例軌跡,透過比較兩者之異同,從整個狀態空間中萃取出適當的關鍵狀態作為獎懲函數索引,並且求出簡潔、有意義的獎懲函數,由實驗結果得知,比起使用整個狀態空間,可以得到更接近專家的策略,且大量節省計算成本。本論文之成果將以影片呈現在 YouTube: http://youtu.be/cMaOdoTt4Hw。
Abstract
Reinforcement Learning (RL) makes an agent learn through interacting with a dynamic environment. One fundamental assumption of existing RL algorithms is that reward function, the most succinct representation of the designer’s intention, needs to be provided beforehand. It is difficult to provide appropriate reward functions in complex problems. The goal of the inverse reinforcement learning is finding a reward function in Markov Decision Process. A set of reward indexes and good example traces demonstrated by expert are needed in an IRL process. However, it is difficult to select a set of reward indexes in complex problems. In this thesis, Inverse Reinforcement Learning based on Critical State (IRLCS) algorithm is proposed to search a succinct and meaningful reward function. IRLCS select a set of reward indexes from whole state space through comparing the difference between the good and bad demonstrations. According to the results of experiment, IRLCS can find a good strategy that closes to the expert strategy. Besides, IRLCS save a lot of computational time. The Research results are presented by the video at YouTube: http://youtu.be/cMaOdoTt4Hw .
目次 Table of Contents
摘要 i
Abstract ii
LIST OF FIGURES v
LIST OF TABLES vi
I. INTRODUCTION 1
1.1 Preface 1
1.2 Motivation and Objective 2
1.3 Markov Decision Process 3
1.4 Reinforcement Learning 3
1.5 Q-learning Algorithm 5
1.6 Inverse Reinforcement Learning 7
1.7 TrAdaboost 8
1.8 Related Works 9
1.9 Organization of thesis 10
II. PROPOSED METHOD 11
2.1 Apprenticeship Learning 11
2.2 Inverse Reinforcement Learning Via Orthogonal Projection 12
2.2.1 Reward Index 12
2.2.2 Iteration Algorithm 14
2.3 Reward Index Construction 18
2.3.1 Impurity function 19
2.3.1 Visit frequency of states 21
2.3.2 Visit frequency of state-action pairs 22
2.4 Inverse Reinforcement Learning Based on Critical State 25
III. EXPERIMENT 32
3.1 Experiment Environment 32
3.2 Purpose of the experiment 35
3.3 The Results of Experiment 35
3.3.1 Collision-avoidance behavior 36
3.3.2 Driving-as-fast-as-possible behavior 40
3.4 Conclusion of Experiment Results 43
IV. CONCLUSION 44
4.1 Summary 44
4.2 Future Work 45
REFERENCES 46
參考文獻 References
[1] S. Levine, Z. Popovic, and V. Koltun, “Feature construction for inverse reinforcement learning,” Advances in Neural Information Processing Systems, volume 23, 2010.
[2] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q-Learning,” Machine Learning, 8(3-4): pp. 279-292, 1992.
[3] A. Ng and S. Russell, “Algorithms for inverse reinforcement learning,” Proceedings of the 17th International Conference on Machine Learning, pp. 663–670, 2000.
[4] P. Abbeel and A. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proceedings of the 21st international conference on Machine learning, p. 1, 2004.
[5] R. E. Schapire, “A brief introduction to boosting,” Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1401-1406, 1999.
[6] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transfer learning,” Proceedings of the 24th International Conference on Machine Learning, pp. 193–200, New York, NY, USA, 2007.
[7] J. Kolter, P. Abbeel, and A. Ng, “Hierarchical apprenticeship learning with application to quadruped locomotion,” Advances in Neural Information Processing Systems, vol. 20, 2008.
[8] P. Abbeel, A. Coates, and A. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.
[9] P. Abbeel, D. Dolgov, A. Ng, and S. Thrun, “Apprenticeship learning for motion planning with application to parking lot navigation,” IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1083–1090, 2008.
[10] S. Chung and H. Huang, “A mobile robot that understands pedestrian spatial behaviors,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5861–5866, 2010.
[11] S.-Y. Chen, H. Qian, J. Fan, Z.-J. Jin, and M.-L. Zhu, “Modified reward function on abstract features in inverse reinforcement learning,” Journal of Zhejlang University - Science C, vol. 11, no. 9, pp. 718-723, 2010.
[12] D. Grollman and A. Billard, “Donut as i do: Learning from failed demonstrations,” International Conference on Robotics and Automation, Shanghai, 2011.
[13] R. Balian, “Entropy, a Protean concept,” Poincaré Seminar 2003, pp. 119–144.
[14] M. Lopes, F. Melo, and L. Montesano, “Active learning for reward estimation in inverse reinforcement learning,” Machine Learning and Knowledge Discovery in Databases, pp. 31–46, 2009.
[15] R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998.
[16] J. Tang, A. Singh, N. Goehausen, and P. Abbeel, “Parameterized maneuver learning for autonomous helicopter flight,” IEEE International Conference on Robotics and Automation (ICRA), pp. 1142–1148, 2010.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code