Responsive image
博碩士論文 etd-1025115-185021 詳細資訊
Title page for etd-1025115-185021
論文名稱
Title
以模糊概念整合逆加強式學習之應用
Applying The Concept of Fuzzy Logic to Inverse Reinforcement Learning
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
52
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2015-11-06
繳交日期
Date of Submission
2015-11-26
關鍵字
Keywords
學徒學習、模糊理論、增強式學習、逆向加強式學習、回饋函數
Inverse reinforcement learning, Reward function, Fuzzy, Reinforcement learning, AdaBoost, Apprenticeship learning
統計
Statistics
本論文已被瀏覽 5675 次,被下載 10
The thesis/dissertation has been browsed 5675 times, has been downloaded 10 times.
中文摘要
逆增強式學習法透過專家示範行為定義一組回饋函數,其中回饋函數包含
Reward Weight和Reward Feature。當產生一組回饋函數時,代理人透過此回饋函數來學習策略,並與專家行為做比較,進而修正回饋函數,以此方法修正直到代理人能學習與專家相似的行為。本篇論文以模糊概念為基礎整合逆加強式學習,首先利用代理人與專家間相異度(dissimilarity) 來定義回饋函數調整學習權重。當相異程度越高,意味著代理人與專家的示範不相似度越高,此時需要調整的權重也需越大。反之,當相異度越低時,代理人與專家的行為將趨近相似,同時需調整的權重將趨近極小值。接著利用模糊概念促使代理人於環境中相鄰的狀態所給的資訊來形成當前狀態的回饋值,此方式可加快Reward Feature的傳遞,使代理人可快速趨近專家行為。最後,以迷宮環境、登山車環境和足球機器人環境模擬驗證所提出的方法,並由模擬的結果證明,所提的方法學習速度有明顯提升。
Abstract
It’s a study on Reinforcement Learning, learning interaction of agents and dynamic environment to get reward function R, and update the policy, converge learning and behavior. But in the cases of complex and difficult case, the feedback function R is especially difficult to determine. Inverse Reinforcement Learning can solve this problem. A policy π and a reward function R are defined by expert behavior demonstration, and compare π with the policy π′ of learning agent reward function R′, update reward function R′ until the policy π′ learn the same behavior as expert’s. The comparison procedure use error to adjust the weights, and utilize the level of dissimilarity for adjusting reward function R′, with the Reinforcement Learning Fuzzy concept to verify strategies (π). This method approximate expert policy faster.
目次 Table of Contents
論文審定書 i
摘要 iii
Abstract iv
圖表目錄 viii
表格目錄 x
第1章 導論 1
1.1 動機 1
1.2 論文架構 2
第2章 背景介紹 3
2.1 馬可夫決策程序 3
2.1.1 增強式學習法 4
2.1.2 Q-Learning 5
2.2 逆向增強式學習法 6
2.2.1逆向增強式學習 7
2.3 AdaBoost演算法 8
2.4模糊理論簡介 10
第3章 提出的方法 13
3.1 使用AdaBoost的概念來實現逆增強式學習 13
3.1.1 AdaBoost-IRL調整範例 16
3.2使用相異度(dissimilarity)的概念來實現逆增強式學習 20
3.3 模糊理論結合IRL的RL 23
第4章 模擬實驗與討論 29
4.1 迷宮的模擬實驗 29
4.2 登山車(Mountain Car)的模擬環境 32
4.3 仿真足球機器人 34
4.4 仿真足球機器人加入Fuzzy模擬實驗 37
第5章 結論與未來展望 39
5.1 結論 39
5.2 未來展望 39
REFERENCES 40
參考文獻 References
[1] S. Z. Shao, and J. M. Er, “A Review of Inverse Reinforcement Learning Theory and Recent Advances,” IEEE World Congress on Computational Intelligence, pp. 1-8, Brisbane, Australia, 2012.
[2] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q-Learning,” Machine Learning, vol.8: pp. 279-292, 1992.
[3] A. Ng and S. Russell, “Algorithms for Inverse Reinforcement Learning,” Proceedings of the 17th International Conference on Machine Learning, pp. 663-670, 2000.
[4] P. Abbeel and A. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proceedings of the 21st international conference on Machine learning, p.1, 2004.
[5] B. Michini, T. J. Walsh, A. A. Mohammadi, and J. P. How, “Bayesian Nonparametric Reward Learning from Demonstration,” IEEE Transactions on Robotics, VOL. 31, pp. 369-386, 2015.
[6] C. C. Lee, Imitation Learning Based on Inverse Reinforcement Learning. Ms. Thesis, National Chung Cheng University, 2014.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An Introduction, MIT Press, Cambridge, 1998.
[8] P. Stone, R. S. Sutton, and G. Kuhlmann, “Reinforcement Learning for RoboCup-Soccer Keepaway,” Adaptive Behavior, Vol. 13, pp. 165-188, 2005
[9] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, Vol. 4, pp. 237-285, 1996.
[10] A. Y. Ng and S. Russell, “Algorithms for Inverse Reinforcement Learning,” Proceedings of the 17th International Conference on Machine Learning, pp. 663–670, 2000.
[11] D. A. Pomerleau, “Efficient Training of Artificial Neural Networks for Autonomous Navigation,” Neural Computation, Vol. 3, pp. 88-97, 1991.
[12] Q. Liang and J. M. Mendel, “Interval Type-2 Fuzzy Logic Systems: Theory and Design,” IEEE Transactions on Fuzzy Systems, Vol. 8, pp. 535-550, 2000.
[13] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Information Sciences, 2001.
[14] J. M. Mendel, R. I. John, and F. Liu, “Interval Type-2 Fuzzy Logic Systems Made Simple,” IEEE Transactions on Fuzzy Systems, Vol. 14, pp. 808-821, December 2006.
[15] J. M. Mendel, “Type-2 Fuzzy Sets and Systems: An Overview,” IEEE Computational Intelligence Magazine, Vol. 2, pp. 20-29, 2007.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code