國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,以模糊概念整合逆加強式學習之應用,Applying The Concept of Fuzzy Logic to Inverse Reinforcement Learning

論文名稱 Title	以模糊概念整合逆加強式學習之應用 Applying The Concept of Fuzzy Logic to Inverse Reinforcement Learning
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	104 學年度第 1 學期 The fall semester of Academic Year 104	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	52
研究生 Author	林宏軒 Hung-shyuan Lin
指導教授 Advisor	黃國勝 Kao-Shing Hwang
召集委員 Convenor	林金玲 Jin-Ling Lin
口試委員 Advisory Committee	陳昱仁, 朱明毅 Yu-Jen Chen; Ming-Yi Ju
口試日期 Date of Exam	2015-11-06	繳交日期 Date of Submission	2015-11-26
關鍵字 Keywords	學徒學習、模糊理論、增強式學習、逆向加強式學習、回饋函數 Inverse reinforcement learning, Reward function, Fuzzy, Reinforcement learning, AdaBoost, Apprenticeship learning
統計 Statistics	本論文已被瀏覽 5675 次，被下載 10 次 The thesis/dissertation has been browsed 5675 times, has been downloaded 10 times.

中文摘要
逆增強式學習法透過專家示範行為定義一組回饋函數，其中回饋函數包含 Reward Weight和Reward Feature。當產生一組回饋函數時，代理人透過此回饋函數來學習策略，並與專家行為做比較，進而修正回饋函數，以此方法修正直到代理人能學習與專家相似的行為。本篇論文以模糊概念為基礎整合逆加強式學習，首先利用代理人與專家間相異度(dissimilarity) 來定義回饋函數調整學習權重。當相異程度越高，意味著代理人與專家的示範不相似度越高，此時需要調整的權重也需越大。反之，當相異度越低時，代理人與專家的行為將趨近相似，同時需調整的權重將趨近極小值。接著利用模糊概念促使代理人於環境中相鄰的狀態所給的資訊來形成當前狀態的回饋值，此方式可加快Reward Feature的傳遞，使代理人可快速趨近專家行為。最後，以迷宮環境、登山車環境和足球機器人環境模擬驗證所提出的方法，並由模擬的結果證明，所提的方法學習速度有明顯提升。
Abstract
It’s a study on Reinforcement Learning, learning interaction of agents and dynamic environment to get reward function R, and update the policy, converge learning and behavior. But in the cases of complex and difficult case, the feedback function R is especially difficult to determine. Inverse Reinforcement Learning can solve this problem. A policy π and a reward function R are defined by expert behavior demonstration, and compare π with the policy π′ of learning agent reward function R′, update reward function R′ until the policy π′ learn the same behavior as expert’s. The comparison procedure use error to adjust the weights, and utilize the level of dissimilarity for adjusting reward function R′, with the Reinforcement Learning Fuzzy concept to verify strategies (π). This method approximate expert policy faster.

目次 Table of Contents
論文審定書 i 摘要 iii Abstract iv 圖表目錄 viii 表格目錄 x 第1章導論 1 1.1 動機 1 1.2 論文架構 2 第2章背景介紹 3 2.1 馬可夫決策程序 3 2.1.1 增強式學習法 4 2.1.2 Q-Learning 5 2.2 逆向增強式學習法 6 2.2.1逆向增強式學習 7 2.3 AdaBoost演算法 8 2.4模糊理論簡介 10 第3章提出的方法 13 3.1 使用AdaBoost的概念來實現逆增強式學習 13 3.1.1 AdaBoost-IRL調整範例 16 3.2使用相異度(dissimilarity)的概念來實現逆增強式學習 20 3.3 模糊理論結合IRL的RL 23 第4章模擬實驗與討論 29 4.1 迷宮的模擬實驗 29 4.2 登山車(Mountain Car)的模擬環境 32 4.3 仿真足球機器人 34 4.4 仿真足球機器人加入Fuzzy模擬實驗 37 第5章結論與未來展望 39 5.1 結論 39 5.2 未來展望 39 REFERENCES 40

參考文獻 References
[1] S. Z. Shao, and J. M. Er, “A Review of Inverse Reinforcement Learning Theory and Recent Advances,” IEEE World Congress on Computational Intelligence, pp. 1-8, Brisbane, Australia, 2012. [2] C. J. C. H. Watkins, and P. Dayan, “Technical note: Q-Learning,” Machine Learning, vol.8: pp. 279-292, 1992. [3] A. Ng and S. Russell, “Algorithms for Inverse Reinforcement Learning,” Proceedings of the 17th International Conference on Machine Learning, pp. 663-670, 2000. [4] P. Abbeel and A. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proceedings of the 21st international conference on Machine learning, p.1, 2004. [5] B. Michini, T. J. Walsh, A. A. Mohammadi, and J. P. How, “Bayesian Nonparametric Reward Learning from Demonstration,” IEEE Transactions on Robotics, VOL. 31, pp. 369-386, 2015. [6] C. C. Lee, Imitation Learning Based on Inverse Reinforcement Learning. Ms. Thesis, National Chung Cheng University, 2014. [7] R. S. Sutton and A. G. Barto, Reinforcement learning: An Introduction, MIT Press, Cambridge, 1998. [8] P. Stone, R. S. Sutton, and G. Kuhlmann, “Reinforcement Learning for RoboCup-Soccer Keepaway,” Adaptive Behavior, Vol. 13, pp. 165-188, 2005 [9] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, Vol. 4, pp. 237-285, 1996. [10] A. Y. Ng and S. Russell, “Algorithms for Inverse Reinforcement Learning,” Proceedings of the 17th International Conference on Machine Learning, pp. 663–670, 2000. [11] D. A. Pomerleau, “Efficient Training of Artificial Neural Networks for Autonomous Navigation,” Neural Computation, Vol. 3, pp. 88-97, 1991. [12] Q. Liang and J. M. Mendel, “Interval Type-2 Fuzzy Logic Systems: Theory and Design,” IEEE Transactions on Fuzzy Systems, Vol. 8, pp. 535-550, 2000. [13] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Information Sciences, 2001. [14] J. M. Mendel, R. I. John, and F. Liu, “Interval Type-2 Fuzzy Logic Systems Made Simple,” IEEE Transactions on Fuzzy Systems, Vol. 14, pp. 808-821, December 2006. [15] J. M. Mendel, “Type-2 Fuzzy Sets and Systems: An Overview,” IEEE Computational Intelligence Magazine, Vol. 2, pp. 20-29, 2007.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：自定論文開放時間 user define 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-1025115-185021.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS