Responsive image
博碩士論文 etd-0025116-130314 詳細資訊
Title page for etd-0025116-130314
論文名稱
Title
自適應探索策略於加強式學習
Adaptive Exploration Strategies for Reinforcement Learning
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
34
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-01-23
繳交日期
Date of Submission
2016-01-25
關鍵字
Keywords
狀態聚合、開發與探索之權衡問題、決策樹、ε-greedy、增強式學習、禁忌搜尋
Reinforce learning, Tabu search, State aggregation, trade-off between Exploration and Exploitation, decision tree, ε-greedy
統計
Statistics
本論文已被瀏覽 5683 次,被下載 15
The thesis/dissertation has been browsed 5683 times, has been downloaded 15 times.
中文摘要
增強式學習透過代理人以嘗試錯誤的方法來學習完成目標,但當我們要將其應用在真實環境中時,狀態如何的切割變得很難去決定。並且當代理人在學習過程中根據策略採取動作時,會遇到如何權衡開發與探索的問題,採取動作時是該探索新的區域以獲得經驗,還是要就現有的知識去獲得最大的回饋。為了解決遇到的問題,本論文先提出以決策樹為基礎的自適應性分割狀態空間演算法,在此方法上引入縮減禁忌搜尋和自適應探索策略,來解決開發與探索的問題。縮減禁忌搜尋會將此採取過的動作放進禁忌列表中,當禁忌列表滿時,則釋放禁忌列表中第一個動作,而此列表的大小會根據成功到達目標的次數作縮減。自適應探索策略是一種基於信息熵作探索率的調整,而不是固定或者手動。最後,以迷宮環境模擬驗證所提出方法的實用性,並由模擬的結果證明,提出方法的學習速度確實有明顯提升。
Abstract
Reinforcement learning through an agent to learn policy use trial and error method to achieve the goal, but when we want to apply it in a real environment, how to dividing state space becomes difficult to decide, another problem in reinforcement learning, agent takes an action in the learning process according to the policy, we will encounter how to balance exploitation and exploration, to explore a new areas in order to gain experience, or to get the maximum reward on existing knowledge. To solve problems, we proposed the decision tree-based adaptive state space segmentation algorithm and then use decreasing Tabu search and adaptive exploration strategies to solve the problem of exploitation and exploration on this method. Decreasing Tabu search will put the action into the Tabu list, after agent take an action. If the Tabu list is full, release the action, but the size of Tabu list will decreasing according to the number of successful reaching goals. Adaptive exploration strategy is based on information entropy, not tuning exploration rate by manually. Finally, a maze environment simulation is used to validate the proposed method, further to decrease the learning time.
目次 Table of Contents
論文審定書 i
摘要 ii
Abstract iii
目錄 iv
圖次 vi
表次 vii
符號定義表 viii
第一章 導論 1
1.1 動機 1
1.2 論文架構 1
第二章 背景介紹 2
2.1 馬可夫決策程序 2
2.2 增強式學習法 3
2.2.1 Q-Learning 4
2.3 禁忌搜尋(Tabu Search) 5
第三章 提出方法 6
3.1 以決策樹為架構的適應性切割狀態方法 6
3.1.1 部分馬可夫決策程序 6
3.1.2 建決策樹 8
3.2 開發與探索之權衡問題 12
3.2.1 縮減禁忌搜尋 12
3.2.2 自適應探索策略 12
第四章 實驗結果 15
4.1 迷宮的模擬實驗 15
第五章 結論與未來展望 21
5.1 結論 21
5.2 未來展望 21
REFERENCES 22
參考文獻 References
[1] R. S. Sutton, and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, 1998.
[2] L. P. Kaelbling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A Survey,” Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
[3] 李天岩,熵(Entropy) ,數學傳播第十三卷第三期。
[4] C. J. C. H. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
[5] M. Abramson and H. Wechsler, “Tabu Search Exploration for On-policy Reinforcement Learning,” in Proceedings of the International Joint Conference on Neural Networks, vol. 4, pp. 2910-2915, 2003.
[6]X. Zhang and Z. Liu, “An Optimized Q-Learning Algorithm Based on the Thinking of Tabu Search,” in Proceedings of the International Symposium on Computational Intelligence and Design, ISCID ’08, vol. 1, pp. 533-536, 2008.
[7] M. Tokic,”Adaptive ε-greedy Exploration in Reinforcement Learning based on Value Difference”, Advances in Artificial Intelligence, vol.6359, pp.203-210, 2010.
[8] A. F. Atiya, A. G. Parlos, and L. Ingber, “A reinforcement learning method based on adaptive simulated annealing,” in Proceedings of the 46th IEEE International Midwest Symposium on Circuits and Systems, (MWSCAS '03), vol. 1, pp. 121-124, 2003.
[9] M. A. Wiering and H. V. Hasselt ,"Ensemble algorithms in reinforcement learning," IEEE Trans. Syst., Man., Cybern. B, vol.38 ,no. 4 ,pp.930 -936 ,2008
[10] M. Coggan ,"Exploration and exploitation in reinforcement learning" , in Proc. of the 4th Int. Conf. Comput. Intell. Multimedia Appl. , pp.1 -44, 2001.
[11] T. K. Das, A. Gosavi, S. Mahadevan, and N. Marchalleck, “Solving Semi-markov Decision Problems Using Average Reward Reinforcement Learning, ”Management Science, vol. 45, pp. 560-574, 1999.
[12] R. Dearden, N. Friedman and D. Andre ,"Model based bayesian exploration" , in Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence,pp.150 -159, 1999.
[13] 陳昱仁, “基於自我組織決策樹多重代理人之策略分享機制”, 博士論文, 國立中正大學, 嘉義, 2009
[14] 羅嘉耀, ”以一種創新的自適應性探索策略革新加強式學習理論之架構”, 博士論文, 國立中正大學, 嘉義, 2012
[15] W. Y. Loh, and Y. S. Shih, “Split Selection Methods for Classification Trees,” Statistica Sinica, vol. 7, pp. 815-840, 1997
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:自定論文開放時間 user define
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code