國立中山大學,National Sun Yat-sen University,學位論文,thesis/dissertation,加強式學習之經驗分享於分散式代理人之應用,Knowledge Sharing Approaches Based on Reinforcement Learning for Distributed Agents System

論文名稱 Title	加強式學習之經驗分享於分散式代理人之應用 Knowledge Sharing Approaches Based on Reinforcement Learning for Distributed Agents System
系所名稱 Department	電機工程學系 Department of Electrical Engineering
畢業學年期 Year, semester	105 學年度第 2 學期 The spring semester of Academic Year 105	語文別 Language	中文 Chinese
學位類別 Degree	碩士 Master	頁數 Number of pages	62
研究生 Author	王聖淵 Sheng-Yuan Wang
指導教授 Advisor	黃國勝 Kao-Shing Hwang
召集委員 Convenor	林金玲 Jin-Ling Lin
口試委員 Advisory Committee	朱明毅, 陳昱仁 Ming-Yi Ju; Yu-Jen Chen
口試日期 Date of Exam	2017-03-10	繳交日期 Date of Submission	2017-03-17
關鍵字 Keywords	分散式運算、蟻群演算法、經驗融合、經驗分享、增強式學習 Ant colony algorithm, Distributed computing, Reinforcement learning, Knowledge sharing, Knowledge merging
統計 Statistics	本論文已被瀏覽 5657 次，被下載 716 次 The thesis/dissertation has been browsed 5657 times, has been downloaded 716 times.

中文摘要
為消弭在大群的學習代理人進行經驗分享時，複雜混亂的知識交換行為，並可利用經驗分享來快速取得有用的環境訊息，增強個別學習代理人本身不足的學習經驗，本文提出一個雲端整合資訊的機制，其各個學習代理人僅與雲端伺服器溝通，以此去除複雜的知識交換行為；且能收集所有代理人的學習經驗並加以融合，接著分享給經驗不足的代理人。代理人利用蟻群演算法中，費洛蒙機制的概念，作為上傳自身經驗時評估經驗重要性的依據，此依據將化為權重值，並用於伺服器合併多代理人的學習經驗。為因應大量的學習經驗資料，雲端伺服器採用分散式儲存系統的資料儲存架構。而處理此海量的資料則採用Apache Hadoop 的軟體框架，其資料處理方式－MapReduce，為分散式運算架構，能快速且有效的處理大量資料。各個學習代理人會向雲端伺服器索取融合後的學習經驗，並將此融合經驗再次與自身的經驗互相整合，以達到經驗分享的目的。最後本論文以自製的小型伺服器，並在多台PC上模擬總數為360隻的學習代理人，以隨機散佈於環境中的方式，同時在相同的環境中學習以實作本文所提之方法，證明此方法能有效改善學習效果。
Abstract
Considering situations in a multi-agent system, if there are tremendous number of agents sharing knowledge with each other, it is complicated activities hard to be solved. This thesis proposed a method that all agents just connect with a server to alleviate the complexity of the experiences exchange activities. The server collects learning knowledge loaded from all the agents, merges the knowledge, and shares the knowledge to all agents which lack akin experiences. The agents utilized the proposed Pheromone Mechanism in Ant Colony Algorithm to evaluate whether an experience is worthy to upload to the server. The remained pheromone in the trace where states are visited along with becomes a weight for combining a collection of experiences on the server. Meanwhile, to deal with the problem of massive data processing, this thesis used the open source software－Apache Hadoop, along with the MapReduce programming model. The agents can take shared experiences integrated with their own knowledge to achieve knowledge sharing and increase the efficiency significantly. The proposed approach in this thesis was implemented by a homemade server and personal computers. The results of simulation with 360 learning agents demonstrate the performance of the proposed approach.

目次 Table of Contents
論文審定書 i 摘要 iii Abstract iv 圖表目錄 ix 表格目錄 xi 第1章導論 1 1.1 動機 1 1.2 論文架構 2 第2章文獻探討 3 2.1 馬可夫決策過程 3 2.1.1 增強式學習法 4 2.1.2 Q-Learning 5 2.2 蟻群演算法 7 2.2.1蟻群演算法 7 2.2.2蟻群演算法之費洛蒙更新 8 2.3 分散式系統 10 2.3.1分散式儲存系統 11 2.3.2分散式運算 12 第3章研究方法 14 3.1 建立複數學習代理人經驗分享機制 14 3.2使用蟻群理論設計加權函數 16 3.2.1費洛蒙機制 17 3.2.2加權函數 18 3.3 分散式系統應用 20 3.3.1資料儲存結構 20 3.3.2資料處理程序MapReduce與經驗融合法 22 3.4 整體流程與演算法 26 3.4.1個別學習-上傳模式 27 3.4.2經驗融合模式 29 3.4.3個別學習-下載模式 30 第4章模擬實驗與實作結果 33 4.1 迷宮模擬實驗 33 4.2 實作結果 41 第4章結論與未來展望 47 4.1 結論 47 4.2 未來展望 47 參考文獻 48

參考文獻 References
[1] A. V. Ivanov, and A. A. Petrovsky, “First-order Markov Property of The Auditory Spiking Neuron Model Response,” Signal Processing Conference, Florence, Italy, 4-8 Sept. 2006. [2] K. I. Y. Inoto, H. Taguchi, and A. Gofuku, “A Study of Reinforcement Learning with Knowledge Sharing,” in Proc. of IEEE Int. Conf. on Robotics and Biomimetics, Okayama, Japan, pp. 175-179, Hong Kong, China, 22-26 Aug. 2004. [3] Z. Jin, W. Y. Liu, and J. Jin, “State-Clusters Shared Cooperative Multi-Agent Reinforcement Learning,” Asian Control Conference ASCC, pp. 129-135, 27-29 Aug. 2009. [4] M. N. Ahmadabadi, and M. Asadpour, “Ecpertness Based Cooperative Q-Learning,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 32, no. 1, pp. 1083-1094, Feb. 2002. [5] B. N. Araabi, S. Mastoureshgh, and M. N. Ahmadabadi, “A Study on Expertise of Agents and Its Effects on Cooperative Q-Learning,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 37, no. 2, pp. 1083-1094, Apr. 2007. [6] A. Anuntapat, A. Thammano, and O. Wongwirat, “Searching Optimization Route by Using Pareto Solution with Ant Algorithm for Mobile Robot in Rough Terrain Environment,” Control, Automation, Robotics and Vision (ICARCV), International Conference, Phuket, Thailand, 13-15 Nov. 2016. [7] J. Li, J. Cheng, Y. Zhao, F. Yang, Y. Huang, H. Chen, and R. Zhao, “A Comparison of General-Purpose Distributed Systems for Data Processing,” Big Data IEEE International Conference, pp. 378-383, Washington D.C., USA, 5-8 Dec. 2016. [8] K. Ito, A. Gofuku, Y. Imoto, and M. Takeshita, “A study of reinforcement learning with knowledge sharing for distributed autonomous system,” Computational Intelligence in Robotics and Automation, Proceedings IEEE, pp. 1120-1125, Kobe, Japan, 16-20 July. 2003. [9] J. Pinto, P. Jain, and T. Kumar, “Hadoop distributed computing clusters for fault prediction,” Computer Science and Engineering Conference ICSEC, Chiang Mai, Thailand, 14-17 Dec. 2016. [10] T. Tateyama, S. Kawata, and Y. Shimomura, “Parallel Reinforcement Learning Systems using Exploration Agents and Dyna-Q Algorithm,” in Proc. SICE Annu. Conf., Takamatsu, Japan, pp. 2774-2778, Takamatsu, Japan, 17-20 Sept. 2007. [11] M. Hussin, Y. C. Lee, and A. Y. Zomaya, “Efficient Energy Management using Adaptive Reinforcement Learning-based Scheduling in Large-Scale Distributed Systems,” in International Conf. on Parallel Proc., Sydney, Australia, pp. 385-393, Taipei City, Taiwan, 13-16 Sept. 2011. [12] H. Karaoğuz, and H. Bozma, “Merging Appearance-Based Spatial Knowledge in Multirobot Systems,” Intelligent Robots and Systems (IROS), IEEE/RSJ International Conference, pp. 5107-5112, Daejeon, Korea, 9-14 Oct. 2016. [13] K.S. Hwang, W. C. Jiang, and Y. J. Chen, “Model Learning and Knowledge Sharing for a Multiagent System with Dyna-Q Learning,” IEEE Transactions on Cybernetics, vol. 45, no. 5, pp. 964-976, May. 2015. [14] K.S. Hwang, W. C. Jiang, Y. J. Chen, and W. H. Wang, “Reinforcement Learning with Model Sharing for Multi-Agent Systems,” System Science and Engineering ICSSE, pp. 293-296, Budapest, Hungary, 4-6 July. 2013. [15] A. Lazarowska, “Parameters Influence on the Performance of an Ant Algorithm for Safe Ship Trajectory Planning,” Cybernetics (CYBCONF), IEEE International Conference, Gdynia, Poland, 24-26 June. 2015. [16] X. Huang, H. Zhou, and W. Wu, “Hadoop Job Scheduling Based on Mixed Ant-Genetic Algorithm,” Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), International Conference, Xi'an, China, 17-19 Sept. 2015.

電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的，進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定，切勿任意重製、散佈、改作、轉貼、播送，以免觸法。論文使用權限 Thesis access permission：校內校外完全公開 unrestricted 開放時間 Available：校內 Campus：已公開 available 校外 Off-campus：已公開 available etd-0217117-155910.pdf
紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊，請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。開放時間 available 已公開 available

QR Code

國立中山大學圖書與資訊處 │ 諮詢服務：2452 論文審查小組 │ 服務信箱 │ 系統開發維運：圖資處知識創新組

Office of Library and Information Services, National Sun Yat-sen University │ Contact Us : 2452 Thesis Format Review Team , Mail │ Development and operations : Knowledge Innovation Division, LIS