Responsive image
博碩士論文 etd-0728116-144121 詳細資訊
Title page for etd-0728116-144121
論文名稱
Title
可參考旁觀者暗示性評斷修正之加強式學習
A reinforcement learning method with implicit critics from a bystander
系所名稱
Department
畢業學年期
Year, semester
語文別
Language
學位類別
Degree
頁數
Number of pages
54
研究生
Author
指導教授
Advisor
召集委員
Convenor
口試委員
Advisory Committee
口試日期
Date of Exam
2016-08-17
繳交日期
Date of Submission
2016-08-28
關鍵字
Keywords
隨機式加強式學習、表情辨識、Actor critic、深度學習、加強式學習
Stochastic reinforcement learning, Reinforcement learning, Actor critic, Deep learning, Facial expression recognition
統計
Statistics
本論文已被瀏覽 5679 次,被下載 769
The thesis/dissertation has been browsed 5679 times, has been downloaded 769 times.
中文摘要
在加強式學習中,代理人透過無數的嘗試得到學習的經驗,經過不斷的訓練,代理人能夠學習完成不同任務的各種行為,但在人機協同工作環境中,代理人除了與環境有互動外,其實與使用者行為或意向也是環環相扣的。本論文以加強式學習中著名的action-critic的架構作為Q學習機構的鷹架,並提出以隨機行動產生的概念來解決傳統Q學習無法產生連續動作的問題,同時延伸所提出的學習架構Actor Critic-Q (ACQ)學習,讓機器人能夠就以即有的獎勵函式去學習某些設定的行為外,也能夠會透過經常性觀察使用者表情而修正內建的行為模式,而從接觸使用者的身上得到經驗,達到客製化行為的學習,也就是說,讓學習代理人能夠除透過內定的獎勵函數去學習行為策略外,也能夠從人的身上得到的互動而修正原有行動策略,而達到策略客製化的目地。在辨識使用者表情的技術,本文是以深度學習(Deep Learning)去訓練,訓練完成後,將辨識到的表情轉化成正向與負向兩種情況,此一訊號係作為另一個ACQ獎勵訊號。從跨越沼澤的實驗結果中可觀查到,本文所提出的雙ACQ的學習架構,的確可讓機器人能夠就以即有的獎勵函式所學習到的慣性行為轉化到符合使用者暗示性評斷的行為。
Abstract
In reinforcement learning, agents try several times to get experience to complete the behavior policy of different missions. But, agents are not only interactive with the environment but also human beings in the environment of human computer cooperation. This thesis applied an actor critic model, which is one of the popular reinforcement learning methods, as the scaffold of the proposed Q-learning. The proposed method introduced a concept of generating actions by a stochastic function to solve the problem of generating continuous actions that can’t be solved by traditional Q-learnings. Synthetically, this thesis designed a reinforcement learning architecture, called Actor Critic-Q (ACQ), to allow agents to learn a behavioral policy by original reward function, and can also modify its built-in behavior through observing users’ emotions. That is, agents get experiences from users’ implicit critics, facial expressions in this case, to achieve the learning of customized behavior. For facial recognition, this thesis applied a Deep Learning to training recognition abilities of the learning agents. In recall of the neural network, facial expressions are classified into a dichotomy status, good or bad, and this signal is the reward signal to the other module of dual ACQs. From the experiments, it is observed that the dual ACQ architecture proposed can allow agents to transfer from the behavior policy learned by the original reward function to a compromised one taking account of implicit critics from users.
目次 Table of Contents
摘要..................................................................i
Abstract...........................................................ii
目錄................................................................iv
圖次................................................................vi
表次...............................................................viii
Ⅰ. 導論..............................................................1
1.1 動機........................................................1
1.2 論文架構.................................................2
II. 背景介紹......................................................3
2.1 加強式學習............................................3
2.2 Actor Critic............................................3
2.3 Continuous State and Action Q-Learning..........4
2.3.1 Adaptive Critic Methods....................4
2.3.2 CMAC Based Q-learning..................5
2.3.3 Q-AHC..............................................5
2.3.4 提出的方法.......................................5
2.4 Stochastic reinforcement learning..... .6
2.5 深度學習............................................. 7
2.6 光流....................................................12
III.提出方法...................................................13
3.1 使用stacked sparse autoencoder辨識表情.....13
3.2 Actor-Critic-Q......................................16
3.3 Actor-Critic-Q with continuous actions............20
3.4 雙ACQ.................................................24
Ⅳ.模擬結果....................................................28
4.1表情辨識結果........................................28
4.2 比較離散與連續的動作.........................29
4.3 比較不同 的影響...................................31
4.4人為介入................................................34
4.4.1 迷宮一................................................34
4.4.2 迷宮二................................................37
Ⅴ.結論與未來展望...........................................40
5.1 Conclusion............................................40
5.2 Future work...........................................40
REFERENCES...............................................41
參考文獻 References
[1] R. S, Sutton, and A. G. Barto, “Reinforcement Learning: An Introduction,” MIT
Press Cambridge, 1998.
[2] L. P. Kaebling, M. L. Littman, and A. W. Moore, “Reinforcement Learning: A
Survey,” Journal of Artificial Intelligence Research 4, pp. 237-285, 1996.
[3]. C. J. C. H. Watkins, “Learning from Delayed Rewards,” Ph.D. Thesis,
Cambridge University, 1989.
[4]. V. Konda and J. Tsitsiklis. “Actor-critic algorithms”. In Advances in Neural
Information Processing Systems 12, 2000.
[5] A. G. Batro, R. S. Sutton, and C. W. Anderson, “Neuronlike adaptive elements
that can solve difficult learning control problems,” IEEE Trans. Syst., Man,
Cyben., Vol. SMC-13, pp.834-846, 1993.
[6] Paul J. Werbos. “Approximate dynamic programming for real-time control and
neural modeling,” In D. A. White and D. A. Sofge, editors, Handbook of
Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. Van Nostrand
Reinhold, 1992.
[7] Juan C. Santamaria, Richard S. Sutton, and Ashwin Ram. “Experiments with
reinforcement learning in problems with continuous state and action spaces,”
Adaptive Behaviour, 6(2):163-218, 1998.
[8] J. S. Albus. “A new approach to manipulator control: the cerebrellar model ar-
ticulated controller (CMAC),” J. Dynamic Systems, Measurement and
Control,97:220-227, 1975.
[9] Gavin Adrian Rummery. “Problem solving with reinforcement learning,” PhD
thesis, Cambridge University, 1995.
[10] V. Gullapalli, “A stochastic reinforcement learning algorithm for learning real
valued functions,” Neural Net, Vol.3, pp.671-692, 1990.
[11] V. Gullapalli, “Associative reinforcement learning of real valued functions,” Proc.
IEEE, Syst., Man, Cybern, Charlottesville, VA, Oct. 1991.
[12] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with
deep convolutional neural networks”, in NIPS, 2012.
[13] G. Hinton, S. Osindero, Y. The, “A fast learning algorithm for deep belief nets”,
Neural Computations, 2006
[14] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, “Stacked Denoising
Autoencoders: Learning Useful Representations in a Deep Network with a
Local Denoising Criterion”, The Journal of Machine Learning Research archive,
Vol.11, pp.3371-3408, 2010.
[15] Pierre Baldi, “Autoencoders, Unsupervised Learning, and Deep Architectures”,
JMLR Workshop and Conference Proceedings Vol.27, pp.37–50, 2012.
[16] B. Widrow and M. A. Lehr, “30 years of adaptive neural networks: perceptron,
madaline , and backpropagation,” Proc. IEEE, Vol.78, No.9, pp.1415-1442,
1990.
[17] Lucey, Patrick, et al. "The extended Cohn-Kanade dataset (CK+): A complete
dataset for action unit and emotion-specified expression." Computer Vision and
Pattern Recognition Workshops, 2010.
[18] B. Horn and B. Schunck. “Determining optical flow”, Artificial Intelligence,
16:185–203, Aug. 1981.
[19] Yunfan Liu, “Facial Expression Recognition and Generation using Sparse
Autoencoder,” International Conference on Smart Computing , pp. 125-130,
2014.
[20] Sun, Deqing, Stefan Roth, and Michael J. Black. “Secrets of optical flow
estimation and their principles.”IEEE Conference on Computer Vision and
Pattern Recognition, pp.2432-2439, 2010.
電子全文 Fulltext
本電子全文僅授權使用者為學術研究之目的,進行個人非營利性質之檢索、閱讀、列印。請遵守中華民國著作權法之相關規定,切勿任意重製、散佈、改作、轉貼、播送,以免觸法。
論文使用權限 Thesis access permission:校內校外完全公開 unrestricted
開放時間 Available:
校內 Campus: 已公開 available
校外 Off-campus: 已公開 available


紙本論文 Printed copies
紙本論文的公開資訊在102學年度以後相對較為完整。如果需要查詢101學年度以前的紙本論文公開資訊,請聯繫圖資處紙本論文服務櫃台。如有不便之處敬請見諒。
開放時間 available 已公開 available

QR Code