[강화학습-12]Sutton 교과서 챕터 10: On-policy Control with Approximation

Author

Irealist

Date

2020-06-17 21:35

Views

1063

이 챕터에서는 다시 control 문제로 돌아가서, 행동 가치 함수(action-value function) \(\hat{q}(s, a, w)\)의 파라미터화된 근사(parametric approximation)에 대해 알아보도록 하겠습니다. 계속해서 on-policy 방법만 보고, off-policy 케이스는 챕터 11에서 다루도록 합니다.

10.1 Episodic Semi-gradient Control

챕터 9에서의 semi-gradient prediction 방법을 행동 가치(action-value)로 확장하는 것은 아래와 같이 예전 챕터들처럼 v의 자리에 q를 넣으면 됩니다.

one-step Sarsa 방법을 예를 들면 아래와 같고, episodic semi-gradient one-step Sarsa라 부릅니다.

Control 문제를 풀기 위해서는 action-value prediction 방법과 함께 policy improvement 및 action selection 테크닉을 함께 사용해야 합니다. Continuous action space에 대한 테크닉은 연구 중이지만 명확한 답을 찾지 못했고, 너무 크지 않은 discrete action space의 경우 이전에 다뤘던 greedy 방법을 사용하면 됩니다.

Pseudocode는 아래에 있습니다.

Total 0

« [강화학습-11]Sutton 교과서 챕터 9: On-Policy Prediction with Approximation

[강화학습-13]Sutton 교과서 챕터 11: Off-policy Methods with Approximation »

List

Total 38

Number	Title	Author	Date	Votes	Views
Notice	[공지]Data Science 게시판의 운영에 관하여 Irealist \| 2020.05.18 \| Votes 0 \| Views 1234	Irealist	2020.05.18	0	1234
37	[통계분석-3]Statistical Concepts(작성중) Irealist \| 2020.08.23 \| Votes 0 \| Views 1000	Irealist	2020.08.23	0	1000
36	[통계분석-2]Statistical Data Irealist \| 2020.08.04 \| Votes 0 \| Views 1328	Irealist	2020.08.04	0	1328
35	[통계분석-1]통계 분석 시리즈를 시작하며 / Introduction Irealist \| 2020.08.04 \| Votes 0 \| Views 1170	Irealist	2020.08.04	0	1170
34	[강화학습-14]Sutton 교과서 챕터 13: Policy Gradient Methods Irealist \| 2020.06.21 \| Votes 0 \| Views 1598	Irealist	2020.06.21	0	1598
33	[강화학습-13]Sutton 교과서 챕터 11: Off-policy Methods with Approximation Irealist \| 2020.06.17 \| Votes 0 \| Views 942	Irealist	2020.06.17	0	942
32	[강화학습-12]Sutton 교과서 챕터 10: On-policy Control with Approximation Irealist \| 2020.06.17 \| Votes 0 \| Views 1063	Irealist	2020.06.17	0	1063
31	[강화학습-11]Sutton 교과서 챕터 9: On-Policy Prediction with Approximation Irealist \| 2020.06.15 \| Votes 0 \| Views 1000	Irealist	2020.06.15	0	1000
30	[강화학습-10]Sutton 교과서 챕터 2: Multi-armed Bandits Irealist \| 2020.06.04 \| Votes 0 \| Views 1310	Irealist	2020.06.04	0	1310
29	[강화학습-9]Sutton 교과서 챕터 17.4: Designing Reward Signals Irealist \| 2020.06.04 \| Votes 0 \| Views 977	Irealist	2020.06.04	0	977
28	[강화학습-8]Sutton 교과서 챕터 12: Eligibility Traces Irealist \| 2020.05.28 \| Votes 0 \| Views 1830	Irealist	2020.05.28	0	1830
27	[강화학습-7]Sutton 교과서 챕터 7: n-step Bootstrapping Irealist \| 2020.05.28 \| Votes 0 \| Views 2074	Irealist	2020.05.28	0	2074
26	[강화학습-6]Sutton 교과서 챕터 8: Tabular Methods Irealist \| 2020.05.27 \| Votes 0 \| Views 723	Irealist	2020.05.27	0	723
25	[강화학습-5]Sutton 교과서 챕터 6: Temporal-Difference Learning Irealist \| 2020.05.23 \| Votes 0 \| Views 1051	Irealist	2020.05.23	0	1051
24	[강화학습-4]Sutton 교과서 챕터 5: Monte Carlo Methods Irealist \| 2020.05.19 \| Votes 0 \| Views 1413	Irealist	2020.05.19	0	1413
23	[강화학습-3]Sutton 교과서 챕터 4: Dynamic Programming Irealist \| 2020.05.19 \| Votes 0 \| Views 1230	Irealist	2020.05.19	0	1230