[강화학습-12]Sutton 교과서 챕터 10: On-policy Control with Approximation

Author

Irealist

Date

2020-06-17 21:35

Views

1070

이 챕터에서는 다시 control 문제로 돌아가서, 행동 가치 함수(action-value function) \(\hat{q}(s, a, w)\)의 파라미터화된 근사(parametric approximation)에 대해 알아보도록 하겠습니다. 계속해서 on-policy 방법만 보고, off-policy 케이스는 챕터 11에서 다루도록 합니다.

10.1 Episodic Semi-gradient Control

챕터 9에서의 semi-gradient prediction 방법을 행동 가치(action-value)로 확장하는 것은 아래와 같이 예전 챕터들처럼 v의 자리에 q를 넣으면 됩니다.

one-step Sarsa 방법을 예를 들면 아래와 같고, episodic semi-gradient one-step Sarsa라 부릅니다.

Control 문제를 풀기 위해서는 action-value prediction 방법과 함께 policy improvement 및 action selection 테크닉을 함께 사용해야 합니다. Continuous action space에 대한 테크닉은 연구 중이지만 명확한 답을 찾지 못했고, 너무 크지 않은 discrete action space의 경우 이전에 다뤘던 greedy 방법을 사용하면 됩니다.

Pseudocode는 아래에 있습니다.

Total 0

« [강화학습-11]Sutton 교과서 챕터 9: On-Policy Prediction with Approximation

[강화학습-13]Sutton 교과서 챕터 11: Off-policy Methods with Approximation »

List

Total 38

Number	Title	Author	Date	Votes	Views
Notice	[공지]Data Science 게시판의 운영에 관하여 Irealist \| 2020.05.18 \| Votes 0 \| Views 1238	Irealist	2020.05.18	0	1238
37	[통계분석-3]Statistical Concepts(작성중) Irealist \| 2020.08.23 \| Votes 0 \| Views 1007	Irealist	2020.08.23	0	1007
36	[통계분석-2]Statistical Data Irealist \| 2020.08.04 \| Votes 0 \| Views 1331	Irealist	2020.08.04	0	1331
35	[통계분석-1]통계 분석 시리즈를 시작하며 / Introduction Irealist \| 2020.08.04 \| Votes 0 \| Views 1176	Irealist	2020.08.04	0	1176
34	[강화학습-14]Sutton 교과서 챕터 13: Policy Gradient Methods Irealist \| 2020.06.21 \| Votes 0 \| Views 1608	Irealist	2020.06.21	0	1608
33	[강화학습-13]Sutton 교과서 챕터 11: Off-policy Methods with Approximation Irealist \| 2020.06.17 \| Votes 0 \| Views 950	Irealist	2020.06.17	0	950
32	[강화학습-12]Sutton 교과서 챕터 10: On-policy Control with Approximation Irealist \| 2020.06.17 \| Votes 0 \| Views 1070	Irealist	2020.06.17	0	1070
31	[강화학습-11]Sutton 교과서 챕터 9: On-Policy Prediction with Approximation Irealist \| 2020.06.15 \| Votes 0 \| Views 1013	Irealist	2020.06.15	0	1013
30	[강화학습-10]Sutton 교과서 챕터 2: Multi-armed Bandits Irealist \| 2020.06.04 \| Votes 0 \| Views 1315	Irealist	2020.06.04	0	1315
29	[강화학습-9]Sutton 교과서 챕터 17.4: Designing Reward Signals Irealist \| 2020.06.04 \| Votes 0 \| Views 981	Irealist	2020.06.04	0	981
28	[강화학습-8]Sutton 교과서 챕터 12: Eligibility Traces Irealist \| 2020.05.28 \| Votes 0 \| Views 1839	Irealist	2020.05.28	0	1839
27	[강화학습-7]Sutton 교과서 챕터 7: n-step Bootstrapping Irealist \| 2020.05.28 \| Votes 0 \| Views 2084	Irealist	2020.05.28	0	2084
26	[강화학습-6]Sutton 교과서 챕터 8: Tabular Methods Irealist \| 2020.05.27 \| Votes 0 \| Views 728	Irealist	2020.05.27	0	728
25	[강화학습-5]Sutton 교과서 챕터 6: Temporal-Difference Learning Irealist \| 2020.05.23 \| Votes 0 \| Views 1059	Irealist	2020.05.23	0	1059
24	[강화학습-4]Sutton 교과서 챕터 5: Monte Carlo Methods Irealist \| 2020.05.19 \| Votes 0 \| Views 1427	Irealist	2020.05.19	0	1427
23	[강화학습-3]Sutton 교과서 챕터 4: Dynamic Programming Irealist \| 2020.05.19 \| Votes 0 \| Views 1243	Irealist	2020.05.19	0	1243