2023-Q3-AI 11. Reinforcement learning (DDQN, A2C)

 

11.1. Video / Materials

Video: https://youtube.com/live/Q4Pq9sPAdt0?feature=share

Jamboard: https://jamboard.google.com/d/1m30FA0PM0uk6foKXSdJyDFwIEMBeVPNF9q-3t9Dj8nQ/edit?usp=sharing

Materials: Rainbow DQN: https://arxiv.org/abs/1710.02298 https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419


Youtube key: 2e0q-hfby-62er-hzuy-3w4j


Previous year's video

Video https://youtu.be/tiaoLNMWZUA

Jamboard: https://jamboard.google.com/d/18gFXn4E36cP9P25wSKpvGgEv1mlAnKfDQbB7fGTDPi0/viewer


11.2. Implement DQN

Based on the materials and video from 11.1, implement DQN using the template provided.

Submit the code and screenshots with the results.

Template: http://share.yellowrobot.xyz/quick/2023-7-24-0418A562-1288-44B9-A6E2-68B5FC4F85E4.zip


11.3. Implement priority replay memory

Based on the materials and video from 11.1, implement "Priority replay memory" using the template provided.

Submit the code and screenshots with the results.

Template: http://share.yellowrobot.xyz/quick/2023-7-24-3E7F61BC-A5D5-4E3F-ACDB-EB007CFB6350.zip


11.4. Implement DDQN

Implement DDQN, based on the template from task 11.3.

Submit the code and screenshots with the results.

Equation: http://share.yellowrobot.xyz/upic/4ba1e3183aab2da81825b9e20c1ec7cf_1690201798.jpg


11.5. Homework - Dueling DDQN + MountainCar

  1. Based on the code from 11.4, implement a new environment MountainCar: https://gym.openai.com/envs/MountainCar-v0

  2. Implement the architecture of the Dueling DDQN model

  3. Submit the code and screenshots with the results.

Model scheme: http://share.yellowrobot.xyz/upic/4a9027de9c537cfd84832e5597d4e617_1690201812.jpg

Model description: https://arxiv.org/abs/1511.06581


Terminology

Agent — the trainee and decision-maker.

Environment — where the agent learns and decides what actions to take.

Action — a set of actions that the agent can take.

State — the state of the agent in the environment.

Reward — for each action chosen by the agent, the environment provides a reward; usually a scalar value.

Policy — the agent's decision-making function (control strategy) that reflects the mapping from situation to actions.

Value function — a mapping from states to real numbers, where the value of the state reflects the long-term reward obtained from this state and executing a certain policy.

Model-free — specifies the optimal policy without using or evaluating the dynamics of the environment (transition and reward functions)

Model-based — uses the transition function (and reward function) to evaluate the optimal policy

 

Materials

image-20230724151933003

6CFCE675-1F3C-4980-8DF0-7A304ECBEC23

image-20230724151953609

ECDB0E4E-5E4D-4BE5-A719-BBD0A29F9689-2

317CD11B-0003-4132-B1BD-DC4C90EA4852

E9105FA6-104A-49F9-AA3F-A56A4DA89C9C

F783C618-5E70-4477-BB75-ADF03ADF5264

CF51809D-04EE-4CDD-B340-A8122E59E04D

D416AFAD-E68B-4106-8BF5-F532C8DFD17E

 

C95CFEC9-C2B8-48D4-BA46-8A16CD97A54A

 

3780603F-CE1A-4004-89DE-7127000CF9DB

 

18054E82-30E9-486B-9498-A00131F7BA3B

3761D110-C085-45E0-87BE-E84ED73595DF

385F5BB1-823B-43EF-ACFA-0E3829E0836D

45BE7E4B-55B6-49CD-A974-AEA4BEEA3AAA

Untitled (49)

Untitled (48)

Untitled (47)

Untitled (46)

Untitled (45)

Untitled (44)

Untitled (43)

Untitled (42)