Video: https://youtube.com/live/Q4Pq9sPAdt0?feature=share
Jamboard: https://jamboard.google.com/d/1m30FA0PM0uk6foKXSdJyDFwIEMBeVPNF9q-3t9Dj8nQ/edit?usp=sharing
Materials: Rainbow DQN: https://arxiv.org/abs/1710.02298 https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419
Youtube key: 2e0q-hfby-62er-hzuy-3w4j
Previous year's video
Video https://youtu.be/tiaoLNMWZUA
Jamboard: https://jamboard.google.com/d/18gFXn4E36cP9P25wSKpvGgEv1mlAnKfDQbB7fGTDPi0/viewer
Based on the materials and video from 11.1, implement DQN using the template provided.
Submit the code and screenshots with the results.
Template: http://share.yellowrobot.xyz/quick/2023-7-24-0418A562-1288-44B9-A6E2-68B5FC4F85E4.zip
Based on the materials and video from 11.1, implement "Priority replay memory" using the template provided.
Submit the code and screenshots with the results.
Template: http://share.yellowrobot.xyz/quick/2023-7-24-3E7F61BC-A5D5-4E3F-ACDB-EB007CFB6350.zip
Implement DDQN, based on the template from task 11.3.
Submit the code and screenshots with the results.
Equation: http://share.yellowrobot.xyz/upic/4ba1e3183aab2da81825b9e20c1ec7cf_1690201798.jpg
Based on the code from 11.4, implement a new environment MountainCar: https://gym.openai.com/envs/MountainCar-v0
Implement the architecture of the Dueling DDQN model
Submit the code and screenshots with the results.
Model scheme: http://share.yellowrobot.xyz/upic/4a9027de9c537cfd84832e5597d4e617_1690201812.jpg
Model description: https://arxiv.org/abs/1511.06581
Agent — the trainee and decision-maker.
Environment — where the agent learns and decides what actions to take.
Action — a set of actions that the agent can take.
State — the state of the agent in the environment.
Reward — for each action chosen by the agent, the environment provides a reward; usually a scalar value.
Policy — the agent's decision-making function (control strategy) that reflects the mapping from situation to actions.
Value function — a mapping from states to real numbers, where the value of the state reflects the long-term reward obtained from this state and executing a certain policy.
Model-free — specifies the optimal policy without using or evaluating the dynamics of the environment (transition and reward functions)
Model-based — uses the transition function (and reward function) to evaluate the optimal policy