Zoom / Video pēc nodarbības: https://zoom.us/j/3167417956?pwd=Q2NoNWp2a3M2Y2hRSHBKZE1Wcml4Zz09
Materials: Policy Gradient: https://www.freecodecamp.org/news/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f/ PPO: https://arxiv.org/abs/1707.06347
Uz google colab, lai varētu palaist vai windowsā
!pip install swig !pip install box2d-py !pip install gym
https://visualstudio.microsoft.com/visual-cpp-build-tools/
Video: https://www.youtube.com/live/mWcrkTFUG8o
Video: https://youtu.be/t93-leAFnSY
Gatavie piemēri:
https://share.yellowrobot.xyz/quick/2025-5-24-482A3AAA-086B-4BD0-A1E3-01146C564C34.zip
Template: http://share.yellowrobot.xyz/quick/2024-1-17-A61A64D6-6498-4DA2-86A1-A1989C2C35C5.zip
Vienādojumi:
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
Updated Template: https://share.yellowrobot.xyz/quick/2024-5-15-B454CD9C-7CFF-4228-82FB-BFCBC6D7C613.zip
Vienādojumi:
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
Template:
http://share.yellowrobot.xyz/quick/2023-5-7-F6A7C309-67A8-453D-BD97-592083ACD5D8.zip
Vienādojumi:
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
Implementēt PPO modeli A (advantage) vietā lietotjot Q(s, a) modeli (Izmantot DQN, vai DDQN versiju).
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
parser.add_argument('-gamma', default=0.8, type=float)
makes it live longer
https://cse.buffalo.edu/~avereshc/rl_fall19/lecture_19_Policy_Gradients_Baselines.pdf
https://medium.com/1mgofficial/reinforcement-learning-101-a-quick-start-guide-cdb827981e89
TRPO
PPO ⚠️ PPO modelī NAV log p, ir tikai tīrs dalījums varbūtibām (manā orģinalajā video bija ielikts)
https://github.com/nikhilbarhate99/PPO-PyTorch/blob/bd8b8bf6832dfcfb9125374fd61c0d359e621607/PPO.py
A2C
https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f
https://medium.com/deeplearningmadeeasy/advantage-actor-critic-a2c-implementation-944e98616b
https://www.52coding.com.cn/2018/01/06/RL%20-%20Policy%20Gradient/
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html