Video: https://youtube.com/live/wgW5p9Hfm8A?feature=share
Jamboard: https://jamboard.google.com/d/1fD-P_7yR8R1cKwzUHaA1II2VkhOVxMTCa2XCrHnuUCg/edit?usp=sharing
Materials: Policy Gradient: https://www.freecodecamp.org/news/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f/ PPO: https://arxiv.org/abs/1707.06347
Jamboard shared: vecins.valters@gmail.com
Youtube RTMP key: ea11-mrgb-4jg2-4ajc-d4hr
Video: https://youtu.be/t93-leAFnSY
Jamboard: https://jamboard.google.com/d/1KolS31GjEtTkd9rvZZFFu6Kzgj0jG0EOCMALzLnGFO8/edit?usp=sharing
Template: http://share.yellowrobot.xyz/quick/2023-5-7-A9B14AF3-F09D-4169-992D-F282A07A366B.zip
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
Template: http://share.yellowrobot.xyz/quick/2023-5-7-5CCE3179-3499-4E0D-9894-4FF10A5538AF.zip
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
Template:
http://share.yellowrobot.xyz/quick/2023-5-7-F6A7C309-67A8-453D-BD97-592083ACD5D8.zip
Vienādojumi: http://share.yellowrobot.xyz/upic/4e251be9772c476a3d7e15156369e99e_1683463127.jpg
Iesniegt pirmkodu un screenshots ar labākajiem rezultātiem.
parser.add_argument('-gamma', default=0.8, type=float)
makes it live longer
TRPO
PPO ⚠️ PPO modelī NAV log p, ir tikai tīrs dalījums varbūtibām (manā orģinalajā video bija ielikts)
https://github.com/nikhilbarhate99/PPO-PyTorch/blob/bd8b8bf6832dfcfb9125374fd61c0d359e621607/PPO.py
A2C
https://towardsdatascience.com/understanding-actor-critic-methods-931b97b6df3f
https://medium.com/deeplearningmadeeasy/advantage-actor-critic-a2c-implementation-944e98616b
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html