본문 바로가기

AI/RL

(1)

PPO란? Proximal Policy Optimization Algorithms https://arxiv.org/abs/1707.06347 Proximal Policy Optimization Algorithms We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standar arxiv.org PPO PPO는 에이전트가 순차적인 의사 결정 작업에서 최적의 정책을 학습하도록 훈련하는 데 사용되는 강..

이전 1 다음

티스토리툴바