AI/RL (1) 썸네일형 리스트형 PPO란? Proximal Policy Optimization Algorithms https://arxiv.org/abs/1707.06347 Proximal Policy Optimization Algorithms We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent. Whereas standar arxiv.org PPO PPO는 에이전트가 순차적인 의사 결정 작업에서 최적의 정책을 학습하도록 훈련하는 데 사용되는 강.. 이전 1 다음