Independent agents, but share the same Behavior
Agent Properties
Mean reward should increase during training
Should end the training from Python (keyboard event for ctrl-c) to trigger saving the .nn model file
TODO Add support for --resume to load up checkpoints
To train behaviors we need to define 3 entities for a given environment
By training, an agents learns a policy, which is an optional map between observations into actions
Heuristic behaviors is maybe a good place to put our navmesh-based control for "classical" AI (eg if agent collides, turn 90deg and continue forward) Behaviors defined as hard-coded set of rules
Behaviors are like a function: f(observations) = actions
Goal of agent: discover a behavior (a Policy) that maximizes a reward
RL algorithms
Imitation Learning: we can combine with RL to dramatically reduce the time the agent takes to solve the environment.
To summarize, we provide 3 training methods: BC, GAIL and RL (PPO or SAC) that can be used independently or together:
Leveraging either BC or GAIL requires recording demonstrations to be provided as input to the training algorithms.
Training with curriculum
Environment Parameter Randomization
Model types
Parameters: https://github.com/Unity-Technologies/ml-agents/blob/release_1/docs/Training-Configuration-File.md
Proximal Policy Optimization