Independent agents, but share the same Behavior
Agent Properties
Mean reward should increase during training
Should end the training from Python (keyboard event for ctrl-c) to trigger saving the .nn model file
TODO Add support for --resume to load up checkpoints
To train behaviors we need to define 3 entities for a given environment
By training, an agents learns a policy, which is an optional map between observations into actions
Heuristic behaviors is maybe a good place to put our navmesh-based control for "classical" AI (eg if agent collides, turn 90deg and continue forward) Behaviors defined as hard-coded set of rules
Behaviors are like a function: f(observations) = actions
Goal of agent: discover a behavior (a Policy) that maximizes a reward
RL algorithms
Imitation Learning: we can combine with RL to dramatically reduce the time the agent takes to solve the environment.
To summarize, we provide 3 training methods: BC, GAIL and RL (PPO or SAC) that can be used independently or together:
Leveraging either BC or GAIL requires recording demonstrations to be provided as input to the training algorithms.
Training with curriculum
increase difficulty level to agents while training
Environment Parameter Randomization
Model types
Parameters: https://github.com/Unity-Technologies/ml-agents/blob/release_1/docs/Training-Configuration-File.md
Proximal Policy Optimization