Fixes handling of time-out signal in RSL-RL and RL-Games wrapper (#375)
# Description
On termination of an episode, three conditions arise:
1. **bad** terminations (terminated dones): the agent gets a termination
penalty
2. **timeout** terminations (truncated dones):
* infinite-horizon: bootstrapping by the agent based on terminal state
* finite-horizon: no penalty or bootstrapping
Currently, we have not handled the last case, which leads to issues when
training RL tasks with a finite horizon (for instance, Nikita's agile
locomotion work).
This MR adds a flag to the RLTaskEnvCfg called `is_finite_horizon` that
helps deal with this case. The flag is consumed by the env wrappers to
decide how they want to specifically handle the finite horizon problem.
## Type of change
- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
## Checklist
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have run all the tests with `./orbit.sh --test` and they pass
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
Showing
Please register or sign in to comment