Unverified Commit f1ba9c3a authored by Bikram Pandit's avatar Bikram Pandit Committed by GitHub

Resets step reward buffer properly when weight is zero (#2392)

# Description

This pull request fixes a bug where `_step_reward` could retain stale
values when a reward term's weight was dynamically changed back to zero.
Previously, when a reward term had zero weight, the computation skipped
updating `_step_reward`, assuming that it would stay correct.
However, if the weight was first changed from zero to nonzero and then
back to zero during runtime (e.g., in curriculum settings), stale
nonzero values could persist, causing incorrect reward visualizations or
logging.

This change explicitly sets `reward_manager._step_reward` to zero when a
reward term has zero weight, ensuring correctness regardless of dynamic
weight changes.

Fixes #2391 

No new dependencies are introduced by this change.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update

## Screenshots

_Not applicable._

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 1208f76b
......@@ -41,6 +41,7 @@ Guidelines for modifications:
* Anton Bjørndahl Mortensen
* Arjun Bhardwaj
* Ashwin Varghese Kuruttukulam
* Bikram Pandit
* Brayden Zhang
* Cameron Upright
* Calvin Yu
......
......@@ -140,9 +140,10 @@ class RewardManager(ManagerBase):
# reset computation
self._reward_buf[:] = 0.0
# iterate over all the reward terms
for name, term_cfg in zip(self._term_names, self._term_cfgs):
for term_idx, (name, term_cfg) in enumerate(zip(self._term_names, self._term_cfgs)):
# skip if weight is zero (kind of a micro-optimization)
if term_cfg.weight == 0.0:
self._step_reward[:, term_idx] = 0.0
continue
# compute term's value
value = term_cfg.func(self._env, **term_cfg.params) * term_cfg.weight * dt
......@@ -152,7 +153,7 @@ class RewardManager(ManagerBase):
self._episode_sums[name] += value
# Update current reward for this step.
self._step_reward[:, self._term_names.index(name)] = value / dt
self._step_reward[:, term_idx] = value / dt
return self._reward_buf
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment