Resets step reward buffer properly when weight is zero (#2392)

# Description This pull request fixes a bug where `_step_reward` could retain stale values when a reward term's weight was dynamically changed back to zero. Previously, when a reward term had zero weight, the computation skipped updating `_step_reward`, assuming that it would stay correct. However, if the weight was first changed from zero to nonzero and then back to zero during runtime (e.g., in curriculum settings), stale nonzero values could persist, causing incorrect reward visualizations or logging. This change explicitly sets `reward_manager._step_reward` to zero when a reward term has zero weight, ensuring correctness regardless of dynamic weight changes. Fixes #2391 No new dependencies are introduced by this change. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update ## Screenshots _Not applicable._ ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Resets step reward buffer properly when weight is zero (#2392)
# Description This pull request fixes a bug where `_step_reward` could retain stale values when a reward term's weight was dynamically changed back to zero. Previously, when a reward term had zero weight, the computation skipped updating `_step_reward`, assuming that it would stay correct. However, if the weight was first changed from zero to nonzero and then back to zero during runtime (e.g., in curriculum settings), stale nonzero values could persist, causing incorrect reward visualizations or logging. This change explicitly sets `reward_manager._step_reward` to zero when a reward term has zero weight, ensuring correctness regardless of dynamic weight changes. Fixes #2391 No new dependencies are introduced by this change. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] This change requires a documentation update ## Screenshots _Not applicable._ ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
f1ba9c3a · Bikram Pandit · GitHub · 1208f76b · f1ba9c3a · f1ba9c3a
Unverified Commit f1ba9c3a authored May 09, 2025 by Bikram Pandit Committed by GitHub May 09, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 4 additions and 2 deletions

CONTRIBUTORS.md CONTRIBUTORS.md +1 -0

reward_manager.py source/isaaclab/isaaclab/managers/reward_manager.py +3 -2

No files found.
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -41,6 +41,7 @@ Guidelines for modifications:
 * Anton Bjørndahl Mortensen
 * Arjun Bhardwaj
 * Ashwin Varghese Kuruttukulam
+* Bikram Pandit
 * Brayden Zhang
 * Cameron Upright
 * Calvin Yu

--- a/source/isaaclab/isaaclab/managers/reward_manager.py
+++ b/source/isaaclab/isaaclab/managers/reward_manager.py
@@ -140,9 +140,10 @@ class RewardManager(ManagerBase):
        # reset computation
        self._reward_buf[:] = 0.0
        # iterate over all the reward terms
-        for name, term_cfg in zip(self._term_names, self._term_cfgs):
+        for term_idx, (name, term_cfg) in enumerate(zip(self._term_names, self._term_cfgs)):
            # skip if weight is zero (kind of a micro-optimization)
            if term_cfg.weight == 0.0:
+                self._step_reward[:, term_idx] = 0.0
                continue
            # compute term's value
            value = term_cfg.func(self._env, **term_cfg.params) * term_cfg.weight * dt
@@ -152,7 +153,7 @@ class RewardManager(ManagerBase):
            self._episode_sums[name] += value
            # Update current reward for this step.
-            self._step_reward[:, self._term_names.index(name)] = value / dt
+            self._step_reward[:, term_idx] = value / dt
        return self._reward_buf