Unverified Commit f1ba9c3a authored by Bikram Pandit's avatar Bikram Pandit Committed by GitHub

Resets step reward buffer properly when weight is zero (#2392)

# Description

This pull request fixes a bug where `_step_reward` could retain stale
values when a reward term's weight was dynamically changed back to zero.
Previously, when a reward term had zero weight, the computation skipped
updating `_step_reward`, assuming that it would stay correct.
However, if the weight was first changed from zero to nonzero and then
back to zero during runtime (e.g., in curriculum settings), stale
nonzero values could persist, causing incorrect reward visualizations or
logging.

This change explicitly sets `reward_manager._step_reward` to zero when a
reward term has zero weight, ensuring correctness regardless of dynamic
weight changes.

Fixes #2391 

No new dependencies are introduced by this change.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] This change requires a documentation update

## Screenshots

_Not applicable._

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 1208f76b
...@@ -41,6 +41,7 @@ Guidelines for modifications: ...@@ -41,6 +41,7 @@ Guidelines for modifications:
* Anton Bjørndahl Mortensen * Anton Bjørndahl Mortensen
* Arjun Bhardwaj * Arjun Bhardwaj
* Ashwin Varghese Kuruttukulam * Ashwin Varghese Kuruttukulam
* Bikram Pandit
* Brayden Zhang * Brayden Zhang
* Cameron Upright * Cameron Upright
* Calvin Yu * Calvin Yu
......
...@@ -140,9 +140,10 @@ class RewardManager(ManagerBase): ...@@ -140,9 +140,10 @@ class RewardManager(ManagerBase):
# reset computation # reset computation
self._reward_buf[:] = 0.0 self._reward_buf[:] = 0.0
# iterate over all the reward terms # iterate over all the reward terms
for name, term_cfg in zip(self._term_names, self._term_cfgs): for term_idx, (name, term_cfg) in enumerate(zip(self._term_names, self._term_cfgs)):
# skip if weight is zero (kind of a micro-optimization) # skip if weight is zero (kind of a micro-optimization)
if term_cfg.weight == 0.0: if term_cfg.weight == 0.0:
self._step_reward[:, term_idx] = 0.0
continue continue
# compute term's value # compute term's value
value = term_cfg.func(self._env, **term_cfg.params) * term_cfg.weight * dt value = term_cfg.func(self._env, **term_cfg.params) * term_cfg.weight * dt
...@@ -152,7 +153,7 @@ class RewardManager(ManagerBase): ...@@ -152,7 +153,7 @@ class RewardManager(ManagerBase):
self._episode_sums[name] += value self._episode_sums[name] += value
# Update current reward for this step. # Update current reward for this step.
self._step_reward[:, self._term_names.index(name)] = value / dt self._step_reward[:, term_idx] = value / dt
return self._reward_buf return self._reward_buf
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment