Unverified Commit 8dabd3f1 authored by ooctipus's avatar ooctipus Committed by GitHub

Fixes Termination Manager logging to report aggregated percentage of...

Fixes Termination Manager logging to report aggregated percentage of environments done due to each term.  (#3107)

# Description

Currently Termination Manager write current step's done count for each
term if reset is detected. This leads to two problem.
1. User sees different counts just by varying num_envs
2. the count can be over-count or under-count depending on when reset
happens, as pointed out in #2977 (Thanks, @Kyu3224)

The cause of the bug is because we are reporting current step status
into a buffer that suppose to record episodic done. So instead of write
the entire buffer base on current value, we ask the update to respect
the non-reseting environment's old value, and instead of reporting
count, we report percentage of environment that was done due to the
particular term.

Test on Isaac-Velocity-Rough-Anymal-C-v0

Before fix:
<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-20"
src="https://github.com/user-attachments/assets/4838d612-7f0e-4232-a07e-688b547e91db"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

After fix:

<img width="786" height="323" alt="Screenshot from 2025-08-06 22-16-12"
src="https://github.com/user-attachments/assets/e6e55c21-17ed-42ca-8d94-a19d08611f86"
/>
Red: num_envs = 4096, Orange: num_envs = 1024

Note that curve of the same color ran on same seed, and curves matched
exactly, the only difference is the data gets reported in termination.
The percentage version is a lot more clear in conveying how agent
currently fails, and how much percentage of agent fails, and shows that
increasing num_envs to 4096 helps improve agent avoiding termination by
`base_contact` much quicker than num_envs=1024. Such message is a bit
hard to tell in first image.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
parent f70e695f
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.44.9"
version = "0.44.10"
# Description
title = "Isaac Lab framework for Robot Learning"
......
Changelog
---------
0.44.10 (2025-08-06)
~~~~~~~~~~~~~~~~~~~
Fixed
^^^^^
* The old termination manager in :class:`~isaaclab.managers.TerminationManager` term_done logging logs the instantaneous
term done count at reset. This let to inaccurate aggregation of termination count, obscuring the what really happening
during the traing. Instead we log the episodic term done.
0.44.9 (2025-07-30)
~~~~~~~~~~~~~~~~~~~
......
......@@ -60,10 +60,9 @@ class TerminationManager(ManagerBase):
# call the base class constructor (this will parse the terms config)
super().__init__(cfg, env)
self._term_name_to_term_idx = {name: i for i, name in enumerate(self._term_names)}
# prepare extra info to store individual termination term information
self._term_dones = dict()
for term_name in self._term_names:
self._term_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._term_dones = torch.zeros((self.num_envs, len(self._term_names)), device=self.device, dtype=torch.bool)
# create buffer for managing termination per environment
self._truncated_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._terminated_buf = torch.zeros_like(self._truncated_buf)
......@@ -139,9 +138,10 @@ class TerminationManager(ManagerBase):
env_ids = slice(None)
# add to episode dict
extras = {}
for key in self._term_dones.keys():
last_episode_done_stats = self._term_dones.float().mean(dim=0)
for i, key in enumerate(self._term_names):
# store information
extras["Episode_Termination/" + key] = torch.count_nonzero(self._term_dones[key][env_ids]).item()
extras["Episode_Termination/" + key] = last_episode_done_stats[i].item()
# reset all the reward terms
for term_cfg in self._class_term_cfgs:
term_cfg.func.reset(env_ids=env_ids)
......@@ -161,7 +161,7 @@ class TerminationManager(ManagerBase):
self._truncated_buf[:] = False
self._terminated_buf[:] = False
# iterate over all the termination terms
for name, term_cfg in zip(self._term_names, self._term_cfgs):
for i, term_cfg in enumerate(self._term_cfgs):
value = term_cfg.func(self._env, **term_cfg.params)
# store timeout signal separately
if term_cfg.time_out:
......@@ -169,7 +169,10 @@ class TerminationManager(ManagerBase):
else:
self._terminated_buf |= value
# add to episode dones
self._term_dones[name][:] = value
rows = value.nonzero(as_tuple=True)[0] # indexing is cheaper than boolean advance indexing
if rows.numel() > 0:
self._term_dones[rows] = False
self._term_dones[rows, i] = True
# return combined termination signal
return self._truncated_buf | self._terminated_buf
......@@ -182,7 +185,7 @@ class TerminationManager(ManagerBase):
Returns:
The corresponding termination term value. Shape is (num_envs,).
"""
return self._term_dones[name]
return self._term_dones[:, self._term_name_to_term_idx[name]]
def get_active_iterable_terms(self, env_idx: int) -> Sequence[tuple[str, Sequence[float]]]:
"""Returns the active terms as iterable sequence of tuples.
......@@ -196,8 +199,8 @@ class TerminationManager(ManagerBase):
The active terms.
"""
terms = []
for key in self._term_dones.keys():
terms.append((key, [self._term_dones[key][env_idx].float().cpu().item()]))
for i, key in enumerate(self._term_names):
terms.append((key, [self._term_dones[env_idx, i].float().cpu().item()]))
return terms
"""
......@@ -217,7 +220,7 @@ class TerminationManager(ManagerBase):
if term_name not in self._term_names:
raise ValueError(f"Termination term '{term_name}' not found.")
# set the configuration
self._term_cfgs[self._term_names.index(term_name)] = cfg
self._term_cfgs[self._term_name_to_term_idx[term_name]] = cfg
def get_term_cfg(self, term_name: str) -> TerminationTermCfg:
"""Gets the configuration for the specified term.
......@@ -234,7 +237,7 @@ class TerminationManager(ManagerBase):
if term_name not in self._term_names:
raise ValueError(f"Termination term '{term_name}' not found.")
# return the configuration
return self._term_cfgs[self._term_names.index(term_name)]
return self._term_cfgs[self._term_name_to_term_idx[term_name]]
"""
Helper functions.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment