Fixes action reset of pre_trained_policy_action (#1623)

# Description Currently, the [PreTrainedPolicyAction](https://github.com/isaac-sim/IsaacLab/blob/v1.4.0/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/navigation/mdp/pre_trained_policy_action.py#L24) class does not reset the actions in the low-level observations when a new episode starts. In my custom legged robot navigation task, the behavior was correct only during the first training episode but failed from the second episode onward. At the start of a new episode, the action observations are not reset and retain the last actions from the previous episode. This can impact training, as in my case, where the actions at the end of an episode differ significantly from those required at the beginning of an episode. This PR resolves the issue by resetting the low-level action observations at the beginning of each new episode. ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Fixes action reset of pre_trained_policy_action (#1623)
# Description Currently, the [PreTrainedPolicyAction](https://github.com/isaac-sim/IsaacLab/blob/v1.4.0/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/navigation/mdp/pre_trained_policy_action.py#L24) class does not reset the actions in the low-level observations when a new episode starts. In my custom legged robot navigation task, the behavior was correct only during the first training episode but failed from the second episode onward. At the start of a new episode, the action observations are not reset and retain the last actions from the previous episode. This can impact training, as in my case, where the actions at the end of an episode differ significantly from those required at the beginning of an episode. This PR resolves the issue by resetting the low-level action observations at the beginning of each new episode. ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
de76c2e9 · Nicola Loi · GitHub · 43a3ce9a · de76c2e9 · de76c2e9
Unverified Commit de76c2e9 authored Jan 08, 2025 by Nicola Loi Committed by GitHub Jan 08, 2025
4 changed files
--- a/CONTRIBUTORS.md
+++ b/CONTRIBUTORS.md
@@ -66,6 +66,7 @@ Guidelines for modifications:
 * Michael Gussert
 * Michael Noseworthy
 * Muhong Guo
+* Nicola Loi
 * Nuralem Abizov
 * Oyindamola Omotuyi
 * Özhan Özen

--- a/source/extensions/omni.isaac.lab_tasks/config/extension.toml
+++ b/source/extensions/omni.isaac.lab_tasks/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.10.17"
+version = "0.10.18"

 # Description
 title = "Isaac Lab Environments"

--- a/source/extensions/omni.isaac.lab_tasks/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.lab_tasks/docs/CHANGELOG.rst
 Changelog
 ---------

+0.10.18 (2025-01-03)
+~~~~~~~~~~~~~~~~~~~
+
+Fixed
+^^^^^
+
+* Fixed the reset of the actions in the function overriding of the low level observations of :class:`omni.isaac.lab_tasks.manager_based.navigation.mdp.PreTrainedPolicyAction`.
+
+
 0.10.17 (2024-12-17)
 ~~~~~~~~~~~~~~~~~~~~


--- a/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/navigation/mdp/pre_trained_policy_action.py
+++ b/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/navigation/mdp/pre_trained_policy_action.py
@@ -50,8 +50,14 @@ class PreTrainedPolicyAction(ActionTerm):
        self._low_level_action_term: ActionTerm = cfg.low_level_actions.class_type(cfg.low_level_actions, env)
        self.low_level_actions = torch.zeros(self.num_envs, self._low_level_action_term.action_dim, device=self.device)

+        def last_action():
+            # reset the low level actions if the episode was reset
+            if hasattr(env, "episode_length_buf"):
+                self.low_level_actions[env.episode_length_buf == 0, :] = 0
+            return self.low_level_actions
+
        # remap some of the low level observations to internal observations
-        cfg.low_level_observations.actions.func = lambda dummy_env: self.low_level_actions
+        cfg.low_level_observations.actions.func = lambda dummy_env: last_action()
        cfg.low_level_observations.actions.params = dict()
        cfg.low_level_observations.velocity_commands.func = lambda dummy_env: self._raw_actions
        cfg.low_level_observations.velocity_commands.params = dict()