Adds dexterous cube manipulation environment for Allegro hand (#499)

# Description This MR adds the dexterous cube manipulation environment from IsaacGymEnvs. The implementation is mostly based on the standard AllegroHand environment. However, it includes the following components from the Dextreme work: * Randomization of mass, joint PD gains, friction, and initial state distribution * Tuning of RL-Games from Dextreme work * Exponential moving average (bounded joint position) action term However, it does the following differently since it led to better convergence: * Changes the way out-of-reach termination is computed. Original work seems to do it w.r.t. the goal position, but that seems unnecessary * Removed goal position racking reward. It was tuned too high, which made learning difficult and is not needed ## Type of change - New feature (non-breaking change which adds functionality) ## Screenshots Trained with RSL-RL https://github.com/isaac-orbit/orbit/assets/12863862/8f51e468-2d93-4520-9689-f1e8f1a898e6 ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have run all the tests with `./orbit.sh --test` and they pass - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Adds dexterous cube manipulation environment for Allegro hand (#499)
# Description This MR adds the dexterous cube manipulation environment from IsaacGymEnvs. The implementation is mostly based on the standard AllegroHand environment. However, it includes the following components from the Dextreme work: * Randomization of mass, joint PD gains, friction, and initial state distribution * Tuning of RL-Games from Dextreme work * Exponential moving average (bounded joint position) action term However, it does the following differently since it led to better convergence: * Changes the way out-of-reach termination is computed. Original work seems to do it w.r.t. the goal position, but that seems unnecessary * Removed goal position racking reward. It was tuned too high, which made learning difficult and is not needed ## Type of change - New feature (non-breaking change which adds functionality) ## Screenshots Trained with RSL-RL https://github.com/isaac-orbit/orbit/assets/12863862/8f51e468-2d93-4520-9689-f1e8f1a898e6 ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have run all the tests with `./orbit.sh --test` and they pass - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
78692be5 · Mayank Mittal · GitHub · b1781712 · 78692be5 · 78692be5
Unverified Commit 78692be5 authored Apr 17, 2024 by Mayank Mittal Committed by GitHub Apr 17, 2024
21 changed files
--- a/docs/source/_static/tasks/manipulation/allegro_cube.jpg
+++ b/docs/source/_static/tasks/manipulation/allegro_cube.jpg
--- a/docs/source/features/environments.rst
+++ b/docs/source/features/environments.rst
@@ -66,11 +66,14 @@ for the reach environment:
    +----------------+---------------------+-----------------------------------------------------------------------------+
    | |cabi-franka|  | |cabi-franka-link|  | Grasp the handle of a cabinet's drawer and open it with the Franka robot    |
    +----------------+---------------------+-----------------------------------------------------------------------------+
+    | |cube-allegro| | |cube-allegro-link| | In-hand reorientation of a cube using Allegro hand                          |
+    +----------------+---------------------+-----------------------------------------------------------------------------+
 .. |reach-franka| image:: ../_static/tasks/manipulation/franka_reach.jpg
 .. |reach-ur10| image:: ../_static/tasks/manipulation/ur10_reach.jpg
 .. |lift-cube| image:: ../_static/tasks/manipulation/franka_lift.jpg
 .. |cabi-franka| image:: ../_static/tasks/manipulation/franka_open_drawer.jpg
+.. |cube-allegro| image:: ../_static/tasks/manipulation/allegro_cube.jpg
 .. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/joint_pos_env_cfg.py>`__
 .. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/joint_pos_env_cfg.py>`__
@@ -78,7 +81,7 @@ for the reach environment:
 .. |lift-cube-ik-abs-link| replace:: `Isaac-Lift-Cube-Franka-IK-Abs-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_abs_env_cfg.py>`__
 .. |lift-cube-ik-rel-link| replace:: `Isaac-Lift-Cube-Franka-IK-Rel-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_rel_env_cfg.py>`__
 .. |cabi-franka-link| replace:: `Isaac-Open-Drawer-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/joint_pos_env_cfg.py>`__
+.. |cube-allegro-link| replace:: `Isaac-Repose-Cube-Allegro-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro/allegro_env_cfg.py>`__
 Locomotion
 ----------

--- a/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
+++ b/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
 [package]
 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.6.0"
+version = "0.6.1"
 # Description
 title = "ORBIT Environments"

--- a/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
 Changelog
 ---------
+0.6.1 (2024-04-16)
+~~~~~~~~~~~~~~~~~~
+Added
+^^^^^
+* Added a new environment ``Isaac-Repose-Cube-Allegro-v0`` and ``Isaac-Repose-Allegro-Cube-NoVelObs-v0``
+  for the Allegro hand to reorient a cube. It is based on the IsaacGymEnvs Allegro hand environment.
 0.6.0 (2024-03-10)
 ~~~~~~~~~~~~~~~~~~

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""In-hand object reorientation environment.
+These environments are based on the `dexterous cube manipulation`_ environments
+provided in IsaacGymEnvs repository from NVIDIA. However, they contain certain
+modifications and additional features.
+.. _dexterous cube manipulation: https://github.com/NVIDIA-Omniverse/IsaacGymEnvs/blob/main/isaacgymenvs/tasks/allegro_hand.py
+"""
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Configurations for in-hand manipulation environments."""
+# We leave this file empty since we don't want to expose any configs in this package directly.
+# We still need this file to import the "config" module in the parent package.
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+import gymnasium as gym
+from . import agents, allegro_env_cfg
+##
+# Register Gym environments.
+##
+##
+# Full kinematic state observations.
+##
+gym.register(
+    id="Isaac-Repose-Cube-Allegro-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": allegro_env_cfg.AllegroCubeEnvCfg,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AllegroCubePPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+)
+gym.register(
+    id="Isaac-Repose-Cube-Allegro-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": allegro_env_cfg.AllegroCubeEnvCfg_PLAY,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AllegroCubePPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+)
+##
+# Kinematic state observations without velocity information.
+##
+gym.register(
+    id="Isaac-Repose-Cube-Allegro-NoVelObs-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": allegro_env_cfg.AllegroCubeNoVelObsEnvCfg,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AllegroCubeNoVelObsPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+)
+gym.register(
+    id="Isaac-Repose-Cube-Allegro-NoVelObs-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": allegro_env_cfg.AllegroCubeNoVelObsEnvCfg_PLAY,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AllegroCubeNoVelObsPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+from . import rsl_rl_cfg  # noqa: F401, F403
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/rl_games_ppo_cfg.yaml
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/rl_games_ppo_cfg.yaml
+params:
+  seed: 42
+  # environment wrapper clipping
+  env:
+    clip_observations: 5.0
+    clip_actions: 1.0
+  algo:
+    name: a2c_continuous
+  model:
+    name: continuous_a2c_logstd
+  network:
+    name: actor_critic
+    separate: False
+    space:
+      continuous:
+        mu_activation: None
+        sigma_activation: None
+        mu_init:
+          name: default
+        sigma_init:
+          name: const_initializer
+          val: 0
+        fixed_sigma: True
+    mlp:
+      units: [512, 256, 128]
+      activation: elu
+      d2rl: False
+      initializer:
+        name: default
+      regularizer:
+        name: None
+  load_checkpoint: False
+  load_path: ''
+  config:
+    name: allegro_cube
+    env_name: rlgpu
+    device: 'cuda:0'
+    device_name: 'cuda:0'
+    multi_gpu: False
+    ppo: True
+    mixed_precision: False
+    normalize_input: True
+    normalize_value: True
+    value_bootstrap: True
+    num_actors: -1  # configured from the script (based on num_envs)
+    reward_shaper:
+      scale_value: 0.1
+    normalize_advantage: True
+    gamma: 0.998
+    tau: 0.95
+    learning_rate: 5e-4
+    lr_schedule: adaptive
+    schedule_type: standard
+    kl_threshold: 0.016
+    score_to_win: 100000
+    max_epochs: 5000
+    save_best_after: 500
+    save_frequency: 200
+    print_stats: True
+    grad_norm: 1.0
+    entropy_coef: 0.002
+    truncate_grads: True
+    e_clip: 0.2
+    horizon_length: 24
+    minibatch_size: 16384  # 32768
+    mini_epochs: 5
+    critic_coef: 4
+    clip_value: True
+    seq_length: 4
+    bounds_loss_coef: 0.0005
+    player:
+      #render: True
+      deterministic: True
+      games_num: 100000
+      print_stats: True
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/rsl_rl_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/agents/rsl_rl_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+from omni.isaac.orbit.utils import configclass
+from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import (
+    RslRlOnPolicyRunnerCfg,
+    RslRlPpoActorCriticCfg,
+    RslRlPpoAlgorithmCfg,
+)
+@configclass
+class AllegroCubePPORunnerCfg(RslRlOnPolicyRunnerCfg):
+    num_steps_per_env = 24
+    max_iterations = 5000
+    save_interval = 50
+    experiment_name = "allegro_cube"
+    empirical_normalization = True
+    policy = RslRlPpoActorCriticCfg(
+        init_noise_std=1.0,
+        actor_hidden_dims=[512, 256, 128],
+        critic_hidden_dims=[512, 256, 128],
+        activation="elu",
+    )
+    algorithm = RslRlPpoAlgorithmCfg(
+        value_loss_coef=1.0,
+        use_clipped_value_loss=True,
+        clip_param=0.2,
+        entropy_coef=0.002,
+        num_learning_epochs=5,
+        num_mini_batches=4,
+        learning_rate=0.001,
+        schedule="adaptive",
+        gamma=0.998,
+        lam=0.95,
+        desired_kl=0.01,
+        max_grad_norm=1.0,
+    )
+@configclass
+class AllegroCubeNoVelObsPPORunnerCfg(AllegroCubePPORunnerCfg):
+    experiment_name = "allegro_cube_no_vel_obs"
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/allegro_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/config/allegro_hand/allegro_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+from omni.isaac.orbit.utils import configclass
+import omni.isaac.orbit_tasks.manipulation.inhand.inhand_env_cfg as inhand_env_cfg
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets import ALLEGRO_HAND_CFG  # isort: skip
+@configclass
+class AllegroCubeEnvCfg(inhand_env_cfg.InHandObjectEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # switch robot to allegro hand
+        self.scene.robot = ALLEGRO_HAND_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+@configclass
+class AllegroCubeEnvCfg_PLAY(AllegroCubeEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
+        # remove termination due to timeouts
+        self.terminations.time_out = None
+##
+# Environment configuration with no velocity observations.
+##
+@configclass
+class AllegroCubeNoVelObsEnvCfg(AllegroCubeEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # switch observation group to no velocity group
+        self.observations.policy = inhand_env_cfg.ObservationsCfg.NoVelocityKinematicObsGroupCfg()
+@configclass
+class AllegroCubeNoVelObsEnvCfg_PLAY(AllegroCubeNoVelObsEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
+        # remove termination due to timeouts
+        self.terminations.time_out = None
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/inhand_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/inhand_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+from __future__ import annotations
+from dataclasses import MISSING
+import omni.isaac.orbit.sim as sim_utils
+from omni.isaac.orbit.assets import ArticulationCfg, AssetBaseCfg, RigidObjectCfg
+from omni.isaac.orbit.envs import RLTaskEnvCfg
+from omni.isaac.orbit.managers import EventTermCfg as EventTerm
+from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
+from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
+from omni.isaac.orbit.managers import RewardTermCfg as RewTerm
+from omni.isaac.orbit.managers import SceneEntityCfg
+from omni.isaac.orbit.managers import TerminationTermCfg as DoneTerm
+from omni.isaac.orbit.scene import InteractiveSceneCfg
+from omni.isaac.orbit.sim.simulation_cfg import PhysxCfg, SimulationCfg
+from omni.isaac.orbit.sim.spawners.materials.physics_materials_cfg import RigidBodyMaterialCfg
+from omni.isaac.orbit.utils import configclass
+from omni.isaac.orbit.utils.assets import ISAAC_NUCLEUS_DIR
+from omni.isaac.orbit.utils.noise import AdditiveGaussianNoiseCfg as Gnoise
+import omni.isaac.orbit_tasks.manipulation.inhand.mdp as mdp
+##
+# Scene definition
+##
+@configclass
+class InHandObjectSceneCfg(InteractiveSceneCfg):
+    """Configuration for a scene with an object and a dexterous hand."""
+    # robots
+    robot: ArticulationCfg = MISSING
+    # objects
+    object: RigidObjectCfg = RigidObjectCfg(
+        prim_path="{ENV_REGEX_NS}/object",
+        spawn=sim_utils.UsdFileCfg(
+            usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Blocks/DexCube/dex_cube_instanceable.usd",
+            rigid_props=sim_utils.RigidBodyPropertiesCfg(
+                kinematic_enabled=False,
+                disable_gravity=False,
+                enable_gyroscopic_forces=True,
+                solver_position_iteration_count=8,
+                solver_velocity_iteration_count=0,
+                sleep_threshold=0.005,
+                stabilization_threshold=0.0025,
+                max_depenetration_velocity=1000.0,
+            ),
+            mass_props=sim_utils.MassPropertiesCfg(density=400.0),
+        ),
+        init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, -0.19, 0.56), rot=(1.0, 0.0, 0.0, 0.0)),
+    )
+    # lights
+    light = AssetBaseCfg(
+        prim_path="/World/light",
+        spawn=sim_utils.DistantLightCfg(color=(0.95, 0.95, 0.95), intensity=1000.0),
+    )
+    dome_light = AssetBaseCfg(
+        prim_path="/World/domeLight",
+        spawn=sim_utils.DomeLightCfg(color=(0.02, 0.02, 0.02), intensity=1000.0),
+    )
+##
+# MDP settings
+##
+@configclass
+class CommandsCfg:
+    """Command specifications for the MDP."""
+    object_pose = mdp.InHandReOrientationCommandCfg(
+        asset_name="object",
+        init_pos_offset=(0.0, 0.0, -0.04),
+        update_goal_on_success=True,
+        orientation_success_threshold=0.1,
+        make_quat_unique=False,
+        marker_pos_offset=(-0.2, -0.06, 0.08),
+        debug_vis=True,
+    )
+@configclass
+class ActionsCfg:
+    """Action specifications for the MDP."""
+    joint_pos = mdp.EMAJointPositionToLimitsActionCfg(
+        asset_name="robot",
+        joint_names=[".*"],
+        alpha=0.95,
+        rescale_to_limits=True,
+    )
+@configclass
+class ObservationsCfg:
+    """Observation specifications for the MDP."""
+    @configclass
+    class KinematicObsGroupCfg(ObsGroup):
+        """Observations with full-kinematic state information.
+        This does not include acceleration or force information.
+        """
+        # observation terms (order preserved)
+        # -- robot terms
+        joint_pos = ObsTerm(func=mdp.joint_pos_limit_normalized, noise=Gnoise(std=0.005))
+        joint_vel = ObsTerm(func=mdp.joint_vel_rel, scale=0.2, noise=Gnoise(std=0.01))
+        # -- object terms
+        object_pos = ObsTerm(
+            func=mdp.root_pos_w, noise=Gnoise(std=0.002), params={"asset_cfg": SceneEntityCfg("object")}
+        )
+        object_quat = ObsTerm(
+            func=mdp.root_quat_w, params={"asset_cfg": SceneEntityCfg("object"), "make_quat_unique": False}
+        )
+        object_lin_vel = ObsTerm(
+            func=mdp.root_lin_vel_w, noise=Gnoise(std=0.002), params={"asset_cfg": SceneEntityCfg("object")}
+        )
+        object_ang_vel = ObsTerm(
+            func=mdp.root_ang_vel_w,
+            scale=0.2,
+            noise=Gnoise(std=0.002),
+            params={"asset_cfg": SceneEntityCfg("object")},
+        )
+        # -- command terms
+        goal_pose = ObsTerm(func=mdp.generated_commands, params={"command_name": "object_pose"})
+        goal_quat_diff = ObsTerm(
+            func=mdp.goal_quat_diff,
+            params={"asset_cfg": SceneEntityCfg("object"), "command_name": "object_pose", "make_quat_unique": False},
+        )
+        # -- action terms
+        last_action = ObsTerm(func=mdp.last_action)
+        def __post_init__(self):
+            self.enable_corruption = True
+            self.concatenate_terms = True
+    @configclass
+    class NoVelocityKinematicObsGroupCfg(KinematicObsGroupCfg):
+        """Observations with partial kinematic state information.
+        In contrast to the full-kinematic state group, this group does not include velocity information
+        about the robot joints and the object root frame. This is useful for tasks where velocity information
+        is not available or has a lot of noise.
+        """
+        def __post_init__(self):
+            # call parent post init
+            super().__post_init__()
+            # set unused terms to None
+            self.joint_vel = None
+            self.object_lin_vel = None
+            self.object_ang_vel = None
+    # observation groups
+    policy: KinematicObsGroupCfg = KinematicObsGroupCfg()
+@configclass
+class EventCfg:
+    """Configuration for randomization."""
+    # startup
+    # -- robot
+    robot_physics_material = EventTerm(
+        func=mdp.randomize_rigid_body_material,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
+            "static_friction_range": (0.7, 1.3),
+            "dynamic_friction_range": (0.7, 1.3),
+            "restitution_range": (0.0, 0.0),
+            "num_buckets": 250,
+        },
+    )
+    robot_scale_mass = EventTerm(
+        func=mdp.randomize_rigid_body_mass,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
+            "mass_range": (0.95, 1.05),
+            "operation": "scale",
+        },
+    )
+    robot_joint_stiffness_and_damping = EventTerm(
+        func=mdp.randomize_actuator_gains,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("robot", joint_names=".*"),
+            "stiffness_range": (0.3, 3.0),  # default: 3.0
+            "damping_range": (0.75, 1.5),  # default: 0.1
+            "operation": "scale",
+            "distribution": "log_uniform",
+        },
+    )
+    # -- object
+    object_physics_material = EventTerm(
+        func=mdp.randomize_rigid_body_material,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("object", body_names=".*"),
+            "static_friction_range": (0.7, 1.3),
+            "dynamic_friction_range": (0.7, 1.3),
+            "restitution_range": (0.0, 0.0),
+            "num_buckets": 250,
+        },
+    )
+    object_scale_mass = EventTerm(
+        func=mdp.randomize_rigid_body_mass,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("object"),
+            "mass_range": (0.4, 1.6),
+            "operation": "scale",
+        },
+    )
+    # reset
+    reset_object = EventTerm(
+        func=mdp.reset_root_state_uniform,
+        mode="reset",
+        params={
+            "pose_range": {"x": [-0.01, 0.01], "y": [-0.01, 0.01], "z": [-0.01, 0.01]},
+            "velocity_range": {},
+            "asset_cfg": SceneEntityCfg("object", body_names=".*"),
+        },
+    )
+    reset_robot_joints = EventTerm(
+        func=mdp.reset_joints_within_limits_range,
+        mode="reset",
+        params={
+            "position_range": {".*": [0.2, 0.2]},
+            "velocity_range": {".*": [0.0, 0.0]},
+            "use_default_offset": True,
+            "operation": "scale",
+        },
+    )
+@configclass
+class RewardsCfg:
+    """Reward terms for the MDP."""
+    # -- task
+    # track_pos_l2 = RewTerm(
+    #     func=mdp.track_pos_l2,
+    #     weight=-10.0,
+    #     params={"object_cfg": SceneEntityCfg("object"), "command_name": "object_pose"},
+    # )
+    track_orientation_inv_l2 = RewTerm(
+        func=mdp.track_orientation_inv_l2,
+        weight=1.0,
+        params={"object_cfg": SceneEntityCfg("object"), "rot_eps": 0.1, "command_name": "object_pose"},
+    )
+    success_bonus = RewTerm(
+        func=mdp.success_bonus,
+        weight=250.0,
+        params={"object_cfg": SceneEntityCfg("object"), "command_name": "object_pose"},
+    )
+    # -- penalties
+    joint_vel_l2 = RewTerm(func=mdp.joint_vel_l2, weight=-2.5e-5)
+    action_l2 = RewTerm(func=mdp.action_l2, weight=-0.0001)
+    action_rate_l2 = RewTerm(func=mdp.action_rate_l2, weight=-0.01)
+    # -- optional penalties (these are disabled by default)
+    # object_away_penalty = RewTerm(
+    #     func=mdp.is_terminated_term,
+    #     weight=-0.0,
+    #     params={"term_keys": "object_out_of_reach"},
+    # )
+@configclass
+class TerminationsCfg:
+    """Termination terms for the MDP."""
+    time_out = DoneTerm(func=mdp.time_out, time_out=True)
+    max_consecutive_success = DoneTerm(
+        func=mdp.max_consecutive_success, params={"num_success": 50, "command_name": "object_pose"}
+    )
+    object_out_of_reach = DoneTerm(func=mdp.object_away_from_robot, params={"threshold": 0.3})
+    # object_out_of_reach = DoneTerm(
+    #     func=mdp.object_away_from_goal, params={"threshold": 0.24, "command_name": "object_pose"}
+    # )
+##
+# Environment configuration
+##
+@configclass
+class InHandObjectEnvCfg(RLTaskEnvCfg):
+    """Configuration for the in hand reorientation environment."""
+    # Scene settings
+    scene: InHandObjectSceneCfg = InHandObjectSceneCfg(num_envs=8192, env_spacing=0.6)
+    # Simulation settings
+    sim: SimulationCfg = SimulationCfg(
+        physics_material=RigidBodyMaterialCfg(
+            static_friction=1.0,
+            dynamic_friction=1.0,
+        ),
+        physx=PhysxCfg(
+            bounce_threshold_velocity=0.2,
+            gpu_max_rigid_contact_count=2**20,
+            gpu_max_rigid_patch_count=2**23,
+        ),
+    )
+    # Basic settings
+    observations: ObservationsCfg = ObservationsCfg()
+    actions: ActionsCfg = ActionsCfg()
+    commands: CommandsCfg = CommandsCfg()
+    # MDP settings
+    rewards: RewardsCfg = RewardsCfg()
+    terminations: TerminationsCfg = TerminationsCfg()
+    events: EventCfg = EventCfg()
+    def __post_init__(self):
+        """Post initialization."""
+        # general settings
+        self.decimation = 4
+        self.episode_length_s = 20.0
+        # simulation settings
+        self.sim.dt = 1.0 / 120.0
+        # change viewer settings
+        self.viewer.eye = (2.0, 2.0, 2.0)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""This sub-module contains the functions that are specific to the in-hand manipulation environments."""
+from omni.isaac.orbit.envs.mdp import *  # noqa: F401, F403
+from .commands import *  # noqa: F401, F403
+from .events import *  # noqa: F401, F403
+from .observations import *  # noqa: F401, F403
+from .rewards import *  # noqa: F401, F403
+from .terminations import *  # noqa: F401, F403
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Sub-module containing command terms for 3D orientation goals."""
+from .commands_cfg import InHandReOrientationCommandCfg  # noqa: F401
+from .orientation_command import InHandReOrientationCommand  # noqa: F401
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/commands_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/commands_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+from dataclasses import MISSING
+import omni.isaac.orbit.sim as sim_utils
+from omni.isaac.orbit.managers import CommandTermCfg
+from omni.isaac.orbit.markers import VisualizationMarkersCfg
+from omni.isaac.orbit.utils import configclass
+from omni.isaac.orbit.utils.assets import ISAAC_NUCLEUS_DIR
+from .orientation_command import InHandReOrientationCommand
+@configclass
+class InHandReOrientationCommandCfg(CommandTermCfg):
+    """Configuration for the uniform 3D orientation command term.
+    Please refer to the :class:`InHandReOrientationCommand` class for more details.
+    """
+    class_type: type = InHandReOrientationCommand
+    resampling_time_range: tuple[float, float] = (1e6, 1e6)  # no resampling based on time
+    asset_name: str = MISSING
+    """Name of the asset in the environment for which the commands are generated."""
+    init_pos_offset: tuple[float, float, float] = (0.0, 0.0, 0.0)
+    """Position offset of the asset from its default position.
+    This is used to account for the offset typically present in the object's default position
+    so that the object is spawned at a height above the robot's palm. When the position command
+    is generated, the object's default position is used as the reference and the offset specified
+    is added to it to get the desired position of the object.
+    """
+    make_quat_unique: bool = MISSING
+    """Whether to make the quaternion unique or not.
+    If True, the quaternion is made unique by ensuring the real part is positive.
+    """
+    orientation_success_threshold: float = MISSING
+    """Threshold for the orientation error to consider the goal orientation to be reached."""
+    update_goal_on_success: bool = MISSING
+    """Whether to update the goal orientation when the goal orientation is reached."""
+    marker_pos_offset: tuple[float, float, float] = (0.0, 0.0, 0.0)
+    """Position offset of the marker from the object's desired position.
+    This is useful to position the marker at a height above the object's desired position.
+    Otherwise, the marker may occlude the object in the visualization.
+    """
+    visualizer_cfg: VisualizationMarkersCfg = VisualizationMarkersCfg(
+        prim_path="/Visuals/Command/goal_marker",
+        markers={
+            "goal": sim_utils.UsdFileCfg(
+                usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Blocks/DexCube/dex_cube_instanceable.usd",
+                scale=(1.0, 1.0, 1.0),
+            ),
+        },
+    )
+    """Configuration for the visualization markers. Default is a cube marker."""
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/orientation_command.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/commands/orientation_command.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Sub-module containing command generators for 3D orientation goals for objects."""
+from __future__ import annotations
+import torch
+from collections.abc import Sequence
+from typing import TYPE_CHECKING
+import omni.isaac.orbit.utils.math as math_utils
+from omni.isaac.orbit.assets import RigidObject
+from omni.isaac.orbit.managers import CommandTerm
+from omni.isaac.orbit.markers.visualization_markers import VisualizationMarkers
+if TYPE_CHECKING:
+    from omni.isaac.orbit.envs import RLTaskEnv
+    from .commands_cfg import InHandReOrientationCommandCfg
+class InHandReOrientationCommand(CommandTerm):
+    """Command term that generates 3D pose commands for in-hand manipulation task.
+    This command term generates 3D orientation commands for the object. The orientation commands
+    are sampled uniformly from the 3D orientation space. The position commands are the default
+    root state of the object.
+    The constant position commands is to encourage that the object does not move during the task.
+    For instance, the object should not fall off the robot's palm.
+    Unlike typical command terms, where the goals are resampled based on time, this command term
+    does not resample the goals based on time. Instead, the goals are resampled when the object
+    reaches the goal orientation. The goal orientation is considered to be reached when the
+    orientation error is below a certain threshold.
+    """
+    cfg: InHandReOrientationCommandCfg
+    """Configuration for the command term."""
+    def __init__(self, cfg: InHandReOrientationCommandCfg, env: RLTaskEnv):
+        """Initialize the command term class.
+        Args:
+            cfg: The configuration parameters for the command term.
+            env: The environment object.
+        """
+        # initialize the base class
+        super().__init__(cfg, env)
+        # object
+        self.object: RigidObject = env.scene[cfg.asset_name]
+        # create buffers to store the command
+        # -- command: (x, y, z)
+        init_pos_offset = torch.tensor(cfg.init_pos_offset, dtype=torch.float, device=self.device)
+        self.pos_command_e = self.object.data.default_root_state[:, :3] + init_pos_offset
+        self.pos_command_w = self.pos_command_e + self._env.scene.env_origins
+        # -- orientation: (w, x, y, z)
+        self.quat_command_w = torch.zeros(self.num_envs, 4, device=self.device)
+        self.quat_command_w[:, 0] = 1.0  # set the scalar component to 1.0
+        # -- unit vectors
+        self._X_UNIT_VEC = torch.tensor([1.0, 0, 0], device=self.device).repeat((self.num_envs, 1))
+        self._Y_UNIT_VEC = torch.tensor([0, 1.0, 0], device=self.device).repeat((self.num_envs, 1))
+        self._Z_UNIT_VEC = torch.tensor([0, 0, 1.0], device=self.device).repeat((self.num_envs, 1))
+        # -- metrics
+        self.metrics["orientation_error"] = torch.zeros(self.num_envs, device=self.device)
+        self.metrics["position_error"] = torch.zeros(self.num_envs, device=self.device)
+        self.metrics["consecutive_success"] = torch.zeros(self.num_envs, device=self.device)
+    def __str__(self) -> str:
+        msg = "InHandManipulationCommandGenerator:\n"
+        msg += f"\tCommand dimension: {tuple(self.command.shape[1:])}\n"
+        return msg
+    """
+    Properties
+    """
+    @property
+    def command(self) -> torch.Tensor:
+        """The desired goal pose in the environment frame. Shape is (num_envs, 7)."""
+        return torch.cat((self.pos_command_e, self.quat_command_w), dim=-1)
+    """
+    Implementation specific functions.
+    """
+    def _update_metrics(self):
+        # logs data
+        # -- compute the orientation error
+        self.metrics["orientation_error"] = math_utils.quat_error_magnitude(
+            self.object.data.root_quat_w, self.quat_command_w
+        )
+        # -- compute the position error
+        self.metrics["position_error"] = torch.norm(self.object.data.root_pos_w - self.pos_command_w, dim=1)
+        # -- compute the number of consecutive successes
+        successes = self.metrics["orientation_error"] < self.cfg.orientation_success_threshold
+        self.metrics["consecutive_success"] += successes.float()
+    def _resample_command(self, env_ids: Sequence[int]):
+        # sample new orientation targets
+        rand_floats = 2.0 * torch.rand((len(env_ids), 2), device=self.device) - 1.0
+        # rotate randomly about x-axis and then y-axis
+        quat = math_utils.quat_mul(
+            math_utils.quat_from_angle_axis(rand_floats[:, 0] * torch.pi, self._X_UNIT_VEC[env_ids]),
+            math_utils.quat_from_angle_axis(rand_floats[:, 1] * torch.pi, self._Y_UNIT_VEC[env_ids]),
+        )
+        # make sure the quaternion real-part is always positive
+        self.quat_command_w[env_ids] = math_utils.quat_unique(quat) if self.cfg.make_quat_unique else quat
+    def _update_command(self):
+        # update the command if goal is reached
+        if self.cfg.update_goal_on_success:
+            # compute the goal resets
+            goal_resets = self.metrics["orientation_error"] < self.cfg.orientation_success_threshold
+            goal_reset_ids = goal_resets.nonzero(as_tuple=False).squeeze(-1)
+            # resample the goals
+            self._resample(goal_reset_ids)
+    def _set_debug_vis_impl(self, debug_vis: TYPE_CHECKING):
+        # set visibility of markers
+        # note: parent only deals with callbacks. not their visibility
+        if debug_vis:
+            # create markers if necessary for the first time
+            if not hasattr(self, "goal_marker_visualizer"):
+                self.goal_marker_visualizer = VisualizationMarkers(self.cfg.visualizer_cfg)
+            # set visibility
+            self.goal_marker_visualizer.set_visibility(True)
+        else:
+            if hasattr(self, "goal_marker_visualizer"):
+                self.goal_marker_visualizer.set_visibility(False)
+    def _debug_vis_callback(self, event):
+        # add an offset to the marker position to visualize the goal
+        marker_pos = self.pos_command_w + torch.tensor(self.cfg.marker_pos_offset, device=self.device)
+        marker_quat = self.quat_command_w
+        # visualize the goal marker
+        self.goal_marker_visualizer.visualize(translations=marker_pos, orientations=marker_quat)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/events.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/events.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Functions specific to the in-hand dexterous manipulation environments."""
+from __future__ import annotations
+import torch
+from typing import TYPE_CHECKING, Literal
+from omni.isaac.orbit.assets import Articulation
+from omni.isaac.orbit.managers import EventTermCfg, ManagerTermBase, SceneEntityCfg
+from omni.isaac.orbit.utils.math import sample_uniform
+if TYPE_CHECKING:
+    from omni.isaac.orbit.envs import BaseEnv
+class reset_joints_within_limits_range(ManagerTermBase):
+    """Reset an articulation's joints to a random position in the given limit ranges.
+    This function samples random values for the joint position and velocities from the given limit ranges.
+    The values are then set into the physics simulation.
+    The parameters to the function are:
+    * :attr:`position_range` - a dictionary of position ranges for each joint. The keys of the dictionary are the
+      joint names (or regular expressions) of the asset.
+    * :attr:`velocity_range` - a dictionary of velocity ranges for each joint. The keys of the dictionary are the
+      joint names (or regular expressions) of the asset.
+    * :attr:`use_default_offset` - a boolean flag to indicate if the ranges are offset by the default joint state.
+      Defaults to False.
+    * :attr:`asset_cfg` - the configuration of the asset to reset. Defaults to the entity named "robot" in the scene.
+    * :attr:`operation` - whether the ranges are scaled values of the joint limits, or absolute limits.
+       Defaults to "abs".
+    The dictionary values are a tuple of the form ``(a, b)``. Based on the operation, these values are
+    interpreted differently:
+    * If the operation is "abs", the values are the absolute minimum and maximum values for the joint, i.e.
+      the joint range becomes ``[a, b]``.
+    * If the operation is "scale", the values are the scaling factors for the joint limits, i.e. the joint range
+      becomes ``[a * min_joint_limit, b * max_joint_limit]``.
+    If the ``a`` or the ``b`` value is ``None``, the joint limits are used instead.
+    Note:
+        If the dictionary does not contain a key, the joint position or joint velocity is set to the default value for
+        that joint.
+    """
+    def __init__(self, cfg: EventTermCfg, env: BaseEnv):
+        # initialize the base class
+        super().__init__(cfg, env)
+        # check if the cfg has the required parameters
+        if "position_range" not in cfg.params or "velocity_range" not in cfg.params:
+            raise ValueError(
+                "The term 'reset_joints_within_range' requires parameters: 'position_range' and 'velocity_range'."
+                f" Received: {list(cfg.params.keys())}."
+            )
+        # parse the parameters
+        asset_cfg: SceneEntityCfg = cfg.params.get("asset_cfg", SceneEntityCfg("robot"))
+        use_default_offset = cfg.params.get("use_default_offset", False)
+        operation = cfg.params.get("operation", "abs")
+        # check if the operation is valid
+        if operation not in ["abs", "scale"]:
+            raise ValueError(
+                f"For event 'reset_joints_within_limits_range', unknown operation: '{operation}'."
+                " Please use 'abs' or 'scale'."
+            )
+        # extract the used quantities (to enable type-hinting)
+        self._asset: Articulation = env.scene[asset_cfg.name]
+        default_joint_pos = self._asset.data.default_joint_pos[0]
+        default_joint_vel = self._asset.data.default_joint_vel[0]
+        # create buffers to store the joint position range
+        self._pos_ranges = self._asset.data.soft_joint_pos_limits[0].clone()
+        # parse joint position ranges
+        pos_joint_ids = []
+        for joint_name, joint_range in cfg.params["position_range"].items():
+            # find the joint ids
+            joint_ids = self._asset.find_joints(joint_name)[0]
+            pos_joint_ids.extend(joint_ids)
+            # set the joint position ranges based on the given values
+            if operation == "abs":
+                if joint_range[0] is not None:
+                    self._pos_ranges[joint_ids, 0] = joint_range[0]
+                if joint_range[1] is not None:
+                    self._pos_ranges[joint_ids, 1] = joint_range[1]
+            elif operation == "scale":
+                if joint_range[0] is not None:
+                    self._pos_ranges[joint_ids, 0] *= joint_range[0]
+                if joint_range[1] is not None:
+                    self._pos_ranges[joint_ids, 1] *= joint_range[1]
+            else:
+                raise ValueError(
+                    f"Unknown operation: '{operation}' for joint position ranges. Please use 'abs' or 'scale'."
+                )
+            # add the default offset
+            if use_default_offset:
+                self._pos_ranges[joint_ids] += default_joint_pos[joint_ids].unsqueeze(1)
+        # store the joint pos ids (used later to sample the joint positions)
+        self._pos_joint_ids = torch.tensor(pos_joint_ids, device=self._pos_ranges.device)
+        self._pos_ranges = self._pos_ranges[self._pos_joint_ids]
+        # create buffers to store the joint velocity range
+        self._vel_ranges = torch.stack(
+            [-self._asset.data.soft_joint_vel_limits[0], self._asset.data.soft_joint_vel_limits[0]], dim=1
+        )
+        # parse joint velocity ranges
+        vel_joint_ids = []
+        for joint_name, joint_range in cfg.params["velocity_range"].items():
+            # find the joint ids
+            joint_ids = self._asset.find_joints(joint_name)[0]
+            vel_joint_ids.extend(joint_ids)
+            # set the joint position ranges based on the given values
+            if operation == "abs":
+                if joint_range[0] is not None:
+                    self._vel_ranges[joint_ids, 0] = joint_range[0]
+                if joint_range[1] is not None:
+                    self._vel_ranges[joint_ids, 1] = joint_range[1]
+            elif operation == "scale":
+                if joint_range[0] is not None:
+                    self._vel_ranges[joint_ids, 0] = joint_range[0] * self._vel_ranges[joint_ids, 0]
+                if joint_range[1] is not None:
+                    self._vel_ranges[joint_ids, 1] = joint_range[1] * self._vel_ranges[joint_ids, 1]
+            else:
+                raise ValueError(
+                    f"Unknown operation: '{operation}' for joint velocity ranges. Please use 'abs' or 'scale'."
+                )
+            # add the default offset
+            if use_default_offset:
+                self._vel_ranges[joint_ids] += default_joint_vel[joint_ids].unsqueeze(1)
+        # store the joint vel ids (used later to sample the joint positions)
+        self._vel_joint_ids = torch.tensor(vel_joint_ids, device=self._vel_ranges.device)
+        self._vel_ranges = self._vel_ranges[self._vel_joint_ids]
+    def __call__(
+        self,
+        env: BaseEnv,
+        env_ids: torch.Tensor,
+        position_range: dict[str, tuple[float | None, float | None]],
+        velocity_range: dict[str, tuple[float | None, float | None]],
+        use_default_offset: bool = False,
+        asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"),
+        operation: Literal["abs", "scale"] = "abs",
+    ):
+        # get default joint state
+        joint_pos = self._asset.data.default_joint_pos[env_ids].clone()
+        joint_vel = self._asset.data.default_joint_vel[env_ids].clone()
+        # sample random joint positions for each joint
+        if len(self._pos_joint_ids) > 0:
+            joint_pos_shape = (len(env_ids), len(self._pos_joint_ids))
+            joint_pos[:, self._pos_joint_ids] = sample_uniform(
+                self._pos_ranges[:, 0], self._pos_ranges[:, 1], joint_pos_shape, device=joint_pos.device
+            )
+            # clip the joint positions to the joint limits
+            joint_pos_limits = self._asset.data.soft_joint_pos_limits[0, self._pos_joint_ids]
+            joint_pos = joint_pos.clamp(joint_pos_limits[:, 0], joint_pos_limits[:, 1])
+        # sample random joint velocities for each joint
+        if len(self._vel_joint_ids) > 0:
+            joint_vel_shape = (len(env_ids), len(self._vel_joint_ids))
+            joint_vel[:, self._vel_joint_ids] = sample_uniform(
+                self._vel_ranges[:, 0], self._vel_ranges[:, 1], joint_vel_shape, device=joint_vel.device
+            )
+            # clip the joint velocities to the joint limits
+            joint_vel_limits = self._asset.data.soft_joint_vel_limits[0, self._vel_joint_ids]
+            joint_vel = joint_vel.clamp(-joint_vel_limits, joint_vel_limits)
+        # set into the physics simulation
+        self._asset.write_joint_state_to_sim(joint_pos, joint_vel, env_ids=env_ids)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/observations.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/observations.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Functions specific to the in-hand dexterous manipulation environments."""
+import torch
+from typing import TYPE_CHECKING
+import omni.isaac.orbit.utils.math as math_utils
+from omni.isaac.orbit.assets import RigidObject
+from omni.isaac.orbit.envs import RLTaskEnv
+from omni.isaac.orbit.managers import SceneEntityCfg
+if TYPE_CHECKING:
+    from .commands import InHandReOrientationCommand
+def goal_quat_diff(
+    env: RLTaskEnv, asset_cfg: SceneEntityCfg, command_name: str, make_quat_unique: bool
+) -> torch.Tensor:
+    """Goal orientation relative to the asset's root frame.
+    The quaternion is represented as (w, x, y, z). The real part is always positive.
+    """
+    # extract useful elements
+    asset: RigidObject = env.scene[asset_cfg.name]
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    # obtain the orientations
+    goal_quat_w = command_term.command[:, 3:7]
+    asset_quat_w = asset.data.root_quat_w
+    # compute quaternion difference
+    quat = math_utils.quat_mul(asset_quat_w, math_utils.quat_conjugate(goal_quat_w))
+    # make sure the quaternion real-part is always positive
+    return math_utils.quat_unique(quat) if make_quat_unique else quat
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/rewards.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/rewards.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Functions specific to the in-hand dexterous manipulation environments."""
+import torch
+from typing import TYPE_CHECKING
+import omni.isaac.orbit.utils.math as math_utils
+from omni.isaac.orbit.assets import RigidObject
+from omni.isaac.orbit.envs import RLTaskEnv
+from omni.isaac.orbit.managers import SceneEntityCfg
+if TYPE_CHECKING:
+    from .commands import InHandReOrientationCommand
+def success_bonus(
+    env: RLTaskEnv, command_name: str, object_cfg: SceneEntityCfg = SceneEntityCfg("object")
+) -> torch.Tensor:
+    """Bonus reward for successfully reaching the goal.
+    The object is considered to have reached the goal when the object orientation is within the threshold.
+    The reward is 1.0 if the object has reached the goal, otherwise 0.0.
+    Args:
+        env: The environment object.
+        command_name: The command term to be used for extracting the goal.
+        object_cfg: The configuration for the scene entity. Default is "object".
+    """
+    # extract useful elements
+    asset: RigidObject = env.scene[object_cfg.name]
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    # obtain the goal orientation
+    goal_quat_w = command_term.command[:, 3:7]
+    # obtain the threshold for the orientation error
+    threshold = command_term.cfg.orientation_success_threshold
+    # calculate the orientation error
+    dtheta = math_utils.quat_error_magnitude(asset.data.root_quat_w, goal_quat_w)
+    return dtheta <= threshold
+def track_pos_l2(
+    env: RLTaskEnv, command_name: str, object_cfg: SceneEntityCfg = SceneEntityCfg("object")
+) -> torch.Tensor:
+    """Reward for tracking the object position using the L2 norm.
+    The reward is the distance between the object position and the goal position.
+    Args:
+        env: The environment object.
+        command_term: The command term to be used for extracting the goal.
+        object_cfg: The configuration for the scene entity. Default is "object".
+    """
+    # extract useful elements
+    asset: RigidObject = env.scene[object_cfg.name]
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    # obtain the goal position
+    goal_pos_e = command_term.command[:, 0:3]
+    # obtain the object position in the environment frame
+    object_pos_e = asset.data.root_pos_w - env.scene.env_origins
+    return torch.norm(goal_pos_e - object_pos_e, p=2, dim=-1)
+def track_orientation_inv_l2(
+    env: RLTaskEnv,
+    command_name: str,
+    object_cfg: SceneEntityCfg = SceneEntityCfg("object"),
+    rot_eps: float = 1e-3,
+) -> torch.Tensor:
+    """Reward for tracking the object orientation using the inverse of the orientation error.
+    The reward is the inverse of the orientation error between the object orientation and the goal orientation.
+    Args:
+        env: The environment object.
+        command_name: The command term to be used for extracting the goal.
+        object_cfg: The configuration for the scene entity. Default is "object".
+        rot_eps: The threshold for the orientation error. Default is 1e-3.
+    """
+    # extract useful elements
+    asset: RigidObject = env.scene[object_cfg.name]
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    # obtain the goal orientation
+    goal_quat_w = command_term.command[:, 3:7]
+    # calculate the orientation error
+    dtheta = math_utils.quat_error_magnitude(asset.data.root_quat_w, goal_quat_w)
+    return 1.0 / (dtheta + rot_eps)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/terminations.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/inhand/mdp/terminations.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+"""Functions specific to the in-hand dexterous manipulation environments."""
+import torch
+from typing import TYPE_CHECKING
+from omni.isaac.orbit.envs import RLTaskEnv
+from omni.isaac.orbit.managers import SceneEntityCfg
+if TYPE_CHECKING:
+    from .commands import InHandReOrientationCommand
+def max_consecutive_success(env: RLTaskEnv, num_success: int, command_name: str) -> torch.Tensor:
+    """Check if the task has been completed consecutively for a certain number of times.
+    Args:
+        env: The environment object.
+        num_success: Threshold for the number of consecutive successes required.
+        command_name: The command term to be used for extracting the goal.
+    """
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    return command_term.metrics["consecutive_success"] >= num_success
+def object_away_from_goal(
+    env: RLTaskEnv,
+    threshold: float,
+    command_name: str,
+    object_cfg: SceneEntityCfg = SceneEntityCfg("object"),
+) -> torch.Tensor:
+    """Check if object has gone far from the goal.
+    The object is considered to be out-of-reach if the distance between the goal and the object is greater
+    than the threshold.
+    Args:
+        env: The environment object.
+        threshold: The threshold for the distance between the robot and the object.
+        command_name: The command term to be used for extracting the goal.
+        object_cfg: The configuration for the scene entity. Default is "object".
+    """
+    # extract useful elements
+    command_term: InHandReOrientationCommand = env.command_manager.get_term(command_name)
+    asset = env.scene[object_cfg.name]
+    # object pos
+    asset_pos_e = asset.data.root_pos_w - env.scene.env_origins
+    goal_pos_e = command_term.command[:, :3]
+    return torch.norm(asset_pos_e - goal_pos_e, p=2, dim=1) > threshold
+def object_away_from_robot(
+    env: RLTaskEnv,
+    threshold: float,
+    asset_cfg: SceneEntityCfg = SceneEntityCfg("robot"),
+    object_cfg: SceneEntityCfg = SceneEntityCfg("object"),
+) -> torch.Tensor:
+    """Check if object has gone far from the robot.
+    The object is considered to be out-of-reach if the distance between the robot and the object is greater
+    than the threshold.
+    Args:
+        env: The environment object.
+        threshold: The threshold for the distance between the robot and the object.
+        asset_cfg: The configuration for the robot entity. Default is "robot".
+        object_cfg: The configuration for the object entity. Default is "object".
+    """
+    # extract useful elements
+    robot = env.scene[asset_cfg.name]
+    object = env.scene[object_cfg.name]
+    # compute distance
+    dist = torch.norm(robot.data.root_pos_w - object.data.root_pos_w, dim=1)
+    return dist > threshold
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/importer.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/importer.py
@@ -12,7 +12,7 @@ import pkgutil
 import sys
-def import_packages(package_name: str, blacklist_pkgs: list[str] = None):
+def import_packages(package_name: str, blacklist_pkgs: list[str] | None = None):
    """Import all sub-packages in a package recursively.
    It is easier to use this function to import all sub-packages in a package recursively