Adds Franka cabinet opening environment from IsaacGymEnvs (#44)

# Description This MR adds the cabinet environment from IsaacGymEnvs. The training is checked for RSL-RL and RL-Games ## Type of change - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Co-authored-by: zrene <rene.zurbruegg@gmail.com>

Adds Franka cabinet opening environment from IsaacGymEnvs (#44)
# Description This MR adds the cabinet environment from IsaacGymEnvs. The training is checked for RSL-RL and RL-Games ## Type of change - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Co-authored-by: zrene <rene.zurbruegg@gmail.com>
e7506fea · Mayank Mittal · GitHub · 2dc138a5 · e7506fea · e7506fea
Unverified Commit e7506fea authored Mar 10, 2024 by Mayank Mittal Committed by GitHub Mar 10, 2024
31 changed files
--- a/docs/source/_static/tasks/manipulation/franka_open_drawer.jpg
+++ b/docs/source/_static/tasks/manipulation/franka_open_drawer.jpg
--- a/docs/source/features/environments.rst
+++ b/docs/source/features/environments.rst
@@ -45,6 +45,12 @@ Manipulation

 Environments based on fixed-arm manipulation tasks.

+For many of these tasks, we include configurations with different arm action spaces. For example,
+for the reach environment:
+
+* |lift-cube-link|: Franka arm with joint position control
+* |lift-cube-ik-abs-link|: Franka arm with absolute IK control
+* |lift-cube-ik-rel-link|: Franka arm with relative IK control

 .. table::
    :widths: 33 37 30
@@ -58,15 +64,20 @@ Environments based on fixed-arm manipulation tasks.
    +----------------+---------------------+-----------------------------------------------------------------------------+
    | |lift-cube|    | |lift-cube-link|    | Pick a cube and bring it to a sampled target position with the Franka robot |
    +----------------+---------------------+-----------------------------------------------------------------------------+
+    | |cabi-franka|  | |cabi-franka-link|  | Grasp the handle of a cabinet's drawer and open it with the Franka robot    |
+    +----------------+---------------------+-----------------------------------------------------------------------------+

 .. |reach-franka| image:: ../_static/tasks/manipulation/franka_reach.jpg
 .. |reach-ur10| image:: ../_static/tasks/manipulation/ur10_reach.jpg
 .. |lift-cube| image:: ../_static/tasks/manipulation/franka_lift.jpg
+.. |cabi-franka| image:: ../_static/tasks/manipulation/franka_open_drawer.jpg

-
-.. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/env_cfg.py>`__
-.. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/env_cfg.py>`__
+.. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/joint_pos_env_cfg.py>`__
+.. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/joint_pos_env_cfg.py>`__
 .. |lift-cube-link| replace:: `Isaac-Lift-Cube-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/joint_pos_env_cfg.py>`__
+.. |lift-cube-ik-abs-link| replace:: `Isaac-Lift-Cube-Franka-IK-Abs-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_abs_env_cfg.py>`__
+.. |lift-cube-ik-rel-link| replace:: `Isaac-Lift-Cube-Franka-IK-Rel-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_rel_env_cfg.py>`__
+.. |cabi-franka-link| replace:: `Isaac-Open-Drawer-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/joint_pos_env_cfg.py>`__


 Locomotion

--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/ui/viewport_camera_controller.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/ui/viewport_camera_controller.py
@@ -65,6 +65,9 @@ class ViewportCameraController:
            # at each rendering step.
            if self.cfg.asset_name is None:
                raise ValueError(f"No asset name provided for viewer with origin type: '{self.cfg.origin_type}'.")
+        else:
+            # set the camera origin to the center of the world
+            self.update_view_to_world()

        # subscribe to post update event so that camera view can be updated at each rendering step
        app_interface = omni.kit.app.get_app_interface()

--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/manager_term_cfg.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/manager_term_cfg.py
@@ -103,12 +103,13 @@ class CommandTermCfg:
 class CurriculumTermCfg(ManagerTermBaseCfg):
    """Configuration for a curriculum term."""

-    func: Callable[..., float | dict[str, float]] = MISSING
+    func: Callable[..., float | dict[str, float] | None] = MISSING
    """The name of the function to be called.

    This function should take the environment object, environment indices
    and any other parameters as input and return the curriculum state for
-    logging purposes.
+    logging purposes. If the function returns None, the curriculum state
+    is not logged.
    """



--- a/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
+++ b/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.5.7"
+version = "0.6.0"

 # Description
 title = "ORBIT Environments"

--- a/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
 Changelog
 ---------

+0.6.0 (2024-03-10)
+~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added a new environment ``Isaac-Open-Drawer-Franka-v0`` for the Franka arm to open a drawer. It is
+  based on the IsaacGymEnvs cabinet environment.
+
+Fixed
+^^^^^
+
+* Fixed logging of extra information for RL-Games wrapper. It expected the extra information to be under the
+  key ``"episode"``, but Orbit used the key ``"log"``. The wrapper now remaps the key to ``"episode"``.
+
+
 0.5.7 (2024-02-28)
 ~~~~~~~~~~~~~~~~~~


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""Manipulation environments to open drawers in a cabinet."""
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/sb3_ppo_cfg.yaml
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/sb3_ppo_cfg.yaml
-# Reference: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml#L32
-seed: 42
-
-# 512×1500×64
-n_timesteps: 49152000
-policy: 'MlpPolicy'
-n_steps: 64
-batch_size: 4096
-gae_lambda: 0.95
-gamma: 0.99
-n_epochs: 20
-ent_coef: 0.0
-learning_rate: !!float 5e-4
-clip_range: 0.2
-policy_kwargs: "dict(
-                  log_std_init=-2,
-                  ortho_init=False,
-                  activation_fn=nn.ELU,
-                  net_arch=[256, 128, 64]
-                )"
-
-
-target_kl: 0.008
-max_grad_norm: 1.0
-
-# Uses VecNormalize class to normalize obs
-normalize_input: True
-# Uses VecNormalize class to normalize rew
-normalize_value: True
-clip_obs: 5.0
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/cabinet_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/cabinet_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Copyright (c) 2022-2023, The ORBIT Project Developers.
+# All rights reserved.
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+from dataclasses import MISSING
+
+import omni.isaac.orbit.sim as sim_utils
+from omni.isaac.orbit.actuators.actuator_cfg import ImplicitActuatorCfg
+from omni.isaac.orbit.assets import ArticulationCfg, AssetBaseCfg
+from omni.isaac.orbit.envs import RLTaskEnvCfg
+from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
+from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
+from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
+from omni.isaac.orbit.managers import RewardTermCfg as RewTerm
+from omni.isaac.orbit.managers import SceneEntityCfg
+from omni.isaac.orbit.managers import TerminationTermCfg as DoneTerm
+from omni.isaac.orbit.scene import InteractiveSceneCfg
+from omni.isaac.orbit.sensors import FrameTransformerCfg
+from omni.isaac.orbit.sensors.frame_transformer import OffsetCfg
+from omni.isaac.orbit.utils import configclass
+from omni.isaac.orbit.utils.assets import ISAAC_NUCLEUS_DIR
+
+from . import mdp
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit.markers.config import FRAME_MARKER_CFG  # isort: skip
+
+
+FRAME_MARKER_SMALL_CFG = FRAME_MARKER_CFG.copy()
+FRAME_MARKER_SMALL_CFG.markers["frame"].scale = (0.10, 0.10, 0.10)
+
+
+##
+# Scene definition
+##
+
+
+@configclass
+class CabinetSceneCfg(InteractiveSceneCfg):
+    """Configuration for the cabinet scene with a robot and a cabinet.
+
+    This is the abstract base implementation, the exact scene is defined in the derived classes
+    which need to set the robot and end-effector frames
+    """
+
+    # robots, Will be populated by agent env cfg
+    robot: ArticulationCfg = MISSING
+    # End-effector, Will be populated by agent env cfg
+    ee_frame: FrameTransformerCfg = MISSING
+
+    cabinet = ArticulationCfg(
+        prim_path="{ENV_REGEX_NS}/Cabinet",
+        spawn=sim_utils.UsdFileCfg(
+            usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Sektion_Cabinet/sektion_cabinet_instanceable.usd",
+            activate_contact_sensors=False,
+        ),
+        init_state=ArticulationCfg.InitialStateCfg(
+            pos=(0.8, 0, 0.4),
+            rot=(0.0, 0.0, 0.0, 1.0),
+            joint_pos={
+                "door_left_joint": 0.0,
+                "door_right_joint": 0.0,
+                "drawer_bottom_joint": 0.0,
+                "drawer_top_joint": 0.0,
+            },
+        ),
+        actuators={
+            "drawers": ImplicitActuatorCfg(
+                joint_names_expr=["drawer_top_joint", "drawer_bottom_joint"],
+                effort_limit=87.0,
+                velocity_limit=100.0,
+                stiffness=10.0,
+                damping=1.0,
+            ),
+            "doors": ImplicitActuatorCfg(
+                joint_names_expr=["door_left_joint", "door_right_joint"],
+                effort_limit=87.0,
+                velocity_limit=100.0,
+                stiffness=10.0,
+                damping=2.5,
+            ),
+        },
+    )
+
+    # Frame definitions for the cabinet.
+    cabinet_frame = FrameTransformerCfg(
+        prim_path="{ENV_REGEX_NS}/Cabinet/sektion",
+        debug_vis=True,
+        visualizer_cfg=FRAME_MARKER_SMALL_CFG.replace(prim_path="/Visuals/CabinetFrameTransformer"),
+        target_frames=[
+            FrameTransformerCfg.FrameCfg(
+                prim_path="{ENV_REGEX_NS}/Cabinet/drawer_handle_top",
+                name="drawer_handle_top",
+                offset=OffsetCfg(
+                    pos=(0.305, 0.0, 0.01),
+                    rot=(0.5, 0.5, -0.5, -0.5),  # align with end-effector frame
+                ),
+            ),
+        ],
+    )
+
+    # plane
+    plane = AssetBaseCfg(
+        prim_path="/World/GroundPlane",
+        init_state=AssetBaseCfg.InitialStateCfg(),
+        spawn=sim_utils.GroundPlaneCfg(),
+        collision_group=-1,
+    )
+
+    # lights
+    light = AssetBaseCfg(
+        prim_path="/World/light",
+        spawn=sim_utils.DomeLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
+    )
+
+
+##
+# MDP settings
+##
+
+
+@configclass
+class CommandsCfg:
+    """Command terms for the MDP."""
+
+    null_command = mdp.NullCommandCfg()
+
+
+@configclass
+class ActionsCfg:
+    """Action specifications for the MDP."""
+
+    body_joint_pos: mdp.JointPositionActionCfg = MISSING
+    finger_joint_pos: mdp.BinaryJointPositionActionCfg = MISSING
+
+
+@configclass
+class ObservationsCfg:
+    """Observation specifications for the MDP."""
+
+    @configclass
+    class PolicyCfg(ObsGroup):
+        """Observations for policy group."""
+
+        joint_pos = ObsTerm(func=mdp.joint_pos_rel)
+        joint_vel = ObsTerm(func=mdp.joint_vel_rel)
+        cabinet_joint_pos = ObsTerm(
+            func=mdp.joint_pos_rel,
+            params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
+        )
+        cabinet_joint_vel = ObsTerm(
+            func=mdp.joint_vel_rel,
+            params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
+        )
+        rel_ee_drawer_distance = ObsTerm(func=mdp.rel_ee_drawer_distance)
+
+        actions = ObsTerm(func=mdp.last_action)
+
+        def __post_init__(self):
+            self.enable_corruption = True
+            self.concatenate_terms = True
+
+    # observation groups
+    policy: PolicyCfg = PolicyCfg()
+
+
+@configclass
+class RandomizationCfg:
+    """Configuration for randomization."""
+
+    robot_physics_material = RandTerm(
+        func=mdp.randomize_rigid_body_material,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("robot", body_names=".*"),
+            "static_friction_range": (0.8, 1.25),
+            "dynamic_friction_range": (0.8, 1.25),
+            "restitution_range": (0.0, 0.0),
+            "num_buckets": 16,
+        },
+    )
+
+    cabinet_physics_material = RandTerm(
+        func=mdp.randomize_rigid_body_material,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("cabinet", body_names="drawer_handle_top"),
+            "static_friction_range": (1.0, 1.25),
+            "dynamic_friction_range": (1.25, 1.5),
+            "restitution_range": (0.0, 0.0),
+            "num_buckets": 16,
+        },
+    )
+
+    reset_all = RandTerm(func=mdp.reset_scene_to_default, mode="reset")
+
+    reset_robot_joints = RandTerm(
+        func=mdp.reset_joints_by_offset,
+        mode="reset",
+        params={
+            "position_range": (-0.1, 0.1),
+            "velocity_range": (0.0, 0.0),
+        },
+    )
+
+
+@configclass
+class RewardsCfg:
+    """Reward terms for the MDP."""
+
+    # 1. Approach the handle
+    approach_ee_handle = RewTerm(func=mdp.approach_ee_handle, weight=2.0, params={"threshold": 0.2})
+    align_ee_handle = RewTerm(func=mdp.align_ee_handle, weight=0.5)
+
+    # 2. Grasp the handle
+    approach_gripper_handle = RewTerm(func=mdp.approach_gripper_handle, weight=5.0, params={"offset": MISSING})
+    align_grasp_around_handle = RewTerm(func=mdp.align_grasp_around_handle, weight=0.125)
+    grasp_handle = RewTerm(
+        func=mdp.grasp_handle,
+        weight=0.5,
+        params={
+            "threshold": 0.03,
+            "open_joint_pos": MISSING,
+            "asset_cfg": SceneEntityCfg("robot", joint_names=MISSING),
+        },
+    )
+
+    # 3. Open the drawer
+    open_drawer_bonus = RewTerm(
+        func=mdp.open_drawer_bonus,
+        weight=7.5,
+        params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
+    )
+    multi_stage_open_drawer = RewTerm(
+        func=mdp.multi_stage_open_drawer,
+        weight=1.0,
+        params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
+    )
+
+    # 4. Penalize actions for cosmetic reasons
+    action_rate_l2 = RewTerm(func=mdp.action_rate_l2, weight=-1e-2)
+    joint_vel = RewTerm(func=mdp.joint_vel_l2, weight=-0.0001)
+
+
+@configclass
+class TerminationsCfg:
+    """Termination terms for the MDP."""
+
+    time_out = DoneTerm(func=mdp.time_out, time_out=True)
+
+
+##
+# Environment configuration
+##
+
+
+@configclass
+class CabinetEnvCfg(RLTaskEnvCfg):
+    """Configuration for the cabinet environment."""
+
+    # Scene settings
+    scene: CabinetSceneCfg = CabinetSceneCfg(num_envs=4096, env_spacing=2.0)
+    # Basic settings
+    observations: ObservationsCfg = ObservationsCfg()
+    actions: ActionsCfg = ActionsCfg()
+    commands: CommandsCfg = CommandsCfg()
+    # MDP settings
+    rewards: RewardsCfg = RewardsCfg()
+    terminations: TerminationsCfg = TerminationsCfg()
+    randomization: RandomizationCfg = RandomizationCfg()
+
+    def __post_init__(self):
+        """Post initialization."""
+        # general settings
+        self.decimation = 1
+        self.episode_length_s = 8.0
+        self.viewer.eye = (-2.0, 2.0, 2.0)
+        self.viewer.lookat = (0.8, 0.0, 0.5)
+        # simulation settings
+        self.sim.dt = 1 / 60  # 60Hz
+        self.sim.physx.bounce_threshold_velocity = 0.2
+        self.sim.physx.bounce_threshold_velocity = 0.01
+        self.sim.physx.friction_correlation_distance = 0.00625
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""Configurations for the cabinet environments."""
+
+# We leave this file empty since we don't want to expose any configs in this package directly.
+# We still need this file to import the "config" module in the parent package.
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+
+import gymnasium as gym
+
+from . import agents, ik_abs_env_cfg, ik_rel_env_cfg, joint_pos_env_cfg
+
+##
+# Register Gym environments.
+##
+
+##
+# Joint Position Control
+##
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": joint_pos_env_cfg.FrankaCabinetEnvCfg,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": joint_pos_env_cfg.FrankaCabinetEnvCfg_PLAY,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+
+##
+# Inverse Kinematics - Absolute Pose Control
+##
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-IK-Abs-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_abs_env_cfg.FrankaCabinetEnvCfg,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-IK-Abs-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_abs_env_cfg.FrankaCabinetEnvCfg_PLAY,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+##
+# Inverse Kinematics - Relative Pose Control
+##
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-IK-Rel-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_rel_env_cfg.FrankaCabinetEnvCfg,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+gym.register(
+    id="Isaac-Open-Drawer-Franka-IK-Rel-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_rel_env_cfg.FrankaCabinetEnvCfg_PLAY,
+        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/agents/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/agents/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from . import rsl_rl_cfg  # noqa: F401, F403
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/rl_games_ppo_cfg.yaml
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/rl_games_ppo_cfg.yaml
@@ -40,36 +40,37 @@ params:
  load_path: ''

  config:
-    name: cabinet
+    name: franka_open_drawer
    env_name: rlgpu
    device: 'cuda:0'
    device_name: 'cuda:0'
+    multi_gpu: False
    ppo: True
    mixed_precision: False
-    normalize_input: True
-    normalize_value: True
-    num_actors: -1
+    normalize_input: False
+    normalize_value: False
+    num_actors: -1  # configured from the script (based on num_envs)
    reward_shaper:
      scale_value: 1
-    normalize_advantage: True
+    normalize_advantage: False
    gamma: 0.99
    tau: 0.95
    learning_rate: 5e-4
    lr_schedule: adaptive
    kl_threshold: 0.008
-    score_to_win: 100000000
-    max_epochs: 300 #300
-    save_best_after: 100
-    save_frequency: 100
+    score_to_win: 200
+    max_epochs: 400
+    save_best_after: 50
+    save_frequency: 50
    print_stats: True
    grad_norm: 1.0
-    entropy_coef: 0.0
+    entropy_coef: 0.001
    truncate_grads: True
    e_clip: 0.2
-    horizon_length: 32 #64
+    horizon_length: 96
    minibatch_size: 4096
-    mini_epochs: 8
+    mini_epochs: 5
    critic_coef: 4
    clip_value: True
-    seq_len: 4
+    seq_length: 4
    bounds_loss_coef: 0.0001
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/rsl_rl_ppo_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/agents/rsl_rl_ppo_cfg.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-# TODO: be moved once CabinetEnv exists
+from omni.isaac.orbit.utils import configclass

 from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import (
    RslRlOnPolicyRunnerCfg,
@@ -11,34 +11,31 @@ from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import (
    RslRlPpoAlgorithmCfg,
 )

-CABINET_RSL_RL_PPO_CFG = RslRlOnPolicyRunnerCfg(
-    num_steps_per_env=64,
-    max_iterations=200,
-    save_interval=50,
-    experiment_name="cabinet",
-    run_name="",
-    resume=False,
-    load_run=-1,
-    load_checkpoint=-1,
-    empirical_normalization=False,
-    policy=RslRlPpoActorCriticCfg(
+
+@configclass
+class CabinetPPORunnerCfg(RslRlOnPolicyRunnerCfg):
+    num_steps_per_env = 96
+    max_iterations = 400
+    save_interval = 50
+    experiment_name = "franka_open_drawer"
+    empirical_normalization = False
+    policy = RslRlPpoActorCriticCfg(
        init_noise_std=1.0,
        actor_hidden_dims=[256, 128, 64],
        critic_hidden_dims=[256, 128, 64],
        activation="elu",
-    ),
-    algorithm=RslRlPpoAlgorithmCfg(
+    )
+    algorithm = RslRlPpoAlgorithmCfg(
        value_loss_coef=1.0,
        use_clipped_value_loss=True,
        clip_param=0.2,
-        entropy_coef=0.00,
-        num_learning_epochs=8,
-        num_mini_batches=8,
+        entropy_coef=1e-3,
+        num_learning_epochs=5,
+        num_mini_batches=4,
        learning_rate=5.0e-4,
        schedule="adaptive",
        gamma=0.99,
        lam=0.95,
-        desired_kl=0.008,
+        desired_kl=0.02,
        max_grad_norm=1.0,
-    ),
-)
+    )
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/ik_abs_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/ik_abs_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
+from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
+from omni.isaac.orbit.utils import configclass
+
+from . import joint_pos_env_cfg
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG  # isort: skip
+
+
+@configclass
+class FrankaCabinetEnvCfg(joint_pos_env_cfg.FrankaCabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+
+        # Set Franka as robot
+        # We switch here to a stiffer PD controller for IK tracking to be better.
+        self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+        # Set actions for the specific robot type (franka)
+        self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
+            asset_name="robot",
+            joint_names=["panda_joint.*"],
+            body_name="panda_hand",
+            controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=False, ik_method="dls"),
+            body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
+        )
+
+
+@configclass
+class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        self.scene.env_spacing = 2.5
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/ik_rel_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/ik_rel_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
+from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
+from omni.isaac.orbit.utils import configclass
+
+from . import joint_pos_env_cfg
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG  # isort: skip
+
+
+@configclass
+class FrankaCabinetEnvCfg(joint_pos_env_cfg.FrankaCabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+
+        # Set Franka as robot
+        # We switch here to a stiffer PD controller for IK tracking to be better.
+        self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+        # Set actions for the specific robot type (franka)
+        self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
+            asset_name="robot",
+            joint_names=["panda_joint.*"],
+            body_name="panda_hand",
+            controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=True, ik_method="dls"),
+            scale=0.5,
+            body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
+        )
+
+
+@configclass
+class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        self.scene.env_spacing = 2.5
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/joint_pos_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/joint_pos_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from omni.isaac.orbit.sensors import FrameTransformerCfg
+from omni.isaac.orbit.sensors.frame_transformer.frame_transformer_cfg import OffsetCfg
+from omni.isaac.orbit.utils import configclass
+
+from omni.isaac.orbit_tasks.manipulation.cabinet import mdp
+from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import CabinetEnvCfg
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets.franka import FRANKA_PANDA_CFG  # isort: skip
+from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import FRAME_MARKER_SMALL_CFG  # isort: skip
+
+
+@configclass
+class FrankaCabinetEnvCfg(CabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+
+        # Set franka as robot
+        self.scene.robot = FRANKA_PANDA_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+        # Set Actions for the specific robot type (franka)
+        self.actions.body_joint_pos = mdp.JointPositionActionCfg(
+            asset_name="robot",
+            joint_names=["panda_joint.*"],
+            scale=1.0,
+            use_default_offset=True,
+        )
+        self.actions.finger_joint_pos = mdp.BinaryJointPositionActionCfg(
+            asset_name="robot",
+            joint_names=["panda_finger.*"],
+            open_command_expr={"panda_finger_.*": 0.04},
+            close_command_expr={"panda_finger_.*": 0.0},
+        )
+
+        # Listens to the required transforms
+        # IMPORTANT: The order of the frames in the list is important. The first frame is the tool center point (TCP)
+        # the other frames are the fingers
+        self.scene.ee_frame = FrameTransformerCfg(
+            prim_path="{ENV_REGEX_NS}/Robot/panda_link0",
+            debug_vis=False,
+            visualizer_cfg=FRAME_MARKER_SMALL_CFG.replace(prim_path="/Visuals/EndEffectorFrameTransformer"),
+            target_frames=[
+                FrameTransformerCfg.FrameCfg(
+                    prim_path="{ENV_REGEX_NS}/Robot/panda_hand",
+                    name="ee_tcp",
+                    offset=OffsetCfg(
+                        pos=(0.0, 0.0, 0.1034),
+                    ),
+                ),
+                FrameTransformerCfg.FrameCfg(
+                    prim_path="{ENV_REGEX_NS}/Robot/panda_leftfinger",
+                    name="tool_leftfinger",
+                    offset=OffsetCfg(
+                        pos=(0.0, 0.0, 0.046),
+                    ),
+                ),
+                FrameTransformerCfg.FrameCfg(
+                    prim_path="{ENV_REGEX_NS}/Robot/panda_rightfinger",
+                    name="tool_rightfinger",
+                    offset=OffsetCfg(
+                        pos=(0.0, 0.0, 0.046),
+                    ),
+                ),
+            ],
+        )
+
+        # override rewards
+        self.rewards.approach_gripper_handle.params["offset"] = 0.04
+        self.rewards.grasp_handle.params["open_joint_pos"] = 0.04
+        self.rewards.grasp_handle.params["asset_cfg"].joint_names = ["panda_finger_.*"]
+
+
+@configclass
+class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        self.scene.env_spacing = 2.5
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/__init__.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""This sub-module contains the functions that are specific to the cabinet environments."""
+
+from omni.isaac.orbit.envs.mdp import *  # noqa: F401, F403
+
+from .observations import *  # noqa: F401, F403
+from .rewards import *  # noqa: F401, F403
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/observations.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/observations.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+import torch
+from typing import TYPE_CHECKING
+
+from omni.isaac.orbit.assets import ArticulationData
+from omni.isaac.orbit.sensors import FrameTransformerData
+
+if TYPE_CHECKING:
+    from omni.isaac.orbit.envs import RLTaskEnv
+
+
+def rel_ee_object_distance(env: RLTaskEnv) -> torch.Tensor:
+    """The distance between the end-effector and the object."""
+    ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
+    object_data: ArticulationData = env.scene["object"].data
+
+    return object_data.root_pos_w - ee_tf_data.target_pos_w[..., 0, :]
+
+
+def rel_ee_drawer_distance(env: RLTaskEnv) -> torch.Tensor:
+    """The distance between the end-effector and the object."""
+    ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
+    cabinet_tf_data: FrameTransformerData = env.scene["cabinet_frame"].data
+
+    return cabinet_tf_data.target_pos_w[..., 0, :] - ee_tf_data.target_pos_w[..., 0, :]
+
+
+def fingertips_pos(env: RLTaskEnv) -> torch.Tensor:
+    """The position of the fingertips relative to the environment origins."""
+    ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
+    fingertips_pos = ee_tf_data.target_pos_w[..., 1:, :] - env.scene.env_origins.unsqueeze(1)
+
+    return fingertips_pos.view(env.num_envs, -1)
+
+
+def ee_pos(env: RLTaskEnv) -> torch.Tensor:
+    """The position of the end-effector relative to the environment origins."""
+    ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
+    ee_pos = ee_tf_data.target_pos_w[..., 0, :] - env.scene.env_origins
+
+    return ee_pos
+
+
+def ee_quat(env: RLTaskEnv) -> torch.Tensor:
+    """The orientation of the end-effector in the environment frame."""
+    ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
+    ee_quat = ee_tf_data.target_quat_w[..., 0, :]
+    # make first element of quaternion positive
+    ee_quat[ee_quat[:, 0] < 0] *= -1
+
+    return ee_quat
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/rewards.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/mdp/rewards.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+import torch
+from typing import TYPE_CHECKING
+
+from omni.isaac.orbit.managers import SceneEntityCfg
+from omni.isaac.orbit.utils.math import matrix_from_quat
+
+if TYPE_CHECKING:
+    from omni.isaac.orbit.envs import RLTaskEnv
+
+
+def approach_ee_handle(env: RLTaskEnv, threshold: float) -> torch.Tensor:
+    r"""Reward the robot for reaching the drawer handle using inverse-square law.
+
+    It uses a piecewise function to reward the robot for reaching the handle.
+
+    .. math::
+
+        reward = \begin{cases}
+            2 * (1 / (1 + distance^2))^2 & \text{if } distance \leq threshold \\
+            (1 / (1 + distance^2))^2 & \text{otherwise}
+        \end{cases}
+
+    """
+    ee_tcp_pos = env.scene["ee_frame"].data.target_pos_w[..., 0, :]
+    handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
+
+    # Compute the distance of the end-effector to the handle
+    distance = torch.norm(handle_pos - ee_tcp_pos, dim=-1, p=2)
+
+    # Reward the robot for reaching the handle
+    reward = 1.0 / (1.0 + distance**2)
+    reward = torch.pow(reward, 2)
+    return torch.where(distance <= threshold, 2 * reward, reward)
+
+
+def align_ee_handle(env: RLTaskEnv) -> torch.Tensor:
+    """Reward for aligning the end-effector with the handle.
+
+    The reward is based on the alignment of the gripper with the handle. It is computed as follows:
+
+    .. math::
+
+        reward = 0.5 * (align_z^2 + align_x^2)
+
+    where :math:`align_z` is the dot product of the z direction of the gripper and the -x direction of the handle
+    and :math:`align_x` is the dot product of the x direction of the gripper and the -y direction of the handle.
+    """
+    ee_tcp_quat = env.scene["ee_frame"].data.target_quat_w[..., 0, :]
+    handle_quat = env.scene["cabinet_frame"].data.target_quat_w[..., 0, :]
+
+    ee_tcp_rot_mat = matrix_from_quat(ee_tcp_quat)
+    handle_mat = matrix_from_quat(handle_quat)
+
+    # get current x and y direction of the handle
+    handle_x, handle_y = handle_mat[..., 0], handle_mat[..., 1]
+    # get current x and z direction of the gripper
+    ee_tcp_x, ee_tcp_z = ee_tcp_rot_mat[..., 0], ee_tcp_rot_mat[..., 2]
+
+    # make sure gripper aligns with the handle
+    # in this case, the z direction of the gripper should be close to the -x direction of the handle
+    # and the x direction of the gripper should be close to the -y direction of the handle
+    # dot product of z and x should be large
+    align_z = torch.bmm(ee_tcp_z.unsqueeze(1), -handle_x.unsqueeze(-1)).squeeze(-1).squeeze(-1)
+    align_x = torch.bmm(ee_tcp_x.unsqueeze(1), -handle_y.unsqueeze(-1)).squeeze(-1).squeeze(-1)
+    return 0.5 * (torch.sign(align_z) * align_z**2 + torch.sign(align_x) * align_x**2)
+
+
+def align_grasp_around_handle(env: RLTaskEnv) -> torch.Tensor:
+    """Bonus for correct hand orientation around the handle.
+
+    The correct hand orientation is when the left finger is above the handle and the right finger is below the handle.
+    """
+    # Target object position: (num_envs, 3)
+    handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
+    # Fingertips position: (num_envs, n_fingertips, 3)
+    ee_fingertips_w = env.scene["ee_frame"].data.target_pos_w[..., 1:, :]
+    lfinger_pos = ee_fingertips_w[..., 0, :]
+    rfinger_pos = ee_fingertips_w[..., 1, :]
+
+    # Check if hand is in a graspable pose
+    is_graspable = (rfinger_pos[:, 2] < handle_pos[:, 2]) & (lfinger_pos[:, 2] > handle_pos[:, 2])
+
+    # bonus if left finger is above the drawer handle and right below
+    return is_graspable
+
+
+def approach_gripper_handle(env: RLTaskEnv, offset: float = 0.04) -> torch.Tensor:
+    """Reward the robot's gripper reaching the drawer handle with the right pose.
+
+    This function returns the distance of fingertips to the handle when the fingers are in a grasping orientation
+    (i.e., the left finger is above the handle and the right finger is below the handle). Otherwise, it returns zero.
+    """
+    # Target object position: (num_envs, 3)
+    handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
+    # Fingertips position: (num_envs, n_fingertips, 3)
+    ee_fingertips_w = env.scene["ee_frame"].data.target_pos_w[..., 1:, :]
+    lfinger_pos = ee_fingertips_w[..., 0, :]
+    rfinger_pos = ee_fingertips_w[..., 1, :]
+
+    # Compute the distance of each finger from the handle
+    lfinger_dist = torch.abs(lfinger_pos[:, 2] - handle_pos[:, 2])
+    rfinger_dist = torch.abs(rfinger_pos[:, 2] - handle_pos[:, 2])
+
+    # Check if hand is in a graspable pose
+    is_graspable = (rfinger_pos[:, 2] < handle_pos[:, 2]) & (lfinger_pos[:, 2] > handle_pos[:, 2])
+
+    return is_graspable * ((offset - lfinger_dist) + (offset - rfinger_dist))
+
+
+def grasp_handle(env: RLTaskEnv, threshold: float, open_joint_pos: float, asset_cfg: SceneEntityCfg) -> torch.Tensor:
+    """Reward for closing the fingers when being close to the handle.
+
+    The :attr:`threshold` is the distance from the handle at which the fingers should be closed.
+    The :attr:`open_joint_pos` is the joint position when the fingers are open.
+
+    Note:
+        It is assumed that zero joint position corresponds to the fingers being closed.
+    """
+    ee_tcp_pos = env.scene["ee_frame"].data.target_pos_w[..., 0, :]
+    handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
+    gripper_joint_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids]
+
+    distance = torch.norm(handle_pos - ee_tcp_pos, dim=-1, p=2)
+    is_close = distance <= threshold
+
+    return is_close * torch.sum(open_joint_pos - gripper_joint_pos, dim=-1)
+
+
+def open_drawer_bonus(env: RLTaskEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
+    """Bonus for opening the drawer given by the joint position of the drawer.
+
+    The bonus is given when the drawer is open. If the grasp is around the handle, the bonus is doubled.
+    """
+    drawer_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids[0]]
+    is_graspable = align_grasp_around_handle(env).float()
+
+    return (is_graspable + 1.0) * drawer_pos
+
+
+def multi_stage_open_drawer(env: RLTaskEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
+    """Multi-stage bonus for opening the drawer.
+
+    Depending on the drawer's position, the reward is given in three stages: easy, medium, and hard.
+    This helps the agent to learn to open the drawer in a controlled manner.
+    """
+    drawer_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids[0]]
+    is_graspable = align_grasp_around_handle(env).float()
+
+    open_easy = (drawer_pos > 0.01) * 0.5
+    open_medium = (drawer_pos > 0.2) * is_graspable
+    open_hard = (drawer_pos > 0.3) * is_graspable
+
+    return open_easy + open_medium + open_hard
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/lift_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/lift_env_cfg.py
@@ -198,7 +198,7 @@ class LiftEnvCfg(RLTaskEnvCfg):
    """Configuration for the lifting environment."""

    # Scene settings
-    scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5, replicate_physics=False)
+    scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
    # Basic settings
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/__init__.py
@@ -5,18 +5,22 @@

 import gymnasium as gym

-from . import agents, env_cfg
+from . import agents, ik_abs_env_cfg, ik_rel_env_cfg, joint_pos_env_cfg

 ##
 # Register Gym environments.
 ##

+##
+# Joint Position Control
+##
+
 gym.register(
    id="Isaac-Reach-Franka-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
    disable_env_checker=True,
    kwargs={
-        "env_cfg_entry_point": env_cfg.FrankaReachEnvCfg,
+        "env_cfg_entry_point": joint_pos_env_cfg.FrankaReachEnvCfg,
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
@@ -28,9 +32,66 @@ gym.register(
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
    disable_env_checker=True,
    kwargs={
-        "env_cfg_entry_point": env_cfg.FrankaReachEnvCfg_PLAY,
+        "env_cfg_entry_point": joint_pos_env_cfg.FrankaReachEnvCfg_PLAY,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
+    },
+)
+
+
+##
+# Inverse Kinematics - Absolute Pose Control
+##
+
+gym.register(
+    id="Isaac-Reach-Franka-IK-Abs-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_abs_env_cfg.FrankaReachEnvCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+gym.register(
+    id="Isaac-Reach-Franka-IK-Abs-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_abs_env_cfg.FrankaReachEnvCfg_PLAY,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+##
+# Inverse Kinematics - Relative Pose Control
+##
+
+gym.register(
+    id="Isaac-Reach-Franka-IK-Rel-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_rel_env_cfg.FrankaReachEnvCfg,
+        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
+        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
+    },
+    disable_env_checker=True,
+)
+
+gym.register(
+    id="Isaac-Reach-Franka-IK-Rel-Play-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    kwargs={
+        "env_cfg_entry_point": ik_rel_env_cfg.FrankaReachEnvCfg_PLAY,
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
    },
+    disable_env_checker=True,
 )
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/ik_abs_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/ik_abs_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
+from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
+from omni.isaac.orbit.utils import configclass
+
+from . import joint_pos_env_cfg
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG  # isort: skip
+
+
+@configclass
+class FrankaReachEnvCfg(joint_pos_env_cfg.FrankaReachEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+
+        # Set Franka as robot
+        # We switch here to a stiffer PD controller for IK tracking to be better.
+        self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+        # Set actions for the specific robot type (franka)
+        self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
+            asset_name="robot",
+            joint_names=["panda_joint.*"],
+            body_name="panda_hand",
+            controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=False, ik_method="dls"),
+            body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
+        )
+
+
+@configclass
+class FrankaReachEnvCfg_PLAY(FrankaReachEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        self.scene.env_spacing = 2.5
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/ik_rel_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/ik_rel_env_cfg.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
+from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
+from omni.isaac.orbit.utils import configclass
+
+from . import joint_pos_env_cfg
+
+##
+# Pre-defined configs
+##
+from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG  # isort: skip
+
+
+@configclass
+class FrankaReachEnvCfg(joint_pos_env_cfg.FrankaReachEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+
+        # Set Franka as robot
+        # We switch here to a stiffer PD controller for IK tracking to be better.
+        self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
+
+        # Set actions for the specific robot type (franka)
+        self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
+            asset_name="robot",
+            joint_names=["panda_joint.*"],
+            body_name="panda_hand",
+            controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=True, ik_method="dls"),
+            scale=0.5,
+            body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
+        )
+
+
+@configclass
+class FrankaReachEnvCfg_PLAY(FrankaReachEnvCfg):
+    def __post_init__(self):
+        # post init of parent
+        super().__post_init__()
+        # make a smaller scene for play
+        self.scene.num_envs = 50
+        self.scene.env_spacing = 2.5
+        # disable randomization for play
+        self.observations.policy.enable_corruption = False
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/env_cfg.py
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/__init__.py
@@ -5,7 +5,7 @@

 import gymnasium as gym

-from . import agents, env_cfg
+from . import agents, joint_pos_env_cfg

 ##
 # Register Gym environments.
@@ -16,7 +16,7 @@ gym.register(
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
    disable_env_checker=True,
    kwargs={
-        "env_cfg_entry_point": env_cfg.UR10ReachEnvCfg,
+        "env_cfg_entry_point": joint_pos_env_cfg.UR10ReachEnvCfg,
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UR10ReachPPORunnerCfg",
    },
@@ -27,7 +27,7 @@ gym.register(
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
    disable_env_checker=True,
    kwargs={
-        "env_cfg_entry_point": env_cfg.UR10ReachEnvCfg_PLAY,
+        "env_cfg_entry_point": joint_pos_env_cfg.UR10ReachEnvCfg_PLAY,
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
        "rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UR10ReachPPORunnerCfg",
    },

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/env_cfg.py
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/reach_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/reach_env_cfg.py
@@ -179,7 +179,7 @@ class ReachEnvCfg(RLTaskEnvCfg):
    """Configuration for the reach end-effector pose tracking environment."""

    # Scene settings
-    scene: ReachSceneCfg = ReachSceneCfg(num_envs=4096, env_spacing=2.5, replicate_physics=False)
+    scene: ReachSceneCfg = ReachSceneCfg(num_envs=4096, env_spacing=2.5)
    # Basic settings
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rl_games.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rl_games.py
@@ -253,6 +253,9 @@ class RlGamesVecEnvWrapper(IVecEnv):
        extras = {
            k: v.to(device=self._rl_device, non_blocking=True) if hasattr(v, "to") else v for k, v in extras.items()
        }
+        # remap extras from "log" to "episode"
+        if "log" in extras:
+            extras["episode"] = extras.pop("log")

        return obs_and_states, rew, dones, extras


--- a/source/standalone/environments/state_machine/lift_cube_sm.py
+++ b/source/standalone/environments/state_machine/lift_cube_sm.py
@@ -110,9 +110,7 @@ def infer_state_machine(
            sm_state[tid] = PickSmState.APPROACH_ABOVE_OBJECT
            sm_wait_time[tid] = 0.0
    elif state == PickSmState.APPROACH_ABOVE_OBJECT:
-        des_ee_pose[tid] = object_pose[tid]
-        # TODO: This is causing issues.
-        # des_ee_pose[tid] = wp.transform_multiply(des_ee_pose[tid], offset[tid])
+        des_ee_pose[tid] = wp.transform_multiply(offset[tid], object_pose[tid])
        gripper_state[tid] = GripperState.OPEN
        # TODO: error between current and desired ee pose below threshold
        # wait for a while
@@ -181,12 +179,16 @@ class PickAndLiftSm:
        self.sm_dt = torch.full((self.num_envs,), self.dt, device=self.device)
        self.sm_state = torch.full((self.num_envs,), 0, dtype=torch.int32, device=self.device)
        self.sm_wait_time = torch.zeros((self.num_envs,), device=self.device)
+
        # desired state
        self.des_ee_pose = torch.zeros((self.num_envs, 7), device=self.device)
        self.des_gripper_state = torch.full((self.num_envs,), 0.0, device=self.device)
+
        # approach above object offset
        self.offset = torch.zeros((self.num_envs, 7), device=self.device)
        self.offset[:, 2] = 0.1
+        self.offset[:, -1] = 1.0  # warp expects quaternion as (x, y, z, w)
+
        # convert to warp
        self.sm_dt_wp = wp.from_torch(self.sm_dt, wp.float32)
        self.sm_state_wp = wp.from_torch(self.sm_state, wp.int32)
@@ -204,12 +206,17 @@ class PickAndLiftSm:

    def compute(self, ee_pose: torch.Tensor, object_pose: torch.Tensor, des_object_pose: torch.Tensor):
        """Compute the desired state of the robot's end-effector and the gripper."""
+        # convert all transformations from (w, x, y, z) to (x, y, z, w)
+        ee_pose = ee_pose[:, [0, 1, 2, 4, 5, 6, 3]]
+        object_pose = object_pose[:, [0, 1, 2, 4, 5, 6, 3]]
+        des_object_pose = des_object_pose[:, [0, 1, 2, 4, 5, 6, 3]]
+
        # convert to warp
        ee_pose_wp = wp.from_torch(ee_pose.contiguous(), wp.transform)
        object_pose_wp = wp.from_torch(object_pose.contiguous(), wp.transform)
        des_object_pose_wp = wp.from_torch(des_object_pose.contiguous(), wp.transform)
-        # run state machine

+        # run state machine
        wp.launch(
            kernel=infer_state_machine,
            dim=self.num_envs,
@@ -227,8 +234,10 @@ class PickAndLiftSm:
            device=self.device,
        )

+        # convert transformations back to (w, x, y, z)
+        des_ee_pose = self.des_ee_pose[:, [0, 1, 2, 6, 3, 4, 5]]
        # convert to torch
-        return torch.cat([self.des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)
+        return torch.cat([des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)


 def main():

--- a/source/standalone/environments/state_machine/open_cabinet_sm.py
+++ b/source/standalone/environments/state_machine/open_cabinet_sm.py
+# Copyright (c) 2022-2024, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""
+Script to run an environment with a cabinet opening state machine.
+
+The state machine is implemented in the kernel function `infer_state_machine`.
+It uses the `warp` library to run the state machine in parallel on the GPU.
+
+.. code-block:: bash
+
+    ./orbit.sh -p source/standalone/environments/state_machine/lift_cube_sm.py --num_envs 32
+
+"""
+
+"""Launch Omniverse Toolkit first."""
+
+import argparse
+
+from omni.isaac.orbit.app import AppLauncher
+
+# add argparse arguments
+parser = argparse.ArgumentParser(description="Pick and lift state machine for cabinet environments.")
+parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
+parser.add_argument(
+    "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations."
+)
+parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
+# append AppLauncher cli args
+AppLauncher.add_app_launcher_args(parser)
+# parse the arguments
+args_cli = parser.parse_args()
+
+# launch omniverse app
+app_launcher = AppLauncher(headless=args_cli.headless)
+simulation_app = app_launcher.app
+
+"""Rest everything else."""
+
+import gymnasium as gym
+import torch
+import traceback
+from collections.abc import Sequence
+
+import carb
+import warp as wp
+
+from omni.isaac.orbit.sensors import FrameTransformer
+
+import omni.isaac.orbit_tasks  # noqa: F401
+from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import CabinetEnvCfg
+from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
+
+# initialize warp
+wp.init()
+
+
+class GripperState:
+    """States for the gripper."""
+
+    OPEN = wp.constant(1.0)
+    CLOSE = wp.constant(-1.0)
+
+
+class OpenDrawerSmState:
+    """States for the cabinet drawer opening state machine."""
+
+    REST = wp.constant(0)
+    APPROACH_INFRONT_HANDLE = wp.constant(1)
+    APPROACH_HANDLE = wp.constant(2)
+    GRASP_HANDLE = wp.constant(3)
+    OPEN_DRAWER = wp.constant(4)
+    RELEASE_HANDLE = wp.constant(5)
+
+
+class OpenDrawerSmWaitTime:
+    """Additional wait times (in s) for states for before switching."""
+
+    REST = wp.constant(0.5)
+    APPROACH_INFRONT_HANDLE = wp.constant(1.25)
+    APPROACH_HANDLE = wp.constant(1.0)
+    GRASP_HANDLE = wp.constant(1.0)
+    OPEN_DRAWER = wp.constant(3.0)
+    RELEASE_HANDLE = wp.constant(0.2)
+
+
+@wp.kernel
+def infer_state_machine(
+    dt: wp.array(dtype=float),
+    sm_state: wp.array(dtype=int),
+    sm_wait_time: wp.array(dtype=float),
+    ee_pose: wp.array(dtype=wp.transform),
+    handle_pose: wp.array(dtype=wp.transform),
+    des_ee_pose: wp.array(dtype=wp.transform),
+    gripper_state: wp.array(dtype=float),
+    handle_approach_offset: wp.array(dtype=wp.transform),
+    handle_grasp_offset: wp.array(dtype=wp.transform),
+    drawer_opening_rate: wp.array(dtype=wp.transform),
+):
+    # retrieve thread id
+    tid = wp.tid()
+    # retrieve state machine state
+    state = sm_state[tid]
+    # decide next state
+    if state == OpenDrawerSmState.REST:
+        des_ee_pose[tid] = ee_pose[tid]
+        gripper_state[tid] = GripperState.OPEN
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.REST:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.APPROACH_INFRONT_HANDLE
+            sm_wait_time[tid] = 0.0
+    elif state == OpenDrawerSmState.APPROACH_INFRONT_HANDLE:
+        des_ee_pose[tid] = wp.transform_multiply(handle_approach_offset[tid], handle_pose[tid])
+        gripper_state[tid] = GripperState.OPEN
+        # TODO: error between current and desired ee pose below threshold
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.APPROACH_INFRONT_HANDLE:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.APPROACH_HANDLE
+            sm_wait_time[tid] = 0.0
+    elif state == OpenDrawerSmState.APPROACH_HANDLE:
+        des_ee_pose[tid] = handle_pose[tid]
+        gripper_state[tid] = GripperState.OPEN
+        # TODO: error between current and desired ee pose below threshold
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.APPROACH_HANDLE:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.GRASP_HANDLE
+            sm_wait_time[tid] = 0.0
+    elif state == OpenDrawerSmState.GRASP_HANDLE:
+        des_ee_pose[tid] = wp.transform_multiply(handle_grasp_offset[tid], handle_pose[tid])
+        gripper_state[tid] = GripperState.CLOSE
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.GRASP_HANDLE:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.OPEN_DRAWER
+            sm_wait_time[tid] = 0.0
+    elif state == OpenDrawerSmState.OPEN_DRAWER:
+        des_ee_pose[tid] = wp.transform_multiply(drawer_opening_rate[tid], handle_pose[tid])
+        gripper_state[tid] = GripperState.CLOSE
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.OPEN_DRAWER:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.RELEASE_HANDLE
+            sm_wait_time[tid] = 0.0
+    elif state == OpenDrawerSmState.RELEASE_HANDLE:
+        des_ee_pose[tid] = ee_pose[tid]
+        gripper_state[tid] = GripperState.CLOSE
+        # wait for a while
+        if sm_wait_time[tid] >= OpenDrawerSmWaitTime.RELEASE_HANDLE:
+            # move to next state and reset wait time
+            sm_state[tid] = OpenDrawerSmState.RELEASE_HANDLE
+            sm_wait_time[tid] = 0.0
+    # increment wait time
+    sm_wait_time[tid] = sm_wait_time[tid] + dt[tid]
+
+
+class OpenDrawerSm:
+    """A simple state machine in a robot's task space to open a drawer in the cabinet.
+
+    The state machine is implemented as a warp kernel. It takes in the current state of
+    the robot's end-effector and the object, and outputs the desired state of the robot's
+    end-effector and the gripper. The state machine is implemented as a finite state
+    machine with the following states:
+
+    1. REST: The robot is at rest.
+    2. APPROACH_HANDLE: The robot moves towards the handle of the drawer.
+    3. GRASP_HANDLE: The robot grasps the handle of the drawer.
+    4. OPEN_DRAWER: The robot opens the drawer.
+    5. RELEASE_HANDLE: The robot releases the handle of the drawer. This is the final state.
+    """
+
+    def __init__(self, dt: float, num_envs: int, device: torch.device | str = "cpu"):
+        """Initialize the state machine.
+
+        Args:
+            dt: The environment time step.
+            num_envs: The number of environments to simulate.
+            device: The device to run the state machine on.
+        """
+        # save parameters
+        self.dt = float(dt)
+        self.num_envs = num_envs
+        self.device = device
+        # initialize state machine
+        self.sm_dt = torch.full((self.num_envs,), self.dt, device=self.device)
+        self.sm_state = torch.full((self.num_envs,), 0, dtype=torch.int32, device=self.device)
+        self.sm_wait_time = torch.zeros((self.num_envs,), device=self.device)
+
+        # desired state
+        self.des_ee_pose = torch.zeros((self.num_envs, 7), device=self.device)
+        self.des_gripper_state = torch.full((self.num_envs,), 0.0, device=self.device)
+
+        # approach infront of the handle
+        self.handle_approach_offset = torch.zeros((self.num_envs, 7), device=self.device)
+        self.handle_approach_offset[:, 0] = -0.1
+        self.handle_approach_offset[:, -1] = 1.0  # warp expects quaternion as (x, y, z, w)
+
+        # handle grasp offset
+        self.handle_grasp_offset = torch.zeros((self.num_envs, 7), device=self.device)
+        self.handle_grasp_offset[:, 0] = 0.025
+        self.handle_grasp_offset[:, -1] = 1.0  # warp expects quaternion as (x, y, z, w)
+
+        # drawer opening rate
+        self.drawer_opening_rate = torch.zeros((self.num_envs, 7), device=self.device)
+        self.drawer_opening_rate[:, 0] = -0.015
+        self.drawer_opening_rate[:, -1] = 1.0  # warp expects quaternion as (x, y, z, w)
+
+        # convert to warp
+        self.sm_dt_wp = wp.from_torch(self.sm_dt, wp.float32)
+        self.sm_state_wp = wp.from_torch(self.sm_state, wp.int32)
+        self.sm_wait_time_wp = wp.from_torch(self.sm_wait_time, wp.float32)
+        self.des_ee_pose_wp = wp.from_torch(self.des_ee_pose, wp.transform)
+        self.des_gripper_state_wp = wp.from_torch(self.des_gripper_state, wp.float32)
+        self.handle_approach_offset_wp = wp.from_torch(self.handle_approach_offset, wp.transform)
+        self.handle_grasp_offset_wp = wp.from_torch(self.handle_grasp_offset, wp.transform)
+        self.drawer_opening_rate_wp = wp.from_torch(self.drawer_opening_rate, wp.transform)
+
+    def reset_idx(self, env_ids: Sequence[int] | None = None):
+        """Reset the state machine."""
+        if env_ids is None:
+            env_ids = slice(None)
+        # reset state machine
+        self.sm_state[env_ids] = 0
+        self.sm_wait_time[env_ids] = 0.0
+
+    def compute(self, ee_pose: torch.Tensor, handle_pose: torch.Tensor):
+        """Compute the desired state of the robot's end-effector and the gripper."""
+        # convert all transformations from (w, x, y, z) to (x, y, z, w)
+        ee_pose = ee_pose[:, [0, 1, 2, 4, 5, 6, 3]]
+        handle_pose = handle_pose[:, [0, 1, 2, 4, 5, 6, 3]]
+        # convert to warp
+        ee_pose_wp = wp.from_torch(ee_pose.contiguous(), wp.transform)
+        handle_pose_wp = wp.from_torch(handle_pose.contiguous(), wp.transform)
+
+        # run state machine
+        wp.launch(
+            kernel=infer_state_machine,
+            dim=self.num_envs,
+            inputs=[
+                self.sm_dt_wp,
+                self.sm_state_wp,
+                self.sm_wait_time_wp,
+                ee_pose_wp,
+                handle_pose_wp,
+                self.des_ee_pose_wp,
+                self.des_gripper_state_wp,
+                self.handle_approach_offset_wp,
+                self.handle_grasp_offset_wp,
+                self.drawer_opening_rate_wp,
+            ],
+            device=self.device,
+        )
+
+        # convert transformations back to (w, x, y, z)
+        des_ee_pose = self.des_ee_pose[:, [0, 1, 2, 6, 3, 4, 5]]
+        # convert to torch
+        return torch.cat([des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)
+
+
+def main():
+    # parse configuration
+    env_cfg: CabinetEnvCfg = parse_env_cfg(
+        "Isaac-Open-Drawer-Franka-IK-Abs-v0",
+        use_gpu=not args_cli.cpu,
+        num_envs=args_cli.num_envs,
+        use_fabric=not args_cli.disable_fabric,
+    )
+    # create environment
+    env = gym.make("Isaac-Open-Drawer-Franka-IK-Abs-v0", cfg=env_cfg)
+    # reset environment at start
+    env.reset()
+
+    # create action buffers (position + quaternion)
+    actions = torch.zeros(env.unwrapped.action_space.shape, device=env.unwrapped.device)
+    actions[:, 3] = 1.0
+    # desired object orientation (we only do position control of object)
+    desired_orientation = torch.zeros((env.unwrapped.num_envs, 4), device=env.unwrapped.device)
+    desired_orientation[:, 1] = 1.0
+    # create state machine
+    open_sm = OpenDrawerSm(env_cfg.sim.dt * env_cfg.decimation, env.unwrapped.num_envs, env.unwrapped.device)
+
+    while simulation_app.is_running():
+        # run everything in inference mode
+        with torch.inference_mode():
+            # step environment
+            dones = env.step(actions)[-2]
+
+            # observations
+            # -- end-effector frame
+            ee_frame_tf: FrameTransformer = env.unwrapped.scene["ee_frame"]
+            tcp_rest_position = ee_frame_tf.data.target_pos_w[..., 0, :].clone() - env.unwrapped.scene.env_origins
+            tcp_rest_orientation = ee_frame_tf.data.target_quat_w[..., 0, :].clone()
+            # -- handle frame
+            cabinet_frame_tf: FrameTransformer = env.unwrapped.scene["cabinet_frame"]
+            cabinet_position = cabinet_frame_tf.data.target_pos_w[..., 0, :].clone() - env.unwrapped.scene.env_origins
+            cabinet_orientation = cabinet_frame_tf.data.target_quat_w[..., 0, :].clone()
+
+            # advance state machine
+            actions = open_sm.compute(
+                torch.cat([tcp_rest_position, tcp_rest_orientation], dim=-1),
+                torch.cat([cabinet_position, cabinet_orientation], dim=-1),
+            )
+
+            # reset state machine
+            if dones.any():
+                open_sm.reset_idx(dones.nonzero(as_tuple=False).squeeze(-1))
+
+    # close the environment
+    env.close()
+
+
+if __name__ == "__main__":
+    try:
+        # run the main execution
+        main()
+    except Exception as err:
+        carb.log_error(err)
+        carb.log_error(traceback.format_exc())
+        raise
+    finally:
+        # close sim app
+        simulation_app.close()