[Newton] Adds policy transfer script for sim2sim transfer from Newton to physX (#3565)

# Description This PR adds a play script to physX-based IsaacLab to make it possible to play a Newton-based trained policy. Tested environments are H1/G1/Anymal-D but other exisiting Newton environment should transfer as well.  Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Fixes # (issue)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (existing functionality will not work without user modification) - Documentation update ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [ ] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: Milad Rakhsha <mrakhsha@nvidia.com>

[Newton] Adds policy transfer script for sim2sim transfer from Newton to physX (#3565)
# Description This PR adds a play script to physX-based IsaacLab to make it possible to play a Newton-based trained policy. Tested environments are H1/G1/Anymal-D but other exisiting Newton environment should transfer as well.  Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change. Fixes # (issue)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (existing functionality will not work without user modification) - Documentation update ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [ ] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [ ] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: Milad Rakhsha <mrakhsha@nvidia.com>
902bded7 · Milad-Rakhsha-NV · GitHub · faa96dfc · 902bded7 · 902bded7
Unverified Commit 902bded7 authored Sep 29, 2025 by Milad-Rakhsha-NV Committed by GitHub Sep 29, 2025
6 changed files
--- a/docs/source/experimental-features/newton-physics-integration/sim-to-sim.rst
+++ b/docs/source/experimental-features/newton-physics-integration/sim-to-sim.rst
@@ -2,38 +2,40 @@

 Sim-to-Sim Policy Transfer
 ==========================
-This section provides examples of sim-to-sim policy transfer using the Newton backend. Sim-to-sim transfer is an essential step before real robot deployment because it verifies that policies work across different simulators. Policies that pass sim-to-sim verification are much more likely to succeed on real robots.
+This section provides examples of sim-to-sim policy transfer between PhysX and Newton backends. Sim-to-sim transfer is an essential step before real robot deployment because it verifies that policies work across different simulators. Policies that pass sim-to-sim verification are much more likely to succeed on real robots.


 Overview
 --------

-This guide shows how to run a PhysX-trained policy on the Newton backend. While the method works for any robot and physics engine, it has only been tested with Unitree G1, Unitree H1, and ANYmal-D robots using PhysX-trained policies.
+This guide shows how to transfer policies between PhysX and Newton backends in both directions. The main challenge is that different physics engines may parse the same robot model with different joint and link ordering.

-PhysX-trained policies expect joints and links in a specific order determined by how PhysX parses the robot model. However, Newton may parse the same robot with different joint and link ordering.
+Policies trained in one backend expect joints and links in a specific order determined by how that backend parses the robot model. When transferring to another backend, the joint ordering may be different, requiring remapping of observations and actions.

 In the future, we plan to solve this using **robot schema** that standardizes joint and link ordering across different backends.

-Currently, we solve this by remapping observations and actions using joint mappings defined in YAML files. These files specify joint names in both PhysX order (source) and Newton order (target). During policy execution, we use this mapping to reorder observations and actions so they work correctly with Newton.
+Currently, we solve this by remapping observations and actions using joint mappings defined in YAML files. These files specify joint names in both source and target backend orders. During policy execution, we use this mapping to reorder observations and actions so they work correctly with the target backend.
+
+The method has been tested with Unitree G1, Unitree Go2, Unitree H1, and ANYmal-D robots for both transfer directions.


 What you need
 ~~~~~~~~~~~~~

- A policy checkpoint trained with PhysX (RSL-RL).
- A joint mapping YAML for your robot under ``scripts/newton_sim2sim/mappings/``.
- The provided player script: ``scripts/newton_sim2sim/rsl_rl_transfer.py``.
+- A policy checkpoint trained with either PhysX or Newton (RSL-RL).
+- A joint mapping YAML for your robot under ``scripts/sim2sim_transfer/config/``.
+- The provided player script: ``scripts/sim2sim_transfer/rsl_rl_transfer.py``.

 To add a new robot, create a YAML file with two lists where each joint name appears exactly once in both:

 .. code-block:: yaml

   # Example structure
-   source_joint_names:  # PhysX joint order
+   source_joint_names:  # Source backend joint order
     - joint_1
     - joint_2
     # ...
-   target_joint_names:  # Newton joint order
+   target_joint_names:  # Target backend joint order
     - joint_1
     - joint_2
     # ...
@@ -41,14 +43,14 @@ To add a new robot, create a YAML file with two lists where each joint name appe
 The script automatically computes the necessary mappings for locomotion tasks.


-How to run
-~~~~~~~~~~
+PhysX-to-Newton Transfer
+~~~~~~~~~~~~~~~~~~~~~~~~

-Use this command template to run a PhysX-trained policy with Newton:
+To run a PhysX-trained policy with the Newton backend, use this command template:

 .. code-block:: bash

-   ./isaaclab.sh -p scripts/newton_sim2sim/rsl_rl_transfer.py \
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
       --task=<TASK_ID> \
       --num_envs=32 \
       --checkpoint <PATH_TO_PHYSX_CHECKPOINT> \
@@ -60,11 +62,11 @@ Here are examples for different robots:

 .. code-block:: bash

-   ./isaaclab.sh -p scripts/newton_sim2sim/rsl_rl_transfer.py \
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
       --task=Isaac-Velocity-Flat-G1-v0 \
       --num_envs=32 \
       --checkpoint <PATH_TO_PHYSX_CHECKPOINT> \
-       --policy_transfer_file scripts/newton_sim2sim/mappings/sim2sim_g1.yaml
+       --policy_transfer_file scripts/sim2sim_transfer/config/physx_to_newton_g1.yaml


 2. Unitree H1
@@ -72,28 +74,94 @@ Here are examples for different robots:

 .. code-block:: bash

-   ./isaaclab.sh -p scripts/newton_sim2sim/rsl_rl_transfer.py \
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
       --task=Isaac-Velocity-Flat-H1-v0 \
       --num_envs=32 \
       --checkpoint <PATH_TO_PHYSX_CHECKPOINT> \
-       --policy_transfer_file scripts/newton_sim2sim/mappings/sim2sim_h1.yaml
+       --policy_transfer_file scripts/sim2sim_transfer/config/physx_to_newton_h1.yaml
+
+
+3. Unitree Go2

+.. code-block:: bash

-3. ANYmal-D
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
+       --task=Isaac-Velocity-Flat-Go2-v0 \
+       --num_envs=32 \
+       --checkpoint <PATH_TO_PHYSX_CHECKPOINT> \
+       --policy_transfer_file scripts/sim2sim_transfer/config/physx_to_newton_go2.yaml
+
+
+4. ANYmal-D


 .. code-block:: bash

-   ./isaaclab.sh -p scripts/newton_sim2sim/rsl_rl_transfer.py \
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
       --task=Isaac-Velocity-Flat-Anymal-D-v0 \
       --num_envs=32 \
       --checkpoint <PATH_TO_PHYSX_CHECKPOINT> \
-       --policy_transfer_file scripts/newton_sim2sim/mappings/sim2sim_anymal_d.yaml
+       --policy_transfer_file scripts/sim2sim_transfer/config/physx_to_newton_anymal_d.yaml
+
+Note that to run this, you need to checkout the Newton-based branch of IsaacLab such as ``feature/newton``.
+
+Newton-to-PhysX Transfer
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+To transfer Newton-trained policies to PhysX-based IsaacLab, use the reverse mapping files:
+
+Here are examples for different robots:
+
+1. Unitree G1
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
+       --task=Isaac-Velocity-Flat-G1-v0 \
+       --num_envs=32 \
+       --checkpoint <PATH_TO_NEWTON_CHECKPOINT> \
+       --policy_transfer_file scripts/sim2sim_transfer/config/newton_to_physx_g1.yaml
+
+
+2. Unitree H1
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
+       --task=Isaac-Velocity-Flat-H1-v0 \
+       --num_envs=32 \
+       --checkpoint <PATH_TO_NEWTON_CHECKPOINT> \
+       --policy_transfer_file scripts/sim2sim_transfer/config/newton_to_physx_h1.yaml
+
+
+3. Unitree Go2
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
+       --task=Isaac-Velocity-Flat-Go2-v0 \
+       --num_envs=32 \
+       --checkpoint <PATH_TO_NEWTON_CHECKPOINT> \
+       --policy_transfer_file scripts/sim2sim_transfer/config/newton_to_physx_go2.yaml
+
+
+4. ANYmal-D
+
+.. code-block:: bash
+
+   ./isaaclab.sh -p scripts/sim2sim_transfer/rsl_rl_transfer.py \
+       --task=Isaac-Velocity-Flat-Anymal-D-v0 \
+       --num_envs=32 \
+       --checkpoint <PATH_TO_NEWTON_CHECKPOINT> \
+       --policy_transfer_file scripts/sim2sim_transfer/config/newton_to_physx_anymal_d.yaml

+The key difference is using the ``newton_to_physx_*.yaml`` mapping files instead of ``physx_to_newton_*.yaml`` files. Also note that you need to checkout a PhysX-based IsaacLab branch such as ``main``.

-Notes and limitations
+Notes and Limitations
 ~~~~~~~~~~~~~~~~~~~~~

- This transfer method has only been tested with Unitree G1, Unitree H1, and ANYmal-D using PhysX-trained policies.
- The observation remapping assumes a locomotion layout with base observations followed by joint observations. For different observation layouts, you'll need to modify ``scripts/newton_sim2sim/policy_mapping.py``.
+- Both transfer directions have been tested with Unitree G1, Unitree Go2, Unitree H1, and ANYmal-D robots.
+- PhysX-to-Newton transfer uses ``physx_to_newton_*.yaml`` mapping files.
+- Newton-to-PhysX transfer requires the corresponding ``newton_to_physx_*.yaml`` mapping files and the PhysX branch of IsaacLab.
+- The observation remapping assumes a locomotion layout with base observations followed by joint observations. For different observation layouts, you'll need to modify the ``get_joint_mappings`` function in ``scripts/sim2sim_transfer/rsl_rl_transfer.py``.
 - When adding new robots or backends, make sure both source and target have identical joint names, and that the YAML lists reflect how each backend orders these joints.
--- a/scripts/sim2sim_transfer/config/newton_to_physx_anymal_d.yaml
+++ b/scripts/sim2sim_transfer/config/newton_to_physx_anymal_d.yaml
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Joint names in the source physics engine where policy is trained (Newton)
+source_joint_names:
+  - "LF_HAA"
+  - "LF_HFE"
+  - "LF_KFE"
+  - "LH_HAA"
+  - "LH_HFE"
+  - "LH_KFE"
+  - "RF_HAA"
+  - "RF_HFE"
+  - "RF_KFE"
+  - "RH_HAA"
+  - "RH_HFE"
+  - "RH_KFE"
+
+# Joint names in the target physics engine where policy is deployed (PhysX)
+target_joint_names:
+  - "LF_HAA"
+  - "LH_HAA"
+  - "RF_HAA"
+  - "RH_HAA"
+  - "LF_HFE"
+  - "LH_HFE"
+  - "RF_HFE"
+  - "RH_HFE"
+  - "LF_KFE"
+  - "LH_KFE"
+  - "RF_KFE"
+  - "RH_KFE"
--- a/scripts/sim2sim_transfer/config/newton_to_physx_g1.yaml
+++ b/scripts/sim2sim_transfer/config/newton_to_physx_g1.yaml
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Joint names in the source physics engine where policy is trained (Newton)
+source_joint_names:
+  - "left_hip_pitch_joint"
+  - "left_hip_roll_joint"
+  - "left_hip_yaw_joint"
+  - "left_knee_joint"
+  - "left_ankle_pitch_joint"
+  - "left_ankle_roll_joint"
+  - "right_hip_pitch_joint"
+  - "right_hip_roll_joint"
+  - "right_hip_yaw_joint"
+  - "right_knee_joint"
+  - "right_ankle_pitch_joint"
+  - "right_ankle_roll_joint"
+  - "torso_joint"
+  - "left_shoulder_pitch_joint"
+  - "left_shoulder_roll_joint"
+  - "left_shoulder_yaw_joint"
+  - "left_elbow_pitch_joint"
+  - "left_elbow_roll_joint"
+  - "left_five_joint"
+  - "left_six_joint"
+  - "left_three_joint"
+  - "left_four_joint"
+  - "left_zero_joint"
+  - "left_one_joint"
+  - "left_two_joint"
+  - "right_shoulder_pitch_joint"
+  - "right_shoulder_roll_joint"
+  - "right_shoulder_yaw_joint"
+  - "right_elbow_pitch_joint"
+  - "right_elbow_roll_joint"
+  - "right_five_joint"
+  - "right_six_joint"
+  - "right_three_joint"
+  - "right_four_joint"
+  - "right_zero_joint"
+  - "right_one_joint"
+  - "right_two_joint"
+
+# Joint names in the target physics engine where policy is deployed (PhysX)
+target_joint_names:
+  - "left_hip_pitch_joint"
+  - "right_hip_pitch_joint"
+  - "torso_joint"
+  - "left_hip_roll_joint"
+  - "right_hip_roll_joint"
+  - "left_shoulder_pitch_joint"
+  - "right_shoulder_pitch_joint"
+  - "left_hip_yaw_joint"
+  - "right_hip_yaw_joint"
+  - "left_shoulder_roll_joint"
+  - "right_shoulder_roll_joint"
+  - "left_knee_joint"
+  - "right_knee_joint"
+  - "left_shoulder_yaw_joint"
+  - "right_shoulder_yaw_joint"
+  - "left_ankle_pitch_joint"
+  - "right_ankle_pitch_joint"
+  - "left_elbow_pitch_joint"
+  - "right_elbow_pitch_joint"
+  - "left_ankle_roll_joint"
+  - "right_ankle_roll_joint"
+  - "left_elbow_roll_joint"
+  - "right_elbow_roll_joint"
+  - "left_five_joint"
+  - "left_three_joint"
+  - "left_zero_joint"
+  - "right_five_joint"
+  - "right_three_joint"
+  - "right_zero_joint"
+  - "left_six_joint"
+  - "left_four_joint"
+  - "left_one_joint"
+  - "right_six_joint"
+  - "right_four_joint"
+  - "right_one_joint"
+  - "left_two_joint"
+  - "right_two_joint"
--- a/scripts/sim2sim_transfer/config/newton_to_physx_go2.yaml
+++ b/scripts/sim2sim_transfer/config/newton_to_physx_go2.yaml
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Joint names in the source physics engine where policy is trained (Newton)
+source_joint_names:
+  - "FL_hip_joint"
+  - "FL_thigh_joint"
+  - "FL_calf_joint"
+  - "FR_hip_joint"
+  - "FR_thigh_joint"
+  - "FR_calf_joint"
+  - "RL_hip_joint"
+  - "RL_thigh_joint"
+  - "RL_calf_joint"
+  - "RR_hip_joint"
+  - "RR_thigh_joint"
+  - "RR_calf_joint"
+# Joint names in the target physics engine where policy is deployed (PhysX)
+target_joint_names:
+  - "FL_hip_joint"
+  - "FR_hip_joint"
+  - "RL_hip_joint"
+  - "RR_hip_joint"
+  - "FL_thigh_joint"
+  - "FR_thigh_joint"
+  - "RL_thigh_joint"
+  - "RR_thigh_joint"
+  - "FL_calf_joint"
+  - "FR_calf_joint"
+  - "RL_calf_joint"
+  - "RR_calf_joint"
--- a/scripts/sim2sim_transfer/config/newton_to_physx_h1.yaml
+++ b/scripts/sim2sim_transfer/config/newton_to_physx_h1.yaml
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+# Joint names in the source physics engine where policy is trained (Newton)
+source_joint_names:
+  - "left_hip_yaw"
+  - "left_hip_roll"
+  - "left_hip_pitch"
+  - "left_knee"
+  - "left_ankle"
+  - "right_hip_yaw"
+  - "right_hip_roll"
+  - "right_hip_pitch"
+  - "right_knee"
+  - "right_ankle"
+  - "torso"
+  - "left_shoulder_pitch"
+  - "left_shoulder_roll"
+  - "left_shoulder_yaw"
+  - "left_elbow"
+  - "right_shoulder_pitch"
+  - "right_shoulder_roll"
+  - "right_shoulder_yaw"
+  - "right_elbow"
+
+# Joint names in the target physics engine where policy is deployed (PhysX)
+target_joint_names:
+  - "left_hip_yaw"
+  - "right_hip_yaw"
+  - "torso"
+  - "left_hip_roll"
+  - "right_hip_roll"
+  - "left_shoulder_pitch"
+  - "right_shoulder_pitch"
+  - "left_hip_pitch"
+  - "right_hip_pitch"
+  - "left_shoulder_roll"
+  - "right_shoulder_roll"
+  - "left_knee"
+  - "right_knee"
+  - "left_shoulder_yaw"
+  - "right_shoulder_yaw"
+  - "left_ankle"
+  - "right_ankle"
+  - "left_elbow"
+  - "right_elbow"
--- a/scripts/sim2sim_transfer/rsl_rl_transfer.py
+++ b/scripts/sim2sim_transfer/rsl_rl_transfer.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers (https://github.com/isaac-sim/IsaacLab/blob/main/CONTRIBUTORS.md).
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""Script to play a checkpoint of an RL agent from RSL-RL with policy transfer capabilities."""
+
+"""Launch Isaac Sim Simulator first."""
+
+import argparse
+import os
+import sys
+
+from isaaclab.app import AppLauncher
+
+# local imports
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), "../.."))
+from scripts.reinforcement_learning.rsl_rl import cli_args  # isort: skip
+
+# add argparse arguments
+parser = argparse.ArgumentParser(description="Play an RL agent with RSL-RL with policy transfer.")
+parser.add_argument("--video", action="store_true", default=False, help="Record videos during training.")
+parser.add_argument("--video_length", type=int, default=200, help="Length of the recorded video (in steps).")
+parser.add_argument(
+    "--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations."
+)
+parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
+parser.add_argument("--task", type=str, default=None, help="Name of the task.")
+parser.add_argument(
+    "--agent", type=str, default="rsl_rl_cfg_entry_point", help="Name of the RL agent configuration entry point."
+)
+parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment")
+parser.add_argument("--real-time", action="store_true", default=False, help="Run in real-time, if possible.")
+# Joint ordering arguments
+parser.add_argument(
+    "--policy_transfer_file",
+    type=str,
+    default=None,
+    help="Path to YAML file containing joint mapping configuration for policy transfer between physics engines.",
+)
+# append RSL-RL cli arguments
+cli_args.add_rsl_rl_args(parser)
+# append AppLauncher cli args
+AppLauncher.add_app_launcher_args(parser)
+# parse the arguments
+args_cli, hydra_args = parser.parse_known_args()
+# always enable cameras to record video
+if args_cli.video:
+    args_cli.enable_cameras = True
+
+# clear out sys.argv for Hydra
+sys.argv = [sys.argv[0]] + hydra_args
+
+# launch omniverse app
+app_launcher = AppLauncher(args_cli)
+simulation_app = app_launcher.app
+
+"""Rest everything follows."""
+
+import gymnasium as gym
+import os
+import time
+import torch
+import yaml
+
+from rsl_rl.runners import DistillationRunner, OnPolicyRunner
+
+from isaaclab.envs import (
+    DirectMARLEnv,
+    DirectMARLEnvCfg,
+    DirectRLEnvCfg,
+    ManagerBasedRLEnvCfg,
+    multi_agent_to_single_agent,
+)
+from isaaclab.utils.assets import retrieve_file_path
+from isaaclab.utils.dict import print_dict
+
+from isaaclab_rl.rsl_rl import RslRlBaseRunnerCfg, RslRlVecEnvWrapper, export_policy_as_jit, export_policy_as_onnx
+
+import isaaclab_tasks  # noqa: F401
+from isaaclab_tasks.utils import get_checkpoint_path
+from isaaclab_tasks.utils.hydra import hydra_task_config
+
+# PLACEHOLDER: Extension template (do not remove this comment)
+
+
+def get_joint_mappings(args_cli, action_space_dim):
+    """Get joint mappings based on command line arguments.
+
+    Args:
+            args_cli: Command line arguments
+            action_space_dim: Dimension of the action space (number of joints)
+
+    Returns:
+            tuple: (source_to_target_list, target_to_source_list, source_to_target_obs_list)
+    """
+    num_joints = action_space_dim
+    if args_cli.policy_transfer_file:
+        # Load from YAML file
+        try:
+            with open(args_cli.policy_transfer_file) as file:
+                config = yaml.safe_load(file)
+        except Exception as e:
+            raise RuntimeError(f"Failed to load joint mapping from {args_cli.policy_transfer_file}: {e}")
+
+        source_joint_names = config["source_joint_names"]
+        target_joint_names = config["target_joint_names"]
+        # Find joint mapping
+        source_to_target = []
+        target_to_source = []
+
+        # Create source to target mapping
+        for joint_name in source_joint_names:
+            if joint_name in target_joint_names:
+                source_to_target.append(target_joint_names.index(joint_name))
+            else:
+                raise ValueError(f"Joint '{joint_name}' not found in target joint names")
+
+        # Create target to source mapping
+        for joint_name in target_joint_names:
+            if joint_name in source_joint_names:
+                target_to_source.append(source_joint_names.index(joint_name))
+            else:
+                raise ValueError(f"Joint '{joint_name}' not found in source joint names")
+        print(f"[INFO] Loaded joint mapping for policy transfer from YAML: {args_cli.policy_transfer_file}")
+        assert (
+            len(source_to_target) == len(target_to_source) == num_joints
+        ), "Number of source and target joints must match"
+    else:
+        # Use identity mapping (one-to-one)
+        identity_map = list(range(num_joints))
+        source_to_target, target_to_source = identity_map, identity_map
+
+    # Create observation mapping (first 12 values stay the same for locomotion examples, then map joint-related values)
+    obs_map = (
+        [0, 1, 2]
+        + [3, 4, 5]
+        + [6, 7, 8]
+        + [9, 10, 11]
+        + [i + 12 + num_joints * 0 for i in source_to_target]
+        + [i + 12 + num_joints * 1 for i in source_to_target]
+        + [i + 12 + num_joints * 2 for i in source_to_target]
+    )
+
+    return source_to_target, target_to_source, obs_map
+
+
+@hydra_task_config(args_cli.task, args_cli.agent)
+def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agent_cfg: RslRlBaseRunnerCfg):
+    """Play with RSL-RL agent with policy transfer capabilities."""
+
+    # override configurations with non-hydra CLI arguments
+    agent_cfg = cli_args.update_rsl_rl_cfg(agent_cfg, args_cli)
+    env_cfg.scene.num_envs = args_cli.num_envs if args_cli.num_envs is not None else env_cfg.scene.num_envs
+
+    # set the environment seed
+    # note: certain randomizations occur in the environment initialization so we set the seed here
+    env_cfg.seed = agent_cfg.seed
+    env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device
+
+    # specify directory for logging experiments
+    log_root_path = os.path.join("logs", "rsl_rl", agent_cfg.experiment_name)
+    log_root_path = os.path.abspath(log_root_path)
+    print(f"[INFO] Loading experiment from directory: {log_root_path}")
+    if args_cli.checkpoint:
+        resume_path = retrieve_file_path(args_cli.checkpoint)
+    else:
+        resume_path = get_checkpoint_path(log_root_path, agent_cfg.load_run, agent_cfg.load_checkpoint)
+
+    log_dir = os.path.dirname(resume_path)
+
+    # set the log directory for the environment (works for all environment types)
+    env_cfg.log_dir = log_dir
+
+    # create isaac environment
+    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
+
+    # convert to single-agent instance if required by the RL algorithm
+    if isinstance(env.unwrapped, DirectMARLEnv):
+        env = multi_agent_to_single_agent(env)
+
+    # wrap for video recording
+    if args_cli.video:
+        video_kwargs = {
+            "video_folder": os.path.join(log_dir, "videos", "play"),
+            "step_trigger": lambda step: step == 0,
+            "video_length": args_cli.video_length,
+            "disable_logger": True,
+        }
+        print("[INFO] Recording videos during training.")
+        print_dict(video_kwargs, nesting=4)
+        env = gym.wrappers.RecordVideo(env, **video_kwargs)
+
+    # wrap around environment for rsl-rl
+    env = RslRlVecEnvWrapper(env, clip_actions=agent_cfg.clip_actions)
+
+    print(f"[INFO]: Loading model checkpoint from: {resume_path}")
+    # load previously trained model
+    if agent_cfg.class_name == "OnPolicyRunner":
+        runner = OnPolicyRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device)
+    elif agent_cfg.class_name == "DistillationRunner":
+        runner = DistillationRunner(env, agent_cfg.to_dict(), log_dir=None, device=agent_cfg.device)
+    else:
+        raise ValueError(f"Unsupported runner class: {agent_cfg.class_name}")
+    runner.load(resume_path)
+
+    # obtain the trained policy for inference
+    policy = runner.get_inference_policy(device=env.unwrapped.device)
+
+    # extract the neural network module
+    # we do this in a try-except to maintain backwards compatibility.
+    try:
+        # version 2.3 onwards
+        policy_nn = runner.alg.policy
+    except AttributeError:
+        # version 2.2 and below
+        policy_nn = runner.alg.actor_critic
+
+    # extract the normalizer
+    if hasattr(policy_nn, "actor_obs_normalizer"):
+        normalizer = policy_nn.actor_obs_normalizer
+    elif hasattr(policy_nn, "student_obs_normalizer"):
+        normalizer = policy_nn.student_obs_normalizer
+    else:
+        normalizer = None
+
+    # export policy to onnx/jit
+    export_model_dir = os.path.join(os.path.dirname(resume_path), "exported")
+    export_policy_as_jit(policy_nn, normalizer=normalizer, path=export_model_dir, filename="policy.pt")
+    export_policy_as_onnx(policy_nn, normalizer=normalizer, path=export_model_dir, filename="policy.onnx")
+
+    dt = env.unwrapped.step_dt
+
+    # reset environment
+    obs = env.get_observations()
+    timestep = 0
+
+    # Get joint mappings for policy transfer
+    _, target_to_source, obs_map = get_joint_mappings(args_cli, env.action_space.shape[1])
+
+    # Create torch tensors for mappings
+    device = args_cli.device if args_cli.device else "cuda:0"
+    target_to_source_tensor = torch.tensor(target_to_source, device=device) if target_to_source else None
+    obs_map_tensor = torch.tensor(obs_map, device=device) if obs_map else None
+
+    def remap_obs(obs):
+        """Remap the observation to the target observation space."""
+        if obs_map_tensor is not None:
+            obs = obs[:, obs_map_tensor]
+        return obs
+
+    def remap_actions(actions):
+        """Remap the actions to the target action space."""
+        if target_to_source_tensor is not None:
+            actions = actions[:, target_to_source_tensor]
+        return actions
+
+    # simulate environment
+    while simulation_app.is_running():
+        start_time = time.time()
+        # run everything in inference mode
+        with torch.inference_mode():
+            # agent stepping
+            actions = policy(remap_obs(obs))
+            # env stepping
+            obs, _, _, _ = env.step(remap_actions(actions))
+        if args_cli.video:
+            timestep += 1
+            # Exit the play loop after recording one video
+            if timestep == args_cli.video_length:
+                break
+
+        # time delay for real-time evaluation
+        sleep_time = dt - (time.time() - start_time)
+        if args_cli.real_time and sleep_time > 0:
+            time.sleep(sleep_time)
+
+    # close the simulator
+    env.close()
+
+
+if __name__ == "__main__":
+    # run the main function
+    main()
+    # close sim app
+    simulation_app.close()