Adds Gymnasium spaces showcase tasks (#2109)

# Description This PR add a set of Direct-workflow tasks that showcase the definition/use of the various Gymnasium observation and action spaces supported in Isaac Lab. ## Type of change  - New feature (non-breaking change which adds functionality) - This change requires a documentation update ## Screenshots ![image](https://github.com/user-attachments/assets/36b526ac-0eb7-45fa-81fa-3d0a09c1c1c5) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>

Adds Gymnasium spaces showcase tasks (#2109)
# Description This PR add a set of Direct-workflow tasks that showcase the definition/use of the various Gymnasium observation and action spaces supported in Isaac Lab. ## Type of change  - New feature (non-breaking change which adds functionality) - This change requires a documentation update ## Screenshots ![image](https://github.com/user-attachments/assets/36b526ac-0eb7-45fa-81fa-3d0a09c1c1c5) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com>
c5fe1b5f · Toni-SM · GitHub · 1b03bf2f · c5fe1b5f · c5fe1b5f
Unverified Commit c5fe1b5f authored Mar 21, 2025 by Toni-SM Committed by GitHub Mar 21, 2025
41 changed files
--- a/docs/source/_static/css/custom.css
+++ b/docs/source/_static/css/custom.css
@@ -82,3 +82,31 @@ a {
    padding-top: 0.0rem !important;
    padding-bottom: 0.0rem !important;
 }
+
+/* showcase task tables */
+
+.showcase-table {
+    min-width: 75%;
+}
+
+.showcase-table td {
+    border-color: gray;
+    border-style: solid;
+    border-width: 1px;
+}
+
+.showcase-table p {
+    margin: 0;
+    padding: 0;
+}
+
+.showcase-table .rot90 {
+    transform: rotate(-90deg);
+    margin: 0;
+    padding: 0;
+}
+
+.showcase-table .center {
+    text-align: center;
+    vertical-align: middle;
+}
--- a/docs/source/overview/environments.rst
+++ b/docs/source/overview/environments.rst
@@ -335,6 +335,121 @@ Others
 .. |quadcopter| image:: ../_static/tasks/others/quadcopter.jpg
 .. |humanoid_amp| image:: ../_static/tasks/others/humanoid_amp.jpg

+Spaces showcase
+~~~~~~~~~~~~~~~
+
+The |cartpole_showcase| folder contains showcase tasks (based on the *Cartpole* and *Cartpole-Camera* Direct tasks)
+for the definition/use of the various Gymnasium observation and action spaces supported in Isaac Lab.
+
+.. |cartpole_showcase| replace:: `cartpole_showcase <https://github.com/isaac-sim/IsaacLab/tree/main/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase>`__
+
+.. note::
+
+    Currently, only Isaac Lab's Direct workflow supports the definition of observation and action spaces other than ``Box``.
+    See Direct workflow's :py:obj:`~isaaclab.envs.DirectRLEnvCfg.observation_space` / :py:obj:`~isaaclab.envs.DirectRLEnvCfg.action_space`
+    documentation for more details.
+
+The following tables summarize the different pairs of showcased spaces for the *Cartpole* and *Cartpole-Camera* tasks.
+Replace ``<OBSERVATION>`` and ``<ACTION>`` with the observation and action spaces to be explored in the task names for training and evaluation.
+
+.. raw:: html
+
+    <table class="showcase-table">
+    <caption>
+      <p>Showcase spaces for the <strong>Cartpole</strong> task</p>
+      <p><code>Isaac-Cartpole-Showcase-&lt;OBSERVATION&gt;-&lt;ACTION&gt;-Direct-v0</code></p>
+    </caption>
+    <tbody>
+      <tr>
+        <td colspan="2" rowspan="2"></td>
+        <td colspan="5" class="center">action space</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Box</strong></td>
+        <td><strong>&nbsp;Discrete</strong></td>
+        <td><strong>&nbsp;MultiDiscrete</strong></td>
+      </tr>
+      <tr>
+        <td rowspan="5" class="rot90 center"><p>observation</p><p>space</p></td>
+        <td><strong>&nbsp;Box</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Discrete</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;MultiDiscrete</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Dict</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Tuple</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+    </tbody>
+    </table>
+    <br>
+    <table class="showcase-table">
+    <caption>
+        <p>Showcase spaces for the <strong>Cartpole-Camera</strong> task</p>
+        <p><code>Isaac-Cartpole-Camera-Showcase-&lt;OBSERVATION&gt;-&lt;ACTION&gt;-Direct-v0</code></p>
+    </caption>
+    <tbody>
+      <tr>
+        <td colspan="2" rowspan="2"></td>
+        <td colspan="5" class="center">action space</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Box</strong></td>
+        <td><strong>&nbsp;Discrete</strong></td>
+        <td><strong>&nbsp;MultiDiscrete</strong></td>
+      </tr>
+      <tr>
+        <td rowspan="5" class="rot90 center"><p>observation</p><p>space</p></td>
+        <td><strong>&nbsp;Box</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Discrete</strong></td>
+        <td class="center">-</td>
+        <td class="center">-</td>
+        <td class="center">-</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;MultiDiscrete</strong></td>
+        <td class="center">-</td>
+        <td class="center">-</td>
+        <td class="center">-</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Dict</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+      <tr>
+        <td><strong>&nbsp;Tuple</strong></td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+        <td class="center">x</td>
+      </tr>
+    </tbody></table>

 Multi-agent
 ------------
@@ -404,6 +519,42 @@ Comprehensive List of Environments
      -
      - Direct
      - **rl_games** (PPO), **skrl** (IPPO, PPO, MAPPO)
+    * - Isaac-Cartpole-Camera-Showcase-Box-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Box-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Box-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Dict-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Dict-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Dict-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Tuple-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Tuple-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Camera-Showcase-Tuple-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
    * - Isaac-Cartpole-Depth-Camera-Direct-v0
      -
      - Direct
@@ -432,6 +583,66 @@ Comprehensive List of Environments
      -
      - Manager Based
      - **rl_games** (PPO)
+    * - Isaac-Cartpole-Showcase-Box-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Box-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Box-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Dict-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Dict-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Dict-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Discrete-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Discrete-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Discrete-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-MultiDiscrete-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-MultiDiscrete-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-MultiDiscrete-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Tuple-Box-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Tuple-Discrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
+    * - Isaac-Cartpole-Showcase-Tuple-MultiDiscrete-Direct-v0
+      -
+      - Direct
+      - **skrl** (PPO)
    * - Isaac-Cartpole-v0
      -
      - Manager Based

--- a/scripts/reinforcement_learning/skrl/play.py
+++ b/scripts/reinforcement_learning/skrl/play.py
@@ -69,7 +69,7 @@ import skrl
 from packaging import version

 # check for minimum supported skrl version
-SKRL_VERSION = "1.4.1"
+SKRL_VERSION = "1.4.2"
 if version.parse(skrl.__version__) < version.parse(SKRL_VERSION):
    skrl.logger.error(
        f"Unsupported skrl version: {skrl.__version__}. "

--- a/scripts/reinforcement_learning/skrl/train.py
+++ b/scripts/reinforcement_learning/skrl/train.py
@@ -71,7 +71,7 @@ import skrl
 from packaging import version

 # check for minimum supported skrl version
-SKRL_VERSION = "1.4.1"
+SKRL_VERSION = "1.4.2"
 if version.parse(skrl.__version__) < version.parse(SKRL_VERSION):
    skrl.logger.error(
        f"Unsupported skrl version: {skrl.__version__}. "

--- a/source/isaaclab_rl/setup.py
+++ b/source/isaaclab_rl/setup.py
@@ -42,7 +42,7 @@ PYTORCH_INDEX_URL = ["https://download.pytorch.org/whl/cu118"]
 # Extra dependencies for RL agents
 EXTRAS_REQUIRE = {
    "sb3": ["stable-baselines3>=2.1"],
-    "skrl": ["skrl>=1.4.1"],
+    "skrl": ["skrl>=1.4.2"],
    "rl-games": ["rl-games==1.6.1", "gym"],  # rl-games still needs gym :(
    "rsl-rl": ["rsl-rl-lib>=2.1.1"],
 }

--- a/source/isaaclab_rl/test/test_skrl_wrapper.py
+++ b/source/isaaclab_rl/test/test_skrl_wrapper.py
@@ -43,7 +43,7 @@ class TestSKRLVecEnvWrapper(unittest.TestCase):
                    cls.registered_tasks.append(task_spec.id)
        # sort environments by name
        cls.registered_tasks.sort()
-        cls.registered_tasks = cls.registered_tasks[:5]
+        cls.registered_tasks = cls.registered_tasks[:3]

        # this flag is necessary to prevent a bug where the simulation gets stuck randomly when running the
        # test on many environments.

--- a/source/isaaclab_tasks/config/extension.toml
+++ b/source/isaaclab_tasks/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.10.25"
+version = "0.10.26"

 # Description
 title = "Isaac Lab Environments"

--- a/source/isaaclab_tasks/docs/CHANGELOG.rst
+++ b/source/isaaclab_tasks/docs/CHANGELOG.rst
 Changelog
 ---------

+0.10.26 (2025-03-18)
+~~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added Gymnasium spaces showcase tasks (``Isaac-Cartpole-Showcase-*-Direct-v0``, and ``Isaac-Cartpole-Camera-Showcase-*-Direct-v0``).
+
+
 0.10.25 (2025-03-10)
 ~~~~~~~~~~~~~~~~~~~~


--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/__init__.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/__init__.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""Cartpole environment showcase for the supported Gymnasium spaces."""
+
+from .cartpole import *  # noqa
+from .cartpole_camera import *  # noqa
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/__init__.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/__init__.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""
+Cartpole balancing environment.
+"""
+
+import gymnasium as gym
+
+from . import agents
+
+###########################
+# Register Gym environments
+###########################
+
+###
+# Observation space as Box
+###
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Box-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Box-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Box-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as Discrete
+###
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Discrete-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Discrete-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Discrete-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as MultiDiscrete
+###
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-MultiDiscrete-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-MultiDiscrete-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-MultiDiscrete-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as Dict
+###
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Dict-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Dict-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Dict-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as Tuple
+###
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Tuple-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Tuple-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Showcase-Tuple-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_multidiscrete_ppo_cfg.yaml",
+    },
+)
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/__init__.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/__init__.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_box_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_box_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_box_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: OBSERVATIONS
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_box_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+# obs["joint-positions"]  obs["joint-velocities"]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(+)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_dict_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+# obs["joint-positions"]  obs["joint-velocities"]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(+)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_dict_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_dict_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+# obs["joint-positions"]  obs["joint-velocities"]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(+)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS["joint-positions"]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS["joint-velocities"]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos + net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_dict_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_discrete_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_discrete_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_discrete_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_discrete_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_multidiscrete_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_multidiscrete_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_multidiscrete_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#                  one_hot(obs)
+#                       │
+#                ┏━━━━━━▼━━━━━┓
+#                ┃     net    ┃
+#                ┡━━━━━━━━━━━━┩
+#                │ linear(32) │
+#                │ elu        │
+#                │ linear(32) │
+#                │ elu        │
+#                └──────┬─────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net
+        input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null  # pre-processor should not be used with Discrete/MultiDiscrete observations
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_multidiscrete_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs[0]                   obs[1]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(*)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_tuple_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs[0]                   obs[1]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(*)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_tuple_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/agents/skrl_tuple_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs[0]                   obs[1]
+#           │                        │
+#    ┏━━━━━━▼━━━━━┓           ┏━━━━━━▼━━━━━┓
+#    ┃   net_pos  ┃           ┃   net_vel  ┃
+#    ┡━━━━━━━━━━━━┩           ┡━━━━━━━━━━━━┩
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    │ linear(16) │           │ linear(16) │
+#    │ elu        │           │ elu        │
+#    └──────┬─────┘           └─────┬──────┘
+#           │                       │
+#           └─────────▶(*)◀─────────┘
+#                       │
+#                 ┏━━━━━▼━━━━━┓
+#                 ┃    net    ┃
+#                 ┡━━━━━━━━━━━┩
+#                 │ identity  │
+# shared          └─────┬─────┘
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: net_pos
+        input: OBSERVATIONS[0]
+        layers: [16, 16]
+        activations: elu
+      - name: net_vel
+        input: OBSERVATIONS[1]
+        layers: [16, 16]
+        activations: elu
+      - name: net
+        input: net_pos * net_vel
+        layers: []
+        activations: []
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 32
+  learning_epochs: 8
+  mini_batches: 8
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 5.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: RunningStandardScaler
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 2.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 0.1
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_direct_tuple_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 4800
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/cartpole_env.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/cartpole_env.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+import gymnasium as gym
+import torch
+
+from isaaclab_tasks.direct.cartpole.cartpole_env import CartpoleEnv, CartpoleEnvCfg
+
+
+class CartpoleShowcaseEnv(CartpoleEnv):
+    cfg: CartpoleEnvCfg
+
+    def _pre_physics_step(self, actions: torch.Tensor) -> None:
+        self.actions = actions.clone()
+
+    def _apply_action(self) -> None:
+        # fundamental spaces
+        # - Box
+        if isinstance(self.single_action_space, gym.spaces.Box):
+            target = self.cfg.action_scale * self.actions
+        # - Discrete
+        elif isinstance(self.single_action_space, gym.spaces.Discrete):
+            target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
+            target = torch.where(self.actions == 1, -self.cfg.action_scale, target)
+            target = torch.where(self.actions == 2, self.cfg.action_scale, target)
+        # - MultiDiscrete
+        elif isinstance(self.single_action_space, gym.spaces.MultiDiscrete):
+            # value
+            target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
+            target = torch.where(self.actions[:, [0]] == 1, self.cfg.action_scale / 2.0, target)
+            target = torch.where(self.actions[:, [0]] == 2, self.cfg.action_scale, target)
+            # direction
+            target = torch.where(self.actions[:, [1]] == 0, -target, target)
+        else:
+            raise NotImplementedError(f"Action space {type(self.single_action_space)} not implemented")
+
+        # set target
+        self.cartpole.set_joint_effort_target(target, joint_ids=self._cart_dof_idx)
+
+    def _get_observations(self) -> dict:
+
+        # fundamental spaces
+        # - Box
+        if isinstance(self.single_observation_space["policy"], gym.spaces.Box):
+            obs = torch.cat(
+                (
+                    self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
+                    self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
+                    self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
+                    self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
+                ),
+                dim=-1,
+            )
+        # - Discrete
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.Discrete):
+            data = (
+                torch.cat(
+                    (
+                        self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
+                        self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
+                        self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
+                        self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
+                    ),
+                    dim=-1,
+                )
+                >= 0
+            )
+            obs = torch.zeros((self.num_envs,), dtype=torch.int32, device=self.device)
+            obs = torch.where(discretization_indices(data, [False, False, False, True]), 1, obs)
+            obs = torch.where(discretization_indices(data, [False, False, True, False]), 2, obs)
+            obs = torch.where(discretization_indices(data, [False, False, True, True]), 3, obs)
+            obs = torch.where(discretization_indices(data, [False, True, False, False]), 4, obs)
+            obs = torch.where(discretization_indices(data, [False, True, False, True]), 5, obs)
+            obs = torch.where(discretization_indices(data, [False, True, True, False]), 6, obs)
+            obs = torch.where(discretization_indices(data, [False, True, True, True]), 7, obs)
+            obs = torch.where(discretization_indices(data, [True, False, False, False]), 8, obs)
+            obs = torch.where(discretization_indices(data, [True, False, False, True]), 9, obs)
+            obs = torch.where(discretization_indices(data, [True, False, True, False]), 10, obs)
+            obs = torch.where(discretization_indices(data, [True, False, True, True]), 11, obs)
+            obs = torch.where(discretization_indices(data, [True, True, False, False]), 12, obs)
+            obs = torch.where(discretization_indices(data, [True, True, False, True]), 13, obs)
+            obs = torch.where(discretization_indices(data, [True, True, True, False]), 14, obs)
+            obs = torch.where(discretization_indices(data, [True, True, True, True]), 15, obs)
+        # - MultiDiscrete
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.MultiDiscrete):
+            zeros = torch.zeros((self.num_envs,), dtype=torch.int32, device=self.device)
+            ones = torch.ones_like(zeros)
+            obs = torch.cat(
+                (
+                    torch.where(
+                        discretization_indices(self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
+                        ones,
+                        zeros,
+                    ).unsqueeze(dim=1),
+                    torch.where(
+                        discretization_indices(self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
+                        ones,
+                        zeros,
+                    ).unsqueeze(dim=1),
+                    torch.where(
+                        discretization_indices(self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
+                        ones,
+                        zeros,
+                    ).unsqueeze(dim=1),
+                    torch.where(
+                        discretization_indices(self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
+                        ones,
+                        zeros,
+                    ).unsqueeze(dim=1),
+                ),
+                dim=-1,
+            )
+        # composite spaces
+        # - Tuple
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.Tuple):
+            obs = (self.joint_pos, self.joint_vel)
+        # - Dict
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.Dict):
+            obs = {"joint-positions": self.joint_pos, "joint-velocities": self.joint_vel}
+        else:
+            raise NotImplementedError(
+                f"Observation space {type(self.single_observation_space['policy'])} not implemented"
+            )
+
+        observations = {"policy": obs}
+        return observations
+
+
+def discretization_indices(x: torch.Tensor, condition: list[bool]) -> torch.Tensor:
+    return torch.prod(x == torch.tensor(condition, device=x.device), axis=-1).to(torch.bool)
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/cartpole_env_cfg.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole/cartpole_env_cfg.py
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/__init__.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/__init__.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""
+Cartpole balancing environment with camera.
+"""
+
+import gymnasium as gym
+
+from . import agents
+
+###########################
+# Register Gym environments
+###########################
+
+###
+# Observation space as Box
+###
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Box-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Box-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Box-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as Dict
+###
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Dict-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Dict-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Dict-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_multidiscrete_ppo_cfg.yaml",
+    },
+)
+
+###
+# Observation space as Tuple
+###
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Tuple-Box-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleBoxEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_box_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Tuple-Discrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_discrete_ppo_cfg.yaml",
+    },
+)
+
+gym.register(
+    id="Isaac-Cartpole-Camera-Showcase-Tuple-MultiDiscrete-Direct-v0",
+    entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
+    disable_env_checker=True,
+    kwargs={
+        "env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleMultiDiscreteEnvCfg",
+        "skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_multidiscrete_ppo_cfg.yaml",
+    },
+)
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/__init__.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/__init__.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#            ┏━━━━━━━━━━▼━━━━━━━━━━┓
+#            ┃ features_extractor  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━┩
+#            │ conv2d(32, 8, 4)    │
+#            │ relu                │
+#            │ conv2d(64, 4, 2)    │
+#            │ relu                │
+#            │ conv2d(64, 3, 1)    │
+#            │ relu                │
+#            │ flatten             │
+#            └──────────┬──────────┘
+#                       │
+#                ┏━━━━━━▼━━━━━━┓
+#                ┃     net     ┃
+#                ┡━━━━━━━━━━━━━┩
+#                │ linear(512) │
+#                │ elu         │
+#                └──────┬──────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_box_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#            ┏━━━━━━━━━━▼━━━━━━━━━━┓
+#            ┃ features_extractor  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━┩
+#            │ conv2d(32, 8, 4)    │
+#            │ relu                │
+#            │ conv2d(64, 4, 2)    │
+#            │ relu                │
+#            │ conv2d(64, 3, 1)    │
+#            │ relu                │
+#            │ flatten             │
+#            └──────────┬──────────┘
+#                       │
+#                ┏━━━━━━▼━━━━━━┓
+#                ┃     net     ┃
+#                ┡━━━━━━━━━━━━━┩
+#                │ linear(512) │
+#                │ elu         │
+#                └──────┬──────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_box_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_box_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#                      obs
+#                       │
+#            ┏━━━━━━━━━━▼━━━━━━━━━━┓
+#            ┃ features_extractor  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━┩
+#            │ conv2d(32, 8, 4)    │
+#            │ relu                │
+#            │ conv2d(64, 4, 2)    │
+#            │ relu                │
+#            │ conv2d(64, 3, 1)    │
+#            │ relu                │
+#            │ flatten             │
+#            └──────────┬──────────┘
+#                       │
+#                ┏━━━━━━▼━━━━━━┓
+#                ┃     net     ┃
+#                ┡━━━━━━━━━━━━━┩
+#                │ linear(512) │
+#                │ elu         │
+#                └──────┬──────┘
+# shared                │
+# ......................│.......................
+# non-shared            │
+#           ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#           ┃  policy|value output  ┃
+#           ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#           │ linear(num_actions|1) │
+#           └───────────┬───────────┘
+#                       ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+        activations: relu
+      - name: net
+        input: features_extractor
+        layers: [512]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_box_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs["camera"]      obs["joint-velocities"]
+#               │                        │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓             │
+#    ┃ features_extractor  ┃             │
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩             │
+#    │ conv2d(32, 8, 4)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 4, 2)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 3, 1)    │             │
+#    │ relu                │             │
+#    │ flatten             │             │
+#    │ linear(512)         │             │
+#    │ tanh                │             │
+#    │ linear(16)          │             │
+#    │ tanh                │             │
+#    └──────────┬──────────┘             |
+#               │                        │
+#               └─▶(concatenate)◀────────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_dict_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs["camera"]      obs["joint-velocities"]
+#               │                        │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓             │
+#    ┃ features_extractor  ┃             │
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩             │
+#    │ conv2d(32, 8, 4)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 4, 2)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 3, 1)    │             │
+#    │ relu                │             │
+#    │ flatten             │             │
+#    │ linear(512)         │             │
+#    │ tanh                │             │
+#    │ linear(16)          │             │
+#    │ tanh                │             │
+#    └──────────┬──────────┘             |
+#               │                        │
+#               └─▶(concatenate)◀────────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_dict_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_dict_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#         obs["camera"]      obs["joint-velocities"]
+#               │                        │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓             │
+#    ┃ features_extractor  ┃             │
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩             │
+#    │ conv2d(32, 8, 4)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 4, 2)    │             │
+#    │ relu                │             │
+#    │ conv2d(64, 3, 1)    │             │
+#    │ relu                │             │
+#    │ flatten             │             │
+#    │ linear(512)         │             │
+#    │ tanh                │             │
+#    │ linear(16)          │             │
+#    │ tanh                │             │
+#    └──────────┬──────────┘             |
+#               │                        │
+#               └─▶(concatenate)◀────────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS, (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: net
+        input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_dict_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_box_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_box_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#             obs[0]                  obs[1]
+#               │                       │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓    ┏━━━━━━━▼━━━━━━━━┓
+#    ┃ features_extractor  ┃    ┃ proprioception ┃
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩    ┡━━━━━━━━━━━━━━━━┩
+#    │ conv2d(32, 8, 4)    │    │ linear(16)     │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 4, 2)    │    │ linear(8)      │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 3, 1)    │    └───────┬────────┘
+#    │ relu                │            │
+#    │ flatten             │            │
+#    │ linear(512)         │            │
+#    │ tanh                │            │
+#    │ linear(16)          │            │
+#    │ tanh                │            │
+#    └──────────┬──────────┘            |
+#               │                       │
+#               └─▶(concatenate)◀───────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see gaussian_model parameters
+    class: GaussianMixin
+    clip_actions: False
+    clip_log_std: True
+    min_log_std: -20.0
+    max_log_std: 2.0
+    initial_log_std: 0.0
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_tuple_box"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_discrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_discrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#             obs[0]                  obs[1]
+#               │                       │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓    ┏━━━━━━━▼━━━━━━━━┓
+#    ┃ features_extractor  ┃    ┃ proprioception ┃
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩    ┡━━━━━━━━━━━━━━━━┩
+#    │ conv2d(32, 8, 4)    │    │ linear(16)     │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 4, 2)    │    │ linear(8)      │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 3, 1)    │    └───────┬────────┘
+#    │ relu                │            │
+#    │ flatten             │            │
+#    │ linear(512)         │            │
+#    │ tanh                │            │
+#    │ linear(16)          │            │
+#    │ tanh                │            │
+#    └──────────┬──────────┘            |
+#               │                       │
+#               └─▶(concatenate)◀───────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see categorical_model parameters
+    class: CategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_tuple_discrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_multidiscrete_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/agents/skrl_tuple_multidiscrete_ppo_cfg.yaml
+seed: 42
+
+
+# Models are instantiated using skrl's model instantiator utility
+# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
+#
+#             obs[0]                  obs[1]
+#               │                       │
+#    ┏━━━━━━━━━━▼━━━━━━━━━━┓    ┏━━━━━━━▼━━━━━━━━┓
+#    ┃ features_extractor  ┃    ┃ proprioception ┃
+#    ┡━━━━━━━━━━━━━━━━━━━━━┩    ┡━━━━━━━━━━━━━━━━┩
+#    │ conv2d(32, 8, 4)    │    │ linear(16)     │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 4, 2)    │    │ linear(8)      │
+#    │ relu                │    │ elu            │
+#    │ conv2d(64, 3, 1)    │    └───────┬────────┘
+#    │ relu                │            │
+#    │ flatten             │            │
+#    │ linear(512)         │            │
+#    │ tanh                │            │
+#    │ linear(16)          │            │
+#    │ tanh                │            │
+#    └──────────┬──────────┘            |
+#               │                       │
+#               └─▶(concatenate)◀───────┘
+#                        │
+#                 ┏━━━━━━▼━━━━━┓
+#                 ┃     net    ┃
+#                 ┡━━━━━━━━━━━━┩
+#                 │ linear(32) │
+#                 │ elu        │
+#                 │ linear(32) │
+#                 │ elu        │
+#                 └──────┬─────┘
+# shared                 │
+# .......................│.......................
+# non-shared             │
+#            ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
+#            ┃  policy|value output  ┃
+#            ┡━━━━━━━━━━━━━━━━━━━━━━━┩
+#            │ linear(num_actions|1) │
+#            └───────────┬───────────┘
+#                        ▼
+models:
+  separate: False
+  policy:  # see multicategorical_model parameters
+    class: MultiCategoricalMixin
+    unnormalized_log_prob: True
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ACTIONS
+  value:  # see deterministic_model parameters
+    class: DeterministicMixin
+    clip_actions: False
+    network:
+      - name: features_extractor
+        input: permute(OBSERVATIONS[0], (0, 3, 1, 2))  # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
+        layers:
+          - conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
+          - conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
+          - flatten
+          - linear: 512
+          - linear: 16
+        activations: [relu, relu, relu, null, tanh, tanh]
+      - name: proprioception
+        input: OBSERVATIONS[1]
+        layers: [16, 8]
+        activations: elu
+      - name: net
+        input: concatenate([features_extractor, proprioception])
+        layers: [32, 32]
+        activations: elu
+    output: ONE
+
+
+# Rollout memory
+# https://skrl.readthedocs.io/en/latest/api/memories/random.html
+memory:
+  class: RandomMemory
+  memory_size: -1  # automatically determined (same as agent:rollouts)
+
+
+# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
+# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
+agent:
+  class: PPO
+  rollouts: 64
+  learning_epochs: 4
+  mini_batches: 32
+  discount_factor: 0.99
+  lambda: 0.95
+  learning_rate: 1.0e-04
+  learning_rate_scheduler: KLAdaptiveLR
+  learning_rate_scheduler_kwargs:
+    kl_threshold: 0.008
+  state_preprocessor: null
+  state_preprocessor_kwargs: null
+  value_preprocessor: RunningStandardScaler
+  value_preprocessor_kwargs: null
+  random_timesteps: 0
+  learning_starts: 0
+  grad_norm_clip: 1.0
+  ratio_clip: 0.2
+  value_clip: 0.2
+  clip_predicted_values: True
+  entropy_loss_scale: 0.0
+  value_loss_scale: 1.0
+  kl_threshold: 0.0
+  rewards_shaper_scale: 1.0
+  time_limit_bootstrap: False
+  mixed_precision: False
+  # logging and checkpoint
+  experiment:
+    directory: "cartpole_camera_direct_tuple_multidiscrete"
+    experiment_name: ""
+    write_interval: auto
+    checkpoint_interval: auto
+
+
+# Sequential trainer
+# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
+trainer:
+  class: SequentialTrainer
+  timesteps: 32000
+  environment_info: log
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/cartpole_camera_env.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/cartpole_camera_env.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+import gymnasium as gym
+import torch
+
+from isaaclab_tasks.direct.cartpole.cartpole_camera_env import CartpoleCameraEnv, CartpoleRGBCameraEnvCfg
+
+
+class CartpoleCameraShowcaseEnv(CartpoleCameraEnv):
+    cfg: CartpoleRGBCameraEnvCfg
+
+    def _pre_physics_step(self, actions: torch.Tensor) -> None:
+        self.actions = actions.clone()
+
+    def _apply_action(self) -> None:
+        # fundamental spaces
+        # - Box
+        if isinstance(self.single_action_space, gym.spaces.Box):
+            target = self.cfg.action_scale * self.actions
+        # - Discrete
+        elif isinstance(self.single_action_space, gym.spaces.Discrete):
+            target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
+            target = torch.where(self.actions == 1, -self.cfg.action_scale, target)
+            target = torch.where(self.actions == 2, self.cfg.action_scale, target)
+        # - MultiDiscrete
+        elif isinstance(self.single_action_space, gym.spaces.MultiDiscrete):
+            # value
+            target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
+            target = torch.where(self.actions[:, [0]] == 1, self.cfg.action_scale / 2.0, target)
+            target = torch.where(self.actions[:, [0]] == 2, self.cfg.action_scale, target)
+            # direction
+            target = torch.where(self.actions[:, [1]] == 0, -target, target)
+        else:
+            raise NotImplementedError(f"Action space {type(self.single_action_space)} not implemented")
+
+        # set target
+        self._cartpole.set_joint_effort_target(target, joint_ids=self._cart_dof_idx)
+
+    def _get_observations(self) -> dict:
+        # get camera data
+        data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "depth"
+        if "rgb" in self.cfg.tiled_camera.data_types:
+            camera_data = self._tiled_camera.data.output[data_type] / 255.0
+            # normalize the camera data for better training results
+            mean_tensor = torch.mean(camera_data, dim=(1, 2), keepdim=True)
+            camera_data -= mean_tensor
+        elif "depth" in self.cfg.tiled_camera.data_types:
+            camera_data = self._tiled_camera.data.output[data_type]
+            camera_data[camera_data == float("inf")] = 0
+
+        # fundamental spaces
+        # - Box
+        if isinstance(self.single_observation_space["policy"], gym.spaces.Box):
+            obs = camera_data
+        # composite spaces
+        # - Tuple
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.Tuple):
+            obs = (camera_data, self.joint_vel)
+        # - Dict
+        elif isinstance(self.single_observation_space["policy"], gym.spaces.Dict):
+            obs = {"joint-velocities": self.joint_vel, "camera": camera_data}
+        else:
+            raise NotImplementedError(
+                f"Observation space {type(self.single_observation_space['policy'])} not implemented"
+            )
+
+        observations = {"policy": obs}
+        return observations
--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/cartpole_camera_env_cfg.py
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase/cartpole_camera/cartpole_camera_env_cfg.py