Upgrades environments from Gym 0.21 to Gymnasium 0.29 (#234)

# Description Currently, we are downgrading many libraries to be able to use the Gym 0.21.0 version. However, this is not great and is causing issues installing new Python packages, as highlighted in #204. It is becoming a more significant issue with Python 3.10 in Isaac Sim 2023.1. This MR upgrades the repository to use the Gymnasium Environment class. ## Type of Change - Bug fix (non-breaking change which fixes an issue) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Co-authored-by: David Hoeller <dhoeller@ethz.ch>

Upgrades environments from Gym 0.21 to Gymnasium 0.29 (#234)
# Description Currently, we are downgrading many libraries to be able to use the Gym 0.21.0 version. However, this is not great and is causing issues installing new Python packages, as highlighted in #204. It is becoming a more significant issue with Python 3.10 in Isaac Sim 2023.1. This MR upgrades the repository to use the Gymnasium Environment class. ## Type of Change - Bug fix (non-breaking change which fixes an issue) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Co-authored-by: David Hoeller <dhoeller@ethz.ch>
cd2c4f1d · Mayank Mittal · GitHub · e5b43e96 · cd2c4f1d · cd2c4f1d
Unverified Commit cd2c4f1d authored Nov 07, 2023 by Mayank Mittal Committed by GitHub Nov 07, 2023
65 changed files
--- a/docs/source/api/orbit_tasks.isaac_env.rst
+++ b/docs/source/api/orbit_tasks.isaac_env.rst
@@ -4,7 +4,7 @@ omni.isaac.orbit_tasks.isaac_env
 We use OpenAI Gym registry to register the environment and their default configuration file.
 The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
 The string is parsed into respective configuration container which needs to be passed to the environment
-class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
+class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
 :mod:`omni.isaac.orbit.utils.parse_cfg`.


@@ -17,12 +17,12 @@ class. This is done using the function :meth:`load_default_env_cfg` in the sub-m

 .. code-block:: python

-   import gym
+   import gymnasium as gym
   import omni.isaac.orbit_tasks
-   from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
+   from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry

   task_name = "Isaac-Cartpole-v0"
-   cfg = load_default_env_cfg(task_name)
+   cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
   env = gym.make(task_name, cfg=cfg)



--- a/docs/source/refs/issues.rst
+++ b/docs/source/refs/issues.rst
 Known issues
 ============

-Installation errors due to gym==0.21.0
--------------------------------------
-
-When installing the gym package, you may encounter the following error:
-
-.. code-block::
-
-    error in gym setup command: 'extras_require' must be a dictionary whose values are strings or lists of
-    strings containing valid project/version requirement specifiers.
-    ----------------------------------------
-    ERROR: Could not find a version that satisfies the requirement gym==0.21.0 (from omni-isaac-orbit-envs[all])
-    (from versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6,
-    ...
-    0.15.7, 0.16.0, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.18.0, 0.18.3, 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0,
-    0.23.1, 0.24.0, 0.24.1, 0.25.0, 0.25.1, 0.25.2, 0.26.0, 0.26.1, 0.26.2)
-    ERROR: No matching distribution found for gym==0.21.0
-
-
-This issue arises since the ``setuptools`` package from version 67.0 onwards does not support malformed version strings.
-Since the OpenAI Gym package that is no longer being maintained (`issue link <https://github.com/openai/gym/issues/3200>`_),
-the current workaround is to install the ``setuptools`` package version 66.0.0. You can do this by running the following
-command:
-
-.. code-block:: bash
-
-    ./orbit.sh -p -m pip install -U setuptools==66
-
 Regression in Isaac Sim 2022.2.1
 --------------------------------


--- a/docs/source/setup/installation.rst
+++ b/docs/source/setup/installation.rst
@@ -157,7 +157,7 @@ utilities to manage extensions:

   optional arguments:
      -h, --help           Display the help content.
-      -i, --install        Install the extensions inside Isaac Orbit.
+      -i, --install        Install the extensions inside Orbit.
      -e, --extra          Install extra dependencies such as the learning frameworks.
      -f, --format         Run pre-commit to format the code and check lints.
      -p, --python         Run the python executable (python.sh) provided by Isaac Sim.

--- a/docs/source/setup/sample.rst
+++ b/docs/source/setup/sample.rst
@@ -141,7 +141,7 @@ format.
   .. code:: bash

      # install python module (for robomimic)
-      ./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[robomimic]'
+      ./orbit.sh -e robomimic
      # split data
      ./orbit.sh -p source/standalone//workflows/robomimic/tools/split_train_val.py logs/robomimic/Isaac-Lift-Franka-v0/hdf_dataset.hdf5 --ratio 0.2

@@ -171,7 +171,7 @@ from the environments into the respective libraries function argument and return
   .. code:: bash

      # install python module (for stable-baselines3)
-      ./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[sb3]'
+      ./orbit.sh -e sb3
      # run script for training
      # note: we enable cpu flag since SB3 doesn't optimize for GPU anyway
      ./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --headless --cpu
@@ -184,7 +184,7 @@ from the environments into the respective libraries function argument and return
   .. code:: bash

      # install python module (for skrl)
-      ./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[skrl]'
+      ./orbit.sh -e skrl
      # run script for training
      ./orbit.sh -p source/standalone/workflows/skrl/train.py --task Isaac-Reach-Franka-v0 --headless
      # run script for playing with 32 environments
@@ -196,7 +196,7 @@ from the environments into the respective libraries function argument and return
   .. code:: bash

      # install python module (for rl-games)
-      ./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[rl_games]'
+      ./orbit.sh -e rl_games
      # run script for training
      ./orbit.sh -p source/standalone/workflows/rl_games/train.py --task Isaac-Ant-v0 --headless
      # run script for playing with 32 environments
@@ -208,7 +208,7 @@ from the environments into the respective libraries function argument and return
   .. code:: bash

      # install python module (for rsl-rl)
-      ./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[rsl_rl]'
+      ./orbit.sh -e rsl_rl
      # run script for training
      ./orbit.sh -p source/standalone/workflows/rsl_rl/train.py --task Isaac-Reach-Franka-v0 --headless
      # run script for playing with 32 environments

--- a/docs/source/tutorials_envs/00_gym_env.rst
+++ b/docs/source/tutorials_envs/00_gym_env.rst
@@ -39,11 +39,12 @@ an environment by calling ``gym.make``. The environments are registered in the `
    gym.register(
        id="Isaac-Cartpole-v0",
        entry_point="omni.isaac.orbit_tasks.classic.cartpole:CartpoleEnv",
-        kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
+        disable_env_checker=True,
+        kwargs={"env_cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
    )

-The ``cfg_entry_point`` argument is used to load the default configuration for the environment. The default
-configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_default_env_cfg` function.
+The ``env_cfg_entry_point`` argument is used to load the default configuration for the environment. The default
+configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_cfg_from_registry` function.
 The configuration entry point can correspond to both a YAML file or a python configuration
 class. The default configuration can be overridden by passing a custom configuration instance to the ``gym.make``
 function as shown later in the tutorial.

--- a/docs/source/tutorials_envs/02_wrappers.rst
+++ b/docs/source/tutorials_envs/02_wrappers.rst
@@ -26,13 +26,13 @@ For example, here is how you would wrap an environment to enforce that reset is

    """Rest everything follows."""

-    import gym
+    import gymnasium as gym

    import omni.isaac.orbit_tasks  # noqa: F401
-    from omni.isaac.orbit_tasks.utils import load_default_env_cfg
+    from omni.isaac.orbit_tasks.utils import load_cfg_from_registry

    # create base environment
-    cfg = load_default_env_cfg("Isaac-Reach-Franka-v0")
+    cfg = load_cfg_from_registry("Isaac-Reach-Franka-v0", "env_cfg_entry_point")
    env = gym.make("Isaac-Reach-Franka-v0", cfg=cfg)
    # wrap environment to enforce that reset is called before step
    env = gym.wrappers.OrderEnforcing(env)
@@ -105,7 +105,7 @@ for 200 steps, and saves it in the ``videos`` folder at a step interval of 1500
    """Rest everything follows."""


-    import gym
+    import gymnasium as gym

    # adjust camera resolution and pose
    env_cfg.viewer.resolution = (640, 480)

--- a/orbit.sh
+++ b/orbit.sh
@@ -185,7 +185,7 @@ print_help () {
    echo -e "\nusage: $(basename "$0") [-h] [-i] [-e] [-f] [-p] [-s] [-o] [-v] [-d] [-c] -- Utility to manage extensions in Orbit."
    echo -e "\noptional arguments:"
    echo -e "\t-h, --help           Display the help content."
-    echo -e "\t-i, --install        Install the extensions inside Isaac Orbit."
+    echo -e "\t-i, --install        Install the extensions inside Orbit."
    echo -e "\t-e, --extra          Install extra dependencies such as the learning frameworks."
    echo -e "\t-f, --format         Run pre-commit to format the code and check lints."
    echo -e "\t-p, --python         Run the python executable (python.sh) provided by Isaac Sim."
@@ -220,9 +220,6 @@ while [[ $# -gt 0 ]]; do
            # this does not check dependencies between extensions
            export -f extract_python_exe
            export -f install_orbit_extension
-            # downgrade setuptools to avoid issues with OpenAI Gym
-            # Check the `Known Issues` section in the documentation
-            $(extract_python_exe) -m pip install --upgrade setuptools==66
            # source directory
            find -L "${ORBIT_PATH}/source/extensions" -mindepth 1 -maxdepth 1 -type d -exec bash -c 'install_orbit_extension "{}"' \;
            # unset local variables
@@ -235,8 +232,17 @@ while [[ $# -gt 0 ]]; do
            # install the python packages for supported reinforcement learning frameworks
            echo "[INFO] Installing extra requirements such as learning frameworks..."
            python_exe=$(extract_python_exe)
+            # check if specified which rl-framework to install
+            if [ -z "$2" ]; then
+                echo "[INFO] Installing all rl-frameworks..."
+                framework_name="all"
+            else
+                echo "[INFO] Installing rl-framework: $2"
+                framework_name=$2
+                shift # past argument
+            fi
            # install the rl-frameworks specified
-            ${python_exe} -m pip install -e ${ORBIT_PATH}/source/extensions/omni.isaac.orbit_tasks[all]
+            ${python_exe} -m pip install -e ${ORBIT_PATH}/source/extensions/omni.isaac.orbit_tasks["${framework_name}"]
            shift # past argument
            ;;
        -c|--conda)

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -27,7 +27,7 @@ extra_standard_library = [
    "tensordict",
    "bpy",
    "matplotlib",
-    "gym",
+    "gymnasium",
    "scipy",
    "hid",
    "yaml",

--- a/source/extensions/omni.isaac.contrib_tasks/docs/README.md
+++ b/source/extensions/omni.isaac.contrib_tasks/docs/README.md
@@ -18,9 +18,12 @@ itself. However, its various instances should be included in directories within
 The environments should then be registered in the `omni/isaac/contrib_tasks/__init__.py`:

 ```python
+import gymnasium as gym
+
 gym.register(
    id="Isaac-Contrib-<my-awesome-env>-v0",
    entry_point="omni.isaac.contrib_tasks.<your-env-package>:<your-env-class>",
+    disable_env_checker=True,
    kwargs={"cfg_entry_point": "omni.isaac.contrib_tasks.<your-env-package-cfg>:<your-env-class-cfg>"},
 )
 ```
--- a/source/extensions/omni.isaac.contrib_tasks/omni/isaac/contrib_tasks/__init__.py
+++ b/source/extensions/omni.isaac.contrib_tasks/omni/isaac/contrib_tasks/__init__.py
@@ -9,7 +9,7 @@
 We use OpenAI Gym registry to register the environment and their default configuration file.
 The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
 The string is parsed into respective configuration container which needs to be passed to the environment
-class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
+class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
 :mod:`omni.isaac.orbit.utils.parse_cfg`.

 Note:
@@ -18,18 +18,18 @@ Note:
    the kwarg argument :obj:`cfg` while creating the environment.

 Usage:
-    >>> import gym
+    >>> import gymnasium as gym
    >>> import omni.isaac.contrib_tasks
-    >>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
+    >>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry
    >>>
    >>> task_name = "Isaac-Contrib-<my-registered-env-name>-v0"
-    >>> cfg = load_default_env_cfg(task_name)
+    >>> cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
    >>> env = gym.make(task_name, cfg=cfg)
 """

 from __future__ import annotations

-import gym  # noqa: F401
+import gymnasium as gym  # noqa: F401
 import os
 import toml


--- a/source/extensions/omni.isaac.contrib_tasks/setup.py
+++ b/source/extensions/omni.isaac.contrib_tasks/setup.py
@@ -28,6 +28,10 @@ setup(
    include_package_data=True,
    python_requires=">=3.7",
    packages=["omni.isaac.contrib_tasks"],
-    classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
+    classifiers=[
+        "Natural Language :: English",
+        "Programming Language :: Python :: 3.10",
+        "Isaac Sim :: 2023.1.0-hotfix.1",
+    ],
    zip_safe=False,
 )
--- a/source/extensions/omni.isaac.orbit/config/extension.toml
+++ b/source/extensions/omni.isaac.orbit/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.9.37"
+version = "0.9.38"

 # Description
 title = "ORBIT framework for Robot Learning"

--- a/source/extensions/omni.isaac.orbit/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.orbit/docs/CHANGELOG.rst
 Changelog
 ---------

+0.9.38 (2023-11-07)
+~~~~~~~~~~~~~~~~~~~
+
+Changed
+^^^^^^^
+
+* Upgraded the :class:`omni.isaac.orbit.envs.RLTaskEnv` class to support Gym 0.29.0 environment definition.
+
+Added
+^^^^^
+
+* Added computation of ``time_outs`` and ``terminated`` signals inside the termination manager. These follow the
+  definition mentioned in `Gym 0.29.0 <https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_.
+* Added proper handling of observation and action spaces in the :class:`omni.isaac.orbit.envs.RLTaskEnv` class.
+  These now follow closely to how Gym VecEnv handles the spaces.
+
+
 0.9.37 (2023-11-06)
 ~~~~~~~~~~~~~~~~~~~


--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/rl_task_env.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/rl_task_env.py
@@ -5,12 +5,14 @@

 from __future__ import annotations

-import gym
+import gymnasium as gym
 import math
 import numpy as np
 import torch
 from typing import Any, ClassVar, Dict, Sequence, Tuple, Union

+from omni.isaac.version import get_version
+
 from omni.isaac.orbit.command_generators import CommandGeneratorBase
 from omni.isaac.orbit.managers import CurriculumManager, RewardManager, TerminationManager

@@ -41,10 +43,16 @@ Note:
 """


-VecEnvStepReturn = Tuple[VecEnvObs, torch.Tensor, torch.Tensor, Dict]
+VecEnvStepReturn = Tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, Dict]
 """The environment signals processed at the end of each step.

-It contains the observation, reward, termination signal and additional information for each sub-environment.
+The tuple contains batched information for each sub-environment. The information is stored in the following order:
+
+1. **Observations**: The observations from the environment.
+2. **Rewards**: The rewards from the environment.
+3. **Terminated Dones**: Whether the environment reached a terminal state, such as task success or robot falling etc.
+4. **Timeout Dones**: Whether the environment reached a timeout state, such as end of max episode length.
+5. **Extras**: A dictionary containing additional information from the environment.
 """


@@ -72,40 +80,43 @@ class RLTaskEnv(BaseEnv, gym.Env):

    is_vector_env: ClassVar[bool] = True
    """Whether the environment is a vectorized environment."""
-    metadata: ClassVar[dict[str, Any]] = {"render.modes": ["human", "rgb_array"]}
+    metadata: ClassVar[dict[str, Any]] = {
+        "render_modes": [None, "human", "rgb_array"],
+        "isaac_sim_version": get_version(),
+    }
    """Metadata for the environment."""

    cfg: RLTaskEnvCfg
    """Configuration for the environment."""

-    def __init__(self, cfg: RLTaskEnvCfg, **kwargs):
+    def __init__(self, cfg: RLTaskEnvCfg, render_mode: str | None = None, **kwargs):
+        """Initialize the environment.
+
+        Args:
+            cfg: The configuration for the environment.
+            render_mode: The render mode for the environment. Defaults to None, which
+                is similar to ``"human"``.
+        """
        # initialize the base class to setup the scene.
        super().__init__(cfg=cfg)
+        # store the render mode
+        self.render_mode = render_mode

        # initialize data and constants
        # -- counter for curriculum
        self.common_step_counter = 0
        # -- init buffers
-        self.reset_buf = torch.ones(self.num_envs, device=self.device, dtype=torch.long)
-        self.reward_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.float)
        self.episode_length_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.long)
        # -- allocate dictionary to store metrics
        self.extras = {}
-        # print the environment information
-        print("[INFO]: Completed setting up the environment...")

        # setup the action and observation spaces for Gym
-        # -- observation space
-        self.observation_space = gym.spaces.Dict()
-        for group_name, group_dim in self.observation_manager.group_obs_dim.items():
-            self.observation_space[group_name] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=group_dim)
-        # -- action space (unbounded since we don't impose any limits)
-        action_dim = sum(self.action_manager.action_term_dim)
-        self.action_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(action_dim,))
-
+        self._configure_gym_env_spaces()
        # perform randomization at the start of the simulation
        if "startup" in self.randomization_manager.available_modes:
            self.randomization_manager.randomize(mode="startup")
+        # print the environment information
+        print("[INFO]: Completed setting up the environment...")

    """
    Properties.
@@ -147,44 +158,54 @@ class RLTaskEnv(BaseEnv, gym.Env):
    Operations - MDP
    """

-    def reset(self) -> VecEnvObs:
-        """Resets all the environments and returns observations.
+    def reset(self, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[VecEnvObs, dict]:
+        """Resets all the environments and returns observations and extras.

        Note:
            This function (if called) must **only** be called before the first call to :meth:`step`, i.e.
            after the environment is created. After that, the :meth:`step` function handles the reset
            of terminated sub-environments.

+        Args:
+            seed: The seed to use for randomization. Defaults to None, in which case the seed is not set.
+            options: Additional information to specify how the environment is reset. Defaults to None.
+
+                Note:
+                    This is not used in the current implementation. It is mostly there for compatibility with
+                    Gymnasium environment definition.
+
        Returns:
-            Observations from the environment.
+            A tuple containing the observations and extras.
        """
+        # set the seed
+        if seed is not None:
+            gym.Env.reset(self, seed=seed)
+            self.seed(seed)
        # reset state of scene
        indices = torch.arange(self.num_envs, dtype=torch.int64, device=self.device)
        self._reset_idx(indices)
        # return observations
-        return self.observation_manager.compute()
+        return self.observation_manager.compute(), self.extras

    def step(self, action: torch.Tensor) -> VecEnvStepReturn:
-        """Apply actions on the environment and reset terminated environments.
+        """Run one timestep of the environment's dynamics and reset terminated environments.

-        This function deals with various timeline events (play, pause and stop) for clean execution.
-        When the simulation is stopped all the physics handles expire and we cannot perform any read or
-        write operations. The timeline event is only detected after every `sim.step()` call. Hence, at
-        every call we need to check the status of the simulator. The logic is as follows:
+        The environment dynamics may comprise of many steps of the physics engine. The number of steps
+        is controlled by the :attr:`RLTaskEnvCfg.decimation` parameter in the configuration. This means
+        that the agent control can happen at a slower rate than the physics simulation. This is useful
+        for real-time control of the robot, where the control loop may be slower than the frequency of
+        the actual dynamics.

-        1. If the simulation is stopped, the environment is closed and the simulator is shutdown.
-        2. If the simulation is paused, we step the simulator until it is playing.
-        3. If the simulation is playing, we set the actions and step the simulator.
+        The function also handles resetting of the terminated environments, at the end of the physics
+        stepping and computation of the reward and terminated signals. This is because it is not
+        possible to reset the sub-environments individually due to the vectorized implementation
+        of sub-environments in the simulator.

        Args:
-            action: Actions to apply on the simulator.
+            action: The actions to apply on the environment. Shape is ``(num_envs, action_dim)``.

        Returns:
-            VecEnvStepReturn: A tuple containing:
-                - (VecEnvObs) observations from the environment
-                - (torch.Tensor) reward from the environment
-                - (torch.Tensor) whether the current episode is completed or not
-                - (dict) misc information
+            A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
        """
        # process actions
        self.action_manager.process_action(action)
@@ -206,13 +227,14 @@ class RLTaskEnv(BaseEnv, gym.Env):
        # -- update env counters (used for curriculum generation)
        self.episode_length_buf += 1  # step in current episode (per env)
        self.common_step_counter += 1  # total step (common for all envs)
-
-        # compute MDP signals
        # -- check terminations
-        self.reset_buf = self.termination_manager.compute().to(torch.long)
+        self.reset_buf = self.termination_manager.compute()
+        self.reset_terminated = self.termination_manager.terminated
+        self.reset_time_outs = self.termination_manager.time_outs
        # -- reward computation
        self.reward_buf = self.reward_manager.compute(dt=self.step_dt)
-        # -- reset envs that terminated and log the episode information
+
+        # -- reset envs that terminated/timed-out and log the episode information
        reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
        if len(reset_env_ids) > 0:
            self._reset_idx(reset_env_ids)
@@ -221,11 +243,14 @@ class RLTaskEnv(BaseEnv, gym.Env):
        # -- step interval randomization
        if "interval" in self.randomization_manager.available_modes:
            self.randomization_manager.randomize(mode="interval", dt=self.step_dt)
+        # -- compute observations
+        # note: done after reset to get the correct observations for reset envs
+        self.obs_buf = self.observation_manager.compute()

        # return observations, rewards, resets and extras
-        return self.observation_manager.compute(), self.reward_buf, self.reset_buf, self.extras
+        return self.obs_buf, self.reward_buf, self.reset_terminated, self.reset_time_outs, self.extras

-    def render(self, mode: str = "human") -> np.ndarray | None:
+    def render(self) -> np.ndarray | None:
        """Run rendering without stepping through the physics.

        By convention, if mode is:
@@ -234,9 +259,6 @@ class RLTaskEnv(BaseEnv, gym.Env):
        - **rgb_array**: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an
          x-by-y pixel image, suitable for turning into a video.

-        Args:
-            mode: The mode to render with. Defaults to "human".
-
        Returns:
            The rendered image as a numpy array if mode is "rgb_array".

@@ -249,15 +271,15 @@ class RLTaskEnv(BaseEnv, gym.Env):
        # run a rendering step of the simulator
        self.sim.render()
        # decide the rendering mode
-        if mode == "human":
+        if self.render_mode == "human" or self.render_mode is None:
            return None
-        elif mode == "rgb_array":
+        elif self.render_mode == "rgb_array":
            # check that if any render could have happened
            if self.sim.render_mode.value < self.sim.RenderMode.PARTIAL_RENDERING.value:
                raise RuntimeError(
-                    f"Cannot render '{mode}' when the simulation render mode is '{self.sim.render_mode.name}'."
-                    f" Please set the simulation render mode to '{self.sim.RenderMode.PARTIAL_RENDERING.name}' or "
-                    f" '{self.sim.RenderMode.FULL_RENDERING.name}'."
+                    f"Cannot render '{self.render_mode}' when the simulation render mode is"
+                    f" '{self.sim.render_mode.name}'. Please set the simulation render mode to:"
+                    f"'{self.sim.RenderMode.PARTIAL_RENDERING.name}' or '{self.sim.RenderMode.FULL_RENDERING.name}'."
                )
            # create the annotator if it does not exist
            if not hasattr(self, "_rgb_annotator"):
@@ -282,7 +304,7 @@ class RLTaskEnv(BaseEnv, gym.Env):
                return rgb_data[:, :, :3]
        else:
            raise NotImplementedError(
-                f"Render mode '{mode}' is not supported. Please use: {self.metadata['render.modes']}."
+                f"Render mode '{self.render_mode}' is not supported. Please use: {self.metadata['render_modes']}."
            )

    def close(self):
@@ -296,9 +318,37 @@ class RLTaskEnv(BaseEnv, gym.Env):
            super().close()

    """
-    Implementation specifics.
+    Helper functions.
    """

+    def _configure_gym_env_spaces(self):
+        """Configure the action and observation spaces for the Gym environment."""
+        # observation space (unbounded since we don't impose any limits)
+        self.single_observation_space = gym.spaces.Dict()
+        for group_name, group_term_names in self.observation_manager.active_terms.items():
+            # extract quantities about the group
+            has_concatenated_obs = self.observation_manager.group_obs_concatenate[group_name]
+            group_dim = self.observation_manager.group_obs_dim[group_name]
+            group_term_dim = self.observation_manager.group_obs_term_dim[group_name]
+            # check if group is concatenated or not
+            # if not concatenated, then we need to add each term separately as a dictionary
+            if has_concatenated_obs:
+                self.single_observation_space[group_name] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=group_dim)
+            else:
+                self.single_observation_space[group_name] = gym.spaces.Dict(
+                    {
+                        term_name: gym.spaces.Box(low=-np.inf, high=np.inf, shape=term_dim)
+                        for term_name, term_dim in zip(group_term_names, group_term_dim)
+                    }
+                )
+        # action space (unbounded since we don't impose any limits)
+        action_dim = sum(self.action_manager.action_term_dim)
+        self.single_action_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(action_dim,))
+
+        # batch the spaces for vectorized environments
+        self.observation_space = gym.vector.utils.batch_space(self.single_observation_space, self.num_envs)
+        self.action_space = gym.vector.utils.batch_space(self.single_action_space, self.num_envs)
+
    def _reset_idx(self, env_ids: Sequence[int]):
        """Reset environments based on specified indices.

@@ -341,6 +391,3 @@ class RLTaskEnv(BaseEnv, gym.Env):

        # reset the episode length buffer
        self.episode_length_buf[env_ids] = 0
-        #  -- add information to extra if timeout occurred due to episode length
-        # Note: this is used by algorithms like PPO where time-outs are handled differently
-        self.extras["time_outs"] = self.termination_manager.time_outs
--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/observation_manager.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/observation_manager.py
@@ -91,6 +91,11 @@ class ObservationManager(ManagerBase):
        """Shape of observation tensor for each term in each group."""
        return self._group_obs_term_dim

+    @property
+    def group_obs_concatenate(self) -> dict[str, bool]:
+        """Whether the observation terms are concatenated in each group."""
+        return self._group_obs_concatenate
+
    """
    Operations.
    """

--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/termination_manager.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/managers/termination_manager.py
@@ -26,8 +26,20 @@ class TerminationManager(ManagerBase):
    argument and returns a boolean tensor of shape ``(num_envs,)``. The termination manager
    computes the termination signal as the union (logical or) of all the termination terms.

+    Following the `Gymnasium API <https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_,
+    the termination signal is computed as the logical OR of the following signals:
+
+    * **Time-out**: This signal is set to true if the environment has ended after an externally defined condition
+      (that is outside the scope of a MDP). For example, the environment may be terminated if the episode has
+      timed out (i.e. reached max episode length).
+    * **Terminated**: This signal is set to true if the environment has reached a terminal state defined by the
+      environment. This state may correspond to task success, task failure, robot falling, etc.
+
+    These signals can be individually accessed using the :attr:`time_outs` and :attr:`terminated` properties.
+
    The termination terms are parsed from a config class containing the manager's settings and each term's
-    parameters. Each termination term should instantiate the :class:`TerminationTermCfg` class.
+    parameters. Each termination term should instantiate the :class:`TerminationTermCfg` class. The term's
+    configuration :attr:`TerminationTermCfg.time_out` decides whether the term is a timeout or a termination term.
    """

    _env: RLTaskEnv
@@ -46,8 +58,8 @@ class TerminationManager(ManagerBase):
        for term_name in self._term_names:
            self._episode_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
        # create buffer for managing termination per environment
-        self._done_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
-        self._time_out_buf = torch.zeros_like(self._done_buf)
+        self._truncated_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
+        self._terminated_buf = torch.zeros_like(self._truncated_buf)

    def __str__(self) -> str:
        """Returns: A string representation for termination manager."""
@@ -79,12 +91,26 @@ class TerminationManager(ManagerBase):
    @property
    def dones(self) -> torch.Tensor:
        """The net termination signal. Shape is ``(num_envs,)``."""
-        return self._done_buf
+        return self._truncated_buf | self._terminated_buf

    @property
    def time_outs(self) -> torch.Tensor:
-        """The timeout signal. Shape is ``(num_envs,)``."""
-        return self._time_out_buf
+        """The timeout signal (reaching max episode length). Shape is ``(num_envs,)``.
+
+        This signal is set to true if the environment has ended after an externally defined condition
+        (that is outside the scope of a MDP). For example, the environment may be terminated if the episode has
+        timed out (i.e. reached max episode length).
+        """
+        return self._truncated_buf
+
+    @property
+    def terminated(self) -> torch.Tensor:
+        """The terminated signal (reaching a terminal state). Shape is ``(num_envs,)``.
+
+        This signal is set to true if the environment has reached a terminal state defined by the environment.
+        This state may correspond to task success, task failure, robot falling, etc.
+        """
+        return self._terminated_buf

    """
    Operations.
@@ -122,20 +148,20 @@ class TerminationManager(ManagerBase):
            The combined termination signal of shape ``(num_envs,)``.
        """
        # reset computation
-        self._done_buf[:] = False
-        self._time_out_buf[:] = False
+        self._truncated_buf[:] = False
+        self._terminated_buf[:] = False
        # iterate over all the termination terms
        for name, term_cfg in zip(self._term_names, self._term_cfgs):
            value = term_cfg.func(self._env, **term_cfg.params)
-            # update total termination
-            self._done_buf |= value
            # store timeout signal separately
            if term_cfg.time_out:
-                self._time_out_buf |= value
+                self._truncated_buf |= value
+            else:
+                self._terminated_buf |= value
            # add to episode dones
            self._episode_dones[name] |= value
-        # return termination signal
-        return self._done_buf
+        # return combined termination signal
+        return self._truncated_buf | self._terminated_buf

    """
    Operations - Term settings.

--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sim/simulation_context.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sim/simulation_context.py
@@ -292,13 +292,13 @@ class SimulationContext(_SimulationContext):
                # hide the viewport and disable updates
                self._viewport_context.updates_enabled = False  # pyright: ignore [reportOptionalMemberAccess]
                self._viewport_window.visible = False  # pyright: ignore [reportOptionalMemberAccess]
-                # reset the throttle counter
-                self._render_throttle_counter = 0
            elif mode == self.RenderMode.NO_RENDERING:
                # hide the viewport and disable updates
                if self._viewport_context is not None:
                    self._viewport_context.updates_enabled = False  # pyright: ignore [reportOptionalMemberAccess]
                    self._viewport_window.visible = False  # pyright: ignore [reportOptionalMemberAccess]
+                # reset the throttle counter
+                self._render_throttle_counter = 0
            else:
                raise ValueError(f"Unsupported render mode: {mode}! Please check `RenderMode` for details.")
            # update render mode
@@ -403,14 +403,21 @@ class SimulationContext(_SimulationContext):
            self._render_throttle_counter += 1
            if self._render_throttle_counter % self._render_throttle_period == 0:
                self._render_throttle_counter = 0
-                # here we don't render viewport so don't need to flush flatcache
-                super().render()
+                # here we don't render viewport so don't need to flush fabric data
+                # note: we don't call super().render() anymore because they do flush the fabric data
+                self.set_setting("/app/player/playSimulations", False)
+                self._app.update()
+                self.set_setting("/app/player/playSimulations", True)
        else:
-            # manually flush the flatcache data to update Hydra textures
+            # manually flush the fabric data to update Hydra textures
            if self._fabric_iface is not None:
                self._fabric_iface.update(0.0, 0.0)
            # render the simulation
-            super().render()
+            # note: we don't call super().render() anymore because they do above operation inside
+            #  and we don't want to do it twice. We may remove it once we drop support for Isaac Sim 2022.2.
+            self.set_setting("/app/player/playSimulations", False)
+            self._app.update()
+            self.set_setting("/app/player/playSimulations", True)

    """
    Operations - Override (extension)

--- a/source/extensions/omni.isaac.orbit/setup.py
+++ b/source/extensions/omni.isaac.orbit/setup.py
@@ -25,18 +25,17 @@ INSTALL_REQUIRES = [
    # devices
    "hidapi",
    # gym
-    "gym==0.21.0",
-    "importlib-metadata~=4.13.0",
-    "setuptools<=66",  # setuptools 67.0 breaks gym
+    "gymnasium==0.29.0",
    # procedural-generation
    "trimesh",
-    "pyglet==1.5.27",  # pyglet 2.0 requires python 3.8
+    "pyglet==1.5.27; python_version < '3.8'",  # pyglet 2.0 requires python 3.8
+    "pyglet; python_version >= '3.8'",
 ]

 # Installation operation
 setup(
    name="omni-isaac-orbit",
-    author="NVIDIA, ETH Zurich, and University of Toronto",
+    author="ORBIT Project Developers",
    maintainer="Mayank Mittal",
    maintainer_email="mittalma@ethz.ch",
    url=EXTENSION_TOML_DATA["package"]["repository"],
@@ -48,6 +47,10 @@ setup(
    python_requires=">=3.7",
    install_requires=INSTALL_REQUIRES,
    packages=["omni.isaac.orbit"],
-    classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
+    classifiers=[
+        "Natural Language :: English",
+        "Programming Language :: Python :: 3.10",
+        "Isaac Sim :: 2023.1.0-hotfix.1",
+    ],
    zip_safe=False,
 )
--- a/source/extensions/omni.isaac.orbit/test/deps/test_torch.py
+++ b/source/extensions/omni.isaac.orbit/test/deps/test_torch.py
@@ -6,6 +6,7 @@
 from __future__ import annotations

 import torch
+import torch.utils.benchmark as benchmark
 import unittest


@@ -124,6 +125,30 @@ class TestTorchOperations(unittest.TestCase):
        my_slice = my_tensor[torch.tensor([0, 1]), ...]
        self.assertNotEqual(my_slice.untyped_storage().data_ptr(), my_tensor.untyped_storage().data_ptr())

+    def test_logical_or(self):
+        """Test bitwise or operation."""
+
+        size = (400, 300, 5)
+        my_tensor_1 = torch.rand(size, device="cuda:0") > 0.5
+        my_tensor_2 = torch.rand(size, device="cuda:0") < 0.5
+
+        # check the speed of logical or
+        timer_logical_or = benchmark.Timer(
+            stmt="torch.logical_or(my_tensor_1, my_tensor_2)",
+            globals={"my_tensor_1": my_tensor_1, "my_tensor_2": my_tensor_2},
+        )
+        timer_bitwise_or = benchmark.Timer(
+            stmt="my_tensor_1 | my_tensor_2", globals={"my_tensor_1": my_tensor_1, "my_tensor_2": my_tensor_2}
+        )
+
+        print("Time for logical or:", timer_logical_or.timeit(number=1000))
+        print("Time for bitwise or:", timer_bitwise_or.timeit(number=1000))
+        # check that logical or works as expected
+        output_logical_or = torch.logical_or(my_tensor_1, my_tensor_2)
+        output_bitwise_or = my_tensor_1 | my_tensor_2
+
+        self.assertTrue(torch.allclose(output_logical_or, output_bitwise_or))
+

 if __name__ == "__main__":
    unittest.main()
--- a/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
+++ b/source/extensions/omni.isaac.orbit_tasks/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.5.0"
+version = "0.5.1"

 # Description
 title = "ORBIT Environments"

--- a/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.orbit_tasks/docs/CHANGELOG.rst
 Changelog
 ---------

+0.5.1 (2023-11-04)
+~~~~~~~~~~~~~~~~~~
+
+Fixed
+^^^^^
+
+* Fixed the wrappers to different learning frameworks to use the new :class:`omni.isaac.orbit_tasks.RLTaskEnv` class.
+  The :class:`RLTaskEnv` class inherits from the :class:`gymnasium.Env` class (Gym 0.29.0).
+* Fixed the registration of tasks in the Gym registry based on Gym 0.29.0 API.
+
+Changed
+^^^^^^^
+
+* Removed the inheritance of all the RL-framework specific wrappers from the :class:`gymnasium.Wrapper` class.
+  This is because the wrappers don't comply with the new Gym 0.29.0 API. The wrappers are now only inherit
+  from their respective RL-framework specific base classes.
+
+
 0.5.0 (2023-10-30)
 ~~~~~~~~~~~~~~~~~~


--- a/source/extensions/omni.isaac.orbit_tasks/docs/README.md
+++ b/source/extensions/omni.isaac.orbit_tasks/docs/README.md
@@ -17,28 +17,31 @@ This looks like as follows:
 omni/isaac/orbit_tasks/locomotion/
 ├── __init__.py
 └── velocity
-    ├── a1
-    │   └── flat_terrain_cfg.py
-    ├── anymal_c
-    │   └── flat_terrain_cfg.py
+    ├── config
+    │   └── anymal_c
+    │       ├── agent  # <- this is where we store the learning agent configurations
+    │       ├── __init__.py  # <- this is where we register the environment and configurations to gym registry
+    │       ├── flat_env_cfg.py
+    │       └── rough_env_cfg.py
    ├── __init__.py
-    ├── velocity_cfg.py
-    └── velocity_env.py
+    └── velocity_env_cfg.py  # <- this is the base task configuration
 ```

-The environments are then registered in the `omni/isaac/orbit_tasks/__init__.py`:
+The environments are then registered in the `omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_c/__init__.py`:

 ```python
 gym.register(
-    id="Isaac-Velocity-Anymal-C-v0",
-    entry_point="omni.isaac.orbit_tasks.locomotion.velocity:LocomotionEnv",
-    kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.locomotion.velocity.anymal_c.flat_terrain_cfg:FlatTerrainCfg"},
+    id="Isaac-Velocity-Rough-Anymal-C-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={"env_cfg_entry_point": f"{__name__}.rough_env_cfg:AnymalCRoughEnvCfg"},
 )

 gym.register(
-    id="Isaac-Velocity-A1-v0",
-    entry_point="omni.isaac.orbit_tasks.locomotion.velocity:LocomotionEnv",
-    kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.locomotion.velocity.a1.flat_terrain_cfg:FlatTerrainCfg"},
+    id="Isaac-Velocity-Flat-Anymal-C-v0",
+    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
+    kwargs={"env_cfg_entry_point": f"{__name__}.flat_env_cfg:AnymalCFlatEnvCfg"},
 )
 ```


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
@@ -9,7 +9,7 @@
 We use OpenAI Gym registry to register the environment and their default configuration file.
 The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
 The string is parsed into respective configuration container which needs to be passed to the environment
-class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
+class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
 :mod:`omni.isaac.orbit.utils.parse_cfg`.

 Note:
@@ -18,12 +18,12 @@ Note:
    the kwarg argument :obj:`cfg` while creating the environment.

 Usage:
-    >>> import gym
+    >>> import gymnasium as gym
    >>> import omni.isaac.orbit_tasks
-    >>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
+    >>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry
    >>>
    >>> task_name = "Isaac-Cartpole-v0"
-    >>> cfg = load_default_env_cfg(task_name)
+    >>> cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
    >>> env = gym.make(task_name, cfg=cfg)
 """


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/ant/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/ant/__init__.py
@@ -7,7 +7,7 @@
 Ant locomotion environment (similar to OpenAI Gym Ant-v2).
 """

-import gym
+import gymnasium as gym

 from . import agents


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/ant/ant_env.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/ant/ant_env.py
@@ -5,7 +5,7 @@

 from __future__ import annotations

-import gym.spaces
+import gymnasium as gym
 import math
 import torch


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/__init__.py
@@ -7,7 +7,7 @@
 Cartpole balancing environment.
 """

-import gym
+import gymnasium as gym

 from . import agents


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env.py
@@ -5,7 +5,7 @@

 from __future__ import annotations

-import gym.spaces
+import gymnasium as gym
 import math
 import torch


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/humanoid/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/humanoid/__init__.py
@@ -7,7 +7,7 @@
 Humanoid locomotion environment (similar to OpenAI Gym Humanoid-v2).
 """

-import gym
+import gymnasium as gym

 from . import agents


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/humanoid/humanoid_env.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/humanoid/humanoid_env.py
@@ -5,7 +5,7 @@

 from __future__ import annotations

-import gym.spaces
+import gymnasium as gym
 import math
 import torch


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_b/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_b/__init__.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-import gym
+import gymnasium as gym

 from . import agents, flat_env_cfg, rough_env_cfg

@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-B-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalBFlatEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBFlatPPORunnerCfg,
@@ -23,6 +24,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-B-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalBFlatEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBFlatPPORunnerCfg,
@@ -32,6 +34,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-B-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalBRoughEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBRoughPPORunnerCfg,
@@ -41,6 +44,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-B-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalBRoughEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBRoughPPORunnerCfg,

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_c/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_c/__init__.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-import gym
+import gymnasium as gym

 from . import agents, flat_env_cfg, rough_env_cfg

@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-C-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalCFlatEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCFlatPPORunnerCfg,
@@ -24,6 +25,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-C-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalCFlatEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCFlatPPORunnerCfg,
@@ -33,6 +35,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-C-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalCRoughEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCRoughPPORunnerCfg,
@@ -42,6 +45,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-C-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalCRoughEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCRoughPPORunnerCfg,

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_d/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_d/__init__.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-import gym
+import gymnasium as gym

 from . import agents, flat_env_cfg, rough_env_cfg

@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-D-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalDFlatEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDFlatPPORunnerCfg,
@@ -23,6 +24,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Flat-Anymal-D-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.AnymalDFlatEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDFlatPPORunnerCfg,
@@ -32,6 +34,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-D-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalDRoughEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDRoughPPORunnerCfg,
@@ -41,6 +44,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Anymal-D-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.AnymalDRoughEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDRoughPPORunnerCfg,

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/unitree_a1/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/config/unitree_a1/__init__.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-import gym
+import gymnasium as gym

 from . import agents, flat_env_cfg, rough_env_cfg

@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
 gym.register(
    id="Isaac-Velocity-Flat-Unitree-A1-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.UnitreeA1FlatEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1FlatPPORunnerCfg,
@@ -23,6 +24,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Flat-Unitree-A1-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": flat_env_cfg.UnitreeA1FlatEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1FlatPPORunnerCfg,
@@ -32,6 +34,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Unitree-A1-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.UnitreeA1RoughEnvCfg,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1RoughPPORunnerCfg,
@@ -41,6 +44,7 @@ gym.register(
 gym.register(
    id="Isaac-Velocity-Rough-Unitree-A1-Play-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": rough_env_cfg.UnitreeA1RoughEnvCfg_PLAY,
        "rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1RoughPPORunnerCfg,

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/velocity_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/locomotion/velocity/velocity_env_cfg.py
@@ -65,7 +65,7 @@ class MySceneCfg(InteractiveSceneCfg):
        offset=RayCasterCfg.OffsetCfg(pos=(0.0, 0.0, 20.0)),
        attach_yaw_only=True,
        pattern_cfg=patterns.GridPatternCfg(resolution=0.1, size=[1.6, 1.0]),
-        debug_vis=True,
+        debug_vis=False,
        mesh_prim_paths=["/World/ground"],
    )
    contact_forces = ContactSensorCfg(prim_path="{ENV_REGEX_NS}/Robot/.*", history_length=3, track_air_time=True)

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/__init__.py
@@ -7,7 +7,7 @@
 Environment for lifting an object with fixed-base robot.
 """

-import gym
+import gymnasium as gym

 from . import agents

@@ -18,6 +18,7 @@ from . import agents
 gym.register(
    id="Isaac-Lift-Franka-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.lift_env_cfg:LiftEnvCfg",
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/lift_env.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/lift_env.py
@@ -5,7 +5,7 @@

 from __future__ import annotations

-import gym.spaces
+import gymnasium as gym
 import math
 import torch


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/__init__.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/__init__.py
@@ -5,7 +5,7 @@

 """Environment for end-effector pose tracking task for fixed-arm robots."""

-import gym
+import gymnasium as gym

 from . import agents

@@ -16,6 +16,7 @@ from . import agents
 gym.register(
    id="Isaac-Reach-Franka-v0",
    entry_point="omni.isaac.orbit.envs:RLTaskEnv",
+    disable_env_checker=True,
    kwargs={
        "env_cfg_entry_point": f"{__name__}.reach_env_cfg:ReachEnvCfg",
        "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/reach_env.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/reach_env.py
@@ -5,7 +5,7 @@

 from __future__ import annotations

-import gym.spaces
+import gymnasium as gym
 import math
 import torch


--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/parse_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/parse_cfg.py
@@ -7,7 +7,7 @@

 from __future__ import annotations

-import gym
+import gymnasium as gym
 import importlib
 import inspect
 import os
@@ -52,7 +52,7 @@ def load_cfg_from_registry(task_name: str, entry_point_key: str) -> dict | Any:
        ValueError: If the entry point key is not available in the gym registry for the task.
    """
    # obtain the configuration entry point
-    cfg_entry_point = gym.spec(task_name)._kwargs.pop(entry_point_key)
+    cfg_entry_point = gym.spec(task_name).kwargs.pop(entry_point_key)
    # check if entry point exists
    if cfg_entry_point is None:
        raise ValueError(

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rl_games.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rl_games.py
@@ -33,7 +33,7 @@ for RL-Games :class:`Runner` class:

 from __future__ import annotations

-import gym
+import gymnasium as gym
 import torch

 from rl_games.common import env_configurations
@@ -49,10 +49,10 @@ Vectorized environment wrapper.
 """


-class RlGamesVecEnvWrapper(gym.Wrapper):
-    """Wraps around Isaac Orbit environment for RL-Games.
+class RlGamesVecEnvWrapper(IVecEnv):
+    """Wraps around Orbit environment for RL-Games.

-    This class wraps around the Isaac Orbit environment. Since RL-Games works directly on
+    This class wraps around the Orbit environment. Since RL-Games works directly on
    GPU buffers, the wrapper handles moving of buffers from the simulation environment
    to the same device as the learning agent. Additionally, it performs clipping of
    observations and actions.
@@ -69,6 +69,13 @@ class RlGamesVecEnvWrapper(gym.Wrapper):
    checks if these attributes exist. If they don't then the wrapper defaults to zero as number
    of privileged observations.

+    .. caution::
+
+        This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
+        the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
+        wrapper.
+
+
    Reference:
        https://github.com/Denys88/rl_games/blob/master/rl_games/common/ivecenv.py
        https://github.com/NVIDIA-Omniverse/IsaacGymEnvs
@@ -85,30 +92,77 @@ class RlGamesVecEnvWrapper(gym.Wrapper):

        Raises:
            ValueError: The environment is not inherited from :class:`RLTaskEnv`.
+            ValueError: If specified, the privileged observations (critic) are not of type :obj:`gym.spaces.Box`.
        """
        # check that input is valid
        if not isinstance(env.unwrapped, RLTaskEnv):
            raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
-        # initialize gym wrapper
-        gym.Wrapper.__init__(self, env)
-        # initialize rl-games vec-env
-        IVecEnv.__init__(self)
+        # initialize the wrapper
+        self.env = env
        # store provided arguments
        self._rl_device = rl_device
        self._clip_obs = clip_obs
        self._clip_actions = clip_actions
+        self._sim_device = env.unwrapped.device
+
        # information about spaces for the wrapper
-        self.observation_space = self.env.observation_space
-        self.action_space = self.env.action_space
+        # note: rl-games only wants single observation and action spaces
+        self.rlg_observation_space = self.unwrapped.single_observation_space["policy"]
+        self.rlg_action_space = self.unwrapped.single_action_space
        # information for privileged observations
-        self.state_space = getattr(self.env, "state_space", None)
-        self.num_states = getattr(self.env, "num_states", 0)
-        # print information about wrapper
-        print("[INFO]: RL-Games Environment Wrapper:")
-        print(f"\t\t Observations clipping: {clip_obs}")
-        print(f"\t\t Actions clipping     : {clip_actions}")
-        print(f"\t\t Agent device         : {rl_device}")
-        print(f"\t\t Asymmetric-learning  : {self.num_states != 0}")
+        self.rlg_state_space = self.unwrapped.single_observation_space.get("critic")
+        if self.rlg_state_space is not None:
+            if not isinstance(self.rlg_state_space, gym.spaces.Box):
+                raise ValueError(f"Privileged observations must be of type Box. Type: {type(self.rlg_state_space)}")
+            self.rlg_num_states = self.rlg_state_space.shape[0]
+        else:
+            self.rlg_num_states = 0
+
+    def __str__(self):
+        """Returns the wrapper name and the :attr:`env` representation string."""
+        return (
+            f"<{type(self).__name__}{self.env}>"
+            f"\n\tObservations clipping: {self._clip_obs}"
+            f"\n\tActions clipping     : {self._clip_actions}"
+            f"\n\tAgent device         : {self._rl_device}"
+            f"\n\tAsymmetric-learning  : {self.rlg_num_states != 0}"
+        )
+
+    def __repr__(self):
+        """Returns the string representation of the wrapper."""
+        return str(self)
+
+    """
+    Properties -- Gym.Wrapper
+    """
+
+    @property
+    def render_mode(self) -> str | None:
+        """Returns the :attr:`Env` :attr:`render_mode`."""
+        return self.env.render_mode
+
+    @property
+    def observation_space(self) -> gym.Space:
+        """Returns the :attr:`Env` :attr:`observation_space`."""
+        return self.env.observation_space
+
+    @property
+    def action_space(self) -> gym.Space:
+        """Returns the :attr:`Env` :attr:`action_space`."""
+        return self.env.action_space
+
+    @classmethod
+    def class_name(cls) -> str:
+        """Returns the class name of the wrapper."""
+        return cls.__name__
+
+    @property
+    def unwrapped(self) -> RLTaskEnv:
+        """Returns the base environment of the wrapper.
+
+        This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
+        """
+        return self.env.unwrapped

    """
    Properties
@@ -120,40 +174,46 @@ class RlGamesVecEnvWrapper(gym.Wrapper):

    def get_env_info(self) -> dict:
        """Returns the Gym spaces for the environment."""
-        # fill the env info dict
-        env_info = {"observation_space": self.observation_space, "action_space": self.action_space}
-        # add information about privileged observations space
-        if self.num_states > 0:
-            env_info["state_space"] = self.state_space
-
-        return env_info
+        return {
+            "observation_space": self.rlg_observation_space,
+            "action_space": self.rlg_action_space,
+            "state_space": self.rlg_state_space,
+        }

    """
    Operations - MDP
    """

+    def seed(self, seed: int = -1) -> int:  # noqa: D102
+        return self.unwrapped.seed(seed)
+
    def reset(self):  # noqa: D102
-        obs_dict = self.env.reset()
+        obs_dict, _ = self.env.reset()
        # process observations and states
        return self._process_obs(obs_dict)

    def step(self, actions):  # noqa: D102
+        # move actions to sim-device
+        actions = actions.detach().clone().to(device=self._sim_device)
        # clip the actions
-        actions = torch.clamp(actions.clone(), -self._clip_actions, self._clip_actions)
+        actions = torch.clamp(actions, -self._clip_actions, self._clip_actions)
        # perform environment step
-        obs_dict, rew, dones, extras = self.env.step(actions)
+        obs_dict, rew, terminated, truncated, extras = self.env.step(actions)
        # process observations and states
        obs_and_states = self._process_obs(obs_dict)
        # move buffers to rl-device
        # note: we perform clone to prevent issues when rl-device and sim-device are the same.
-        rew = rew.to(self._rl_device)
-        dones = dones.to(self._rl_device)
+        rew = rew.to(device=self._rl_device)
+        dones = (terminated | truncated).to(device=self._rl_device)
        extras = {
            k: v.to(device=self._rl_device, non_blocking=True) if hasattr(v, "to") else v for k, v in extras.items()
        }

        return obs_and_states, rew, dones, extras

+    def close(self):  # noqa: D102
+        return self.env.close()
+
    """
    Helper functions
    """
@@ -163,34 +223,29 @@ class RlGamesVecEnvWrapper(gym.Wrapper):

        Note:
            States typically refers to privileged observations for the critic function. It is typically used in
-            asymmetric actor-critic algorithms [1].
+            asymmetric actor-critic algorithms.

        Args:
-            obs: The current observations from environment.
+            obs_dict: The current observations from environment.

        Returns:
-            If environment provides states, then a dictionary
-                containing the observations and states is returned. Otherwise just the observations tensor
-                is returned.
-
-        Reference:
-            1. Pinto, Lerrel, et al. "Asymmetric actor critic for image-based robot learning."
-               arXiv preprint arXiv:1710.06542 (2017).
+            If environment provides states, then a dictionary containing the observations and states is returned.
+            Otherwise just the observations tensor is returned.
        """
        # process policy obs
        obs = obs_dict["policy"]
        # clip the observations
        obs = torch.clamp(obs, -self._clip_obs, self._clip_obs)
        # move the buffer to rl-device
-        obs = obs.to(self._rl_device).clone()
+        obs = obs.to(device=self._rl_device).clone()

        # check if asymmetric actor-critic or not
-        if self.num_states > 0:
+        if self.rlg_num_states > 0:
            # acquire states from the environment if it exists
            try:
                states = obs_dict["critic"]
            except AttributeError:
-                raise NotImplementedError("Environment does not define key `critic` for privileged observations.")
+                raise NotImplementedError("Environment does not define key 'critic' for privileged observations.")
            # clip the states
            states = torch.clamp(states, -self._clip_obs, self._clip_obs)
            # move buffers to rl-device

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/rsl_rl/vecenv_wrapper.py
@@ -17,22 +17,28 @@ The following example shows how to wrap an environment for RSL-RL:

 from __future__ import annotations

-import gym
-import gym.spaces
+import gymnasium as gym
 import torch

+from rsl_rl.env import VecEnv
+
 from omni.isaac.orbit.envs import RLTaskEnv


-class RslRlVecEnvWrapper(gym.Wrapper):
-    """Wraps around Isaac Orbit environment for RSL-RL library
+class RslRlVecEnvWrapper(VecEnv):
+    """Wraps around Orbit environment for RSL-RL library
+
+    To use asymmetric actor-critic, the environment instance must have the attributes :attr:`num_privileged_obs` (int).
+    This is used by the learning agent to allocate buffers in the trajectory memory. Additionally, the returned
+    observations should have the key "critic" which corresponds to the privileged observations. Since this is
+    optional for some environments, the wrapper checks if these attributes exist. If they don't then the wrapper
+    defaults to zero as number of privileged observations.
+
+    .. caution::

-    To use asymmetric actor-critic, the environment instance must have the attributes :attr:`num_states` (int)
-    and :attr:`state_space` (:obj:`gym.spaces.Box`). These are used by the learning agent to allocate buffers in
-    the trajectory memory. Additionally, the method :meth:`_get_observations()` should have the key "critic"
-    which corresponds to the privileged observations. Since this is optional for some environments, the wrapper
-    checks if these attributes exist. If they don't then the wrapper defaults to zero as number of privileged
-    observations.
+        This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
+        the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
+        wrapper.

    Reference:
        https://github.com/leggedrobotics/rsl_rl/blob/master/rsl_rl/env/vec_env.py
@@ -41,6 +47,9 @@ class RslRlVecEnvWrapper(gym.Wrapper):
    def __init__(self, env: RLTaskEnv):
        """Initializes the wrapper.

+        Note:
+            The wrapper calls :meth:`reset` at the start since the RSL-RL runner does not call reset.
+
        Args:
            env: The environment to wrap around.

@@ -51,28 +60,74 @@ class RslRlVecEnvWrapper(gym.Wrapper):
        if not isinstance(env.unwrapped, RLTaskEnv):
            raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
        # initialize the wrapper
-        gym.Wrapper.__init__(self, env)
+        self.env = env
        # store information required by wrapper
-        orbit_env: RLTaskEnv = self.env.unwrapped
-        self.num_envs = orbit_env.num_envs
-        self.num_actions = orbit_env.action_manager.total_action_dim
-        self.num_obs = orbit_env.observation_manager.group_obs_dim["policy"][0]
+        self.num_envs = self.unwrapped.num_envs
+        self.device = self.unwrapped.device
+        self.max_episode_length = self.unwrapped.max_episode_length
+        self.num_actions = self.unwrapped.action_manager.total_action_dim
+        self.num_obs = self.unwrapped.observation_manager.group_obs_dim["policy"][0]
+        # -- privileged observations
+        if "critic" in self.unwrapped.observation_manager.group_obs_dim:
+            self.num_privileged_obs = self.unwrapped.observation_manager.group_obs_dim["critic"][0]
+        else:
+            self.num_privileged_obs = 0
        # reset at the start since the RSL-RL runner does not call reset
        self.env.reset()

+    def __str__(self):
+        """Returns the wrapper name and the :attr:`env` representation string."""
+        return f"<{type(self).__name__}{self.env}>"
+
+    def __repr__(self):
+        """Returns the string representation of the wrapper."""
+        return str(self)
+
+    """
+    Properties -- Gym.Wrapper
+    """
+
+    @property
+    def render_mode(self) -> str | None:
+        """Returns the :attr:`Env` :attr:`render_mode`."""
+        return self.env.render_mode
+
+    @property
+    def observation_space(self) -> gym.Space:
+        """Returns the :attr:`Env` :attr:`observation_space`."""
+        return self.env.observation_space
+
+    @property
+    def action_space(self) -> gym.Space:
+        """Returns the :attr:`Env` :attr:`action_space`."""
+        return self.env.action_space
+
+    @classmethod
+    def class_name(cls) -> str:
+        """Returns the class name of the wrapper."""
+        return cls.__name__
+
+    @property
+    def unwrapped(self) -> RLTaskEnv:
+        """Returns the base environment of the wrapper.
+
+        This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
+        """
+        return self.env.unwrapped
+
    """
    Properties
    """

-    def get_observations(self) -> torch.Tensor:
+    def get_observations(self) -> tuple[torch.Tensor, dict]:
        """Returns the current observations of the environment."""
-        obs_dict = self.env.unwrapped.observation_manager.compute()
+        obs_dict = self.unwrapped.observation_manager.compute()
        return obs_dict["policy"], {"observations": obs_dict}

    @property
    def episode_length_buf(self) -> torch.Tensor:
        """The episode length buffer."""
-        return self.env.unwrapped.episode_length_buf
+        return self.unwrapped.episode_length_buf

    @episode_length_buf.setter
    def episode_length_buf(self, value: torch.Tensor):
@@ -80,22 +135,34 @@ class RslRlVecEnvWrapper(gym.Wrapper):

        Note: This is needed to perform random initialization of episode lengths in RSL-RL.
        """
-        self.env.unwrapped.episode_length_buf = value
+        self.unwrapped.episode_length_buf = value

    """
    Operations - MDP
    """

-    def reset(self) -> tuple[torch.Tensor, dict]:
+    def seed(self, seed: int = -1) -> int:  # noqa: D102
+        return self.unwrapped.seed(seed)
+
+    def reset(self) -> tuple[torch.Tensor, dict]:  # noqa: D102
        # reset the environment
-        obs_dict = self.env.reset()
+        obs_dict, _ = self.env.reset()
        # return observations
        return obs_dict["policy"], {"observations": obs_dict}

    def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, dict]:
        # record step information
-        obs_dict, rew, dones, extras = self.env.step(actions)
-        # return step information
+        obs_dict, rew, terminated, truncated, extras = self.env.step(actions)
+        # compute dones for compatibility with RSL-RL
+        dones = (terminated | truncated).to(dtype=torch.long)
+        # move extra observations to the extras dict
        obs = obs_dict["policy"]
        extras["observations"] = obs_dict
+        # move time out information to the extras dict
+        extras["time_outs"] = truncated
+
+        # return the step information
        return obs, rew, dones, extras
+
+    def close(self):  # noqa: D102
+        return self.env.close()
--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/sb3.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/sb3.py
@@ -17,7 +17,6 @@ The following example shows how to wrap an environment for Stable-Baselines3:

 from __future__ import annotations

-import gym
 import numpy as np
 import torch
 from typing import Any
@@ -65,8 +64,8 @@ Vectorized environment wrapper.
 """


-class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
-    """Wraps around Isaac Orbit environment for Stable Baselines3.
+class Sb3VecEnvWrapper(VecEnv):
+    """Wraps around Orbit environment for Stable Baselines3.

    Isaac Sim internally implements a vectorized environment. However, since it is
    still considered a single environment instance, Stable Baselines tries to wrap
@@ -74,10 +73,15 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
    is not inheriting from their :class:`VecEnv`. Thus, this class thinly wraps
    over the environment from :class:`RLTaskEnv`.

+    Note:
+        While Stable-Baselines3 supports Gym 0.26+ API, their vectorized environment
+        still uses the old API (i.e. it is closer to Gym 0.21). Thus, we implement
+        the old API for the vectorized environment.
+
    We also add monitoring functionality that computes the un-discounted episode
    return and length. This information is added to the info dicts under key `episode`.

-    In contrast to Isaac Orbit environment, stable-baselines expect the following:
+    In contrast to the Orbit environment, stable-baselines expect the following:

    1. numpy datatype for MDP signals
    2. a list of info dicts for each sub-environment (instead of a dict)
@@ -85,16 +89,24 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
       to the one after reset. The "real" final observation is passed using the info dicts
       under the key ``terminal_observation``.

-    Warning:
+    .. warning::
+
        By the nature of physics stepping in Isaac Sim, it is not possible to forward the
-        simulation buffers without performing a physics step. Thus, reset is performed only
-        at the start of :meth:`step()` function before the actual physics step is taken.
-        Thus, the returned observations for terminated environments is still the final
-        observation and not the ones after the reset.
+        simulation buffers without performing a physics step. Thus, reset is performed
+        inside the :meth:`step()` function after the actual physics step is taken.
+        Thus, the returned observations for terminated environments is the one after the reset.
+
+    .. caution::
+
+        This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
+        the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
+        wrapper.

    Reference:
-        https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html
-        https://stable-baselines3.readthedocs.io/en/master/common/monitor.html
+
+    1. https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html
+    2. https://stable-baselines3.readthedocs.io/en/master/common/monitor.html
+
    """

    def __init__(self, env: RLTaskEnv):
@@ -110,12 +122,43 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
        if not isinstance(env.unwrapped, RLTaskEnv):
            raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
        # initialize the wrapper
-        gym.Wrapper.__init__(self, env)
+        self.env = env
+        # collect common information
+        self.num_envs = self.unwrapped.num_envs
+        self.sim_device = self.unwrapped.device
+        self.render_mode = self.unwrapped.render_mode
        # initialize vec-env
-        VecEnv.__init__(self, self.env.num_envs, self.env.observation_space, self.env.action_space)
+        observation_space = self.unwrapped.single_observation_space["policy"]
+        action_space = self.unwrapped.single_action_space
+        VecEnv.__init__(self, self.num_envs, observation_space, action_space)
        # add buffer for logging episodic information
-        self._ep_rew_buf = torch.zeros(self.env.num_envs, dtype=torch.float, device=self.env.device)
-        self._ep_len_buf = torch.zeros(self.env.num_envs, dtype=torch.float, device=self.env.device)
+        self._ep_rew_buf = torch.zeros(self.num_envs, device=self.sim_device)
+        self._ep_len_buf = torch.zeros(self.num_envs, device=self.sim_device)
+
+    def __str__(self):
+        """Returns the wrapper name and the :attr:`env` representation string."""
+        return f"<{type(self).__name__}{self.env}>"
+
+    def __repr__(self):
+        """Returns the string representation of the wrapper."""
+        return str(self)
+
+    """
+    Properties -- Gym.Wrapper
+    """
+
+    @classmethod
+    def class_name(cls) -> str:
+        """Returns the class name of the wrapper."""
+        return cls.__name__
+
+    @property
+    def unwrapped(self) -> RLTaskEnv:
+        """Returns the base environment of the wrapper.
+
+        This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
+        """
+        return self.env.unwrapped

    """
    Properties
@@ -133,31 +176,43 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
    Operations - MDP
    """

+    def seed(self, seed: int | None = None) -> list[int | None]:  # noqa: D102
+        return [self.unwrapped.seed(seed)] * self.unwrapped.num_envs
+
    def reset(self) -> VecEnvObs:  # noqa: D102
-        obs_dict = self.env.reset()
+        obs_dict, _ = self.env.reset()
        # convert data types to numpy depending on backend
        return self._process_obs(obs_dict)

-    def step(self, actions: np.ndarray) -> VecEnvStepReturn:  # noqa: D102
+    def step_async(self, actions):  # noqa: D102
        # convert input to numpy array
-        actions = np.asarray(actions)
+        if not isinstance(actions, torch.Tensor):
+            actions = np.asarray(actions)
+            actions = torch.from_numpy(actions).to(device=self.sim_device, dtype=torch.float32)
+        else:
+            actions = actions.to(device=self.sim_device, dtype=torch.float32)
        # convert to tensor
-        actions = torch.from_numpy(actions).to(device=self.env.device)
-        # record step information
-        obs_dict, rew, dones, extras = self.env.step(actions)
+        self._async_actions = actions

+    def step_wait(self) -> VecEnvStepReturn:  # noqa: D102
+        # record step information
+        obs_dict, rew, terminated, truncated, extras = self.env.step(self._async_actions)
        # update episode un-discounted return and length
        self._ep_rew_buf += rew
        self._ep_len_buf += 1
+        # compute reset ids
+        dones = terminated | truncated
        reset_ids = (dones > 0).nonzero(as_tuple=False)

        # convert data types to numpy depending on backend
        # Note: RLTaskEnv uses torch backend (by default).
        obs = self._process_obs(obs_dict)
-        rew = rew.cpu().numpy()
-        dones = dones.cpu().numpy()
+        rew = rew.detach().cpu().numpy()
+        terminated = terminated.detach().cpu().numpy()
+        truncated = truncated.detach().cpu().numpy()
+        dones = dones.detach().cpu().numpy()
        # convert extra information to list of dicts
-        infos = self._process_extras(obs, dones, extras, reset_ids)
+        infos = self._process_extras(obs, terminated, truncated, extras, reset_ids)

        # reset info for terminated environments
        self._ep_rew_buf[reset_ids] = 0
@@ -165,59 +220,70 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):

        return obs, rew, dones, infos

-    """
-    Unused methods.
-    """
-
-    def step_async(self, actions):  # noqa: D102
-        self._async_actions = actions
+    def close(self):  # noqa: D102
+        self.env.close()

-    def step_wait(self):  # noqa: D102
-        return self.step(self._async_actions)
-
-    def get_attr(self, attr_name, indices):  # noqa: D102
-        raise NotImplementedError
+    def get_attr(self, attr_name, indices=None):  # noqa: D102
+        # resolve indices
+        if indices is None:
+            indices = slice(None)
+            num_indices = self.num_envs
+        else:
+            num_indices = len(indices)
+        # obtain attribute value
+        attr_val = getattr(self.env, attr_name)
+        # return the value
+        if not isinstance(attr_val, torch.Tensor):
+            return [attr_val] * num_indices
+        else:
+            return attr_val[indices].detach().cpu().numpy()

    def set_attr(self, attr_name, value, indices=None):  # noqa: D102
-        raise NotImplementedError
+        raise NotImplementedError("Setting attributes is not supported.")

    def env_method(self, method_name: str, *method_args, indices=None, **method_kwargs):  # noqa: D102
-        raise NotImplementedError
+        if method_name == "render":
+            # gymnasium does not support changing render mode at runtime
+            return self.env.render()
+        else:
+            # this isn't properly implemented but it is not necessary.
+            # mostly done for completeness.
+            env_method = getattr(self.env, method_name)
+            return env_method(*method_args, indices=indices, **method_kwargs)

    def env_is_wrapped(self, wrapper_class, indices=None):  # noqa: D102
-        raise NotImplementedError
+        raise NotImplementedError("Checking if environment is wrapped is not supported.")

    def get_images(self):  # noqa: D102
-        raise NotImplementedError
+        raise NotImplementedError("Getting images is not supported.")

    """
    Helper functions.
    """

-    def _process_obs(self, obs_dict) -> np.ndarray:
+    def _process_obs(self, obs_dict: torch.Tensor | dict[str, torch.Tensor]) -> np.ndarray | dict[str, np.ndarray]:
        """Convert observations into NumPy data type."""
        # Sb3 doesn't support asymmetric observation spaces, so we only use "policy"
        obs = obs_dict["policy"]
        # Note: RLTaskEnv uses torch backend (by default).
-        if self.env.sim.backend == "torch":
-            if isinstance(obs, dict):
-                for key, value in obs.items():
-                    obs[key] = value.detach().cpu().numpy()
-            else:
-                obs = obs.detach().cpu().numpy()
-        elif self.env.sim.backend == "numpy":
-            pass
+        if isinstance(obs, dict):
+            for key, value in obs.items():
+                obs[key] = value.detach().cpu().numpy()
+        elif isinstance(obs, torch.Tensor):
+            obs = obs.detach().cpu().numpy()
        else:
-            raise NotImplementedError(f"Unsupported backend for simulation: {self.env.sim.backend}")
+            raise NotImplementedError(f"Unsupported data type: {type(obs)}")
        return obs

-    def _process_extras(self, obs, dones, extras, reset_ids) -> list[dict[str, Any]]:
+    def _process_extras(
+        self, obs: np.ndarray, terminated: np.ndarray, truncated: np.ndarray, extras: dict, reset_ids: np.ndarray
+    ) -> list[dict[str, Any]]:
        """Convert miscellaneous information into dictionary for each sub-environment."""
        # create empty list of dictionaries to fill
-        infos: list[dict[str, Any]] = [dict.fromkeys(extras.keys()) for _ in range(self.env.num_envs)]
+        infos: list[dict[str, Any]] = [dict.fromkeys(extras.keys()) for _ in range(self.num_envs)]
        # fill-in information for each sub-environment
        # Note: This loop becomes slow when number of environments is large.
-        for idx in range(self.env.num_envs):
+        for idx in range(self.num_envs):
            # fill-in episode monitoring info
            if idx in reset_ids:
                infos[idx]["episode"] = dict()
@@ -225,14 +291,13 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
                infos[idx]["episode"]["l"] = float(self._ep_len_buf[idx])
            else:
                infos[idx]["episode"] = None
+            # fill-in bootstrap information
+            infos[idx]["TimeLimit.truncated"] = truncated[idx] and not terminated[idx]
            # fill-in information from extras
            for key, value in extras.items():
-                # 1. remap the key for time-outs for what SB3 expects
-                # 2. remap extra episodes information safely
-                # 3. for others just store their values
-                if key == "time_outs":
-                    infos[idx]["TimeLimit.truncated"] = bool(value[idx])
-                elif key == "episode":
+                # 1. remap extra episodes information safely
+                # 2. for others just store their values
+                if key == "log":
                    # only log this data for episodes that are terminated
                    if infos[idx]["episode"] is not None:
                        for sub_key, sub_value in value.items():
@@ -240,7 +305,7 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
                else:
                    infos[idx][key] = value[idx]
            # add information about terminal observation separately
-            if dones[idx] == 1:
+            if idx in reset_ids:
                # extract terminal observations
                if isinstance(obs, dict):
                    terminal_obs = dict.fromkeys(obs.keys())

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/skrl.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/utils/wrappers/skrl.py
@@ -93,9 +93,9 @@ Vectorized environment wrapper.


 def SkrlVecEnvWrapper(env: RLTaskEnv):
-    """Wraps around Isaac Orbit environment for skrl.
+    """Wraps around Orbit environment for skrl.

-    This function wraps around the Isaac Orbit environment. Since the :class:`RLTaskEnv` environment
+    This function wraps around the Orbit environment. Since the :class:`RLTaskEnv` environment
    wrapping functionality is defined within the skrl library itself, this implementation
    is maintained for compatibility with the structure of the extension that contains it.
    Internally it calls the :func:`wrap_env` from the skrl library API.

--- a/source/extensions/omni.isaac.orbit_tasks/setup.py
+++ b/source/extensions/omni.isaac.orbit_tasks/setup.py
@@ -22,18 +22,21 @@ INSTALL_REQUIRES = [
    "numpy",
    "torch",
    "torchvision>=0.14.1",  # ensure compatibility with torch 1.13.1
-    "protobuf==3.20.2",
+    "protobuf>=3.20.2",
    # data collection
    "h5py",
+    # basic logger
+    "tensorboard",
+    # video recording
+    "moviepy",
 ]

 # Extra dependencies for RL agents
 EXTRAS_REQUIRE = {
-    "sb3": ["stable-baselines3>=1.5,<=1.8", "tensorboard"],
+    "sb3": ["stable-baselines3>=2.0"],
    "skrl": ["skrl>=0.10.0"],
-    "rl_games": ["rl-games==1.5.2"],
-    # TODO: Uncomment when rsl_rl is updated to public.
-    # "rsl_rl": ["rsl_rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
+    "rl_games": ["rl-games==1.6.1"],
+    "rsl_rl": ["rsl_rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
    "robomimic": ["robomimic@git+https://github.com/ARISE-Initiative/robomimic.git"],
 }
 # cumulation of all extra-requires
@@ -43,7 +46,7 @@ EXTRAS_REQUIRE["all"] = list(itertools.chain.from_iterable(EXTRAS_REQUIRE.values
 # Installation operation
 setup(
    name="omni-isaac-orbit_tasks",
-    author="NVIDIA, ETH Zurich, and University of Toronto",
+    author="ORBIT Project Developers",
    maintainer="Mayank Mittal",
    maintainer_email="mittalma@ethz.ch",
    url=EXTENSION_TOML_DATA["package"]["repository"],
@@ -55,6 +58,10 @@ setup(
    install_requires=INSTALL_REQUIRES,
    extras_require=EXTRAS_REQUIRE,
    packages=["omni.isaac.orbit_tasks"],
-    classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
+    classifiers=[
+        "Natural Language :: English",
+        "Programming Language :: Python :: 3.10",
+        "Isaac Sim :: 2023.1.0-hotfix.1",
+    ],
    zip_safe=False,
 )
--- a/source/extensions/omni.isaac.orbit_tasks/test/test_environments.py
+++ b/source/extensions/omni.isaac.orbit_tasks/test/test_environments.py
@@ -20,8 +20,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
-import gym.envs
+import gymnasium as gym
 import torch
 import traceback
 import unittest
@@ -42,7 +41,7 @@ class TestEnvironments(unittest.TestCase):
    def setUpClass(cls):
        # acquire all Isaac environments names
        cls.registered_tasks = list()
-        for task_spec in gym.envs.registry.all():
+        for task_spec in gym.registry.values():
            if "Isaac" in task_spec.id:
                cls.registered_tasks.append(task_spec.id)
        # sort environments by name
@@ -70,19 +69,20 @@ class TestEnvironments(unittest.TestCase):
            env: RLTaskEnv = gym.make(task_name, cfg=env_cfg)

            # reset environment
-            obs = env.reset()
+            obs, _ = env.reset()
            # check signal
            self.assertTrue(self._check_valid_tensor(obs))

            # simulate environment for 1000 steps
-            for _ in range(1000):
-                # sample actions from -1 to 1
-                actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
-                # apply actions
-                transition = env.step(actions)
-                # check signals
-                for data in transition:
-                    self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
+            with torch.inference_mode():
+                for _ in range(1000):
+                    # sample actions from -1 to 1
+                    actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
+                    # apply actions
+                    transition = env.step(actions)
+                    # check signals
+                    for data in transition:
+                        self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")

            # close the environment
            print(f">>> Closing environment: {task_name}")
@@ -108,9 +108,9 @@ class TestEnvironments(unittest.TestCase):
            valid_tensor = True
            for value in data.values():
                if isinstance(value, dict):
-                    return TestEnvironments._check_valid_tensor(value)
+                    valid_tensor &= TestEnvironments._check_valid_tensor(value)
                elif isinstance(value, torch.Tensor):
-                    valid_tensor = valid_tensor and not torch.any(torch.isnan(value))
+                    valid_tensor &= not torch.any(torch.isnan(value))
            return valid_tensor
        else:
            raise ValueError(f"Input data of invalid type: {type(data)}.")

--- a/source/extensions/omni.isaac.orbit_tasks/test/test_record_video.py
+++ b/source/extensions/omni.isaac.orbit_tasks/test/test_record_video.py
@@ -19,7 +19,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import os
 import torch
 import traceback
@@ -42,7 +42,7 @@ class TestRecordVideoWrapper(unittest.TestCase):
    def setUpClass(cls):
        # acquire all Isaac environments names
        cls.registered_tasks = list()
-        for task_spec in gym.envs.registry.all():
+        for task_spec in gym.registry.values():
            if "Isaac" in task_spec.id:
                cls.registered_tasks.append(task_spec.id)
        # sort environments by name
@@ -73,25 +73,24 @@ class TestRecordVideoWrapper(unittest.TestCase):
            env_cfg.sim.shutdown_app_on_stop = False

            # create environment
-            env: RLTaskEnv = gym.make(task_name, cfg=env_cfg)
+            env: RLTaskEnv = gym.make(task_name, cfg=env_cfg, render_mode="rgb_array")

            # directory to save videos
            videos_dir = os.path.join(self.videos_dir, task_name)
            # wrap environment to record videos
            env = gym.wrappers.RecordVideo(
-                env, videos_dir, step_trigger=self.step_trigger, video_length=self.video_length
+                env, videos_dir, step_trigger=self.step_trigger, video_length=self.video_length, disable_logger=True
            )

            # reset environment
            env.reset()
            # simulate environment
-            for _ in range(500):
-                # compute zero actions
-                actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
-                # apply actions
-                _ = env.step(actions)
-                # render environment
-                env.render(mode="human")
+            with torch.inference_mode():
+                for _ in range(500):
+                    # compute zero actions
+                    actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
+                    # apply actions
+                    _ = env.step(actions)

            # close the simulator
            env.close()

--- a/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_rl_games_wrapper.py
+++ b/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_rl_games_wrapper.py
+# Copyright (c) 2022-2023, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+"""Launch Isaac Sim Simulator first."""
+
+import os
+
+from omni.isaac.orbit.app import AppLauncher
+
+# launch the simulator
+app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
+app_launcher = AppLauncher(headless=True, experience=app_experience)
+simulation_app = app_launcher.app
+
+
+"""Rest everything follows."""
+
+
+import gymnasium as gym
+import torch
+import traceback
+import unittest
+
+import carb
+import omni.usd
+
+from omni.isaac.orbit.envs import RLTaskEnvCfg
+
+import omni.isaac.orbit_tasks  # noqa: F401
+from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
+from omni.isaac.orbit_tasks.utils.wrappers.rl_games import RlGamesVecEnvWrapper
+
+
+class TestRlGamesVecEnvWrapper(unittest.TestCase):
+    """Test that RL-Games VecEnv wrapper works as expected."""
+
+    @classmethod
+    def setUpClass(cls):
+        # acquire all Isaac environments names
+        cls.registered_tasks = list()
+        for task_spec in gym.registry.values():
+            if "Isaac" in task_spec.id:
+                cls.registered_tasks.append(task_spec.id)
+        # sort environments by name
+        cls.registered_tasks.sort()
+        # only pick the first three environments to test
+        cls.registered_tasks = cls.registered_tasks[:3]
+        # print all existing task names
+        print(">>> All registered environments:", cls.registered_tasks)
+
+    def setUp(self) -> None:
+        # common parameters
+        self.num_envs = 512
+        self.use_gpu = True
+
+    def test_random_actions(self):
+        """Run random actions and check environments return valid signals."""
+        for task_name in self.registered_tasks:
+            print(f">>> Running test for environment: {task_name}")
+            # create a new stage
+            omni.usd.get_context().new_stage()
+            # parse configuration
+            env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
+            # note: we don't want to shutdown the app on stop during the tests since we reload the stage
+            env_cfg.sim.shutdown_app_on_stop = False
+
+            # create environment
+            env = gym.make(task_name, cfg=env_cfg)
+            # wrap environment
+            env = RlGamesVecEnvWrapper(env, "cuda:0", 100, 100)
+
+            # reset environment
+            obs = env.reset()
+            # check signal
+            self.assertTrue(self._check_valid_tensor(obs))
+
+            # simulate environment for 100 steps
+            with torch.inference_mode():
+                for _ in range(100):
+                    # sample actions from -1 to 1
+                    actions = 2 * torch.rand(env.action_space.shape, device=env.device) - 1
+                    # apply actions
+                    transition = env.step(actions)
+                    # check signals
+                    for data in transition:
+                        self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
+
+            # close the environment
+            print(f">>> Closing environment: {task_name}")
+            env.close()
+
+    """
+    Helper functions.
+    """
+
+    @staticmethod
+    def _check_valid_tensor(data: torch.Tensor | dict) -> bool:
+        """Checks if given data does not have corrupted values.
+
+        Args:
+            data: Data buffer.
+
+        Returns:
+            True if the data is valid.
+        """
+        if isinstance(data, torch.Tensor):
+            return not torch.any(torch.isnan(data))
+        elif isinstance(data, dict):
+            valid_tensor = True
+            for value in data.values():
+                if isinstance(value, dict):
+                    valid_tensor &= TestRlGamesVecEnvWrapper._check_valid_tensor(value)
+                elif isinstance(value, torch.Tensor):
+                    valid_tensor &= not torch.any(torch.isnan(value))
+            return valid_tensor
+        else:
+            raise ValueError(f"Input data of invalid type: {type(data)}.")
+
+
+if __name__ == "__main__":
+    try:
+        unittest.main()
+    except Exception as err:
+        carb.log_error(err)
+        carb.log_error(traceback.format_exc())
+        raise
+    finally:
+        # close sim app
+        simulation_app.close()
--- a/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_rsl_rl_wrapper.py
+++ b/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_rsl_rl_wrapper.py
+# Copyright (c) 2022-2023, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+"""Launch Isaac Sim Simulator first."""
+
+import os
+
+from omni.isaac.orbit.app import AppLauncher
+
+# launch the simulator
+app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
+app_launcher = AppLauncher(headless=True, experience=app_experience)
+simulation_app = app_launcher.app
+
+
+"""Rest everything follows."""
+
+
+import gymnasium as gym
+import torch
+import traceback
+import unittest
+
+import carb
+import omni.usd
+
+from omni.isaac.orbit.envs import RLTaskEnvCfg
+
+import omni.isaac.orbit_tasks  # noqa: F401
+from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
+from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import RslRlVecEnvWrapper
+
+
+class TestRslRlVecEnvWrapper(unittest.TestCase):
+    """Test that RSL-RL VecEnv wrapper works as expected."""
+
+    @classmethod
+    def setUpClass(cls):
+        # acquire all Isaac environments names
+        cls.registered_tasks = list()
+        for task_spec in gym.registry.values():
+            if "Isaac" in task_spec.id:
+                cls.registered_tasks.append(task_spec.id)
+        # sort environments by name
+        cls.registered_tasks.sort()
+        # only pick the first three environments to test
+        cls.registered_tasks = cls.registered_tasks[:3]
+        # print all existing task names
+        print(">>> All registered environments:", cls.registered_tasks)
+
+    def setUp(self) -> None:
+        # common parameters
+        self.num_envs = 512
+        self.use_gpu = True
+
+    def test_random_actions(self):
+        """Run random actions and check environments return valid signals."""
+        for task_name in self.registered_tasks:
+            print(f">>> Running test for environment: {task_name}")
+            # create a new stage
+            omni.usd.get_context().new_stage()
+            # parse configuration
+            env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
+            # note: we don't want to shutdown the app on stop during the tests since we reload the stage
+            env_cfg.sim.shutdown_app_on_stop = False
+
+            # create environment
+            env = gym.make(task_name, cfg=env_cfg)
+            # wrap environment
+            env = RslRlVecEnvWrapper(env)
+
+            # reset environment
+            obs, extras = env.reset()
+            # check signal
+            self.assertTrue(self._check_valid_tensor(obs))
+            self.assertTrue(self._check_valid_tensor(extras))
+
+            # simulate environment for 1000 steps
+            with torch.inference_mode():
+                for _ in range(1000):
+                    # sample actions from -1 to 1
+                    actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
+                    # apply actions
+                    transition = env.step(actions)
+                    # check signals
+                    for data in transition:
+                        self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
+
+            # close the environment
+            print(f">>> Closing environment: {task_name}")
+            env.close()
+
+    """
+    Helper functions.
+    """
+
+    @staticmethod
+    def _check_valid_tensor(data: torch.Tensor | dict) -> bool:
+        """Checks if given data does not have corrupted values.
+
+        Args:
+            data: Data buffer.
+
+        Returns:
+            True if the data is valid.
+        """
+        if isinstance(data, torch.Tensor):
+            return not torch.any(torch.isnan(data))
+        elif isinstance(data, dict):
+            valid_tensor = True
+            for value in data.values():
+                if isinstance(value, dict):
+                    valid_tensor &= TestRslRlVecEnvWrapper._check_valid_tensor(value)
+                elif isinstance(value, torch.Tensor):
+                    valid_tensor &= not torch.any(torch.isnan(value))
+            return valid_tensor
+        else:
+            raise ValueError(f"Input data of invalid type: {type(data)}.")
+
+
+if __name__ == "__main__":
+    try:
+        unittest.main()
+    except Exception as err:
+        carb.log_error(err)
+        carb.log_error(traceback.format_exc())
+        raise
+    finally:
+        # close sim app
+        simulation_app.close()
--- a/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_sb3_wrapper.py
+++ b/source/extensions/omni.isaac.orbit_tasks/test/wrappers/test_sb3_wrapper.py
+# Copyright (c) 2022-2023, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from __future__ import annotations
+
+"""Launch Isaac Sim Simulator first."""
+
+import os
+
+from omni.isaac.orbit.app import AppLauncher
+
+# launch the simulator
+app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
+app_launcher = AppLauncher(headless=True, experience=app_experience)
+simulation_app = app_launcher.app
+
+
+"""Rest everything follows."""
+
+
+import gymnasium as gym
+import numpy as np
+import torch
+import traceback
+import unittest
+
+import carb
+import omni.usd
+
+from omni.isaac.orbit.envs import RLTaskEnvCfg
+
+import omni.isaac.orbit_tasks  # noqa: F401
+from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
+from omni.isaac.orbit_tasks.utils.wrappers.sb3 import Sb3VecEnvWrapper
+
+
+class TestStableBaselines3VecEnvWrapper(unittest.TestCase):
+    """Test that RSL-RL VecEnv wrapper works as expected."""
+
+    @classmethod
+    def setUpClass(cls):
+        # acquire all Isaac environments names
+        cls.registered_tasks = list()
+        for task_spec in gym.registry.values():
+            if "Isaac" in task_spec.id:
+                cls.registered_tasks.append(task_spec.id)
+        # sort environments by name
+        cls.registered_tasks.sort()
+        # only pick the first three environments to test
+        cls.registered_tasks = cls.registered_tasks[:3]
+        # print all existing task names
+        print(">>> All registered environments:", cls.registered_tasks)
+
+    def setUp(self) -> None:
+        # common parameters
+        self.num_envs = 512
+        self.use_gpu = True
+
+    def test_random_actions(self):
+        """Run random actions and check environments return valid signals."""
+        for task_name in self.registered_tasks:
+            print(f">>> Running test for environment: {task_name}")
+            # create a new stage
+            omni.usd.get_context().new_stage()
+            # parse configuration
+            env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
+            # note: we don't want to shutdown the app on stop during the tests since we reload the stage
+            env_cfg.sim.shutdown_app_on_stop = False
+
+            # create environment
+            env = gym.make(task_name, cfg=env_cfg)
+            # wrap environment
+            env = Sb3VecEnvWrapper(env)
+
+            # reset environment
+            obs = env.reset()
+            # check signal
+            self.assertTrue(self._check_valid_array(obs))
+
+            # simulate environment for 1000 steps
+            with torch.inference_mode():
+                for _ in range(1000):
+                    # sample actions from -1 to 1
+                    actions = 2 * np.random.rand(env.num_envs, env.action_space.shape) - 1
+                    # apply actions
+                    transition = env.step(actions)
+                    # check signals
+                    for data in transition:
+                        self.assertTrue(self._check_valid_array(data), msg=f"Invalid data: {data}")
+
+            # close the environment
+            print(f">>> Closing environment: {task_name}")
+            env.close()
+
+    """
+    Helper functions.
+    """
+
+    @staticmethod
+    def _check_valid_array(data: np.ndarray | dict | list) -> bool:
+        """Checks if given data does not have corrupted values.
+
+        Args:
+            data: Data buffer.
+
+        Returns:
+            True if the data is valid.
+        """
+        if isinstance(data, np.ndarray):
+            return not np.any(np.isnan(data))
+        elif isinstance(data, dict):
+            valid_array = True
+            for value in data.values():
+                if isinstance(value, dict):
+                    valid_array &= TestStableBaselines3VecEnvWrapper._check_valid_array(value)
+                elif isinstance(value, np.ndarray):
+                    valid_array &= not np.any(np.isnan(value))
+            return valid_array
+        elif isinstance(data, list):
+            valid_array = True
+            for value in data:
+                valid_array &= TestStableBaselines3VecEnvWrapper._check_valid_array(value)
+            return valid_array
+        else:
+            raise ValueError(f"Input data of invalid type: {type(data)}.")
+
+
+if __name__ == "__main__":
+    try:
+        unittest.main()
+    except Exception as err:
+        carb.log_error(err)
+        carb.log_error(traceback.format_exc())
+        raise
+    finally:
+        # close sim app
+        simulation_app.close()
--- a/source/standalone/environments/list_envs.py
+++ b/source/standalone/environments/list_envs.py
@@ -27,7 +27,7 @@ simulation_app = app_launcher.app

 """Rest everything follows."""

-import gym
+import gymnasium as gym
 from prettytable import PrettyTable

 import omni.isaac.contrib_tasks  # noqa: F401
@@ -47,10 +47,10 @@ def main():
    # count of environments
    index = 0
    # acquire all Isaac environments names
-    for task_spec in gym.envs.registry.all():
+    for task_spec in gym.registry.values():
        if "Isaac" in task_spec.id:
            # add details to table
-            table.add_row([index + 1, task_spec.id, task_spec.entry_point, task_spec._kwargs["env_cfg_entry_point"]])
+            table.add_row([index + 1, task_spec.id, task_spec.entry_point, task_spec.kwargs["env_cfg_entry_point"]])
            # increment count
            index += 1

@@ -61,6 +61,8 @@ if __name__ == "__main__":
    try:
        # run the main function
        main()
+    except Exception as e:
+        raise e
    finally:
        # close the app
        simulation_app.close()
--- a/source/standalone/environments/random_agent.py
+++ b/source/standalone/environments/random_agent.py
@@ -15,7 +15,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="Random agent for Isaac Orbit environments.")
+parser = argparse.ArgumentParser(description="Random agent for Orbit environments.")
 parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
 parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
 parser.add_argument("--task", type=str, default=None, help="Name of the task.")
@@ -31,7 +31,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import torch
 import traceback

@@ -43,12 +43,15 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg


 def main():
-    """Random actions agent with Isaac Orbit environment."""
+    """Random actions agent with Orbit environment."""
    # parse configuration
    env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=args_cli.num_envs)
    # create environment
    env = gym.make(args_cli.task, cfg=env_cfg)

+    # print info (this is vectorized environment)
+    print(f"[INFO]: Gym observation space: {env.observation_space}")
+    print(f"[INFO]: Gym action space: {env.action_space}")
    # reset environment
    env.reset()
    # simulate environment
@@ -56,9 +59,9 @@ def main():
        # run everything in inference mode
        with torch.inference_mode():
            # sample actions from -1 to 1
-            actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
+            actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
            # apply actions
-            _, _, _, _ = env.step(actions)
+            env.step(actions)

    # close the simulator
    env.close()

--- a/source/standalone/environments/state_machine/play_lift.py
+++ b/source/standalone/environments/state_machine/play_lift.py
@@ -36,7 +36,7 @@ simulation_app = app_launcher.app

 """Rest everything else."""

-import gym
+import gymnasium as gym
 import torch
 import traceback
 from enum import Enum

--- a/source/standalone/environments/teleoperation/teleop_se3_agent.py
+++ b/source/standalone/environments/teleoperation/teleop_se3_agent.py
@@ -15,7 +15,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="Keyboard teleoperation for Isaac Orbit environments.")
+parser = argparse.ArgumentParser(description="Keyboard teleoperation for Orbit environments.")
 parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
 parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.")
 parser.add_argument("--device", type=str, default="keyboard", help="Device for interacting with environment")
@@ -33,7 +33,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import torch
 import traceback


--- a/source/standalone/environments/zero_agent.py
+++ b/source/standalone/environments/zero_agent.py
@@ -15,7 +15,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="Zero agent for Isaac Orbit environments.")
+parser = argparse.ArgumentParser(description="Zero agent for Orbit environments.")
 parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
 parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
 parser.add_argument("--task", type=str, default=None, help="Name of the task.")
@@ -30,7 +30,7 @@ simulation_app = app_launcher.app

 """Rest everything follows."""

-import gym
+import gymnasium as gym
 import torch
 import traceback

@@ -42,12 +42,15 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg


 def main():
-    """Zero actions agent with Isaac Orbit environment."""
+    """Zero actions agent with Orbit environment."""
    # parse configuration
    env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=args_cli.num_envs)
    # create environment
    env = gym.make(args_cli.task, cfg=env_cfg)

+    # print info (this is vectorized environment)
+    print(f"[INFO]: Gym observation space: {env.observation_space}")
+    print(f"[INFO]: Gym action space: {env.action_space}")
    # reset environment
    env.reset()
    # simulate environment
@@ -55,9 +58,9 @@ def main():
        # run everything in inference mode
        with torch.inference_mode():
            # compute zero actions
-            actions = torch.zeros((env.num_envs, env.action_space.shape[0]), device=env.device)
+            actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
            # apply actions
-            _, _, _, _ = env.step(actions)
+            env.step(actions)

    # close the simulator
    env.close()

--- a/source/standalone/workflows/rl_games/play.py
+++ b/source/standalone/workflows/rl_games/play.py
@@ -37,7 +37,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import math
 import os
 import torch

--- a/source/standalone/workflows/rl_games/train.py
+++ b/source/standalone/workflows/rl_games/train.py
@@ -41,7 +41,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import math
 import os
 import traceback
@@ -96,13 +96,14 @@ def main():
    clip_actions = agent_cfg["params"]["env"].get("clip_actions", math.inf)

    # create isaac environment
-    env = gym.make(args_cli.task, cfg=env_cfg)
+    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
    # wrap for video recording
    if args_cli.video:
        video_kwargs = {
            "video_folder": os.path.join(log_dir, "videos"),
            "step_trigger": lambda step: step % args_cli.video_interval == 0,
            "video_length": args_cli.video_length,
+            "disable_logger": True,
        }
        print("[INFO] Recording videos during training.")
        print_dict(video_kwargs, nesting=4)

--- a/source/standalone/workflows/robomimic/collect_demonstrations.py
+++ b/source/standalone/workflows/robomimic/collect_demonstrations.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-"""Script to collect demonstrations with Isaac Orbit environments."""
+"""Script to collect demonstrations with Orbit environments."""

 from __future__ import annotations

@@ -15,7 +15,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="Collect demonstrations for Isaac Orbit environments.")
+parser = argparse.ArgumentParser(description="Collect demonstrations for Orbit environments.")
 parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
 parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.")
 parser.add_argument("--task", type=str, default=None, help="Name of the task.")
@@ -35,7 +35,7 @@ simulation_app = app_launcher.app


 import contextlib
-import gym
+import gymnasium as gym
 import os
 import torch
 import traceback

--- a/source/standalone/workflows/robomimic/play.py
+++ b/source/standalone/workflows/robomimic/play.py
@@ -15,7 +15,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="Play policy trained using robomimic for Isaac Orbit environments.")
+parser = argparse.ArgumentParser(description="Play policy trained using robomimic for Orbit environments.")
 parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
 parser.add_argument("--task", type=str, default=None, help="Name of the task.")
 parser.add_argument("--checkpoint", type=str, default=None, help="Pytorch model checkpoint to load.")
@@ -31,7 +31,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import torch
 import traceback

@@ -46,7 +46,7 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg


 def main():
-    """Run a trained policy from robomimic with Isaac Orbit environment."""
+    """Run a trained policy from robomimic with Orbit environment."""
    # parse configuration
    env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=1)
    # modify configuration

--- a/source/standalone/workflows/robomimic/train.py
+++ b/source/standalone/workflows/robomimic/train.py
@@ -54,7 +54,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""

 import argparse
-import gym
+import gymnasium as gym
 import json
 import numpy as np
 import os

--- a/source/standalone/workflows/rsl_rl/play.py
+++ b/source/standalone/workflows/rsl_rl/play.py
@@ -36,7 +36,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import os
 import torch
 import traceback

--- a/source/standalone/workflows/rsl_rl/train.py
+++ b/source/standalone/workflows/rsl_rl/train.py
@@ -47,7 +47,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import os
 import torch
 import traceback
@@ -88,13 +88,14 @@ def main():
    log_dir = os.path.join(log_root_path, log_dir)

    # create isaac environment
-    env = gym.make(args_cli.task, cfg=env_cfg)
+    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
    # wrap for video recording
    if args_cli.video:
        video_kwargs = {
            "video_folder": os.path.join(log_dir, "videos"),
            "step_trigger": lambda step: step % args_cli.video_interval == 0,
            "video_length": args_cli.video_length,
+            "disable_logger": True,
        }
        print("[INFO] Recording videos during training.")
        print_dict(video_kwargs, nesting=4)

--- a/source/standalone/workflows/sb3/play.py
+++ b/source/standalone/workflows/sb3/play.py
@@ -33,7 +33,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import torch
 import traceback


--- a/source/standalone/workflows/sb3/train.py
+++ b/source/standalone/workflows/sb3/train.py
@@ -43,7 +43,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import os
 import traceback
 from datetime import datetime
@@ -95,6 +95,7 @@ def main():
            "video_folder": os.path.join(log_dir, "videos"),
            "step_trigger": lambda step: step % args_cli.video_interval == 0,
            "video_length": args_cli.video_length,
+            "disable_logger": True,
        }
        print("[INFO] Recording videos during training.")
        print_dict(video_kwargs, nesting=4)

--- a/source/standalone/workflows/skrl/play.py
+++ b/source/standalone/workflows/skrl/play.py
@@ -38,7 +38,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import torch
 import traceback


--- a/source/standalone/workflows/skrl/train.py
+++ b/source/standalone/workflows/skrl/train.py
@@ -48,7 +48,7 @@ simulation_app = app_launcher.app
 """Rest everything follows."""


-import gym
+import gymnasium as gym
 import traceback
 from datetime import datetime

@@ -97,13 +97,14 @@ def main():
    dump_pickle(os.path.join(log_dir, "params", "agent.pkl"), experiment_cfg)

    # create isaac environment
-    env = gym.make(args_cli.task, cfg=env_cfg)
+    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
    # wrap for video recording
    if args_cli.video:
        video_kwargs = {
            "video_folder": os.path.join(log_dir, "videos"),
            "step_trigger": lambda step: step % args_cli.video_interval == 0,
            "video_length": args_cli.video_length,
+            "disable_logger": True,
        }
        print("[INFO] Recording videos during training.")
        print_dict(video_kwargs, nesting=4)