Cleans up the `omni.isaac.lab.envs` submodule (#548)

# Description Earlier, it was unclear where the configuration classes and corresponding classes belonged inside the `omni.isaac.lab.envs` module. This MR reorganizes the module to ensure parity between the class and its respective configuration class. The MR also fixes docstrings with the hope of making things cleaner. ## Type of change - Bug fix (non-breaking change which fixes an issue) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have run all the tests with `./isaaclab.sh --test` and they pass - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Cleans up the `omni.isaac.lab.envs` submodule (#548)
# Description Earlier, it was unclear where the configuration classes and corresponding classes belonged inside the `omni.isaac.lab.envs` module. This MR reorganizes the module to ensure parity between the class and its respective configuration class. The MR also fixes docstrings with the hope of making things cleaner. ## Type of change - Bug fix (non-breaking change which fixes an issue) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have run all the tests with `./isaaclab.sh --test` and they pass - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
3ef7e678 · Mayank Mittal · GitHub · 59493b89 · 3ef7e678 · 3ef7e678
Unverified Commit 3ef7e678 authored Jun 25, 2024 by Mayank Mittal Committed by GitHub Jun 25, 2024
22 changed files
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -38,7 +38,7 @@ repos:
      - id: pyupgrade
        args: ["--py310-plus"]
        # FIXME: This is a hack because Pytorch does not like: torch.Tensor | dict aliasing
-        exclude: "source/extensions/omni.isaac.lab/omni/isaac/lab/envs/types.py"
+        exclude: "source/extensions/omni.isaac.lab/omni/isaac/lab/envs/common.py"
  - repo: https://github.com/codespell-project/codespell
    rev: v2.2.6
    hooks:

--- a/docs/index.rst
+++ b/docs/index.rst
@@ -69,7 +69,7 @@ Table of Contents
   :maxdepth: 2
   :caption: Features

-   source/features/workflows
+   source/features/task_workflows
   source/features/multi_gpu
   source/features/tiled_rendering
   source/features/environments

--- a/docs/source/api/lab/omni.isaac.lab.envs.rst
+++ b/docs/source/api/lab/omni.isaac.lab.envs.rst
@@ -16,11 +16,11 @@

    ManagerBasedEnv
    ManagerBasedEnvCfg
-    ViewerCfg
    ManagerBasedRLEnv
    ManagerBasedRLEnvCfg
    DirectRLEnv
    DirectRLEnvCfg
+    ViewerCfg

 Manager Based Environment
 -------------------------
@@ -32,10 +32,6 @@ Manager Based Environment
    :members:
    :exclude-members: __init__, class_type

-.. autoclass:: ViewerCfg
-    :members:
-    :exclude-members: __init__
-
 Manager Based RL Environment
 ----------------------------

@@ -63,3 +59,10 @@ Direct RL Environment
    :inherited-members:
    :show-inheritance:
    :exclude-members: __init__, class_type
+
+Common
+------
+
+.. autoclass:: ViewerCfg
+    :members:
+    :exclude-members: __init__
--- a/docs/source/features/task_workflows.rst
+++ b/docs/source/features/task_workflows.rst
+.. _feature-workflows:
+
+
+Task Design Workflows
+=====================
+
+.. currentmodule:: omni.isaac.lab
+
+Environments define the interface between the agent and the simulation. In the simplest case, the environment provides
+the agent with the current observations and executes the actions provided by the agent. In a Markov Decision Process
+(MDP) formulation, the environment can also provide additional information such as the current reward, done flag, and
+information about the current episode.
+
+While the environment interface is simple to understand, its implementation can vary significantly depending on the
+complexity of the task. In the context of reinforcement learning (RL), the environment implementation can be broken down
+into several components, such as the reward function, observation function, termination function, and reset function.
+Each of these components can be implemented in different ways depending on the complexity of the task and the desired
+level of modularity.
+
+We provide two different workflows for designing environments with the framework:
+
+* **Manager-based**: The environment is decomposed into individual components (or managers) that handle different
+  aspects of the environment (such as computing observations, applying actions, and applying randomization). The
+  user defines configuration classes for each component and the environment is responsible for coordinating the
+  managers and calling their functions.
+* **Direct**: The user defines a single class that implements the entire environment directly without the need for
+  separate managers. This class is responsible for computing observations, applying actions, and computing rewards.
+
+Both workflows have their own advantages and disadvantages. The manager-based workflow is more modular and allows
+different components of the environment to be swapped out easily. This is useful when prototyping the environment
+and experimenting with different configurations. On the other hand, the direct workflow is more efficient and allows
+for more fine-grained control over the environment logic. This is useful when optimizing the environment for performance
+or when implementing complex logic that is difficult to decompose into separate components.
+
+
+Manager-Based Environments
+--------------------------
+
+A majority of environment implementations follow a similar structure. The environment processes the input actions,
+steps through the simulation, computes observations and reward signals, applies randomization, and resets the terminated
+environments. Motivated by this, the environment can be decomposed into individual components that handle each of these tasks.
+For example, the observation manager is responsible for computing the observations, the reward manager is responsible for
+computing the rewards, and the termination manager is responsible for computing the termination signal. This approach
+is known as the manager-based environment design in the framework.
+
+Manager-based environments promote modular implementations of tasks by decomposing the task into individual
+components that are managed by separate classes. Each component of the task, such as rewards, observations,
+termination can all be specified as individual configuration classes that are then passed to the corresponding
+manager classes. The manager is then responsible for parsing the configurations and processing the contents specified
+in its configuration.
+
+The coordination between the different managers is orchestrated by the class :class:`envs.ManagerBasedRLEnv`.
+It takes in a task configuration class instance (:class:`envs.ManagerBasedRLEnvCfg`) that contains the configurations
+for each of the components of the task. Based on the configurations, the scene is set up and the task is initialized.
+Afterwards, while stepping through the environment, all the managers are called sequentially to perform the necessary
+operations.
+
+For their own tasks, we expect the user to mainly define the task configuration class and use the existing
+:class:`envs.ManagerBasedRLEnv` class for the task implementation. The task configuration class should inherit from
+the base class :class:`envs.ManagerBasedRLEnvCfg` and contain variables assigned to various configuration classes
+for each component (such as the ``ObservationCfg`` and ``RewardCfg``).
+
+.. dropdown:: Example for defining the reward function for the Cartpole task using the manager-style
+    :icon: plus
+
+    The following class is a part of the Cartpole environment configuration class. The :class:`RewardsCfg` class
+    defines individual terms that compose the reward function. Each reward term is defined by its function
+    implementation, weight and additional parameters to be passed to the function. Users can define multiple
+    reward terms and their weights to be used in the reward function.
+
+    .. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/cartpole_env_cfg.py
+        :language: python
+        :pyobject: RewardsCfg
+
+
+Through this approach, it is possible to easily vary the implementations of the task by switching some components
+while leaving the remaining of the code intact. This flexibility is desirable when prototyping the environment and
+experimenting with different configurations. It also allows for easy collaborating with others on implementing an
+environment, since contributors may choose to use different combinations of configurations for their own task
+specifications.
+
+.. seealso::
+
+    We provide a more detailed tutorial for setting up an environment using the manager-based workflow at
+    :ref:`tutorial-create-manager-rl-env`.
+
+
+Direct Environments
+-------------------
+
+The direct-style environment aligns more closely with traditional implementations of environments,
+where a single script directly implements the reward function, observation function, resets, and all the other components
+of the environment. This approach does not require the manager classes. Instead, users are provided the complete freedom
+to implement their task through the APIs from the base class :class:`envs.DirectRLEnv`. For users migrating from the `IsaacGymEnvs`_
+and `OmniIsaacGymEnvs`_ framework, this workflow may be more familiar.
+
+When defining an environment with the direct-style implementation, we expect the user define a single class that
+implements the entire environment. The task class should inherit from the base :class:`envs.DirectRLEnv` class and should
+have its corresponding configuration class that inherits from :class:`envs.DirectRLEnvCfg`. The task class is responsible
+for setting up the scene, processing the actions, computing the rewards, observations, resets, and termination signals.
+
+.. dropdown:: Example for defining the reward function for the Cartpole task using the direct-style
+    :icon: plus
+
+    The following function is a part of the Cartpole environment class and is responsible for computing the rewards.
+
+    .. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_env.py
+        :language: python
+        :pyobject: CartpoleEnv._get_rewards
+        :dedent: 4
+
+    It calls the :meth:`compute_rewards` function which is Torch JIT compiled for performance benefits.
+
+    .. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_env.py
+        :language: python
+        :pyobject: compute_rewards
+
+This approach provides more transparency in the implementations of the environments, as logic is defined within the task
+class instead of abstracted with the use of managers. This may be beneficial when implementing complex logic that is
+difficult to decompose into separate components. Additionally, the direct-style implementation may bring more performance
+benefits for the environment, as it allows implementing large chunks of logic with optimized frameworks such as
+`PyTorch JIT`_ or `Warp`_. This may be valuable when scaling up training tremendously which requires optimizing individual
+operations in the environment.
+
+.. seealso::
+
+    We provide a more detailed tutorial for setting up a RL environment using the direct workflow at
+    :ref:`tutorial-create-direct-rl-env`.
+
+
+.. _IsaacGymEnvs: https://github.com/isaac-sim/IsaacGymEnvs
+.. _OmniIsaacGymEnvs: https://github.com/isaac-sim/OmniIsaacGymEnvs
+.. _Pytorch JIT: https://pytorch.org/docs/stable/jit.html
+.. _Warp: https://github.com/NVIDIA/warp
--- a/docs/source/features/workflows.rst
+++ b/docs/source/features/workflows.rst
-.. _feature-workflows:
-
-
-Task Design Workflows
-=====================
-
-.. currentmodule:: omni.isaac.lab
-
-Reinforcement learning environments can be implemented using two different workflows: Manager-based and Direct.
-This page outlines the two workflows, explaining their benefits and usecases.
-
-In addition, multi-GPU and multi-node reinforcement learning support is explained, along with the tiled rendering API,
-which can be used for efficient vectorized rendering across environments.
-
-
-Manager-Based Environments
--------------------------
-
-Manager-based environments promote modular implementations of reinforcement learning tasks
-through the use of Managers. Each component of the task, such as rewards, observations, termination
-can all be specified as individual configuration classes that are then passed to the corresponding
-manager classes. Each manager is responsible for parsing the configurations and processing
-the contents specified in each config class. The manager implementations are taken care of by
-the base class :class:`envs.ManagerBasedRLEnv`.
-
-With this approach, it is simple to switch implementations of some components in the task
-while leaving the remaining of the code intact. This is desirable when collaborating with others
-on implementing a reinforcement learning environment, where contributors may choose to use
-different combinations of configurations for the reinforcement learning components of the task.
-
-A class definition of a manager-based environment consists of defining a task configuration class that
-inherits from :class:`envs.ManagerBasedRLEnvCfg`. This class should contain variables assigned to various
-configuration classes for each of the components of the RL task, such as the ``ObservationCfg``
-or ``RewardCfg``. The entry point of the environment becomes the base class :class:`envs.ManagerBasedRLEnv`,
-which will process the main task config and iterate through the individual configuration classes that are defined
-in the task config class.
-
-An example of implementing the reward function for the Cartpole task using the manager-based implementation is as follow:
-
-.. code-block:: python
-
-    @configclass
-    class RewardsCfg:
-        """Reward terms for the MDP."""
-
-        # (1) Constant running reward
-        alive = RewTerm(func=mdp.is_alive, weight=1.0)
-        # (2) Failure penalty
-        terminating = RewTerm(func=mdp.is_terminated, weight=-2.0)
-        # (3) Primary task: keep pole upright
-        pole_pos = RewTerm(
-            func=mdp.joint_pos_target_l2,
-            weight=-1.0,
-            params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]), "target": 0.0},
-        )
-        # (4) Shaping tasks: lower cart velocity
-        cart_vel = RewTerm(
-            func=mdp.joint_vel_l1,
-            weight=-0.01,
-            params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"])},
-        )
-        # (5) Shaping tasks: lower pole angular velocity
-        pole_vel = RewTerm(
-            func=mdp.joint_vel_l1,
-            weight=-0.005,
-            params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"])},
-        )
-
-.. seealso::
-
-    We provide a more detailed tutorial for setting up a RL environment using the manager-based workflow at
-    `Creating a manager-based RL Environment <../tutorials/03_envs/create_rl_env.html>`_.
-
-
-Direct Environments
-------------------
-
-The direct-style environment more closely aligns with traditional implementations of reinforcement learning environments,
-where a single script implements the reward function, observation function, resets, and all other components
-of the environment. This approach does not use the Manager classes. Instead, users are left with the freedom
-to implement the APIs from the base class :class:`envs.DirectRLEnv`. For users migrating from the IsaacGymEnvs
-or OmniIsaacGymEnvs framework, this workflow will have a closer implementation to the previous frameworks.
-
-When defining an environment following the direct-style implementation, a task configuration class inheriting from
-:class:`envs.DirectRLEnvCfg` is used for defining task environment configuration variables, such as the number
-of observations and actions. Adding configuration classes for the managers are not required and will not be processed
-by the base class. In addition to the configuration class, the logic of the task should be defined in a new
-task class that inherits from the base class :class:`envs.DirectRLEnv`. This class will then implement the main
-task logics, including setting up the scene, processing the actions, computing resets, rewards, and observations.
-
-This approach may bring more performance benefits for the environment, as it allows implementing large chunks
-of logic with optimized frameworks such as `PyTorch Jit <https://pytorch.org/docs/stable/jit.html>`_ or
-`Warp <https://github.com/NVIDIA/warp>`_. This may be important when scaling up training for large and complex
-environments. Additionally, data may be cached in class variables and reused in multiple APIs for the class.
-This method provides more transparency in the implementations of the environments, as logic is defined
-within the task class instead of abstracted with the use the Managers.
-
-An example of implementing the reward function for the Cartpole task using the Direct-style implementation is as follow:
-
-.. code-block:: python
-
-   def _get_rewards(self) -> torch.Tensor:
-        total_reward = compute_rewards(
-            self.cfg.rew_scale_alive,
-            self.cfg.rew_scale_terminated,
-            self.cfg.rew_scale_pole_pos,
-            self.cfg.rew_scale_cart_vel,
-            self.cfg.rew_scale_pole_vel,
-            self.joint_pos[:, self._pole_dof_idx[0]],
-            self.joint_vel[:, self._pole_dof_idx[0]],
-            self.joint_pos[:, self._cart_dof_idx[0]],
-            self.joint_vel[:, self._cart_dof_idx[0]],
-            self.reset_terminated,
-        )
-        return total_reward
-
-   @torch.jit.script
-   def compute_rewards(
-       rew_scale_alive: float,
-       rew_scale_terminated: float,
-       rew_scale_pole_pos: float,
-       rew_scale_cart_vel: float,
-       rew_scale_pole_vel: float,
-       pole_pos: torch.Tensor,
-       pole_vel: torch.Tensor,
-       cart_pos: torch.Tensor,
-       cart_vel: torch.Tensor,
-       reset_terminated: torch.Tensor,
-   ):
-       rew_alive = rew_scale_alive * (1.0 - reset_terminated.float())
-       rew_termination = rew_scale_terminated * reset_terminated.float()
-       rew_pole_pos = rew_scale_pole_pos * torch.sum(torch.square(pole_pos), dim=-1)
-       rew_cart_vel = rew_scale_cart_vel * torch.sum(torch.abs(cart_vel), dim=-1)
-       rew_pole_vel = rew_scale_pole_vel * torch.sum(torch.abs(pole_vel), dim=-1)
-       total_reward = rew_alive + rew_termination + rew_pole_pos + rew_cart_vel + rew_pole_vel
-       return total_reward
-
-.. seealso::
-
-    We provide a more detailed tutorial for setting up a RL environment using the direct workflow at
-    `Creating a Direct Workflow RL Environment <../tutorials/03_envs/create_direct_rl_env.html>`_.
--- a/docs/source/how-to/wrap_rl_env.rst
+++ b/docs/source/how-to/wrap_rl_env.rst
@@ -77,7 +77,7 @@ The camera's pose and image resolution can be configured through the
 .. dropdown:: Default parameters of the ViewerCfg class:
    :icon: code

-    .. literalinclude:: ../../../source/extensions/omni.isaac.lab/omni/isaac/lab/envs/base_env_cfg.py
+    .. literalinclude:: ../../../source/extensions/omni.isaac.lab/omni/isaac/lab/envs/common.py
        :language: python
        :pyobject: ViewerCfg


--- a/docs/source/tutorials/03_envs/create_direct_rl_env.rst
+++ b/docs/source/tutorials/03_envs/create_direct_rl_env.rst
-.. _tutorial-create-oige-rl-env:
+.. _tutorial-create-direct-rl-env:


 Creating a Direct Workflow RL Environment
@@ -103,7 +103,7 @@ Defining Rewards

 Reward function should be defined in the ``_get_rewards(self)`` API, which returns the reward
 buffer as a return value. Within this function, the task is free to implement the logic of
-the reward function. In this example, we implement a Pytorch jitted function that computes
+the reward function. In this example, we implement a Pytorch JIT function that computes
 the various components of the reward function.

 .. code-block:: python

--- a/docs/source/tutorials/03_envs/create_base_env.rst
+++ b/docs/source/tutorials/03_envs/create_base_env.rst
-.. _tutorial-create-base-env:
+.. _tutorial-create-manager-base-env:


 Creating a Manager-Based Base Environment

--- a/docs/source/tutorials/03_envs/create_rl_env.rst
+++ b/docs/source/tutorials/03_envs/create_rl_env.rst
-.. _tutorial-create-rl-env:
+.. _tutorial-create-manager-rl-env:


 Creating a Manager-Based RL Environment
@@ -6,7 +6,7 @@ Creating a Manager-Based RL Environment

 .. currentmodule:: omni.isaac.lab

-Having learnt how to create a base environment in :ref:`tutorial-create-base-env`, we will now look at how to create a manager-based
+Having learnt how to create a base environment in :ref:`tutorial-create-manager-base-env`, we will now look at how to create a manager-based
 task environment for reinforcement learning.

 The base environment is designed as an sense-act environment where the agent can send commands to the environment
@@ -56,7 +56,7 @@ The script for running the environment ``run_cartpole_rl_env.py`` is present in
 The Code Explained
 ~~~~~~~~~~~~~~~~~~

-We already went through parts of the above in the :ref:`tutorial-create-base-env` tutorial to learn
+We already went through parts of the above in the :ref:`tutorial-create-manager-base-env` tutorial to learn
 about how to specify the scene, observations, actions and events. Thus, in this tutorial, we
 will focus only on the RL components of the environment.

@@ -144,7 +144,7 @@ Tying it all together
 ---------------------

 With all the above components defined, we can now create the :class:`ManagerBasedRLEnvCfg` configuration for the
-cartpole environment. This is similar to the :class:`ManagerBasedEnvCfg` defined in :ref:`tutorial-create-base-env`,
+cartpole environment. This is similar to the :class:`ManagerBasedEnvCfg` defined in :ref:`tutorial-create-manager-base-env`,
 only with the added RL components explained in the above sections.

 .. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/cartpole_env_cfg.py

--- a/docs/source/tutorials/03_envs/index.rst
+++ b/docs/source/tutorials/03_envs/index.rst
@@ -10,8 +10,8 @@ different aspects of the framework to create a simulation environment for agent
    :maxdepth: 1
    :titlesonly:

-    create_base_env
-    create_rl_env
+    create_manager_base_env
+    create_manager_rl_env
    create_direct_rl_env
    register_rl_env_gym
    run_rl_training
--- a/docs/source/tutorials/03_envs/register_rl_env_gym.rst
+++ b/docs/source/tutorials/03_envs/register_rl_env_gym.rst
@@ -63,7 +63,7 @@ in the environment name, the entry point to the environment class, and the entry
 environment configuration class.

 .. note::
-    The ``gymnasium`` registry is a global registry. Hence, it is important to ensure that the
+    The :mod:`gymnasium` registry is a global registry. Hence, it is important to ensure that the
    environment names are unique. Otherwise, the registry will throw an error when registering
    the environment.

@@ -76,7 +76,7 @@ call for the cartpole environment in the ``omni.isaac.lab_tasks.manager_based.cl
 .. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/__init__.py
   :language: python
   :lines: 10-
-   :emphasize-lines: 11, 12, 15
+   :emphasize-lines: 4, 11, 12, 15

 The ``id`` argument is the name of the environment. As a convention, we name all the environments
 with the prefix ``Isaac-`` to make it easier to search for them in the registry. The name of the
@@ -96,54 +96,22 @@ configuration is loaded using the :meth:`omni.isaac.lab_tasks.utils.parse_env_cf
 It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
 The configuration entry point can be both a YAML file or a python configuration class.

-Direct Environemtns
+Direct Environments
 ^^^^^^^^^^^^^^^^^^^

-For direct-based environments, the following shows the registration call for the cartpole environment
-in the ``omni.isaac.lab_tasks.direct.cartpole`` sub-package:
+For direct-based environments, the environment registration follows a similar pattern. Instead of
+registering the environment's entry point as the :class:`~omni.isaac.lab.envs.ManagerBasedRLEnv` class,
+we register the environment's entry point as the implementation class of the environment.
+Additionally, we add the suffix ``-Direct`` to the environment name to differentiate it from the
+manager-based environments.

-.. code-block:: python
+As an example, the following shows the registration call for the cartpole environment in the
+``omni.isaac.lab_tasks.direct.cartpole`` sub-package:

-  import gymnasium as gym
-
-  from . import agents
-  from .cartpole_env import CartpoleEnv, CartpoleEnvCfg
-
-  ##
-  # Register Gym environments.
-  ##
-
-  gym.register(
-      id="Isaac-Cartpole-Direct-v0",
-      entry_point="omni.isaac.lab_tasks.direct.cartpole:CartpoleEnv",
-      disable_env_checker=True,
-      kwargs={
-          "env_cfg_entry_point": CartpoleEnvCfg,
-          "rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
-          "rsl_rl_cfg_entry_point": agents.rsl_rl_ppo_cfg.CartpolePPORunnerCfg,
-          "skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
-          "sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml",
-      },
-  )
-
-The ``id`` argument is the name of the environment. As a convention, we name all the environments
-with the prefix ``Isaac-`` to make it easier to search for them in the registry.
-For direct environments, we also add the suffix ``-Direct``. The name of the
-environment is typically followed by the name of the task, and then the name of the robot.
-For instance, for legged locomotion with ANYmal C on flat terrain, the environment is called
-``Isaac-Velocity-Flat-Anymal-C-Direct-v0``. The version number ``v<N>`` is typically used to specify different
-variations of the same environment. Otherwise, the names of the environments can become too long
-and difficult to read.
-
-The ``entry_point`` argument is the entry point to the environment class. The entry point is a string
-of the form ``<module>:<class>``. In the case of the cartpole environment, the entry point is
-``omni.isaac.lab_tasks.direct.cartpole:CartpoleEnv``. The entry point is used to import the environment class
-when creating the environment instance.
-
-The ``env_cfg_entry_point`` argument specifies the default configuration for the environment. The default
-configuration is loaded using the :meth:`omni.isaac.lab_tasks.utils.parse_env_cfg` function.
-It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
-The configuration entry point can be both a YAML file or a python configuration class.
+.. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/__init__.py
+   :language: python
+   :lines: 10-31
+   :emphasize-lines: 5, 12, 13, 16


 Creating the environment
@@ -181,7 +149,7 @@ Now that we have gone through the code, let's run the script and see the result:
   ./isaaclab.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32


-This should open a stage with everything similar to the previous :ref:`tutorial-create-rl-env` tutorial.
+This should open a stage with everything similar to the :ref:`tutorial-create-manager-rl-env` tutorial.
 To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal.

 In addition, you can also change the simulation device from GPU to CPU by adding the ``--cpu`` flag:

--- a/source/extensions/omni.isaac.lab/config/extension.toml
+++ b/source/extensions/omni.isaac.lab/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.18.0"
+version = "0.18.1"

 # Description
 title = "Isaac Lab framework for Robot Learning"

--- a/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
 Changelog
 ---------

+0.18.1 (2024-06-25)
+~~~~~~~~~~~~~~~~~~~
+
+Changed
+^^^^^^^
+
+* Ensured that a parity between class and its configuration class is explicitly visible in the :class:`omni.isaac.lab.envs`
+  module. This makes it easier to follow where definitions are located and how they are related. This should not be
+  a breaking change as the classes are still accessible through the same module.
+
+
 0.18.0 (2024-06-13)
-~~~~~~~~~~~~~~~~~~~~
+~~~~~~~~~~~~~~~~~~~

 Fixed
 ^^^^^
@@ -23,8 +34,8 @@ Changed
 Fixed
 ^^^^^

-* Fixed the orientation reset logic in :func:`omni.isaac.lab.envs.mdp.events.reset_root_state_uniform` to make it relative to the default orientation.
-  Earlier, the position was sampled relative to the default and the orientation not.
+* Fixed the orientation reset logic in :func:`omni.isaac.lab.envs.mdp.events.reset_root_state_uniform` to make it relative to
+  the default orientation. Earlier, the position was sampled relative to the default and the orientation not.


 0.17.12 (2024-06-13)

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/__init__.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/__init__.py
@@ -11,28 +11,35 @@ observations and executes the actions provided by the agent. However, the
 environment can also provide additional information such as the current
 reward, done flag, and information about the current episode.

-Based on these, there are two types of environments:
-
-* :class:`ManagerBasedEnv`: The manager-based workflow base environment which
-  only provides the agent with the
-  current observations and executes the actions provided by the agent.
-* :class:`ManagerBasedRLEnv`: The manager-based workflow RL task environment which
-  besides the functionality of
-  the base environment also provides additional Markov Decision Process (MDP)
-  related information such as the current reward, done flag, and information.
+There are two types of environment designing workflows:
+
+* **Manager-based**: The environment is decomposed into individual components (or managers)
+  for different aspects (such as computing observations, applying actions, and applying
+  randomization. The users mainly configure the managers and the environment coordinates the
+  managers and calls their functions.
+* **Direct**: The user implements all the necessary functionality directly into a single class
+  directly without the need for additional managers.

-In addition, RL task environments can use the direct workflow implementation:
+Based on these workflows, there are the following environment classes:
+
+* :class:`ManagerBasedEnv`: The manager-based workflow base environment which only provides the
+  agent with the current observations and executes the actions provided by the agent.
+* :class:`ManagerBasedRLEnv`: The manager-based workflow RL task environment which besides the
+  functionality of the base environment also provides additional Markov Decision Process (MDP)
+  related information such as the current reward, done flag, and information.
+* :class:`DirectRLEnv`: The direct workflow RL task environment which provides implementations for
+  implementing scene setup, computing dones, performing resets, and computing reward and observation.

-* :class:`DirectRLEnv`: The direct workflow RL task environment which provides implementations
-  for implementing scene setup, computing dones, performing resets, and computing
-  reward and observation.
+For more information about the workflow design patterns, see the `Task Design Workflows`_ section.

+.. _`Task Design Workflows`: https://isaac-sim.github.io/IsaacLab/source/features/task_workflows.html
 """

 from . import mdp, ui
-from .base_env_cfg import ManagerBasedEnvCfg, ViewerCfg
+from .common import VecEnvObs, VecEnvStepReturn, ViewerCfg
 from .direct_rl_env import DirectRLEnv
+from .direct_rl_env_cfg import DirectRLEnvCfg
 from .manager_based_env import ManagerBasedEnv
+from .manager_based_env_cfg import ManagerBasedEnvCfg
 from .manager_based_rl_env import ManagerBasedRLEnv
-from .rl_env_cfg import DirectRLEnvCfg, ManagerBasedRLEnvCfg
-from .types import VecEnvObs, VecEnvStepReturn
+from .manager_based_rl_env_cfg import ManagerBasedRLEnvCfg
--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/types.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/types.py
@@ -6,7 +6,61 @@
 from __future__ import annotations

 import torch
-from typing import Dict
+from typing import Dict, Literal
+
+from omni.isaac.lab.utils import configclass
+
+##
+# Configuration.
+##
+
+
+@configclass
+class ViewerCfg:
+    """Configuration of the scene viewport camera."""
+
+    eye: tuple[float, float, float] = (7.5, 7.5, 7.5)
+    """Initial camera position (in m). Default is (7.5, 7.5, 7.5)."""
+
+    lookat: tuple[float, float, float] = (0.0, 0.0, 0.0)
+    """Initial camera target position (in m). Default is (0.0, 0.0, 0.0)."""
+
+    cam_prim_path: str = "/OmniverseKit_Persp"
+    """The camera prim path to record images from. Default is "/OmniverseKit_Persp",
+    which is the default camera in the viewport.
+    """
+
+    resolution: tuple[int, int] = (1280, 720)
+    """The resolution (width, height) of the camera specified using :attr:`cam_prim_path`.
+    Default is (1280, 720).
+    """
+
+    origin_type: Literal["world", "env", "asset_root"] = "world"
+    """The frame in which the camera position (eye) and target (lookat) are defined in. Default is "world".
+
+    Available options are:
+
+    * ``"world"``: The origin of the world.
+    * ``"env"``: The origin of the environment defined by :attr:`env_index`.
+    * ``"asset_root"``: The center of the asset defined by :attr:`asset_name` in environment :attr:`env_index`.
+    """
+
+    env_index: int = 0
+    """The environment index for frame origin. Default is 0.
+
+    This quantity is only effective if :attr:`origin` is set to "env" or "asset_root".
+    """
+
+    asset_name: str | None = None
+    """The asset name in the interactive scene for the frame origin. Default is None.
+
+    This quantity is only effective if :attr:`origin` is set to "asset_root".
+    """
+
+
+##
+# Types.
+##

 VecEnvObs = Dict[str, torch.Tensor | Dict[str, torch.Tensor]]
 """Observation returned by the environment.
@@ -31,7 +85,7 @@ Note:

 """

-VecEnvStepReturn = tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, Dict]
+VecEnvStepReturn = tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, dict]
 """The environment signals processed at the end of each step.

 The tuple contains batched information for each sub-environment. The information is stored in the following order:

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env.py
@@ -21,21 +21,21 @@ import omni.isaac.core.utils.torch as torch_utils
 import omni.kit.app
 from omni.isaac.version import get_version

-from omni.isaac.lab.envs.types import VecEnvObs, VecEnvStepReturn
 from omni.isaac.lab.managers import EventManager
 from omni.isaac.lab.scene import InteractiveScene
 from omni.isaac.lab.sim import SimulationContext
 from omni.isaac.lab.utils.noise import NoiseModel
 from omni.isaac.lab.utils.timer import Timer

-from .rl_env_cfg import DirectRLEnvCfg
+from .common import VecEnvObs, VecEnvStepReturn
+from .direct_rl_env_cfg import DirectRLEnvCfg
 from .ui import ViewportCameraController


 class DirectRLEnv(gym.Env):
-    """The superclass for the direct workflow reinforcement learning-based environments.
+    """The superclass for the direct workflow to design environments.

-    This class implements the core functionality for reinforcement learning-based
+    This class implements the core functionality for reinforcement learning (RL)
    environments. It is designed to be used with any RL library. The class is designed
    to be used with vectorized environments, i.e., the environment is expected to be run
    in parallel with multiple sub-environments.
@@ -70,6 +70,8 @@ class DirectRLEnv(gym.Env):

        Args:
            cfg: The configuration object for the environment.
+            render_mode: The render mode for the environment. Defaults to None, which
+                is similar to ``"human"``.

        Raises:
            RuntimeError: If a simulation context already exists. The environment must always create one
@@ -165,10 +167,11 @@ class DirectRLEnv(gym.Env):
        self.reset_time_outs = torch.zeros_like(self.reset_terminated)
        self.reset_buf = torch.zeros(self.num_envs, dtype=torch.bool, device=self.sim.device)
        self.actions = torch.zeros(self.num_envs, self.cfg.num_actions, device=self.sim.device)
+
        # setup the action and observation spaces for Gym
        self._configure_gym_env_spaces()

-        # -- noise cfg for adding action and observation noise
+        # setup noise cfg for adding action and observation noise
        if self.cfg.action_noise_model:
            self._action_noise_model: NoiseModel = self.cfg.action_noise_model.class_type(
                self.num_envs, self.cfg.action_noise_model, self.device
@@ -177,6 +180,7 @@ class DirectRLEnv(gym.Env):
            self._observation_noise_model: NoiseModel = self.cfg.observation_noise_model.class_type(
                self.num_envs, self.cfg.observation_noise_model, self.device
            )
+
        # perform events at the start of the simulation
        if self.cfg.events:
            if "startup" in self.event_manager.available_modes:
@@ -260,19 +264,27 @@ class DirectRLEnv(gym.Env):
    def step(self, action: torch.Tensor) -> VecEnvStepReturn:
        """Execute one time-step of the environment's dynamics.

-        The environment steps forward at a fixed time-step, while the physics simulation is
-        decimated at a lower time-step. This is to ensure that the simulation is stable. These two
-        time-steps can be configured independently using the :attr:`DirectRLEnvCfg.decimation` (number of
-        simulation steps per environment step) and the :attr:`DirectRLEnvCfg.physics_dt` (physics time-step).
-        Based on these parameters, the environment time-step is computed as the product of the two.
+        The environment steps forward at a fixed time-step, while the physics simulation is decimated at a
+        lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured
+        independently using the :attr:`DirectRLEnvCfg.decimation` (number of simulation steps per environment step)
+        and the :attr:`DirectRLEnvCfg.sim.physics_dt` (physics time-step). Based on these parameters, the environment
+        time-step is computed as the product of the two.
+
+        This function performs the following steps:
+
+        1. Pre-process the actions before stepping through the physics.
+        2. Apply the actions to the simulator and step through the physics in a decimated manner.
+        3. Compute the reward and done signals.
+        4. Reset environments that have terminated or reached the maximum episode length.
+        5. Apply interval events if they are enabled.
+        6. Compute observations.

        Args:
            action: The actions to apply on the environment. Shape is (num_envs, action_dim).

        Returns:
-            A tuple containing the observations and extras.
+            A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
        """
-
        # add action noise
        if self.cfg.action_noise_model:
            action = self._action_noise_model.apply(action.clone())
@@ -317,6 +329,7 @@ class DirectRLEnv(gym.Env):
        self.obs_buf = self._get_observations()

        # add observation noise
+        # note: we apply no noise to the state space (since it is used for critic networks)
        if self.cfg.observation_noise_model:
            self.obs_buf["policy"] = self._observation_noise_model.apply(self.obs_buf["policy"])

@@ -424,6 +437,10 @@ class DirectRLEnv(gym.Env):
            # update closing status
            self._is_closed = True

+    """
+    Operations - Debug Visualization.
+    """
+
    def set_debug_vis(self, debug_vis: bool) -> bool:
        """Toggles the environment debug visualization.

@@ -477,6 +494,7 @@ class DirectRLEnv(gym.Env):
        self.observation_space = gym.vector.utils.batch_space(self.single_observation_space["policy"], self.num_envs)
        self.action_space = gym.vector.utils.batch_space(self.single_action_space, self.num_envs)

+        # optional state space for asymmetric actor-critic architectures
        if self.num_states > 0:
            self.single_observation_space["critic"] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.num_states,))
            self.state_space = gym.vector.utils.batch_space(self.single_observation_space["critic"], self.num_envs)
@@ -488,49 +506,103 @@ class DirectRLEnv(gym.Env):
            env_ids: List of environment ids which must be reset
        """
        self.scene.reset(env_ids)
-        # apply events such as randomizations for environments that need a reset
+
+        # apply events such as randomization for environments that need a reset
        if self.cfg.events:
            if "reset" in self.event_manager.available_modes:
                self.event_manager.apply(env_ids=env_ids, mode="reset")
+
+        # reset noise models
        if self.cfg.action_noise_model:
            self._action_noise_model.reset(env_ids)
        if self.cfg.observation_noise_model:
            self._observation_noise_model.reset(env_ids)
+
        # reset the episode length buffer
        self.episode_length_buf[env_ids] = 0

-    # this can be done through configs as well
+    """
+    Implementation-specific functions.
+    """
+
    def _setup_scene(self):
-        pass
+        """Setup the scene for the environment.

-    def _set_debug_vis_impl(self, debug_vis: bool):
-        """Set debug visualization into visualization objects.
+        This function is responsible for creating the scene objects and setting up the scene for the environment.
+        The scene creation can happen through :class:`omni.isaac.lab.scene.InteractiveSceneCfg` or through
+        directly creating the scene objects and registering them with the scene manager.

-        This function is responsible for creating the visualization objects if they don't exist
-        and input ``debug_vis`` is True. If the visualization objects exist, the function should
-        set their visibility into the stage.
+        We leave the implementation of this function to the derived classes. If the environment does not require
+        any explicit scene setup, the function can be left empty.
        """
-        raise NotImplementedError(f"Debug visualization is not implemented for {self.__class__.__name__}.")
+        pass

    @abstractmethod
    def _pre_physics_step(self, actions: torch.Tensor):
-        return NotImplementedError
+        """Pre-process actions before stepping through the physics.
+
+        This function is responsible for pre-processing the actions before stepping through the physics.
+        It is called before the physics stepping (which is decimated).
+
+        Args:
+            actions: The actions to apply on the environment. Shape is (num_envs, action_dim).
+        """
+        raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.")

    @abstractmethod
    def _apply_action(self):
-        return NotImplementedError
+        """Apply actions to the simulator.
+
+        This function is responsible for applying the actions to the simulator. It is called at each
+        physics time-step.
+        """
+        raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.")

    @abstractmethod
    def _get_observations(self) -> VecEnvObs:
-        return NotImplementedError
+        """Compute and return the observations for the environment.
+
+        Returns:
+            The observations for the environment.
+        """
+        raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")

    def _get_states(self) -> VecEnvObs | None:
-        return None
+        """Compute and return the states for the environment.
+
+        The state-space is used for asymmetric actor-critic architectures. It is configured
+        using the :attr:`DirectRLEnvCfg.num_states` parameter.
+
+        Returns:
+            The states for the environment. If the environment does not have a state-space, the function
+            returns a None.
+        """
+        return None  # noqa: R501

    @abstractmethod
    def _get_rewards(self) -> torch.Tensor:
-        return NotImplementedError
+        """Compute and return the rewards for the environment.
+
+        Returns:
+            The rewards for the environment. Shape is (num_envs,).
+        """
+        raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.")

    @abstractmethod
    def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
-        return NotImplementedError
+        """Compute and return the done flags for the environment.
+
+        Returns:
+            A tuple containing the done flags for termination and time-out.
+            Shape of individual tensors is (num_envs,).
+        """
+        raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.")
+
+    def _set_debug_vis_impl(self, debug_vis: bool):
+        """Set debug visualization into visualization objects.
+
+        This function is responsible for creating the visualization objects if they don't exist
+        and input ``debug_vis`` is True. If the visualization objects exist, the function should
+        set their visibility into the stage.
+        """
+        raise NotImplementedError(f"Debug visualization is not implemented for {self.__class__.__name__}.")
--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/rl_env_cfg.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/rl_env_cfg.py
@@ -8,88 +8,23 @@ from dataclasses import MISSING
 from omni.isaac.lab.scene import InteractiveSceneCfg
 from omni.isaac.lab.sim import SimulationCfg
 from omni.isaac.lab.utils import configclass
-from omni.isaac.lab.utils.noise.noise_cfg import NoiseModelCfg
+from omni.isaac.lab.utils.noise import NoiseModelCfg

-from .base_env_cfg import ManagerBasedEnvCfg, ViewerCfg
-from .ui import BaseEnvWindow, ManagerBasedRLEnvWindow
+from .common import ViewerCfg
+from .ui import BaseEnvWindow


 @configclass
-class ManagerBasedRLEnvCfg(ManagerBasedEnvCfg):
-    """Configuration for a reinforcement learning environment with the manager-based workflow."""
+class DirectRLEnvCfg:
+    """Configuration for an RL environment defined with the direct workflow.

-    # ui settings
-    ui_window_class_type: type | None = ManagerBasedRLEnvWindow
-
-    # general settings
-    is_finite_horizon: bool = False
-    """Whether the learning task is treated as a finite or infinite horizon problem for the agent.
-    Defaults to False, which means the task is treated as an infinite horizon problem.
-
-    This flag handles the subtleties of finite and infinite horizon tasks:
-
-    * **Finite horizon**: no penalty or bootstrapping value is required by the the agent for
-      running out of time. However, the environment still needs to terminate the episode after the
-      time limit is reached.
-    * **Infinite horizon**: the agent needs to bootstrap the value of the state at the end of the episode.
-      This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this
-      bootstrapping calculation.
-
-    If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal
-    is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out
-    (or truncated) done signal is sent to the agent.
-
-    Note:
-        The base :class:`ManagerBasedRLEnv` class does not use this flag directly. It is used by the environment
-        wrappers to determine what type of done signal to send to the corresponding learning agent.
+    Please refer to the :class:`omni.isaac.lab.envs.direct_rl_env.DirectRLEnv` class for more details.
    """

-    episode_length_s: float = MISSING
-    """Duration of an episode (in seconds).
-
-    Based on the decimation rate and physics time step, the episode length is calculated as:
-
-    .. code-block:: python
-
-        episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
-
-    For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds,
-    then the episode length in steps is 100.
-    """
-
-    # environment settings
-    rewards: object = MISSING
-    """Reward settings.
-
-    Please refer to the :class:`omni.isaac.lab.managers.RewardManager` class for more details.
-    """
-
-    terminations: object = MISSING
-    """Termination settings.
-
-    Please refer to the :class:`omni.isaac.lab.managers.TerminationManager` class for more details.
-    """
-
-    curriculum: object = MISSING
-    """Curriculum settings.
-
-    Please refer to the :class:`omni.isaac.lab.managers.CurriculumManager` class for more details.
-    """
-
-    commands: object = MISSING
-    """Command settings.
-
-    Please refer to the :class:`omni.isaac.lab.managers.CommandManager` class for more details.
-    """
-
-
-@configclass
-class DirectRLEnvCfg(ManagerBasedEnvCfg):
-    """Configuration for a reinforcement learning environment with the direct workflow."""
-
    # simulation settings
    viewer: ViewerCfg = ViewerCfg()
    """Viewer configuration. Default is ViewerCfg()."""
+
    sim: SimulationCfg = SimulationCfg()
    """Physics simulation configuration. Default is SimulationCfg()."""

@@ -113,14 +48,6 @@ class DirectRLEnvCfg(ManagerBasedEnvCfg):
    This means that the control action is updated every 10 simulation steps.
    """

-    # environment settings
-    scene: InteractiveSceneCfg = MISSING
-    """Scene settings.
-
-    Please refer to the :class:`omni.isaac.lab.scene.InteractiveSceneCfg` class for more details.
-    """
-
-    # general settings
    is_finite_horizon: bool = False
    """Whether the learning task is treated as a finite or infinite horizon problem for the agent.
    Defaults to False, which means the task is treated as an infinite horizon problem.
@@ -156,29 +83,39 @@ class DirectRLEnvCfg(ManagerBasedEnvCfg):
    then the episode length in steps is 100.
    """

+    # environment settings
+    scene: InteractiveSceneCfg = MISSING
+    """Scene settings.
+
+    Please refer to the :class:`omni.isaac.lab.scene.InteractiveSceneCfg` class for more details.
+    """
+
+    events: object = None
+    """Event settings. Defaults to None, in which case no events are applied through the event manager.
+
+    Please refer to the :class:`omni.isaac.lab.managers.EventManager` class for more details.
+    """
+
    num_observations: int = MISSING
-    """The size of the observation for each environment."""
+    """The dimension of the observation space from each environment instance."""

    num_states: int = 0
-    """The size of the state-space for each environment. Default is 0.
+    """The dimension of the state-space from each environment instance. Default is 0, which means no state-space is defined.

-    This is used for asymmetric actor-critic and defines the observation space for the critic.
+    This is useful for asymmetric actor-critic and defines the observation space for the critic.
    """

-    num_actions: int = MISSING
-    """The size of the action space for each environment."""
+    observation_noise_model: NoiseModelCfg | None = None
+    """The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.

-    events: object = None
-    """Settings for specifying domain randomization terms during training.
-       Please refer to the :class:`omni.isaac.lab.managers.EventManager` class for more details.
+    Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
    """

+    num_actions: int = MISSING
+    """The dimension of the action space for each environment."""
+
    action_noise_model: NoiseModelCfg | None = None
-    """Settings for adding noise to the action buffer.
-       Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
-    """
+    """The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.

-    observation_noise_model: NoiseModelCfg | None = None
-    """Settings for adding noise to the observation buffer.
    Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
    """
--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env.py
@@ -12,13 +12,13 @@ from typing import Any
 import carb
 import omni.isaac.core.utils.torch as torch_utils

-from omni.isaac.lab.envs.types import VecEnvObs
 from omni.isaac.lab.managers import ActionManager, EventManager, ObservationManager
 from omni.isaac.lab.scene import InteractiveScene
 from omni.isaac.lab.sim import SimulationContext
 from omni.isaac.lab.utils.timer import Timer

-from .base_env_cfg import ManagerBasedEnvCfg
+from .common import VecEnvObs
+from .manager_based_env_cfg import ManagerBasedEnvCfg
 from .ui import ViewportCameraController


@@ -251,7 +251,7 @@ class ManagerBasedEnv:
        The environment steps forward at a fixed time-step, while the physics simulation is
        decimated at a lower time-step. This is to ensure that the simulation is stable. These two
        time-steps can be configured independently using the :attr:`ManagerBasedEnvCfg.decimation` (number of
-        simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.physics_dt` (physics time-step).
+        simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.sim.dt` (physics time-step).
        Based on these parameters, the environment time-step is computed as the product of the two.

        Args:

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/base_env_cfg.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/base_env_cfg.py
@@ -10,7 +10,6 @@ configuring the environment instances, viewer settings, and simulation parameter
 """

 from dataclasses import MISSING
-from typing import Literal

 import omni.isaac.lab.envs.mdp as mdp
 from omni.isaac.lab.managers import EventTermCfg as EventTerm
@@ -18,52 +17,10 @@ from omni.isaac.lab.scene import InteractiveSceneCfg
 from omni.isaac.lab.sim import SimulationCfg
 from omni.isaac.lab.utils import configclass

+from .common import ViewerCfg
 from .ui import BaseEnvWindow


-@configclass
-class ViewerCfg:
-    """Configuration of the scene viewport camera."""
-
-    eye: tuple[float, float, float] = (7.5, 7.5, 7.5)
-    """Initial camera position (in m). Default is (7.5, 7.5, 7.5)."""
-
-    lookat: tuple[float, float, float] = (0.0, 0.0, 0.0)
-    """Initial camera target position (in m). Default is (0.0, 0.0, 0.0)."""
-
-    cam_prim_path: str = "/OmniverseKit_Persp"
-    """The camera prim path to record images from. Default is "/OmniverseKit_Persp",
-    which is the default camera in the viewport.
-    """
-
-    resolution: tuple[int, int] = (1280, 720)
-    """The resolution (width, height) of the camera specified using :attr:`cam_prim_path`.
-    Default is (1280, 720).
-    """
-
-    origin_type: Literal["world", "env", "asset_root"] = "world"
-    """The frame in which the camera position (eye) and target (lookat) are defined in. Default is "world".
-
-    Available options are:
-
-    * ``"world"``: The origin of the world.
-    * ``"env"``: The origin of the environment defined by :attr:`env_index`.
-    * ``"asset_root"``: The center of the asset defined by :attr:`asset_name` in environment :attr:`env_index`.
-    """
-
-    env_index: int = 0
-    """The environment index for frame origin. Default is 0.
-
-    This quantity is only effective if :attr:`origin` is set to "env" or "asset_root".
-    """
-
-    asset_name: str | None = None
-    """The asset name in the interactive scene for the frame origin. Default is None.
-
-    This quantity is only effective if :attr:`origin` is set to "asset_root".
-    """
-
-
 @configclass
 class DefaultEventManagerCfg:
    """Configuration of the default event manager.
@@ -82,6 +39,7 @@ class ManagerBasedEnvCfg:
    # simulation settings
    viewer: ViewerCfg = ViewerCfg()
    """Viewer configuration. Default is ViewerCfg()."""
+
    sim: SimulationCfg = SimulationCfg()
    """Physics simulation configuration. Default is SimulationCfg()."""


--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_rl_env.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_rl_env.py
@@ -17,9 +17,9 @@ from omni.isaac.version import get_version

 from omni.isaac.lab.managers import CommandManager, CurriculumManager, RewardManager, TerminationManager

+from .common import VecEnvStepReturn
 from .manager_based_env import ManagerBasedEnv
-from .rl_env_cfg import ManagerBasedRLEnvCfg
-from .types import VecEnvStepReturn
+from .manager_based_rl_env_cfg import ManagerBasedRLEnvCfg


 class ManagerBasedRLEnv(ManagerBasedEnv, gym.Env):
@@ -84,9 +84,11 @@ class ManagerBasedRLEnv(ManagerBasedEnv, gym.Env):

        # setup the action and observation spaces for Gym
        self._configure_gym_env_spaces()
+
        # perform events at the start of the simulation
        if "startup" in self.event_manager.available_modes:
            self.event_manager.apply(mode="startup")
+
        # print the environment information
        print("[INFO]: Completed setting up the environment...")


--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_rl_env_cfg.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_rl_env_cfg.py
+# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+from dataclasses import MISSING
+
+from omni.isaac.lab.utils import configclass
+
+from .manager_based_env_cfg import ManagerBasedEnvCfg
+from .ui import ManagerBasedRLEnvWindow
+
+
+@configclass
+class ManagerBasedRLEnvCfg(ManagerBasedEnvCfg):
+    """Configuration for a reinforcement learning environment with the manager-based workflow."""
+
+    # ui settings
+    ui_window_class_type: type | None = ManagerBasedRLEnvWindow
+
+    # general settings
+    is_finite_horizon: bool = False
+    """Whether the learning task is treated as a finite or infinite horizon problem for the agent.
+    Defaults to False, which means the task is treated as an infinite horizon problem.
+
+    This flag handles the subtleties of finite and infinite horizon tasks:
+
+    * **Finite horizon**: no penalty or bootstrapping value is required by the the agent for
+      running out of time. However, the environment still needs to terminate the episode after the
+      time limit is reached.
+    * **Infinite horizon**: the agent needs to bootstrap the value of the state at the end of the episode.
+      This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this
+      bootstrapping calculation.
+
+    If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal
+    is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out
+    (or truncated) done signal is sent to the agent.
+
+    Note:
+        The base :class:`ManagerBasedRLEnv` class does not use this flag directly. It is used by the environment
+        wrappers to determine what type of done signal to send to the corresponding learning agent.
+    """
+
+    episode_length_s: float = MISSING
+    """Duration of an episode (in seconds).
+
+    Based on the decimation rate and physics time step, the episode length is calculated as:
+
+    .. code-block:: python
+
+        episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
+
+    For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds,
+    then the episode length in steps is 100.
+    """
+
+    # environment settings
+    rewards: object = MISSING
+    """Reward settings.
+
+    Please refer to the :class:`omni.isaac.lab.managers.RewardManager` class for more details.
+    """
+
+    terminations: object = MISSING
+    """Termination settings.
+
+    Please refer to the :class:`omni.isaac.lab.managers.TerminationManager` class for more details.
+    """
+
+    curriculum: object = MISSING
+    """Curriculum settings.
+
+    Please refer to the :class:`omni.isaac.lab.managers.CurriculumManager` class for more details.
+    """
+
+    commands: object = MISSING
+    """Command settings.
+
+    Please refer to the :class:`omni.isaac.lab.managers.CommandManager` class for more details.
+    """
--- a/source/extensions/omni.isaac.lab_tasks/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.lab_tasks/docs/CHANGELOG.rst
@@ -46,7 +46,7 @@ Added
 Changed
 ^^^^^^^

-* Set default device for RSL RL and SB3 configs to "cuda:0".
+* Made default device for RSL RL and SB3 configs to "cuda:0".

 0.7.3 (2024-05-21)
 ~~~~~~~~~~~~~~~~~~
@@ -54,7 +54,7 @@ Changed
 Added
 ^^^^^

-* Introduce ``--max_iterations`` argument to training scripts for specifying number of training iterations.
+* Introduced ``--max_iterations`` argument to training scripts for specifying number of training iterations.

 0.7.2 (2024-05-13)
 ~~~~~~~~~~~~~~~~~~
@@ -62,7 +62,8 @@ Added
 Added
 ^^^^^

-* Add Shadow Hand environments: ``Isaac-Shadow-Hand-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-FF-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-LSTM-Direct-v0``.
+* Added Shadow Hand environments: ``Isaac-Shadow-Hand-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-FF-Direct-v0``,
+  and ``Isaac-Shadow-Hand-OpenAI-LSTM-Direct-v0``.


 0.7.1 (2024-05-09)
@@ -80,7 +81,9 @@ Added
 Changed
 ^^^^^^^

-* Renamed all references of ``BaseEnv``, ``RLTaskEnv``, and ``OIGEEnv`` to :class:`omni.isaac.lab.envs.ManagerBasedEnv`, :class:`omni.isaac.lab.envs.ManagerBasedRLEnv`, and :class:`omni.isaac.lab.envs.DirectRLEnv`.
+* Renamed all references of ``BaseEnv``, ``RLTaskEnv``, and ``OIGEEnv`` to
+  :class:`omni.isaac.lab.envs.ManagerBasedEnv`, :class:`omni.isaac.lab.envs.ManagerBasedRLEnv`,
+  and :class:`omni.isaac.lab.envs.DirectRLEnv` respectively.
 * Split environments into ``manager_based`` and ``direct`` folders.

 Added