Adapts tutorial on base and RL environment (#283)

# Description This MR adapts the environment tutorials. It reorganizes the tutorials for them and also modify the content to make them more complete. ## Type of change - This change requires a documentation update ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Adapts tutorial on base and RL environment (#283)
# Description This MR adapts the environment tutorials. It reorganizes the tutorials for them and also modify the content to make them more complete. ## Type of change - This change requires a documentation update ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
e49048f9 · Mayank Mittal · GitHub · eb75c536 · e49048f9 · e49048f9
Unverified Commit e49048f9 authored Dec 12, 2023 by Mayank Mittal Committed by GitHub Dec 12, 2023
19 changed files
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -62,10 +62,10 @@ For more information about the framework, please refer to the `paper <https://ar
   :maxdepth: 1
   :caption: Tutorials (Environments)

-   source/tutorials/03_envs/00_gym_env
-   source/tutorials/03_envs/01_create_base_env
-   source/tutorials/03_envs/02_create_rl_env
-   source/tutorials/03_envs/03_wrappers
+   source/tutorials/03_envs/base_env
+   source/tutorials/03_envs/rl_env
+   source/tutorials/03_envs/gym_registry
+   source/tutorials/03_envs/rl_training


 .. toctree::

--- a/docs/source/how_to_guides/01_assets/cartpole.rst
+++ b/docs/source/how_to_guides/01_assets/cartpole.rst
@@ -23,7 +23,7 @@ the rail. The attached pole has 1 DOF that allows it to rotate freely.

 .. TODO: Add isaac sim screenshot and replace GIF with a webdb

-In :ref:`creating-base-env` participants will learn to control the
+In :ref:`tutorial-create-base-env` participants will learn to control the
 pole to stabilize the cart, but this tutorial focuses on merely constructing
 the :class:`ArticulationCfg` that defines the cartpole.


--- a/docs/source/tutorials/03_envs/03_wrappers.rst
+++ b/docs/source/tutorials/03_envs/03_wrappers.rst
+.. _how-to-env-wrappers:
+
 Using environment wrappers
 ==========================


--- a/docs/source/how_to_guides/index.rst
+++ b/docs/source/how_to_guides/index.rst
@@ -48,3 +48,12 @@ The `control` section focuses on how to implement controllers within Orbit.
    Markers <04_controllers/ik_controller>

 Please refer to the individual guides in each section for detailed instructions and examples.
+
+
+Gym
+---
+
+.. toctree::
+    :maxdepth: 1
+
+    Gym Wrappers <05_gym/wrappers>
--- a/docs/source/tutorials/00_sim/applauncher.rst
+++ b/docs/source/tutorials/00_sim/applauncher.rst
 Launching Isaac Sim from AppLauncher
-==============================
+====================================

 .. currentmodule:: omni.isaac.orbit


--- a/docs/source/tutorials/03_envs/00_gym_env.rst
+++ b/docs/source/tutorials/03_envs/00_gym_env.rst
-Running an RL environment
-=========================
-
-In this tutorial, we will learn how to run existing learning environments provided in the ``omni.isaac.orbit_tasks``
-extension. All the environments included in Orbit follow the :class:`gymnasium.Env` interface, which means that they can be used
-with any reinforcement learning framework that supports OpenAI Gym. However, since the environments are implemented
-in a vectorized fashion, they can only be used with frameworks that support vectorized environments.
-
-Many common frameworks come with their own desired definitions of a vectorized environment and require the returned data
-to follow their supported data types and data structures. For example, ``stable-baselines3`` uses ``numpy`` arrays, while
-``rsl-rl``, ``rl-games``, or ``skrl`` use ``torch.Tensor``. We provide wrappers for these different frameworks, which can be found
-in the ``omni.isaac.orbit_tasks.utils.wrappers`` module.
-
-
-The Code
-~~~~~~~~
-
-The tutorial corresponds to the ``zero_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
-
-
-.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
-   :language: python
-   :emphasize-lines: 34-35,41-44,49-55
-   :linenos:
-
-The Code Explained
-~~~~~~~~~~~~~~~~~~
-
-Using gym registry for environments
-----------------------------------
-
-All environments are registered using the ``gym`` registry, which means that you can create an instance of
-an environment by calling ``gym.make``. The environments are registered in the ``__init__.py`` file of the
-``omni.isaac.orbit_tasks`` extension with the following syntax:
-
-.. code-block:: python
-
-    # Cartpole environment
-    gym.register(
-        id="Isaac-Cartpole-v0",
-        entry_point="omni.isaac.orbit_tasks.classic.cartpole:CartpoleEnv",
-        disable_env_checker=True,
-        kwargs={"env_cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
-    )
-
-The ``env_cfg_entry_point`` argument is used to load the default configuration for the environment. The default
-configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_cfg_from_registry` function.
-The configuration entry point can correspond to both a YAML file or a python configuration
-class. The default configuration can be overridden by passing a custom configuration instance to the ``gym.make``
-function as shown later in the tutorial.
-
-To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks`` extension,
-we must import the module at the start of the script.
-
-.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
-   :language: python
-   :lines: 33-35
-   :linenos:
-   :lineno-start: 33
-
-.. note::
-
-    As a convention, we name all the environments in ``omni.isaac.orbit_tasks`` extension with the prefix ``Isaac-``.
-    For more complicated environments, we follow the pattern: ``Isaac-<TaskName>-<RobotName>-v<N>``,
-    where `N` is used to specify different observations or action spaces within the same task definition. For example,
-    for legged locomotion with ANYmal C, the environment is called ``Isaac-Velocity-Anymal-C-v0``.
-
-
-In this tutorial, the task name is read from the command line. The task name is used to load the default configuration
-as well as to create the environment instance. In addition, other parsed command line arguments such as the
-number of environments, the simulation device, and whether to render, are used to override the default configuration.
-
-.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
-   :language: python
-   :lines: 42-45
-   :linenos:
-   :lineno-start: 42
-
-
-Running the environment
-----------------------
-
-Once creating the environment, the rest of the execution follows the standard resetting and stepping.
-
-.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
-   :language: python
-   :lines: 45-55
-   :linenos:
-   :lineno-start: 45
-
-Similar to previous tutorials, to ensure a safe exit when running the script, we need to add checks
-for whether the simulation is stopped or not.
-
-.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
-   :language: python
-   :lines: 57-59
-   :linenos:
-   :lineno-start: 57
-
-
-The Code Execution
-~~~~~~~~~~~~~~~~~~
-
-Now that we have gone through the code, let's run the script and see the result:
-
-.. code-block:: bash
-
-   ./orbit.sh -p source/standalone/environments/zero_agent.py --task Isaac-Cartpole-v0 --num_envs 32
-
-
-This should open a stage with a ground plane, lights and 32 cartpoles spawned in a grid. The cartpole
-would be falling down since no actions are acting on them. To stop the simulation,
-you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C``
-in the terminal.
-
-.. note::
-    When running environments with GPU pipeline, the states in the scene are not synced with the USD
-    interface. Therefore values in the UI may appear wrong when simulation is running. Although objects
-    may be updating in the Viewport, attribute values in the UI will not update along with them.
-
-    To enable USD synchronization, please use the CPU pipeline with ``--cpu`` and disable flatcache by setting
-    ``use_flatcache`` to False in the environment configuration.
--- a/docs/source/tutorials/03_envs/01_create_base_env.rst
+++ b/docs/source/tutorials/03_envs/01_create_base_env.rst
-.. _creating-base-env:
-
-Creating a Base Environment
-===========================
-
-In Orbit, there are two types of environments: :class:`BaseEnv` and
-:class:`RLTaskEnv`. Base environments contain a robot, its action and
-observation spaces as well as randomizations (and handling of resets) to be applied to the
-environment. Typically, a :class:`BaseEnv` is utilized if one wants
-to evaluate an existing control algorithm, mechanical design or do traditional
-robot control but doesn't plan on doing RL. This workflow is commonly used in
-other simulators such as Gazebo, Mujoco, etc. :class:`BaseEnv` doesn't
-contain rewards and terminations, which are common in RL settings. If
-interested in doing RL in Orbit, this tutorial is still a good starting point
-as :class:`RLTaskEnv` inherits from :class:`BaseEnv` and there's a lot of shared functionality.
-
-In this tutorial, we will look at the base class :class:`BaseEnv` and its
-corresponding configuration class :class:`BaseEnvCfg` and
-discuss the different configuration classes that need to be implemented to
-create a new environment. We will use the Cartpole environment with simple PD
-control as an example to illustrate the different steps in developing a
-new :class:`BaseEnv`.
-
-The Code
-~~~~~~~~
-
-The tutorial corresponds to the ``cartpole_base_env`` script  in the ``orbit/source/standalone/tutorials`` directory.
-
-.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-   :language: python
-
-All environments in Orbit inherit from the base class :class:`BaseEnv`.
-
-The base class :class:`BaseEnv` wraps around many intricacies of the
-simulation and provides a simple interface for the user to implement their own
-environment. At the core, the base class provides the following
-functionality:
-
-* :meth:`__init__` method to create the environment instances and initialize
-   different components
-* :meth:`reset` and :meth:`step` methods that are used to interact with
-   the environment
-* :meth:`close` method to close the environment
-* :meth:`load_managers` method to load the managers that handle actions,
-   observations and any randomizations
-
-The base class :class:`BaseEnv` is defined in the file \ ``base_env.py``:
-
-.. dropdown:: :fa:`eye,mr-1` Code for  ``base_env.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/base_env.py
-      :language: python
-      :lines: 45-323
-
-To customize a :class:`BaseEnv` one needs to implement a :class:`BaseEnvCfg`
-which configures the action space, observation space and randomizations
-associated with the environment. These are utilized by their associated :class:`ManagerBase`
-classes to interact with the environment.
-
-The base class :class:`BaseEnvCfg` is defined in the file ``base_env_cfg.py``:
-
-.. dropdown:: :fa:`eye,mr-1` Code for ``base_env_cfg.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/base_env_cfg.py
-      :language: python
-      :lines: 56-91
-
-The Code Explained
-~~~~~~~~~~~~~~~~~~
-
-Designing the scene
-------------------
-
-The first step in creating a new environment is to configure the scene by
-implementing a :class:`InteractiveSceneCfg`. This will then be used to construct
-a :class:`InteractiveScene` which handles spawning of the objects in the scene.
-
-In this tutorial, we will be using the configuration from ``cartpole_scene.py``.
-See :ref:`tutorial-interactive-scene` for a tutorial on how to create it.
-
-The scene used here consists of a ground plane, the cartpole and some lights.
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:``CartpoleSceneCfg`` class in ``cartpole_scene.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_scene.py
-      :language: python
-
-
-Defining actions
----------------
-
-The action space of the Cartpole environment in this example is the force control
-over the sliding Cart portion of the cartpole,
-moving it horizontally along the rail to balance the pole vertically.
-
-The :class:`ActionTerm` developed in this tutorial implements PD control. In
-:meth:`process_actions`, the PD controller calculates the control input
-given the current joint position and velocity of the the `cart_to_pole` joint.
-This method, in addition to :meth:`apply_actions` are called by the action
-manager at each step of the environment to determine the next action and then
-apply it.
-
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:``ActionsCfg`` class in ``cartpole_base_env.py``
-
-   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-      :language: python
-      :start-after: # Cartpole Action Configuration
-      :end-before: # Cartpole Observation Configuration
-
-Defining observations
----------------------
-
-The observation space of the environment defines the observed state at
-each time step.
-
-The returned observations will be a dictionary with the keys corresponding to the group names
-and the values corresponding to the observation tensors of shape ``(num_envs, obs_dims)``.
-
-This allows the user to define multiple
-observation groups that can then be used for different learning paradigms (e.g. for asymmetric actor-critic, one group could be for the RL
-policy while the other is for the RL-critic).
-
-While not prescriptive in the base class, it is recommended that the user always define the ``policy`` group which is used as the default
-observation group for the environment. This is essential because various wrappers read this group name to unwrap the observations dictionary
-for their respective frameworks.
-
-In the cartpole environment, the observation is computed by the :class:`ObservationManager` class. This class is responsible for computing
-the observations for the environment by reading data from the various buffers and sensors. More details on the observation manager can be
-found in the   `MDP managers <../api/orbit/omni.isaac.orbit.managers.html#observation-manager>`_ section.
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:`ObservationsCfg` class in ``cartpole_base_env.py``
-
-   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-      :language: python
-      :start-after: # Cartpole Observation Configuration
-      :end-before: # Cartpole Randomization Configuration
-
-Defining randomizations
-----------------------
-
-Often times in robotics, randomness is used to more closely emulate the real world.
-In Orbit, :class:`RandomizationManager` is used to manage randomness and define
-what environment terms that will be randomized via its :class:`RandomizationCfg`.
-In addition, it handles reset calls, so even if you don't want to randomize anything,
-you still need to define a :class:`RandomizationCfg` to handle the reset calls.
-
-In this example, the initial slider to cart joint position and velocity are randomized
-to be within (-1.0, 1.0) meters and (-0.1, 0.1) radians respectively. Also, the pole joint's position
-and velocity are randomized slightly to make the problem more challenging.
-
-When developing your own environments, feel free to add more :class:`RandTerm` as needed or use the
-ones pre packaged with Orbit.
-
-Randomization terms have a `mode` associated with them as denoted by the mode argument of
-:class:`RandTerm`. The various `mode`s are `"interval", "reset", "startup"`.
-
-Randomization Modes
-####################
-
-* `"interval"` `mode` execute randomization at a given fixed interval.
-* `"reset"` `mode` execute randomization on every call to an environment's :meth:`reset`.
-* `"startup"` `mode` execute randomization only once at environment startup.
-
-In this example, the randomization terms use `reset` mode indicating the
-randomization will be applied upon each call to :class:`reset`.
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:``RandomizationCfg`` class in ``cartpole_base_env.py```
-
-   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-      :language: python
-      :start-after: # Cartpole Randomization Configuration
-      :end-before: # Cartpole Environment Configuration
-
-Tying it all together
---------------------
-In this section we will integrate the scene, observation, action and randomization
-configurations built in the previous sections to fully configure the Cartpole
-:class:`BaseEnv`.
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:`CartpoleEnvCfg` class in ``cartpole_base_env.py``
-
-   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-      :language: python
-      :start-after: # Cartpole Environment Configuration
-      :end-before: # Main
-
-.. note::  To modify any configuration of the :class:`BaseEnvCfg` you can use :class:`__post_init__`
-   as is done in this example.
-
-
-The main method
---------------
-Lastly, we define the main method which will handle resetting and stepping of the environment.
-At each iteration, we send the target_position - the `action` to the environment and
-receive back the observation which is then printed to the console.
-
-.. dropdown:: :fa:`eye,mr-1` Code for :class:`CartpoleEnvCfg` class in ``cartpole_base_env.py``
-
-   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
-      :language: python
-      :start-after: # Main
-
-
-The Code Execution
-~~~~~~~~~~~~~~~~~~
-
-Now that we have gone through the code, let's run the environment.
-
-As an example, to run the Cartpole base environment script, you can use the following command.
-
-.. code-block:: bash
-
-   ./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_base_env.py
-
-
-This should open a stage with a ground plane, lights, and a cartpole.
-The simulation should be playing with the cartpole attempting to balance itself
-such that the pole is vertical. Feel free to modify the P and D gains
-to improve the cart's ability to balance the pole vertically.
-
-To stop the simulation,
-you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C`` in the terminal
-where you started the simulation.
--- a/docs/source/tutorials/03_envs/02_create_rl_env.rst
+++ b/docs/source/tutorials/03_envs/02_create_rl_env.rst
-Creating an RL Environment
-==========================
-
-In Orbit, we provide a set of environments that are ready to use. However, you may want to create your own
-RL environment for your application. This tutorial will show you how to create a new RL environment from scratch.
-
-As a practice, we maintain all the environments that are *officially* provided in the ``omni.isaac.orbit_tasks``
-extension. It is recommended to add your environment to the extension ``omni.isaac.contrib_tasks``. This way, you can
-easily update your environment when the API changes and you can also contribute your environment to the community.
-
-In this tutorial, we will look at the configuration class :class:`RLTaskEnvCfg` that is used to
-configure your learning agent and discuss the different classes you need to create to configure
-your RL task. We will use the Cartpole balancing task environment as an example to illustrate the
-different components.
-
-The Code
-~~~~~~~~
-
-All RL environments in Orbit inherit from the base class :class:`RLTaskEnv`. The base class follows the ``gym.Env``
-interface and provides the basic functionality for an environment. Similar to `IsaacGym <https://sites.google.com/view/isaacgym-nvidia>`_,
-all environments designed in Orbit are *vectorized* implementations. This means that multiple environment
-instances are packed into the simulation and the user can interact with the environment by passing in a batch of actions.
-
-.. note::
-
-   While the environment itself is implemented as a vectorized environment, we do not
-   inherit from :class:`gym.vector.VectorEnv`. This is mainly because the class adds
-   various methods (for wait and asynchronous updates) which are not required.
-   Additionally, each RL library typically has its own definition for a vectorized
-   environment. Thus, to reduce complexity, we directly use the :class:`gym.Env` over
-   here and leave it up to library-defined wrappers to take care of wrapping this
-   environment for their agents.
-
-The base class :class:`RLTaskEnv` wraps around many intricacies of the simulation and
-provides a simple interface for the user to implement their own environment. At the core, the
-base class provides the following functionality:
-
-* :meth:`__init__` method to create the environment instances and initialize different components
-* :meth:`reset` and :meth:`step` methods that are used to interact with the environment
-* :meth:`render` method to render the environment
-* :meth:`close` method to close the environment
-
-All environments are registered using the :func:`gym.register` method. This method takes in the name of the
-environment, the class that implements the environment and the configuration file for the environment.
-The name of the environment is used to create the environment using the :func:`gym.make` method.
-
-The base class :class:`RLTaskEnv` is defined in the file ``rl_task_env.py``:
-
-.. dropdown:: :fa:`eye,mr-1` Code for ``rl_task_env.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/rl_task_env.py
-      :language: python
-      :linenos:
-
-Similar to other components of Orbit, instead of directly modifying the base
-class :class:`RLTaskEnv`, users can simply implement a configuration
-:class:`RLTaskEnvCfg` which will then be used to construct a
-:class:`RLTaskEnv` instance.
-
-This tutorial will continue along with the Cartpole example, this time creating a `RLTaskEnvCfg`
-to define the task of balancing the pole from a reinforcement learning perspective.
-
-.. dropdown:: :fa:`eye,mr-1` Code for ``cartpole_env_cfg.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
-      :language: python
-      :linenos:
-
-
-
-The Code Explained
-~~~~~~~~~~~~~~~~~~
-
-Designing the scene
-------------------
-
-The first step in creating a new environment is to design the scene in which the agent will operate within.
-The scene used in this tutorial is the same one used in the tutorial :ref:`creating-base-env`, so we won't
-go over it in detail again here.
-
-Also see :ref:`tutorial-interactive-scene` for even more details on scene creation.
-
-Designing the Action and Observation Spaces
-------------------------------------------
-
-Again, the :class:`ActionTerm` and :class:`ObservationTerm` used here are the same as those
-used in :ref:`creating-base-env`, so you can reference that tutorial for more details.
-
-
-Designing the Rewards
------------------------
-
-The :class:`RewardsCfg` configures the :class:`RewardManager` to dictate how the agent receives rewards from the
-environment. In this example we define a few reward terms to guide our agent to a robust policy to balance the pole.
-
-To define a reward term, you need to provide the function that computes the reward
-as well as the weighting associated with it.
-
-There are a few reward functions pre-defined in ``source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/mdp/rewards.py``
-that will be used in the Cartpole environment, but when creating your own environments, feel free to add more to
-your task config as you see fit.
-
-The various reward terms used in this environment will be explained in the following sections. Feel free to skip over this
-if you are already familiar with the cartpole example.
-
-.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
-   :language: python
-   :start-after: # Rewards configuration
-   :end-before: # Terminations configuration
-
-* **Alive Reward Term**: Agent receives a reward at each step that it is
-  not terminated which is weighted by a factor of 1.0. This term is used to encourage the agent to
-  stay alive for as long as possible.
-* **Terminating Reward Term**: Agent receives a reward at each step that it is in terminated state
-  which is weighted by a factor of -2.0. This term is similarly used to penalize the agent for terminating.
-* **Pole Angle Reward Term**: Agent receives a reward based upon the L2 norm of the current pole angle
-  compared to the target pole angle which is weighted by a factor of -1.0. This term is used to encourage
-  the agent to keep the pole angle close to the desired angle.
-* **Cart Velocity Reward Term**: Agent receives a reward based upon the L1 norm of the current cart velocity
-  which is weighted by a factor of -0.01. This term is used to encourage the agent to keep the cart velocity
-  close to the desired velocity.
-* **Pole Velocity Reward Term**: Agent receives a reward based upon the L1 norm of the current pole velocity
-  which is weighted by a factor of -0.005. This term is used to encourage the agent to keep the pole velocity
-  close to the desired velocity.
-
-Designing the Termination Criteria
----------------------------------
-
-In RL tasks, it is important to define when an episode is terminated. This is because the agent needs to know when
-to reset the environment and start a new episode.
-
-The :class:`TerminationsCfg` configures what constitutes an episode as
-terminated. In this example, we want the task to terminate when either of the following conditions is met:
-* **Episode Length** The episode length is greater than the defined max_episode_length
-* **Cart out of bounds** The cart goes outside of the bounds [-3, 3]
-
-.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
-   :language: python
-   :start-after: # Terminations configuration
-   :end-before: # Curriculum configuration
-
-
-As with :class:`RewardTermCfg`, you can define additional :class:`DoneTermCfg` for any additional criteria
-for which you want to terminate the episode and add them to the :class:`TerminationsCfg`.
-
-Curriculum and Commands
-----------------------
-
-Curriculum
-^^^^^^^^^^
-
-Often times when training an agent, it is useful to start with a simple task and gradually increase the difficulty
-as the agent learns. This is the idea behind curriculum learning. In Orbit, we provide a :class:`CurriculumManager`
-that can be used to define a curriculum for your environment. In this tutorial we won't implement one for simplicity,
-but you can see an example of a curriculum definition in the included Lift task for inspiration.
-
-.. TODO: add tutorial that explains how to use the curriculum manager and reference it here.
-
-We use a simple pass-through curriculum in this example to define a curriculum manager that does not modify the
-environment.
-
-.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
-   :language: python
-   :start-after: # Curriculum configuration
-   :end-before: ##
-
-Commands
-^^^^^^^^
-
-Additionally, you can also define commands that are sent to the environment at the start of each episode. This is
-useful for resetting the environment to a specific state or for providing additional information to the agent. In this
-example, we don't use any commands, but you can see an example of a command definition in the included Lift task for
-inspiration.
-
-.. TODO: add tutorial that explains how to use the command manager and reference it here.
-
-We use the :class:`NullCommandGeneratorCfg` in this example to define a command generator that does not generate
-any commands. It is provided to the user as a convenience class to avoid having to define a new command generator.
-There are other command generators that are provided in
-``source/extensions/omni.isaac.orbit/omni/isaac/orbit/command_generators``.
-
-.. dropdown:: :fa:`eye,mr-1` Code for ``null_command_generator.py``
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/command_generators/null_command_generator.py
-      :language: python
-      :linenos:
-
-Combining Everything
--------------------
-
-Now we want to construct a :class:`RLTaskEnvCfg` that ties all of the
-task configurations together. This object will be used to construct the
-:class:`RLEnv` to be used for RL training.
-
-Again this is similar to the :class:`BaseEnvCfg` defined in :ref:`creating-base-env`.
-only with the added RL components explained in the above sections:
-* Curriculum
-* Rewards
-* Terminations
-* Commands
-
-.. TODO: Explain __post_init__
-
-Registering the environment
---------------------------
-
-Before you can run your environment, you need to register your environment with the OpenAI Gym interface.
-
-To register an environment, call the :meth:`gym.register` method in the :mod:`__init__.py` file of your environment package
-(for instance, in ``omni.isaac.contrib_tasks.__init__.py``). This has the following components:
-
-* **Name of the environment:** This should ideally be in the format :const:`Isaac-\<EnvironmentName\>-\<Robot\>-\<Version\>`.
-  However, this is not a strict requirement and you can use any name you want.
-* **Entry point:** This is the import path of the environment class. This is used to instantiate the environment.
-* **Config entry point:** This is the import path of the environment configuration file. This is used to instantiate the environment configuration.
-  The configuration file can be either a YAML file or a Python dataclass
-
-As examples of this in the ``omni.isaac.orbit_tasks`` package, we have the following:
-
-.. dropdown:: :fa:`eye,mr-1` Registering an environment with a YAML configuration file
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
-      :language: python
-      :lines: 52-56
-      :linenos:
-      :lineno-start: 52
-
-.. dropdown:: :fa:`eye,mr-1` Registering an environment with a Python dataclass configuration file
-
-   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
-      :language: python
-      :lines: 84-88
-      :linenos:
-      :lineno-start: 84
-
-
-The Code Execution
-~~~~~~~~~~~~~~~~~~
-
-Now that we have gone through the code, let's run the environment. All environments registered in the ``omni.isaac.orbit_tasks``
-and ``omni.isaac.contrib_tasks`` packages are automatically available in the included standalone environments and workflows scripts.
-
-As an example, to run the Cartpole RL environment, you can use the following command.
-
-.. code-block:: bash
-
-   ./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
-
-
-This should open a stage with a ground plane, lights, and 4096 Cartpole agents initialized at different
-random configurations. The simulation should be playing with each Carpoles moving randomly.
-To stop the simulation, you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C`` in the terminal
-where you started the simulation.
--- a/docs/source/tutorials/03_envs/base_env.rst
+++ b/docs/source/tutorials/03_envs/base_env.rst
+.. _tutorial-create-base-env:
+
+
+Creating a Base Environment
+===========================
+
+.. currentmodule:: omni.isaac.orbit
+
+Environments bring together different aspects of the simulation such as
+the scene, observations and actions spaces, randomizations, etc. to create a
+coherent interface for various applications. In Orbit, environments are
+implemented as :class:`envs.BaseEnv` and :class:`envs.RLTaskEnv` classes.
+The two classes are very similar, but :class:`envs.RLTaskEnv` is useful for
+reinforcement learning tasks and contains rewards, terminations, curriculum
+and command generation. The :class:`envs.BaseEnv` class is useful for
+traditional robot control and doesn't contain rewards and terminations.
+
+In this tutorial, we will look at the base class :class:`envs.BaseEnv` and its
+corresponding configuration class :class:`envs.BaseEnvCfg`. We will use the
+cartpole environment from earlier to illustrate the different components
+in creating a new :class:`envs.BaseEnv` environment.
+
+
+The Code
+~~~~~~~~
+
+The tutorial corresponds to the ``cartpole_base_env`` script  in the ``orbit/source/standalone/tutorials/04_envs``
+directory.
+
+.. dropdown:: Code for cartpole_base_env.py
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+      :language: python
+      :emphasize-lines: 51-55, 58-75, 78-111, 114-133, 138-142, 147, 151, 156-157, 163-164
+      :linenos:
+
+
+The Code Explained
+~~~~~~~~~~~~~~~~~~
+
+The base class :class:`envs.BaseEnv` wraps around many intricacies of the simulation interaction
+and provides a simple interface for the user to run the simulation and interact with it. It
+is composed of the following components:
+
+* :class:`scene.InteractiveScene` - The scene that is used for the simulation.
+* :class:`managers.ActionManager` - The manager that handles actions.
+* :class:`managers.ObservationManager` - The manager that handles observations.
+* :class:`managers.RandomizationManager` - The manager that handles randomizations.
+
+By configuring these components, the user can create different variations of the same environment
+with minimal effort. In this tutorial, we will go through the different components of the
+:class:`envs.BaseEnv` class and how to configure them to create a new environment.
+
+Designing the scene
+-------------------
+
+The first step in creating a new environment is to configure its scene. For the cartpole
+environment, we will be using the scene from the previous tutorial. Thus, we omit the
+scene configuration here. For more details on how to configure a scene, see
+:ref:`tutorial-interactive-scene`.
+
+Defining actions
+----------------
+
+In the previous tutorial, we directly input the action to the cartpole using
+the :meth:`assets.Articulation.set_joint_effort_target` method. In this tutorial, we will
+use the :class:`managers.ActionManager` to handle the actions.
+
+The action manager can comprise of multiple :class:`managers.ActionTerm`. Each action term
+is responsible for applying *control* over a specific aspect of the environment. For instance,
+for robotic arm, we can have two action terms -- one for controlling the joints of the arm,
+and the other for controlling the gripper. This composition allows the user to define
+different control schemes for different aspects of the environment.
+
+In the cartpole environment, we want to control the force applied to the cart to balance the pole.
+Thus, we will create an action term that controls the force applied to the cart.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+   :language: python
+   :pyobject: ActionsCfg
+   :linenos:
+   :lineno-start: 51
+
+Defining observations
+---------------------
+
+While the scene defines the state of the environment, the observations define the states
+that are observable by the agent. These observations are used by the agent to make decisions
+on what actions to take. In Orbit, the observations are computed by the
+:class:`managers.ObservationManager` class.
+
+Similar to the action manager, the observation manager can comprise of multiple observation terms.
+These are further grouped into observation groups which are used to define different observation
+spaces for the environment. For instance, for hierarchical control, we may want to define
+two observation groups -- one for the low level controller and the other for the high level
+controller. It is assumed that all the observation terms in a group have the same dimensions.
+
+For this tutorial, we will only define one observation group named ``"policy"``. While not completely
+prescriptive, this group is a necessary requirement for various wrappers in Orbit.
+We define a group by inheriting from the :class:`managers.ObservationGroupCfg` class. This class
+collects different observation terms and help define common properties for the group, such
+as enabling noise corruption or concatenating the observations into a single tensor.
+
+The individual terms are defined by inheriting from the :class:`managers.ObservationTermCfg` class.
+This class takes in the :attr:`managers.ObservationTermCfg.func` that specifies the function or
+callable class that computes the observation for that term. It includes other parameters for
+defining the noise model, clipping, scaling, etc. However, we leave these parameters to their
+default values for this tutorial.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+   :language: python
+   :pyobject: ObservationsCfg
+   :linenos:
+   :lineno-start: 58
+
+Defining randomizations
+-----------------------
+
+At this point, we have defined the scene, actions and observations for the cartpole environment.
+The general idea for all these components is to define the configuration classes and then
+pass them to the corresponding managers. The randomization manager is no different.
+
+The :class:`managers.RandomizationManager` class is responsible for randomizing everything related
+to the simulation state. This includes randomizing (or resetting) the scene, randomizing physical
+properties (such as mass, friction, etc.), and visual properties (such as colors, textures, etc.).
+Each of these are specified through the :class:`managers.RandomizationTermCfg` class, which
+takes in the :attr:`managers.RandomizationTermCfg.func` that specifies the function or callable
+class that performs the randomization. Additionally, it expects the **mode** of randomization.
+The mode specifies when the randomization term should be applied. It is possible to specify
+your own mode, but Orbit provides three modes out of the box:
+
+* ``"startup"`` - Randomize only once at environment startup.
+* ``"reset"`` - Randomize on every call to an environment's reset.
+* ``"interval"`` - Randomize at a given fixed interval.
+
+For this example, we randomize the pole's mass on startup. This is done only once since this operation
+is expensive and we don't want to do it on every reset. We also randomize the initial joint state of
+the cartpole and the pole at every reset.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+   :language: python
+   :pyobject: RandomizationCfg
+   :linenos:
+   :lineno-start: 78
+
+Tying it all together
+---------------------
+
+Having defined the scene and manager configurations, we can now define the environment configuration
+through the :class:`envs.BaseEnvCfg` class. This class takes in the scene, action, observation and
+randomization configurations.
+
+In addition to these, it also takes in the :attr:`envs.BaseEnvCfg.sim` which defines the simulation
+parameters such as the timestep, gravity, etc. This is initialized to the default values, but can
+be modified as needed. We recommend doing so by defining the :meth:`__post_init__` method in the
+:class:`envs.BaseEnvCfg` class, which is called after the configuration is initialized.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+   :language: python
+   :pyobject: CartpoleEnvCfg
+   :linenos:
+   :lineno-start: 114
+
+Running the simulation
+----------------------
+
+Lastly, we revisit the simulation execution loop. This is now much simpler since we have
+abstracted away most of the details into the environment configuration. We only need to
+call the :meth:`envs.BaseEnv.reset` method to reset the environment and :meth:`envs.BaseEnv.step`
+method to step the environment. Both these functions return the observation and an info dictionary
+which may contain additional information provided by the environment. These can be used by an
+agent for decision-making.
+
+The :class:`envs.BaseEnv` class does not have any notion of terminations since that concept is
+specific for episodic tasks. Thus, the user is responsible for defining the termination condition
+for the environment. In this tutorial, we reset the simulation at regular intervals.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
+   :language: python
+   :pyobject: main
+   :linenos:
+   :lineno-start: 136
+
+An important thing to note above is that the entire simulation loop is wrapped inside the
+:meth:`torch.inference_mode` context manager. This is because the environment uses PyTorch
+operations under-the-hood and we want to ensure that the simulation is not slowed down by
+the overhead of PyTorch's autograd engine and gradients are not computed for the simulation
+operations.
+
+
+The Code Execution
+~~~~~~~~~~~~~~~~~~
+
+To run the base environment made in this tutorial, you can use the following command:
+
+.. code-block:: bash
+
+   ./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_base_env.py --num_envs 32
+
+
+This should open a stage with a ground plane, light source, and cartpoles. The simulation should be
+playing with random actions on the cartpole. Additionally, it opens a UI window on the bottom
+right corner of the screen named ``"Orbit"``. This window contains different UI elements that
+can be used for debugging and visualization.
+
+To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal where you
+started the simulation.
+
+In this tutorial, we learned about the different managers that help define a base environment. We
+include more examples of defining the base environment in the ``orbit/source/standalone/tutorials/04_envs``
+directory. For completion, they can be run using the following commands:
+
+.. code-block:: bash
+
+   # Floating cube environment with custom action term for PD control
+   ./orbit.sh -p source/standalone/tutorials/04_envs/cube_base_env.py --num_envs 32
+
+   # Quadrupedal locomotion environment with a policy that interacts with the environment
+   ./orbit.sh -p source/standalone/tutorials/04_envs/quadruped_base_env.py --num_envs 32
+
+
+In the following tutorial, we will look at the :class:`envs.RLTaskEnv` class and how to use it
+to create a Markovian Decision Process (MDP).
--- a/docs/source/tutorials/03_envs/gym_registry.rst
+++ b/docs/source/tutorials/03_envs/gym_registry.rst
+Registering an Environment
+==========================
+
+.. currentmodule:: omni.isaac.orbit
+
+In the previous tutorial, we learned how to create a custom cartpole environment. We manually
+created an instance of the environment by importing the environment class and its configuration
+class.
+
+.. dropdown:: Environment creation in the previous tutorial
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
+      :language: python
+      :lines: 39-50
+
+While straightforward, this approach is not scalable as we have a large suite of environments.
+In this tutorial, we will show how to use the :meth:`gymnasium.register` method to register
+environments with the ``gymnasium`` registry. This allows us to create the environment through
+the :meth:`gymnasium.make` function.
+
+
+.. dropdown:: Environment creation in this tutorial
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/environments/random_agent.py
+      :language: python
+      :lines: 40-50
+
+
+The Code
+~~~~~~~~
+
+The tutorial corresponds to the ``random_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
+
+.. dropdown:: Code for random_agent.py
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/environments/random_agent.py
+      :language: python
+      :emphasize-lines: 40-42, 47-50
+      :linenos:
+
+
+The Code Explained
+~~~~~~~~~~~~~~~~~~
+
+The :class:`envs.RLTaskEnv` class inherits from the :class:`gymnasium.Env` class to follow
+a standard interface. However, unlike the traditional Gym environments, the :class:`envs.RLTaskEnv`
+implements a *vectorized* environment. This means that multiple environment instances
+are running simultaneously in the same process, and all the data is returned in a batched
+fashion.
+
+Using the gym registry
+----------------------
+
+To register an environment, we use the :meth:`gymnasium.register` method. This method takes
+in the environment name, the entry point to the environment class, and the entry point to the
+environment configuration class. For the cartpole environment, the following shows the registration
+call in the ``omni.isaac.orbit_tasks.classic.cartpole`` sub-package:
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/__init__.py
+   :language: python
+   :lines: 10-
+   :emphasize-lines: 11, 12, 15
+
+The ``id`` argument is the name of the environment. As a convention, we name all the environments
+with the prefix ``Isaac-`` to make it easier to search for them in the registry. The name of the
+environment is typically followed by the name of the task, and then the name of the robot.
+For instance, for legged locomotion with ANYmal C on flat terrain, the environment is called
+``Isaac-Velocity-Flat-Anymal-C-v0``. The version number ``v<N>`` is typically used to specify different
+variations of the same environment. Otherwise, the names of the environments can become too long
+and difficult to read.
+
+The ``entry_point`` argument is the entry point to the environment class. The entry point is a string
+of the form ``<module>:<class>``. In the case of the cartpole environment, the entry point is
+``omni.isaac.orbit.envs:RLTaskEnv``. The entry point is used to import the environment class
+when creating the environment instance.
+
+The ``env_cfg_entry_point`` argument specifies the default configuration for the environment. The default
+configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_env_cfg` function.
+It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
+The configuration entry point can be both a YAML file or a python configuration class.
+
+.. note::
+    The ``gymnasium`` registry is a global registry. Hence, it is important to ensure that the
+    environment names are unique. Otherwise, the registry will throw an error when registering
+    the environment.
+
+Creating the environment
+------------------------
+
+To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks``
+extension, we must import the module at the start of the script. This will execute the ``__init__.py``
+file which iterates over all the sub-packages and registers their respective environments.
+
+.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
+   :language: python
+   :lines: 40-41
+   :linenos:
+   :lineno-start: 40
+
+In this tutorial, the task name is read from the command line. The task name is used to parse
+the default configuration as well as to create the environment instance. In addition, other
+parsed command line arguments such as the number of environments, the simulation device,
+and whether to render, are used to override the default configuration.
+
+.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
+   :language: python
+   :lines: 47-50
+   :linenos:
+   :lineno-start: 47
+
+Once creating the environment, the rest of the execution follows the standard resetting and stepping.
+
+
+The Code Execution
+~~~~~~~~~~~~~~~~~~
+
+Now that we have gone through the code, let's run the script and see the result:
+
+.. code-block:: bash
+
+   ./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
+
+
+This should open a stage with everything similar to the previous :ref:`tutorial-create-rl-env` tutorial.
+To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal.
+
+In addition, you can also change the simulation device from GPU to CPU by adding the ``--cpu`` flag:
+
+.. code-block:: bash
+
+   ./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32 --cpu
+
+With the ``--cpu`` flag, the simulation will run on the CPU. This is useful for debugging the simulation.
+However, the simulation will run much slower than on the GPU.
--- a/docs/source/tutorials/03_envs/rl_env.rst
+++ b/docs/source/tutorials/03_envs/rl_env.rst
+.. _tutorial-create-rl-env:
+
+
+Creating an RL Environment
+==========================
+
+.. currentmodule:: omni.isaac.orbit
+
+Having learnt how to create a base environment in :ref:`tutorial-create-base-env`, we will now look at how to create a
+task environment for reinforcement learning.
+
+The base environment is designed as an sense-act environment where the agent can send commands to the environment
+and receive observations from the environment. This minimal interface is sufficient for many applications such as
+traditional motion planning and controls. However, many applications require a task-specification which often
+serves as the learning objective for the agent. For instance, in a navigation task, the agent may be required to
+reach a goal location. To this end, we use the :class:`envs.RLTaskEnv` class which extends the base environment
+to include a task specification.
+
+Similar to other components in Orbit, instead of directly modifying the base class :class:`RLTaskEnv`, we
+encourage users to  simply implement a configuration :class:`RLTaskEnvCfg` for their task environment.
+This practice allows us to separate the task specification from the environment implementation, making it easier
+to reuse components of the same environment for different tasks.
+
+In this tutorial, we will configure the cartpole environment using the :class:`RLTaskEnvCfg` to create a task
+for balancing the pole upright. We will learn how to specify the task using reward terms, termination criteria,
+curriculum and commands.
+
+
+The Code
+~~~~~~~~
+
+For this tutorial, we use the cartpole environment defined in ``omni.isaac.orbit_tasks.classic.cartpole`` module.
+
+.. dropdown:: Code for cartpole_env_cfg.py
+   :icon: code
+
+   .. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+      :language: python
+      :emphasize-lines: 63-68, 124-149, 152-162, 165-169, 187-192
+      :linenos:
+
+The script for running the environment ``cartpole_rl_env.py`` is present in the
+``orbit/source/standalone/tutorials/04_envs`` directory. The script is similar to the
+``cartpole_base_env.py`` script in the previous tutorial, except that it uses the
+:class:`envs.RLTaskEnv` instead of the :class:`envs.BaseEnv`.
+
+.. dropdown:: Code for cartpole_rl_env.py
+   :icon: code
+
+   .. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
+      :language: python
+      :emphasize-lines: 46-50, 64-65
+      :linenos:
+
+
+The Code Explained
+~~~~~~~~~~~~~~~~~~
+
+We already went through parts of the above in the :ref:`tutorial-create-base-env` tutorial to learn
+about how to specify the scene, observations, actions and randomizations. Thus, in this tutorial, we
+will focus only on the RL components of the environment.
+
+In Orbit, we provide various implementations of different terms in the :mod:`envs.mdp` module. We will use
+some of these terms in this tutorial, but users are free to define their own terms as well. These
+are usually placed in their task-specific sub-package
+(for instance, in :mod:`omni.isaac.orbit_tasks.classic.cartpole.mdp`).
+
+
+Defining rewards
+----------------
+
+The :class:`managers.RewardManager` is used to compute the reward terms for the agent. Similar to the other
+managers, its terms are configured using the :class:`managers.RewardTermCfg` class. The
+:class:`managers.RewardTermCfg` class specifies the function or callable class that computes the reward
+as well as the weighting associated with it. It also takes in dictionary of arguments, ``"params"``
+that are passed to the reward function when it is called.
+
+For the cartpole task, we will use the following reward terms:
+
+* **Alive Reward**: Encourage the agent to stay alive for as long as possible.
+* **Terminating Reward**: Similarly penalize the agent for terminating.
+* **Pole Angle Reward**: Encourage the agent to keep the pole at the desired upright position.
+* **Cart Velocity Reward**: Encourage the agent to keep the cart velocity as small as possible.
+* **Pole Velocity Reward**: Encourage the agent to keep the pole velocity as small as possible.
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+   :language: python
+   :pyobject: RewardsCfg
+   :linenos:
+   :lineno-start: 124
+
+Defining termination criteria
+-----------------------------
+
+Most learning tasks happen over a finite number of steps that we call an episode. For instance, in the cartpole
+task, we want the agent to balance the pole for as long as possible. However, if the agent reaches an unstable
+or unsafe state, we want to terminate the episode. On the other hand, if the agent is able to balance the pole
+for a long time, we want to terminate the episode and start a new one so that the agent can learn to balance the
+pole from a different starting configuration.
+
+The :class:`managers.TerminationsCfg` configures what constitutes for an episode to terminate. In this example,
+we want the task to terminate when either of the following conditions is met:
+
+* **Episode Length** The episode length is greater than the defined max_episode_length
+* **Cart out of bounds** The cart goes outside of the bounds [-3, 3]
+
+The flag :attr:`managers.TerminationsCfg.time_out` specifies whether the term is a time-out (truncation) term
+or terminated term. These are used to indicate the two types of terminations as described in `Gymnasium's documentation
+<https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_.
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+   :language: python
+   :pyobject: TerminationsCfg
+   :linenos:
+   :lineno-start: 152
+
+Defining commands
+-----------------
+
+For various goal-conditioned tasks, it is useful to specify the goals or commands for the agent. These are
+handled through the :class:`managers.CommandManager`. The command manager handles resampling and updating the
+commands at each step. It can also be used to provide the commands as an observation to the agent.
+
+For this simple task, we do not use any commands. This is specified by using a command term with the
+:class:`envs.mdp.NullCommandCfg` configuration. However, you can see an example of command definitions in the
+locomotion or manipulation tasks.
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+   :language: python
+   :pyobject: CommandsCfg
+   :linenos:
+   :lineno-start: 63
+
+Defining curriculum
+-------------------
+
+Often times when training a learning agent, it helps to start with a simple task and gradually increase the
+tasks's difficulty as the agent training progresses. This is the idea behind curriculum learning. In Orbit,
+we provide a :class:`managers.CurriculumManager` class that can be used to define a curriculum for your environment.
+
+In this tutorial we don't implement a curriculum for simplicity, but you can see an example of a
+curriculum definition in the other locomotion or manipulation tasks.
+We use a simple pass-through curriculum to define a curriculum manager that does not modify the environment.
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+   :language: python
+   :pyobject: CurriculumCfg
+   :linenos:
+   :lineno-start: 165
+
+Tying it all together
+---------------------
+
+With all the above components defined, we can now create the :class:`RLTaskEnvCfg` configuration for the
+cartpole environment. This is similar to the :class:`BaseEnvCfg` defined in :ref:`tutorial-create-base-env`,
+only with the added RL components explained in the above sections.
+
+.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+   :language: python
+   :pyobject: CartpoleEnvCfg
+   :linenos:
+   :lineno-start: 177
+
+Running the simulation loop
+---------------------------
+
+Coming back to the ``cartpole_rl_env.py`` script, the simulation loop is similar to the previous tutorial.
+The only difference is that we create an instance of :class:`envs.RLTaskEnv` instead of the
+:class:`envs.BaseEnv`. Consequently, now the :meth:`envs.RLTaskEnv.step` method returns additional signals
+such as the reward and termination status. The information dictionary also maintains logging of quantities
+such as the reward contribution from individual terms, the termination status of each term, the episode length etc.
+
+.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
+   :language: python
+   :pyobject: main
+   :linenos:
+   :lineno-start: 44
+
+
+The Code Execution
+~~~~~~~~~~~~~~~~~~
+
+Similar to the previous tutorial, we can run the environment by executing the ``cartpole_rl_env.py`` script.
+
+.. code-block:: bash
+
+   ./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_rl_env.py --num_envs 32
+
+
+This should open a similar simulation as in the previous tutorial. However, this time, the environment
+returns more signals that specify the reward and termination status. Additionally, the individual
+environments reset themselves when they terminate based on the termination criteria specified in the
+configuration.
+
+To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal
+where you started the simulation.
+
+In this tutorial, we learnt how to create a task environment for reinforcement learning. We do this
+by extending the base environment to include the rewards, terminations, commands and curriculum terms.
+We also learnt how to use the :class:`envs.RLTaskEnv` class to run the environment and receive various
+signals from it.
+
+While it is possible to manually create an instance of :class:`envs.RLTaskEnv` class for a desired task,
+this is not scalable as it requires specialized scripts for each task. Thus, we exploit the
+:meth:`gymnasium.make` function to create the environment with the gym interface. We will learn how to do this
+in the next tutorial.
--- a/docs/source/tutorials/03_envs/rl_training.rst
+++ b/docs/source/tutorials/03_envs/rl_training.rst
+Training with an RL Agent
+=========================
+
+.. currentmodule:: omni.isaac.orbit
+
+In the previous tutorials, we covered how to define an RL task environment, register
+it into the ``gym`` registry, and interact with it using a random agent. We now move
+on to the next step: training an RL agent to solve the task.
+
+Although the :class:`envs.RLTaskEnv` conforms to the :class:`gymnasium.Env` interface,
+it is not exactly a ``gym`` environment. The input and outputs of the environment are
+not numpy arrays, but rather based on torch tensors with the first dimension being the
+number of environment instances.
+
+Additionally, most RL libraries expect their own variation of an environment interface.
+For example, `Stable-Baselines3`_ expects the environment to conform to its
+`VecEnv API`_ which expects a list of numpy arrays instead of a single tensor. Similarly,
+`RSL-RL`_ and `RL-Games`_ expect a different interface. Since there is no one-size-fits-all
+solution, we do not base the :class:`envs.RLTaskEnv` on any particular learning library.
+Instead, we implement wrappers to convert the environment into the expected interface.
+These are specified in the :mod:`omni.isaac.orbit_tasks.utils.wrappers` module.
+
+In this tutorial, we will use `Stable-Baselines3`_ to train an RL agent to solve the
+cartpole balancing task.
+
+.. caution::
+
+  Wrapping the environment with the respective learning framework's wrapper should happen in the end,
+  i.e. after all other wrappers have been applied. This is because the learning framework's wrapper
+  modifies the interpretation of environment's APIs which may no longer be compatible with :class:`gymnasium.Env`.
+
+The Code
+--------
+
+For this tutorial, we use the training script from `Stable-Baselines3`_ workflow in the
+``orbit/source/standalone/workflows/sb3`` directory.
+
+.. dropdown:: Code for train.py
+    :icon: code
+
+    .. literalinclude:: ../../../../source/standalone/workflows/sb3/train.py
+      :language: python
+      :emphasize-lines: 61, 69, 74-76, 96-110, 125-126, 114-123
+      :linenos:
+
+The Code Explained
+------------------
+
+.. currentmodule:: omni.isaac.orbit_tasks.utils
+
+Most of the code above is boilerplate code to create logging directories, saving the parsed configurations,
+and setting up different Stable-Baselines3 components. For this tutorial, the important part is creating
+the environment and wrapping it with the Stable-Baselines3 wrapper.
+
+There are three wrappers used in the code above:
+
+1. :class:`gymnasium.wrappers.RecordVideo`: This wrapper records a video of the environment
+   and saves it to the specified directory. This is useful for visualizing the agent's behavior
+   during training.
+2. :class:`wrappers.sb3.Sb3VecEnvWrapper`: This wrapper converts the environment
+   into a Stable-Baselines3 compatible environment.
+3. `stable_baselines3.common.vec_env.VecNormalize`_: This wrapper normalizes the
+   environment's observations and rewards.
+
+Each of these wrappers wrap around the previous wrapper by following ``env = wrapper(env, *args, **kwargs)``
+repeatedly. The final environment is then used to train the agent. For more information on how these
+wrappers work, please refer to the :ref:`how-to-env-wrappers` documentation.
+
+The Code Execution
+------------------
+
+We train a PPO agent from Stable-Baselines3 to solve the cartpole balancing task.
+
+Training the agent
+~~~~~~~~~~~~~~~~~~
+
+There are three main ways to train the agent. Each of them has their own advantages and disadvantages.
+It is up to you to decide which one you prefer based on your use case.
+
+Headless execution
+""""""""""""""""""
+
+If the ``--headless`` flag is set, the simulation is not rendered during training. This is useful
+when training on a remote server or when you do not want to see the simulation. Typically, it speeds
+up the training process since only physics simulation step is performed.
+
+.. code-block:: bash
+
+  ./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless
+
+
+Headless execution with off-screen render
+"""""""""""""""""""""""""""""""""""""""""
+
+Since the above command does not render the simulation, it is not possible to visualize the agent's
+behavior during training. To visualize the agent's behavior, we pass the ``--offscreen_render`` which
+enables off-screen rendering. Additionally, we pass the flag ``--video`` which records a video of the
+agent's behavior during training.
+
+.. code-block:: bash
+
+  ./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless --offscreen_render --video
+
+The videos are saved to the ``logs/sb3/Isaac-Cartpole-v0/<run-dir>/videos`` directory. You can open these videos
+using any video player.
+
+Interactive execution
+"""""""""""""""""""""
+
+.. currentmodule:: omni.isaac.orbit
+
+While the above two methods are useful for training the agent, they don't allow you to interact with the
+simulation to see what is happening. In this case, you can ignore the ``--headless`` flag and run the
+training script as follows:
+
+.. code-block:: bash
+
+  ./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64
+
+This will open the Isaac Sim window and you can see the agent training in the environment. However, this
+will slow down the training process since the simulation is rendered on the screen. As a workaround, you
+can switch between different render modes in the ``"Orbit"`` window that is docked on the bottom-right
+corner of the screen. To learn more about these render modes, please check the
+:class:`sim.SimulationContext.RenderMode` class.
+
+Viewing the logs
+~~~~~~~~~~~~~~~~
+
+On a separate terminal, you can monitor the training progress by executing the following command:
+
+.. code:: bash
+
+   # execute from the root directory of the repository
+   ./orbit.sh -p -m tensorboard.main --logdir logs/sb3/Isaac-Cartpole-v0
+
+Playing the trained agent
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Once the training is complete, you can visualize the trained agent by executing the following command:
+
+.. code:: bash
+
+   # execute from the root directory of the repository
+   ./orbit.sh -p source/standalone/workflows/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32
+
+By default, the above command will load the latest checkpoint from the ``logs/sb3/Isaac-Cartpole-v0``
+directory. You can also specify a specific checkpoint by passing the ``--checkpoint`` flag.
+
+.. _Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
+.. _VecEnv API: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecenv-api-vs-gym-api
+.. _`stable_baselines3.common.vec_env.VecNormalize`: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize
+.. _RL-Games: https://github.com/Denys88/rl_games
+.. _RSL-RL: https://github.com/leggedrobotics/rsl_rl
--- a/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sim/simulation_context.py
+++ b/source/extensions/omni.isaac.orbit/omni/isaac/orbit/sim/simulation_context.py
@@ -74,10 +74,10 @@ class SimulationContext(_SimulationContext):
        events) are updated. There are three main components that can be updated when the simulation is rendered:

        1. **UI elements and other extensions**: These are UI elements (such as buttons, sliders, etc.) and other
-            extensions that are running in the background that need to be updated when the simulation is running.
+           extensions that are running in the background that need to be updated when the simulation is running.
        2. **Cameras**: These are typically based on Hydra textures and are used to render the scene from different
           viewpoints. They can be attached to a viewport or be used independently to render the scene.
-        3. **`Viewports`_**: These are windows where you can see the rendered scene.
+        3. **`Viewports`**: These are windows where you can see the rendered scene.

        Updating each of the above components has a different overhead. For example, updating the viewports is
        computationally expensive compared to updating the UI elements. Therefore, it is useful to be able to

--- a/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
+++ b/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
@@ -60,7 +60,6 @@ class CartpoleSceneCfg(InteractiveSceneCfg):
 ##


-# Actions configuration
 @configclass
 class CommandsCfg:
    """Command terms for the MDP."""
@@ -76,7 +75,6 @@ class ActionsCfg:
    joint_effort = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=100.0)


-# Observations configuration
 @configclass
 class ObservationsCfg:
    """Observation specifications for the MDP."""
@@ -97,7 +95,6 @@ class ObservationsCfg:
    policy: PolicyCfg = PolicyCfg()


-# Randomization configuration
 @configclass
 class RandomizationCfg:
    """Configuration for randomization."""
@@ -124,7 +121,6 @@ class RandomizationCfg:
    )


-# Rewards configuration
 @configclass
 class RewardsCfg:
    """Reward terms for the MDP."""
@@ -153,7 +149,6 @@ class RewardsCfg:
    )


-# Terminations configuration
 @configclass
 class TerminationsCfg:
    """Termination terms for the MDP."""
@@ -167,7 +162,6 @@ class TerminationsCfg:
    )


-# Curriculum configuration
 @configclass
 class CurriculumCfg:
    """Configuration for the curriculum."""

--- a/source/standalone/tutorials/04_envs/cartpole_base_env.py
+++ b/source/standalone/tutorials/04_envs/cartpole_base_env.py
@@ -4,7 +4,8 @@
 # SPDX-License-Identifier: BSD-3-Clause

 """
-This script demonstrates how to use the environment concept that combines a scene with an action, observation and randomization manager.
+This script demonstrates how to create a simple environment with a cartpole. It combines the concepts of
+scene, action, observation and randomization managers to create an environment.
 """

 from __future__ import annotations
@@ -17,8 +18,8 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
-parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to spawn.")
+parser = argparse.ArgumentParser(description="This script demonstrates a simple cartpole environment.")
+parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")

 # append AppLauncher cli args
 AppLauncher.add_app_launcher_args(parser)
@@ -37,94 +38,26 @@ import traceback
 import carb

 import omni.isaac.orbit.envs.mdp as mdp
-from omni.isaac.orbit.assets import RigidObject
 from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
 from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
 from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
 from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
 from omni.isaac.orbit.managers import SceneEntityCfg
-from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
 from omni.isaac.orbit.utils import configclass

-from omni.isaac.orbit_tasks.classic.cartpole import CartpoleSceneCfg
-
-# Cartpole Action Configuration
-
-
-class CartpoleActionTerm(ActionTerm):
-    _asset: RigidObject
-    """The articulation asset on which the action term is applied."""
-
-    def __init__(self, cfg, env: BaseEnv):
-        super().__init__(cfg, env)
-        self._raw_actions = torch.zeros(env.num_envs, 1, device=self.device)
-        self._processed_actions = torch.zeros(env.num_envs, 1, device=self.device)
-
-        # gains of controller
-        self.p_gain = 1500.0
-        self.d_gain = 10.0
-
-        # extract the joint id of the slider_to_cart joint
-        joint_ids, _ = self._asset.find_joints(["slider_to_cart", "cart_to_pole"])
-        self.slider_to_cart_joint_id = joint_ids[0]
-        self.cart_to_pole_joint_id = joint_ids[1]
-
-    """
-    Properties.
-    """
-
-    @property
-    def action_dim(self) -> int:
-        return self._raw_actions.shape[1]
-
-    @property
-    def raw_actions(self) -> torch.Tensor:
-        return self._raw_actions
-
-    @property
-    def processed_actions(self) -> torch.Tensor:
-        return self._processed_actions
-
-    """
-    Operations
-    """
-
-    def process_actions(self, actions: torch.Tensor):
-        # store the raw actions
-        self._raw_actions[:] = actions
-
-        joint_pos = (
-            self._asset.data.joint_pos[:, self.cart_to_pole_joint_id]
-            - self._asset.data.default_joint_pos[:, self.cart_to_pole_joint_id]
-        )
-        joint_vel = (
-            self._asset.data.joint_vel[:, self.cart_to_pole_joint_id]
-            - self._asset.data.default_joint_vel[:, self.cart_to_pole_joint_id]
-        )
-
-        self._processed_actions[:] = self.p_gain * (actions - joint_pos) - self.d_gain * joint_vel
-
-    def apply_actions(self):
-        # set slider joint target
-        self._asset.set_joint_effort_target(self.processed_actions, joint_ids=[self.slider_to_cart_joint_id])
-
-
-@configclass
-class CartpoleActionTermCfg(ActionTermCfg):
-    class_type: CartpoleActionTerm = CartpoleActionTerm
+from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleSceneCfg


 @configclass
 class ActionsCfg:
-    """Action specifications for the MDP."""
+    """Action specifications for the environment."""

-    joint_pos = CartpoleActionTermCfg(asset_name="robot")
+    joint_efforts = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=5.0)


-# Cartpole Observation Configuration
 @configclass
 class ObservationsCfg:
-    """Observation specifications for the MDP."""
+    """Observation specifications for the environment."""

    @configclass
    class PolicyCfg(ObsGroup):
@@ -142,14 +75,21 @@ class ObservationsCfg:
    policy: PolicyCfg = PolicyCfg()


-# Cartpole Randomization Configuration
-
-
 @configclass
 class RandomizationCfg:
    """Configuration for randomization."""

-    # reset
+    # on startup
+    add_pole_mass = RandTerm(
+        func=mdp.add_body_mass,
+        mode="startup",
+        params={
+            "asset_cfg": SceneEntityCfg("robot", body_names=["pole"]),
+            "mass_range": (0.1, 0.5),
+        },
+    )
+
+    # on reset
    reset_cart_position = RandTerm(
        func=mdp.reset_joints_by_offset,
        mode="reset",
@@ -171,57 +111,57 @@ class RandomizationCfg:
    )


-# Cartpole Environment Configuration
-
-
 @configclass
 class CartpoleEnvCfg(BaseEnvCfg):
-    """Configuration for the locomotion velocity-tracking environment."""
+    """Configuration for the cartpole environment."""

    # Scene settings
-    scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=False)
+    scene = CartpoleSceneCfg(num_envs=1024, env_spacing=2.5)
    # Basic settings
-    observations: ObservationsCfg = ObservationsCfg()
-    actions: ActionsCfg = ActionsCfg()
-    randomization: RandomizationCfg = RandomizationCfg()
+    observations = ObservationsCfg()
+    actions = ActionsCfg()
+    randomization = RandomizationCfg()

    def __post_init__(self):
        """Post initialization."""
-        # general settings
-        self.decimation = 4
-        self.episode_length_s = 20.0
+        # viewer settings
+        self.viewer.eye = [4.5, 0.0, 6.0]
+        self.viewer.lookat = [0.0, 0.0, 2.0]
+        # step settings
+        self.decimation = 4  # env step every 4 sim steps: 200Hz / 4 = 50Hz
        # simulation settings
-        self.sim.dt = 0.005
-        self.sim.disable_contact_processing = True
-
-
-# Main
+        self.sim.dt = 0.005  # sim step every 5ms: 200Hz


 def main():
    """Main function."""
-
+    # parse the arguments
+    env_cfg = CartpoleEnvCfg()
+    env_cfg.scene.num_envs = args_cli.num_envs
    # setup base environment
-    env = BaseEnv(cfg=CartpoleEnvCfg())
-    obs = env.reset()
-
-    target_position = torch.zeros(env.num_envs, 1, device=env.device)
+    env = BaseEnv(cfg=env_cfg)

    # simulate physics
    count = 0
    while simulation_app.is_running():
-        # reset
-        if count % 300 == 0:
-            env.reset()
-            count = 0
-
-        # step env
-        obs, _ = env.step(target_position)
-
-        # print current orientation of pole
-        print(obs["policy"][0][1].item())
-        # update counter
-        count += 1
+        with torch.inference_mode():
+            # reset
+            if count % 300 == 0:
+                count = 0
+                env.reset()
+                print("-" * 80)
+                print("[INFO]: Resetting environment...")
+            # sample random actions
+            joint_efforts = torch.randn_like(env.action_manager.action)
+            # step the environment
+            obs, _ = env.step(joint_efforts)
+            # print current orientation of pole
+            print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
+            # update counter
+            count += 1
+
+    # close the environment
+    env.close()


 if __name__ == "__main__":

--- a/source/standalone/tutorials/04_envs/cartpole_rl_env.py
+++ b/source/standalone/tutorials/04_envs/cartpole_rl_env.py
+# Copyright (c) 2022-2023, The ORBIT Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""
+This script demonstrates how to call the RL environment for the cartpole balancing task.
+"""
+
+from __future__ import annotations
+
+"""Launch Isaac Sim Simulator first."""
+
+
+import argparse
+
+from omni.isaac.orbit.app import AppLauncher
+
+# add argparse arguments
+parser = argparse.ArgumentParser(description="This script demonstrates the RL environment for cartpole balancing.")
+parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
+
+# append AppLauncher cli args
+AppLauncher.add_app_launcher_args(parser)
+# parse the arguments
+args_cli = parser.parse_args()
+
+# launch omniverse app
+app_launcher = AppLauncher(args_cli)
+simulation_app = app_launcher.app
+
+"""Rest everything follows."""
+
+import torch
+import traceback
+
+import carb
+
+from omni.isaac.orbit.envs import RLTaskEnv
+
+from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleEnvCfg
+
+
+def main():
+    """Main function."""
+    # parse the arguments
+    env_cfg = CartpoleEnvCfg()
+    env_cfg.scene.num_envs = args_cli.num_envs
+    # setup RL environment
+    env = RLTaskEnv(cfg=env_cfg)
+
+    # simulate physics
+    count = 0
+    while simulation_app.is_running():
+        with torch.inference_mode():
+            # reset
+            if count % 300 == 0:
+                count = 0
+                env.reset()
+                print("-" * 80)
+                print("[INFO]: Resetting environment...")
+            # sample random actions
+            joint_efforts = torch.randn_like(env.action_manager.action)
+            # step the environment
+            obs, rew, terminated, truncated, info = env.step(joint_efforts)
+            # print current orientation of pole
+            print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
+            # update counter
+            count += 1
+
+    # close the environment
+    env.close()
+
+
+if __name__ == "__main__":
+    try:
+        # run the main execution
+        main()
+    except Exception as err:
+        carb.log_error(err)
+        carb.log_error(traceback.format_exc())
+        raise
+    finally:
+        # close sim app
+        simulation_app.close()
--- a/source/standalone/tutorials/04_envs/floating_cube.py
+++ b/source/standalone/tutorials/04_envs/floating_cube.py
@@ -4,8 +4,17 @@
 # SPDX-License-Identifier: BSD-3-Clause

 """
-This script demonstrates the base environment concept that combines a scene with an action,
-observation and randomization manager for a floating cube.
+This script creates a simple environment with a floating cube. The cube is controlled by a PD
+controller to track an arbitrary target position.
+
+While going through this tutorial, we recommend you to pay attention to how a custom action term
+is defined. The action term is responsible for processing the raw actions and applying them to the
+scene entities. The rest of the environment is similar to the previous tutorials.
+
+.. code-block:: bash
+
+    # Run the script
+    ./orbit.sh -p source/standalone/tutorials/04_envs/floating_cube.py --num_envs 32
 """

 from __future__ import annotations
@@ -18,7 +27,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
+parser = argparse.ArgumentParser(description="This script demonstrates base environment with a floating cube.")
 parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")

 # append AppLauncher cli args
@@ -31,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
 simulation_app = app_launcher.app

 """Rest everything follows."""
+
 import torch
 import traceback

@@ -40,59 +50,41 @@ import omni.isaac.orbit.envs.mdp as mdp
 import omni.isaac.orbit.sim as sim_utils
 from omni.isaac.orbit.assets import AssetBaseCfg, RigidObject, RigidObjectCfg
 from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
+from omni.isaac.orbit.managers import ActionTerm, ActionTermCfg
 from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
 from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
 from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
 from omni.isaac.orbit.managers import SceneEntityCfg
-from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
 from omni.isaac.orbit.scene import InteractiveSceneCfg
 from omni.isaac.orbit.terrains import TerrainImporterCfg
 from omni.isaac.orbit.utils import configclass

 ##
-# Scene definition
+# Custom action term
 ##


-@configclass
-class MySceneCfg(InteractiveSceneCfg):
-    """Example scene configuration."""
-
-    # add terrain
-    terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
-
-    # add cube
-    cube: RigidObjectCfg = RigidObjectCfg(
-        prim_path="{ENV_REGEX_NS}/cube",
-        spawn=sim_utils.CuboidCfg(
-            size=(0.2, 0.2, 0.2),
-            rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0),
-            mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
-            physics_material=sim_utils.RigidBodyMaterialCfg(),
-            visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
-        ),
-        init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
-    )
-
-    # lights
-    light = AssetBaseCfg(
-        prim_path="/World/light",
-        spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
-    )
-
+class CubeActionTerm(ActionTerm):
+    """Simple action term that implements a PD controller to track a target position.

-##
-# Action Term
-##
+    The action term is applied to the cube asset. It involves two steps:

+    1. **Process the raw actions**: Typically, this includes any transformations of the raw actions
+       that are required to map them to the desired space. This is called once per environment step.
+    2. **Apply the processed actions**: This step applies the processed actions to the asset.
+       It is called once per simulation step.

-class CubeActionTerm(ActionTerm):
-    """Simple action term that implements a PD controller to track a target position."""
+    In this case, the action term simply applies the raw actions to the cube asset. The raw actions
+    are the desired target positions of the cube in the environment frame. The pre-processing step
+    simply copies the raw actions to the processed actions as no additional processing is required.
+    The processed actions are then applied to the cube asset by implementing a PD controller to
+    track the target position.
+    """

    _asset: RigidObject
    """The articulation asset on which the action term is applied."""

-    def __init__(self, cfg: ActionTermCfg, env: BaseEnv):
+    def __init__(self, cfg: CubeActionTermCfg, env: BaseEnv):
        # call super constructor
        super().__init__(cfg, env)
        # create buffers
@@ -100,8 +92,8 @@ class CubeActionTerm(ActionTerm):
        self._processed_actions = torch.zeros(env.num_envs, 3, device=self.device)
        self._vel_command = torch.zeros(self.num_envs, 6, device=self.device)
        # gains of controller
-        self.p_gain = 5.0
-        self.d_gain = 0.5
+        self.p_gain = cfg.p_gain
+        self.d_gain = cfg.d_gain

    """
    Properties.
@@ -113,7 +105,6 @@ class CubeActionTerm(ActionTerm):

    @property
    def raw_actions(self) -> torch.Tensor:
-        # desired: (x, y, z)
        return self._raw_actions

    @property
@@ -144,10 +135,16 @@ class CubeActionTermCfg(ActionTermCfg):
    """Configuration for the cube action term."""

    class_type: type = CubeActionTerm
+    """The class corresponding to the action term."""
+
+    p_gain: float = 5.0
+    """Proportional gain of the PD controller."""
+    d_gain: float = 0.5
+    """Derivative gain of the PD controller."""


 ##
-# Observation Term
+# Custom observation term
 ##


@@ -158,6 +155,41 @@ def base_position(env: BaseEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
    return asset.data.root_pos_w - env.scene.env_origins


+##
+# Scene definition
+##
+
+
+@configclass
+class MySceneCfg(InteractiveSceneCfg):
+    """Example scene configuration.
+
+    The scene comprises of a ground plane, light source and floating cubes (gravity disabled).
+    """
+
+    # add terrain
+    terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
+
+    # add cube
+    cube: RigidObjectCfg = RigidObjectCfg(
+        prim_path="{ENV_REGEX_NS}/cube",
+        spawn=sim_utils.CuboidCfg(
+            size=(0.2, 0.2, 0.2),
+            rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0, disable_gravity=True),
+            mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
+            physics_material=sim_utils.RigidBodyMaterialCfg(),
+            visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
+        ),
+        init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
+    )
+
+    # lights
+    light = AssetBaseCfg(
+        prim_path="/World/light",
+        spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
+    )
+
+
 ##
 # Environment settings
 ##
@@ -218,7 +250,7 @@ class CubeEnvCfg(BaseEnvCfg):
    """Configuration for the locomotion velocity-tracking environment."""

    # Scene settings
-    scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=True)
+    scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5)
    # Basic settings
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
@@ -247,13 +279,15 @@ def main():

    # simulate physics
    count = 0
+    obs, _ = env.reset()
    while simulation_app.is_running():
        with torch.inference_mode():
            # reset
            if count % 300 == 0:
-                env.reset()
                count = 0
-
+                obs, _ = env.reset()
+                print("-" * 80)
+                print("[INFO]: Resetting environment...")
            # step env
            obs, _ = env.step(target_position)
            # print mean squared position error between target and current position
@@ -262,6 +296,9 @@ def main():
            # update counter
            count += 1

+    # close the environment
+    env.close()
+

 if __name__ == "__main__":
    try:

--- a/source/standalone/tutorials/04_envs/running_quadruped.py
+++ b/source/standalone/tutorials/04_envs/running_quadruped.py
@@ -4,11 +4,17 @@
 # SPDX-License-Identifier: BSD-3-Clause

 """
-This script demonstrates the environment concept that combines a scene with an action,
-observation and randomization manager for a quadruped robot.
+This script demonstrates the environment for a quadruped robot with height-scan sensor.
+
+In this example, we use a locomotion policy to control the robot. The robot is commanded to
+move forward at a constant velocity. The height-scan sensor is used to detect the height of
+the terrain.
+
+.. code-block:: bash
+
+    # Run the script
+    ./orbit.sh -p source/standalone/tutorials/04_envs/quadruped_base_env.py --num_envs 32

-A locomotion policy is loaded and used to control the robot. This shows how to use the
-environment with a policy.
 """

 from __future__ import annotations
@@ -21,7 +27,7 @@ import argparse
 from omni.isaac.orbit.app import AppLauncher

 # add argparse arguments
-parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
+parser = argparse.ArgumentParser(description="This script demonstrates a quadrupedal locomotion environment.")
 parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")

 # append AppLauncher cli args
@@ -34,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
 simulation_app = app_launcher.app

 """Rest everything follows."""
+
 import os
 import torch
 import traceback
@@ -62,6 +69,16 @@ from omni.isaac.orbit.utils.noise import AdditiveUniformNoiseCfg as Unoise
 from omni.isaac.orbit.terrains.config.rough import ROUGH_TERRAINS_CFG  # isort: skip


+##
+# Custom observation terms
+##
+
+
+def constant_commands(env: BaseEnv) -> torch.Tensor:
+    """The generated command from the command generator."""
+    return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
+
+
 ##
 # Scene definition
 ##
@@ -112,11 +129,6 @@ class MySceneCfg(InteractiveSceneCfg):
 ##


-def constant_commands(env: BaseEnv) -> torch.Tensor:
-    """The generated command from the command generator."""
-    return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
-
-
 @configclass
 class ActionsCfg:
    """Action specifications for the MDP."""
@@ -162,21 +174,7 @@ class ObservationsCfg:
 class RandomizationCfg:
    """Configuration for randomization."""

-    reset_base = RandTerm(
-        func=mdp.reset_root_state_uniform,
-        mode="reset",
-        params={
-            "pose_range": {"x": (-0.5, 0.5), "y": (-0.5, 0.5), "yaw": (-3.14, 3.14)},
-            "velocity_range": {
-                "x": (-0.5, 0.5),
-                "y": (-0.5, 0.5),
-                "z": (-0.5, 0.5),
-                "roll": (-0.5, 0.5),
-                "pitch": (-0.5, 0.5),
-                "yaw": (-0.5, 0.5),
-            },
-        },
-    )
+    reset_scene = RandTerm(func=mdp.reset_scene_to_default, mode="reset")


 ##
@@ -198,22 +196,21 @@ class QuadrupedEnvCfg(BaseEnvCfg):
    def __post_init__(self):
        """Post initialization."""
        # general settings
-        self.decimation = 4
-        self.episode_length_s = 20.0
+        self.decimation = 4  # env decimation -> 50 Hz control
        # simulation settings
-        self.sim.dt = 0.005
+        self.sim.dt = 0.005  # simulation timestep -> 200 Hz physics
+        self.sim.physics_material = self.scene.terrain.physics_material
        # update sensor update periods
        # we tick all the sensors based on the smallest update period (physics update period)
        if self.scene.height_scanner is not None:
-            self.scene.height_scanner.update_period = self.decimation * self.sim.dt
+            self.scene.height_scanner.update_period = self.decimation * self.sim.dt  # 50 Hz


 def main():
    """Main function."""
-
    # setup base environment
-    env = BaseEnv(cfg=QuadrupedEnvCfg())
-    obs, _ = env.reset()
+    env_cfg = QuadrupedEnvCfg()
+    env = BaseEnv(cfg=env_cfg)

    # load level policy
    policy_path = os.path.join(ISAAC_ORBIT_NUCLEUS_DIR, "Policies", "ANYmal-C", "policy.pt")
@@ -221,27 +218,29 @@ def main():
    if not check_file_path(policy_path):
        raise FileNotFoundError(f"Policy file '{policy_path}' does not exist.")
    # jit load the policy
-    locomotion_policy = torch.jit.load(policy_path)
-    locomotion_policy.to(env.device)
-    locomotion_policy.eval()
+    policy = torch.jit.load(policy_path).to(env.device).eval()

    # simulate physics
    count = 0
+    obs, _ = env.reset()
    while simulation_app.is_running():
        with torch.inference_mode():
            # reset
            if count % 1000 == 0:
                obs, _ = env.reset()
                count = 0
-                print("[INFO]: Resetting robots state...")
-
+                print("-" * 80)
+                print("[INFO]: Resetting environment...")
            # infer action
-            action = locomotion_policy(obs["policy"])
+            action = policy(obs["policy"])
            # step env
            obs, _ = env.step(action)
            # update counter
            count += 1

+    # close the environment
+    env.close()
+

 if __name__ == "__main__":
    try:

--- a/source/standalone/workflows/sb3/train.py
+++ b/source/standalone/workflows/sb3/train.py
@@ -45,6 +45,7 @@ else:
 # launch omniverse app
 app_launcher = AppLauncher(args_cli, experience=app_experience)
 simulation_app = app_launcher.app
+
 """Rest everything follows."""


@@ -93,7 +94,7 @@ def main():
    n_timesteps = agent_cfg.pop("n_timesteps")

    # create isaac environment
-    env = gym.make(args_cli.task, cfg=env_cfg)
+    env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
    # wrap for video recording
    if args_cli.video:
        video_kwargs = {