Unverified Commit e49048f9 authored by Mayank Mittal's avatar Mayank Mittal Committed by GitHub

Adapts tutorial on base and RL environment (#283)

# Description

This MR adapts the environment tutorials. It reorganizes the tutorials
for them and also modify the content to make them more complete.

## Type of change

- This change requires a documentation update

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent eb75c536
......@@ -62,10 +62,10 @@ For more information about the framework, please refer to the `paper <https://ar
:maxdepth: 1
:caption: Tutorials (Environments)
source/tutorials/03_envs/00_gym_env
source/tutorials/03_envs/01_create_base_env
source/tutorials/03_envs/02_create_rl_env
source/tutorials/03_envs/03_wrappers
source/tutorials/03_envs/base_env
source/tutorials/03_envs/rl_env
source/tutorials/03_envs/gym_registry
source/tutorials/03_envs/rl_training
.. toctree::
......
......@@ -23,7 +23,7 @@ the rail. The attached pole has 1 DOF that allows it to rotate freely.
.. TODO: Add isaac sim screenshot and replace GIF with a webdb
In :ref:`creating-base-env` participants will learn to control the
In :ref:`tutorial-create-base-env` participants will learn to control the
pole to stabilize the cart, but this tutorial focuses on merely constructing
the :class:`ArticulationCfg` that defines the cartpole.
......
.. _how-to-env-wrappers:
Using environment wrappers
==========================
......
......@@ -48,3 +48,12 @@ The `control` section focuses on how to implement controllers within Orbit.
Markers <04_controllers/ik_controller>
Please refer to the individual guides in each section for detailed instructions and examples.
Gym
---
.. toctree::
:maxdepth: 1
Gym Wrappers <05_gym/wrappers>
Launching Isaac Sim from AppLauncher
==============================
====================================
.. currentmodule:: omni.isaac.orbit
......
Running an RL environment
=========================
In this tutorial, we will learn how to run existing learning environments provided in the ``omni.isaac.orbit_tasks``
extension. All the environments included in Orbit follow the :class:`gymnasium.Env` interface, which means that they can be used
with any reinforcement learning framework that supports OpenAI Gym. However, since the environments are implemented
in a vectorized fashion, they can only be used with frameworks that support vectorized environments.
Many common frameworks come with their own desired definitions of a vectorized environment and require the returned data
to follow their supported data types and data structures. For example, ``stable-baselines3`` uses ``numpy`` arrays, while
``rsl-rl``, ``rl-games``, or ``skrl`` use ``torch.Tensor``. We provide wrappers for these different frameworks, which can be found
in the ``omni.isaac.orbit_tasks.utils.wrappers`` module.
The Code
~~~~~~~~
The tutorial corresponds to the ``zero_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:emphasize-lines: 34-35,41-44,49-55
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
Using gym registry for environments
-----------------------------------
All environments are registered using the ``gym`` registry, which means that you can create an instance of
an environment by calling ``gym.make``. The environments are registered in the ``__init__.py`` file of the
``omni.isaac.orbit_tasks`` extension with the following syntax:
.. code-block:: python
# Cartpole environment
gym.register(
id="Isaac-Cartpole-v0",
entry_point="omni.isaac.orbit_tasks.classic.cartpole:CartpoleEnv",
disable_env_checker=True,
kwargs={"env_cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
)
The ``env_cfg_entry_point`` argument is used to load the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_cfg_from_registry` function.
The configuration entry point can correspond to both a YAML file or a python configuration
class. The default configuration can be overridden by passing a custom configuration instance to the ``gym.make``
function as shown later in the tutorial.
To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks`` extension,
we must import the module at the start of the script.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 33-35
:linenos:
:lineno-start: 33
.. note::
As a convention, we name all the environments in ``omni.isaac.orbit_tasks`` extension with the prefix ``Isaac-``.
For more complicated environments, we follow the pattern: ``Isaac-<TaskName>-<RobotName>-v<N>``,
where `N` is used to specify different observations or action spaces within the same task definition. For example,
for legged locomotion with ANYmal C, the environment is called ``Isaac-Velocity-Anymal-C-v0``.
In this tutorial, the task name is read from the command line. The task name is used to load the default configuration
as well as to create the environment instance. In addition, other parsed command line arguments such as the
number of environments, the simulation device, and whether to render, are used to override the default configuration.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 42-45
:linenos:
:lineno-start: 42
Running the environment
-----------------------
Once creating the environment, the rest of the execution follows the standard resetting and stepping.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 45-55
:linenos:
:lineno-start: 45
Similar to previous tutorials, to ensure a safe exit when running the script, we need to add checks
for whether the simulation is stopped or not.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 57-59
:linenos:
:lineno-start: 57
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the script and see the result:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/zero_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with a ground plane, lights and 32 cartpoles spawned in a grid. The cartpole
would be falling down since no actions are acting on them. To stop the simulation,
you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C``
in the terminal.
.. note::
When running environments with GPU pipeline, the states in the scene are not synced with the USD
interface. Therefore values in the UI may appear wrong when simulation is running. Although objects
may be updating in the Viewport, attribute values in the UI will not update along with them.
To enable USD synchronization, please use the CPU pipeline with ``--cpu`` and disable flatcache by setting
``use_flatcache`` to False in the environment configuration.
.. _creating-base-env:
Creating a Base Environment
===========================
In Orbit, there are two types of environments: :class:`BaseEnv` and
:class:`RLTaskEnv`. Base environments contain a robot, its action and
observation spaces as well as randomizations (and handling of resets) to be applied to the
environment. Typically, a :class:`BaseEnv` is utilized if one wants
to evaluate an existing control algorithm, mechanical design or do traditional
robot control but doesn't plan on doing RL. This workflow is commonly used in
other simulators such as Gazebo, Mujoco, etc. :class:`BaseEnv` doesn't
contain rewards and terminations, which are common in RL settings. If
interested in doing RL in Orbit, this tutorial is still a good starting point
as :class:`RLTaskEnv` inherits from :class:`BaseEnv` and there's a lot of shared functionality.
In this tutorial, we will look at the base class :class:`BaseEnv` and its
corresponding configuration class :class:`BaseEnvCfg` and
discuss the different configuration classes that need to be implemented to
create a new environment. We will use the Cartpole environment with simple PD
control as an example to illustrate the different steps in developing a
new :class:`BaseEnv`.
The Code
~~~~~~~~
The tutorial corresponds to the ``cartpole_base_env`` script in the ``orbit/source/standalone/tutorials`` directory.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
All environments in Orbit inherit from the base class :class:`BaseEnv`.
The base class :class:`BaseEnv` wraps around many intricacies of the
simulation and provides a simple interface for the user to implement their own
environment. At the core, the base class provides the following
functionality:
* :meth:`__init__` method to create the environment instances and initialize
different components
* :meth:`reset` and :meth:`step` methods that are used to interact with
the environment
* :meth:`close` method to close the environment
* :meth:`load_managers` method to load the managers that handle actions,
observations and any randomizations
The base class :class:`BaseEnv` is defined in the file \ ``base_env.py``:
.. dropdown:: :fa:`eye,mr-1` Code for ``base_env.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/base_env.py
:language: python
:lines: 45-323
To customize a :class:`BaseEnv` one needs to implement a :class:`BaseEnvCfg`
which configures the action space, observation space and randomizations
associated with the environment. These are utilized by their associated :class:`ManagerBase`
classes to interact with the environment.
The base class :class:`BaseEnvCfg` is defined in the file ``base_env_cfg.py``:
.. dropdown:: :fa:`eye,mr-1` Code for ``base_env_cfg.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/base_env_cfg.py
:language: python
:lines: 56-91
The Code Explained
~~~~~~~~~~~~~~~~~~
Designing the scene
-------------------
The first step in creating a new environment is to configure the scene by
implementing a :class:`InteractiveSceneCfg`. This will then be used to construct
a :class:`InteractiveScene` which handles spawning of the objects in the scene.
In this tutorial, we will be using the configuration from ``cartpole_scene.py``.
See :ref:`tutorial-interactive-scene` for a tutorial on how to create it.
The scene used here consists of a ground plane, the cartpole and some lights.
.. dropdown:: :fa:`eye,mr-1` Code for :class:``CartpoleSceneCfg`` class in ``cartpole_scene.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_scene.py
:language: python
Defining actions
----------------
The action space of the Cartpole environment in this example is the force control
over the sliding Cart portion of the cartpole,
moving it horizontally along the rail to balance the pole vertically.
The :class:`ActionTerm` developed in this tutorial implements PD control. In
:meth:`process_actions`, the PD controller calculates the control input
given the current joint position and velocity of the the `cart_to_pole` joint.
This method, in addition to :meth:`apply_actions` are called by the action
manager at each step of the environment to determine the next action and then
apply it.
.. dropdown:: :fa:`eye,mr-1` Code for :class:``ActionsCfg`` class in ``cartpole_base_env.py``
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:start-after: # Cartpole Action Configuration
:end-before: # Cartpole Observation Configuration
Defining observations
----------------------
The observation space of the environment defines the observed state at
each time step.
The returned observations will be a dictionary with the keys corresponding to the group names
and the values corresponding to the observation tensors of shape ``(num_envs, obs_dims)``.
This allows the user to define multiple
observation groups that can then be used for different learning paradigms (e.g. for asymmetric actor-critic, one group could be for the RL
policy while the other is for the RL-critic).
While not prescriptive in the base class, it is recommended that the user always define the ``policy`` group which is used as the default
observation group for the environment. This is essential because various wrappers read this group name to unwrap the observations dictionary
for their respective frameworks.
In the cartpole environment, the observation is computed by the :class:`ObservationManager` class. This class is responsible for computing
the observations for the environment by reading data from the various buffers and sensors. More details on the observation manager can be
found in the `MDP managers <../api/orbit/omni.isaac.orbit.managers.html#observation-manager>`_ section.
.. dropdown:: :fa:`eye,mr-1` Code for :class:`ObservationsCfg` class in ``cartpole_base_env.py``
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:start-after: # Cartpole Observation Configuration
:end-before: # Cartpole Randomization Configuration
Defining randomizations
-----------------------
Often times in robotics, randomness is used to more closely emulate the real world.
In Orbit, :class:`RandomizationManager` is used to manage randomness and define
what environment terms that will be randomized via its :class:`RandomizationCfg`.
In addition, it handles reset calls, so even if you don't want to randomize anything,
you still need to define a :class:`RandomizationCfg` to handle the reset calls.
In this example, the initial slider to cart joint position and velocity are randomized
to be within (-1.0, 1.0) meters and (-0.1, 0.1) radians respectively. Also, the pole joint's position
and velocity are randomized slightly to make the problem more challenging.
When developing your own environments, feel free to add more :class:`RandTerm` as needed or use the
ones pre packaged with Orbit.
Randomization terms have a `mode` associated with them as denoted by the mode argument of
:class:`RandTerm`. The various `mode`s are `"interval", "reset", "startup"`.
Randomization Modes
####################
* `"interval"` `mode` execute randomization at a given fixed interval.
* `"reset"` `mode` execute randomization on every call to an environment's :meth:`reset`.
* `"startup"` `mode` execute randomization only once at environment startup.
In this example, the randomization terms use `reset` mode indicating the
randomization will be applied upon each call to :class:`reset`.
.. dropdown:: :fa:`eye,mr-1` Code for :class:``RandomizationCfg`` class in ``cartpole_base_env.py```
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:start-after: # Cartpole Randomization Configuration
:end-before: # Cartpole Environment Configuration
Tying it all together
---------------------
In this section we will integrate the scene, observation, action and randomization
configurations built in the previous sections to fully configure the Cartpole
:class:`BaseEnv`.
.. dropdown:: :fa:`eye,mr-1` Code for :class:`CartpoleEnvCfg` class in ``cartpole_base_env.py``
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:start-after: # Cartpole Environment Configuration
:end-before: # Main
.. note:: To modify any configuration of the :class:`BaseEnvCfg` you can use :class:`__post_init__`
as is done in this example.
The main method
---------------
Lastly, we define the main method which will handle resetting and stepping of the environment.
At each iteration, we send the target_position - the `action` to the environment and
receive back the observation which is then printed to the console.
.. dropdown:: :fa:`eye,mr-1` Code for :class:`CartpoleEnvCfg` class in ``cartpole_base_env.py``
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:start-after: # Main
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the environment.
As an example, to run the Cartpole base environment script, you can use the following command.
.. code-block:: bash
./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_base_env.py
This should open a stage with a ground plane, lights, and a cartpole.
The simulation should be playing with the cartpole attempting to balance itself
such that the pole is vertical. Feel free to modify the P and D gains
to improve the cart's ability to balance the pole vertically.
To stop the simulation,
you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C`` in the terminal
where you started the simulation.
Creating an RL Environment
==========================
In Orbit, we provide a set of environments that are ready to use. However, you may want to create your own
RL environment for your application. This tutorial will show you how to create a new RL environment from scratch.
As a practice, we maintain all the environments that are *officially* provided in the ``omni.isaac.orbit_tasks``
extension. It is recommended to add your environment to the extension ``omni.isaac.contrib_tasks``. This way, you can
easily update your environment when the API changes and you can also contribute your environment to the community.
In this tutorial, we will look at the configuration class :class:`RLTaskEnvCfg` that is used to
configure your learning agent and discuss the different classes you need to create to configure
your RL task. We will use the Cartpole balancing task environment as an example to illustrate the
different components.
The Code
~~~~~~~~
All RL environments in Orbit inherit from the base class :class:`RLTaskEnv`. The base class follows the ``gym.Env``
interface and provides the basic functionality for an environment. Similar to `IsaacGym <https://sites.google.com/view/isaacgym-nvidia>`_,
all environments designed in Orbit are *vectorized* implementations. This means that multiple environment
instances are packed into the simulation and the user can interact with the environment by passing in a batch of actions.
.. note::
While the environment itself is implemented as a vectorized environment, we do not
inherit from :class:`gym.vector.VectorEnv`. This is mainly because the class adds
various methods (for wait and asynchronous updates) which are not required.
Additionally, each RL library typically has its own definition for a vectorized
environment. Thus, to reduce complexity, we directly use the :class:`gym.Env` over
here and leave it up to library-defined wrappers to take care of wrapping this
environment for their agents.
The base class :class:`RLTaskEnv` wraps around many intricacies of the simulation and
provides a simple interface for the user to implement their own environment. At the core, the
base class provides the following functionality:
* :meth:`__init__` method to create the environment instances and initialize different components
* :meth:`reset` and :meth:`step` methods that are used to interact with the environment
* :meth:`render` method to render the environment
* :meth:`close` method to close the environment
All environments are registered using the :func:`gym.register` method. This method takes in the name of the
environment, the class that implements the environment and the configuration file for the environment.
The name of the environment is used to create the environment using the :func:`gym.make` method.
The base class :class:`RLTaskEnv` is defined in the file ``rl_task_env.py``:
.. dropdown:: :fa:`eye,mr-1` Code for ``rl_task_env.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/rl_task_env.py
:language: python
:linenos:
Similar to other components of Orbit, instead of directly modifying the base
class :class:`RLTaskEnv`, users can simply implement a configuration
:class:`RLTaskEnvCfg` which will then be used to construct a
:class:`RLTaskEnv` instance.
This tutorial will continue along with the Cartpole example, this time creating a `RLTaskEnvCfg`
to define the task of balancing the pole from a reinforcement learning perspective.
.. dropdown:: :fa:`eye,mr-1` Code for ``cartpole_env_cfg.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
Designing the scene
-------------------
The first step in creating a new environment is to design the scene in which the agent will operate within.
The scene used in this tutorial is the same one used in the tutorial :ref:`creating-base-env`, so we won't
go over it in detail again here.
Also see :ref:`tutorial-interactive-scene` for even more details on scene creation.
Designing the Action and Observation Spaces
-------------------------------------------
Again, the :class:`ActionTerm` and :class:`ObservationTerm` used here are the same as those
used in :ref:`creating-base-env`, so you can reference that tutorial for more details.
Designing the Rewards
------------------------
The :class:`RewardsCfg` configures the :class:`RewardManager` to dictate how the agent receives rewards from the
environment. In this example we define a few reward terms to guide our agent to a robust policy to balance the pole.
To define a reward term, you need to provide the function that computes the reward
as well as the weighting associated with it.
There are a few reward functions pre-defined in ``source/extensions/omni.isaac.orbit/omni/isaac/orbit/envs/mdp/rewards.py``
that will be used in the Cartpole environment, but when creating your own environments, feel free to add more to
your task config as you see fit.
The various reward terms used in this environment will be explained in the following sections. Feel free to skip over this
if you are already familiar with the cartpole example.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:start-after: # Rewards configuration
:end-before: # Terminations configuration
* **Alive Reward Term**: Agent receives a reward at each step that it is
not terminated which is weighted by a factor of 1.0. This term is used to encourage the agent to
stay alive for as long as possible.
* **Terminating Reward Term**: Agent receives a reward at each step that it is in terminated state
which is weighted by a factor of -2.0. This term is similarly used to penalize the agent for terminating.
* **Pole Angle Reward Term**: Agent receives a reward based upon the L2 norm of the current pole angle
compared to the target pole angle which is weighted by a factor of -1.0. This term is used to encourage
the agent to keep the pole angle close to the desired angle.
* **Cart Velocity Reward Term**: Agent receives a reward based upon the L1 norm of the current cart velocity
which is weighted by a factor of -0.01. This term is used to encourage the agent to keep the cart velocity
close to the desired velocity.
* **Pole Velocity Reward Term**: Agent receives a reward based upon the L1 norm of the current pole velocity
which is weighted by a factor of -0.005. This term is used to encourage the agent to keep the pole velocity
close to the desired velocity.
Designing the Termination Criteria
----------------------------------
In RL tasks, it is important to define when an episode is terminated. This is because the agent needs to know when
to reset the environment and start a new episode.
The :class:`TerminationsCfg` configures what constitutes an episode as
terminated. In this example, we want the task to terminate when either of the following conditions is met:
* **Episode Length** The episode length is greater than the defined max_episode_length
* **Cart out of bounds** The cart goes outside of the bounds [-3, 3]
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:start-after: # Terminations configuration
:end-before: # Curriculum configuration
As with :class:`RewardTermCfg`, you can define additional :class:`DoneTermCfg` for any additional criteria
for which you want to terminate the episode and add them to the :class:`TerminationsCfg`.
Curriculum and Commands
-----------------------
Curriculum
^^^^^^^^^^
Often times when training an agent, it is useful to start with a simple task and gradually increase the difficulty
as the agent learns. This is the idea behind curriculum learning. In Orbit, we provide a :class:`CurriculumManager`
that can be used to define a curriculum for your environment. In this tutorial we won't implement one for simplicity,
but you can see an example of a curriculum definition in the included Lift task for inspiration.
.. TODO: add tutorial that explains how to use the curriculum manager and reference it here.
We use a simple pass-through curriculum in this example to define a curriculum manager that does not modify the
environment.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:start-after: # Curriculum configuration
:end-before: ##
Commands
^^^^^^^^
Additionally, you can also define commands that are sent to the environment at the start of each episode. This is
useful for resetting the environment to a specific state or for providing additional information to the agent. In this
example, we don't use any commands, but you can see an example of a command definition in the included Lift task for
inspiration.
.. TODO: add tutorial that explains how to use the command manager and reference it here.
We use the :class:`NullCommandGeneratorCfg` in this example to define a command generator that does not generate
any commands. It is provided to the user as a convenience class to avoid having to define a new command generator.
There are other command generators that are provided in
``source/extensions/omni.isaac.orbit/omni/isaac/orbit/command_generators``.
.. dropdown:: :fa:`eye,mr-1` Code for ``null_command_generator.py``
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit/omni/isaac/orbit/command_generators/null_command_generator.py
:language: python
:linenos:
Combining Everything
--------------------
Now we want to construct a :class:`RLTaskEnvCfg` that ties all of the
task configurations together. This object will be used to construct the
:class:`RLEnv` to be used for RL training.
Again this is similar to the :class:`BaseEnvCfg` defined in :ref:`creating-base-env`.
only with the added RL components explained in the above sections:
* Curriculum
* Rewards
* Terminations
* Commands
.. TODO: Explain __post_init__
Registering the environment
---------------------------
Before you can run your environment, you need to register your environment with the OpenAI Gym interface.
To register an environment, call the :meth:`gym.register` method in the :mod:`__init__.py` file of your environment package
(for instance, in ``omni.isaac.contrib_tasks.__init__.py``). This has the following components:
* **Name of the environment:** This should ideally be in the format :const:`Isaac-\<EnvironmentName\>-\<Robot\>-\<Version\>`.
However, this is not a strict requirement and you can use any name you want.
* **Entry point:** This is the import path of the environment class. This is used to instantiate the environment.
* **Config entry point:** This is the import path of the environment configuration file. This is used to instantiate the environment configuration.
The configuration file can be either a YAML file or a Python dataclass
As examples of this in the ``omni.isaac.orbit_tasks`` package, we have the following:
.. dropdown:: :fa:`eye,mr-1` Registering an environment with a YAML configuration file
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
:language: python
:lines: 52-56
:linenos:
:lineno-start: 52
.. dropdown:: :fa:`eye,mr-1` Registering an environment with a Python dataclass configuration file
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/__init__.py
:language: python
:lines: 84-88
:linenos:
:lineno-start: 84
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the environment. All environments registered in the ``omni.isaac.orbit_tasks``
and ``omni.isaac.contrib_tasks`` packages are automatically available in the included standalone environments and workflows scripts.
As an example, to run the Cartpole RL environment, you can use the following command.
.. code-block:: bash
./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with a ground plane, lights, and 4096 Cartpole agents initialized at different
random configurations. The simulation should be playing with each Carpoles moving randomly.
To stop the simulation, you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C`` in the terminal
where you started the simulation.
.. _tutorial-create-base-env:
Creating a Base Environment
===========================
.. currentmodule:: omni.isaac.orbit
Environments bring together different aspects of the simulation such as
the scene, observations and actions spaces, randomizations, etc. to create a
coherent interface for various applications. In Orbit, environments are
implemented as :class:`envs.BaseEnv` and :class:`envs.RLTaskEnv` classes.
The two classes are very similar, but :class:`envs.RLTaskEnv` is useful for
reinforcement learning tasks and contains rewards, terminations, curriculum
and command generation. The :class:`envs.BaseEnv` class is useful for
traditional robot control and doesn't contain rewards and terminations.
In this tutorial, we will look at the base class :class:`envs.BaseEnv` and its
corresponding configuration class :class:`envs.BaseEnvCfg`. We will use the
cartpole environment from earlier to illustrate the different components
in creating a new :class:`envs.BaseEnv` environment.
The Code
~~~~~~~~
The tutorial corresponds to the ``cartpole_base_env`` script in the ``orbit/source/standalone/tutorials/04_envs``
directory.
.. dropdown:: Code for cartpole_base_env.py
:icon: code
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:emphasize-lines: 51-55, 58-75, 78-111, 114-133, 138-142, 147, 151, 156-157, 163-164
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
The base class :class:`envs.BaseEnv` wraps around many intricacies of the simulation interaction
and provides a simple interface for the user to run the simulation and interact with it. It
is composed of the following components:
* :class:`scene.InteractiveScene` - The scene that is used for the simulation.
* :class:`managers.ActionManager` - The manager that handles actions.
* :class:`managers.ObservationManager` - The manager that handles observations.
* :class:`managers.RandomizationManager` - The manager that handles randomizations.
By configuring these components, the user can create different variations of the same environment
with minimal effort. In this tutorial, we will go through the different components of the
:class:`envs.BaseEnv` class and how to configure them to create a new environment.
Designing the scene
-------------------
The first step in creating a new environment is to configure its scene. For the cartpole
environment, we will be using the scene from the previous tutorial. Thus, we omit the
scene configuration here. For more details on how to configure a scene, see
:ref:`tutorial-interactive-scene`.
Defining actions
----------------
In the previous tutorial, we directly input the action to the cartpole using
the :meth:`assets.Articulation.set_joint_effort_target` method. In this tutorial, we will
use the :class:`managers.ActionManager` to handle the actions.
The action manager can comprise of multiple :class:`managers.ActionTerm`. Each action term
is responsible for applying *control* over a specific aspect of the environment. For instance,
for robotic arm, we can have two action terms -- one for controlling the joints of the arm,
and the other for controlling the gripper. This composition allows the user to define
different control schemes for different aspects of the environment.
In the cartpole environment, we want to control the force applied to the cart to balance the pole.
Thus, we will create an action term that controls the force applied to the cart.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:pyobject: ActionsCfg
:linenos:
:lineno-start: 51
Defining observations
---------------------
While the scene defines the state of the environment, the observations define the states
that are observable by the agent. These observations are used by the agent to make decisions
on what actions to take. In Orbit, the observations are computed by the
:class:`managers.ObservationManager` class.
Similar to the action manager, the observation manager can comprise of multiple observation terms.
These are further grouped into observation groups which are used to define different observation
spaces for the environment. For instance, for hierarchical control, we may want to define
two observation groups -- one for the low level controller and the other for the high level
controller. It is assumed that all the observation terms in a group have the same dimensions.
For this tutorial, we will only define one observation group named ``"policy"``. While not completely
prescriptive, this group is a necessary requirement for various wrappers in Orbit.
We define a group by inheriting from the :class:`managers.ObservationGroupCfg` class. This class
collects different observation terms and help define common properties for the group, such
as enabling noise corruption or concatenating the observations into a single tensor.
The individual terms are defined by inheriting from the :class:`managers.ObservationTermCfg` class.
This class takes in the :attr:`managers.ObservationTermCfg.func` that specifies the function or
callable class that computes the observation for that term. It includes other parameters for
defining the noise model, clipping, scaling, etc. However, we leave these parameters to their
default values for this tutorial.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:pyobject: ObservationsCfg
:linenos:
:lineno-start: 58
Defining randomizations
-----------------------
At this point, we have defined the scene, actions and observations for the cartpole environment.
The general idea for all these components is to define the configuration classes and then
pass them to the corresponding managers. The randomization manager is no different.
The :class:`managers.RandomizationManager` class is responsible for randomizing everything related
to the simulation state. This includes randomizing (or resetting) the scene, randomizing physical
properties (such as mass, friction, etc.), and visual properties (such as colors, textures, etc.).
Each of these are specified through the :class:`managers.RandomizationTermCfg` class, which
takes in the :attr:`managers.RandomizationTermCfg.func` that specifies the function or callable
class that performs the randomization. Additionally, it expects the **mode** of randomization.
The mode specifies when the randomization term should be applied. It is possible to specify
your own mode, but Orbit provides three modes out of the box:
* ``"startup"`` - Randomize only once at environment startup.
* ``"reset"`` - Randomize on every call to an environment's reset.
* ``"interval"`` - Randomize at a given fixed interval.
For this example, we randomize the pole's mass on startup. This is done only once since this operation
is expensive and we don't want to do it on every reset. We also randomize the initial joint state of
the cartpole and the pole at every reset.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:pyobject: RandomizationCfg
:linenos:
:lineno-start: 78
Tying it all together
---------------------
Having defined the scene and manager configurations, we can now define the environment configuration
through the :class:`envs.BaseEnvCfg` class. This class takes in the scene, action, observation and
randomization configurations.
In addition to these, it also takes in the :attr:`envs.BaseEnvCfg.sim` which defines the simulation
parameters such as the timestep, gravity, etc. This is initialized to the default values, but can
be modified as needed. We recommend doing so by defining the :meth:`__post_init__` method in the
:class:`envs.BaseEnvCfg` class, which is called after the configuration is initialized.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:pyobject: CartpoleEnvCfg
:linenos:
:lineno-start: 114
Running the simulation
----------------------
Lastly, we revisit the simulation execution loop. This is now much simpler since we have
abstracted away most of the details into the environment configuration. We only need to
call the :meth:`envs.BaseEnv.reset` method to reset the environment and :meth:`envs.BaseEnv.step`
method to step the environment. Both these functions return the observation and an info dictionary
which may contain additional information provided by the environment. These can be used by an
agent for decision-making.
The :class:`envs.BaseEnv` class does not have any notion of terminations since that concept is
specific for episodic tasks. Thus, the user is responsible for defining the termination condition
for the environment. In this tutorial, we reset the simulation at regular intervals.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_base_env.py
:language: python
:pyobject: main
:linenos:
:lineno-start: 136
An important thing to note above is that the entire simulation loop is wrapped inside the
:meth:`torch.inference_mode` context manager. This is because the environment uses PyTorch
operations under-the-hood and we want to ensure that the simulation is not slowed down by
the overhead of PyTorch's autograd engine and gradients are not computed for the simulation
operations.
The Code Execution
~~~~~~~~~~~~~~~~~~
To run the base environment made in this tutorial, you can use the following command:
.. code-block:: bash
./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_base_env.py --num_envs 32
This should open a stage with a ground plane, light source, and cartpoles. The simulation should be
playing with random actions on the cartpole. Additionally, it opens a UI window on the bottom
right corner of the screen named ``"Orbit"``. This window contains different UI elements that
can be used for debugging and visualization.
To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal where you
started the simulation.
In this tutorial, we learned about the different managers that help define a base environment. We
include more examples of defining the base environment in the ``orbit/source/standalone/tutorials/04_envs``
directory. For completion, they can be run using the following commands:
.. code-block:: bash
# Floating cube environment with custom action term for PD control
./orbit.sh -p source/standalone/tutorials/04_envs/cube_base_env.py --num_envs 32
# Quadrupedal locomotion environment with a policy that interacts with the environment
./orbit.sh -p source/standalone/tutorials/04_envs/quadruped_base_env.py --num_envs 32
In the following tutorial, we will look at the :class:`envs.RLTaskEnv` class and how to use it
to create a Markovian Decision Process (MDP).
Registering an Environment
==========================
.. currentmodule:: omni.isaac.orbit
In the previous tutorial, we learned how to create a custom cartpole environment. We manually
created an instance of the environment by importing the environment class and its configuration
class.
.. dropdown:: Environment creation in the previous tutorial
:icon: code
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
:language: python
:lines: 39-50
While straightforward, this approach is not scalable as we have a large suite of environments.
In this tutorial, we will show how to use the :meth:`gymnasium.register` method to register
environments with the ``gymnasium`` registry. This allows us to create the environment through
the :meth:`gymnasium.make` function.
.. dropdown:: Environment creation in this tutorial
:icon: code
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 40-50
The Code
~~~~~~~~
The tutorial corresponds to the ``random_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
.. dropdown:: Code for random_agent.py
:icon: code
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:emphasize-lines: 40-42, 47-50
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
The :class:`envs.RLTaskEnv` class inherits from the :class:`gymnasium.Env` class to follow
a standard interface. However, unlike the traditional Gym environments, the :class:`envs.RLTaskEnv`
implements a *vectorized* environment. This means that multiple environment instances
are running simultaneously in the same process, and all the data is returned in a batched
fashion.
Using the gym registry
----------------------
To register an environment, we use the :meth:`gymnasium.register` method. This method takes
in the environment name, the entry point to the environment class, and the entry point to the
environment configuration class. For the cartpole environment, the following shows the registration
call in the ``omni.isaac.orbit_tasks.classic.cartpole`` sub-package:
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/__init__.py
:language: python
:lines: 10-
:emphasize-lines: 11, 12, 15
The ``id`` argument is the name of the environment. As a convention, we name all the environments
with the prefix ``Isaac-`` to make it easier to search for them in the registry. The name of the
environment is typically followed by the name of the task, and then the name of the robot.
For instance, for legged locomotion with ANYmal C on flat terrain, the environment is called
``Isaac-Velocity-Flat-Anymal-C-v0``. The version number ``v<N>`` is typically used to specify different
variations of the same environment. Otherwise, the names of the environments can become too long
and difficult to read.
The ``entry_point`` argument is the entry point to the environment class. The entry point is a string
of the form ``<module>:<class>``. In the case of the cartpole environment, the entry point is
``omni.isaac.orbit.envs:RLTaskEnv``. The entry point is used to import the environment class
when creating the environment instance.
The ``env_cfg_entry_point`` argument specifies the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_env_cfg` function.
It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
The configuration entry point can be both a YAML file or a python configuration class.
.. note::
The ``gymnasium`` registry is a global registry. Hence, it is important to ensure that the
environment names are unique. Otherwise, the registry will throw an error when registering
the environment.
Creating the environment
------------------------
To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks``
extension, we must import the module at the start of the script. This will execute the ``__init__.py``
file which iterates over all the sub-packages and registers their respective environments.
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 40-41
:linenos:
:lineno-start: 40
In this tutorial, the task name is read from the command line. The task name is used to parse
the default configuration as well as to create the environment instance. In addition, other
parsed command line arguments such as the number of environments, the simulation device,
and whether to render, are used to override the default configuration.
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 47-50
:linenos:
:lineno-start: 47
Once creating the environment, the rest of the execution follows the standard resetting and stepping.
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the script and see the result:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with everything similar to the previous :ref:`tutorial-create-rl-env` tutorial.
To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal.
In addition, you can also change the simulation device from GPU to CPU by adding the ``--cpu`` flag:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32 --cpu
With the ``--cpu`` flag, the simulation will run on the CPU. This is useful for debugging the simulation.
However, the simulation will run much slower than on the GPU.
.. _tutorial-create-rl-env:
Creating an RL Environment
==========================
.. currentmodule:: omni.isaac.orbit
Having learnt how to create a base environment in :ref:`tutorial-create-base-env`, we will now look at how to create a
task environment for reinforcement learning.
The base environment is designed as an sense-act environment where the agent can send commands to the environment
and receive observations from the environment. This minimal interface is sufficient for many applications such as
traditional motion planning and controls. However, many applications require a task-specification which often
serves as the learning objective for the agent. For instance, in a navigation task, the agent may be required to
reach a goal location. To this end, we use the :class:`envs.RLTaskEnv` class which extends the base environment
to include a task specification.
Similar to other components in Orbit, instead of directly modifying the base class :class:`RLTaskEnv`, we
encourage users to simply implement a configuration :class:`RLTaskEnvCfg` for their task environment.
This practice allows us to separate the task specification from the environment implementation, making it easier
to reuse components of the same environment for different tasks.
In this tutorial, we will configure the cartpole environment using the :class:`RLTaskEnvCfg` to create a task
for balancing the pole upright. We will learn how to specify the task using reward terms, termination criteria,
curriculum and commands.
The Code
~~~~~~~~
For this tutorial, we use the cartpole environment defined in ``omni.isaac.orbit_tasks.classic.cartpole`` module.
.. dropdown:: Code for cartpole_env_cfg.py
:icon: code
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:emphasize-lines: 63-68, 124-149, 152-162, 165-169, 187-192
:linenos:
The script for running the environment ``cartpole_rl_env.py`` is present in the
``orbit/source/standalone/tutorials/04_envs`` directory. The script is similar to the
``cartpole_base_env.py`` script in the previous tutorial, except that it uses the
:class:`envs.RLTaskEnv` instead of the :class:`envs.BaseEnv`.
.. dropdown:: Code for cartpole_rl_env.py
:icon: code
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
:language: python
:emphasize-lines: 46-50, 64-65
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
We already went through parts of the above in the :ref:`tutorial-create-base-env` tutorial to learn
about how to specify the scene, observations, actions and randomizations. Thus, in this tutorial, we
will focus only on the RL components of the environment.
In Orbit, we provide various implementations of different terms in the :mod:`envs.mdp` module. We will use
some of these terms in this tutorial, but users are free to define their own terms as well. These
are usually placed in their task-specific sub-package
(for instance, in :mod:`omni.isaac.orbit_tasks.classic.cartpole.mdp`).
Defining rewards
----------------
The :class:`managers.RewardManager` is used to compute the reward terms for the agent. Similar to the other
managers, its terms are configured using the :class:`managers.RewardTermCfg` class. The
:class:`managers.RewardTermCfg` class specifies the function or callable class that computes the reward
as well as the weighting associated with it. It also takes in dictionary of arguments, ``"params"``
that are passed to the reward function when it is called.
For the cartpole task, we will use the following reward terms:
* **Alive Reward**: Encourage the agent to stay alive for as long as possible.
* **Terminating Reward**: Similarly penalize the agent for terminating.
* **Pole Angle Reward**: Encourage the agent to keep the pole at the desired upright position.
* **Cart Velocity Reward**: Encourage the agent to keep the cart velocity as small as possible.
* **Pole Velocity Reward**: Encourage the agent to keep the pole velocity as small as possible.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: RewardsCfg
:linenos:
:lineno-start: 124
Defining termination criteria
-----------------------------
Most learning tasks happen over a finite number of steps that we call an episode. For instance, in the cartpole
task, we want the agent to balance the pole for as long as possible. However, if the agent reaches an unstable
or unsafe state, we want to terminate the episode. On the other hand, if the agent is able to balance the pole
for a long time, we want to terminate the episode and start a new one so that the agent can learn to balance the
pole from a different starting configuration.
The :class:`managers.TerminationsCfg` configures what constitutes for an episode to terminate. In this example,
we want the task to terminate when either of the following conditions is met:
* **Episode Length** The episode length is greater than the defined max_episode_length
* **Cart out of bounds** The cart goes outside of the bounds [-3, 3]
The flag :attr:`managers.TerminationsCfg.time_out` specifies whether the term is a time-out (truncation) term
or terminated term. These are used to indicate the two types of terminations as described in `Gymnasium's documentation
<https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: TerminationsCfg
:linenos:
:lineno-start: 152
Defining commands
-----------------
For various goal-conditioned tasks, it is useful to specify the goals or commands for the agent. These are
handled through the :class:`managers.CommandManager`. The command manager handles resampling and updating the
commands at each step. It can also be used to provide the commands as an observation to the agent.
For this simple task, we do not use any commands. This is specified by using a command term with the
:class:`envs.mdp.NullCommandCfg` configuration. However, you can see an example of command definitions in the
locomotion or manipulation tasks.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: CommandsCfg
:linenos:
:lineno-start: 63
Defining curriculum
-------------------
Often times when training a learning agent, it helps to start with a simple task and gradually increase the
tasks's difficulty as the agent training progresses. This is the idea behind curriculum learning. In Orbit,
we provide a :class:`managers.CurriculumManager` class that can be used to define a curriculum for your environment.
In this tutorial we don't implement a curriculum for simplicity, but you can see an example of a
curriculum definition in the other locomotion or manipulation tasks.
We use a simple pass-through curriculum to define a curriculum manager that does not modify the environment.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: CurriculumCfg
:linenos:
:lineno-start: 165
Tying it all together
---------------------
With all the above components defined, we can now create the :class:`RLTaskEnvCfg` configuration for the
cartpole environment. This is similar to the :class:`BaseEnvCfg` defined in :ref:`tutorial-create-base-env`,
only with the added RL components explained in the above sections.
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: CartpoleEnvCfg
:linenos:
:lineno-start: 177
Running the simulation loop
---------------------------
Coming back to the ``cartpole_rl_env.py`` script, the simulation loop is similar to the previous tutorial.
The only difference is that we create an instance of :class:`envs.RLTaskEnv` instead of the
:class:`envs.BaseEnv`. Consequently, now the :meth:`envs.RLTaskEnv.step` method returns additional signals
such as the reward and termination status. The information dictionary also maintains logging of quantities
such as the reward contribution from individual terms, the termination status of each term, the episode length etc.
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
:language: python
:pyobject: main
:linenos:
:lineno-start: 44
The Code Execution
~~~~~~~~~~~~~~~~~~
Similar to the previous tutorial, we can run the environment by executing the ``cartpole_rl_env.py`` script.
.. code-block:: bash
./orbit.sh -p source/standalone/tutorials/04_envs/cartpole_rl_env.py --num_envs 32
This should open a similar simulation as in the previous tutorial. However, this time, the environment
returns more signals that specify the reward and termination status. Additionally, the individual
environments reset themselves when they terminate based on the termination criteria specified in the
configuration.
To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal
where you started the simulation.
In this tutorial, we learnt how to create a task environment for reinforcement learning. We do this
by extending the base environment to include the rewards, terminations, commands and curriculum terms.
We also learnt how to use the :class:`envs.RLTaskEnv` class to run the environment and receive various
signals from it.
While it is possible to manually create an instance of :class:`envs.RLTaskEnv` class for a desired task,
this is not scalable as it requires specialized scripts for each task. Thus, we exploit the
:meth:`gymnasium.make` function to create the environment with the gym interface. We will learn how to do this
in the next tutorial.
Training with an RL Agent
=========================
.. currentmodule:: omni.isaac.orbit
In the previous tutorials, we covered how to define an RL task environment, register
it into the ``gym`` registry, and interact with it using a random agent. We now move
on to the next step: training an RL agent to solve the task.
Although the :class:`envs.RLTaskEnv` conforms to the :class:`gymnasium.Env` interface,
it is not exactly a ``gym`` environment. The input and outputs of the environment are
not numpy arrays, but rather based on torch tensors with the first dimension being the
number of environment instances.
Additionally, most RL libraries expect their own variation of an environment interface.
For example, `Stable-Baselines3`_ expects the environment to conform to its
`VecEnv API`_ which expects a list of numpy arrays instead of a single tensor. Similarly,
`RSL-RL`_ and `RL-Games`_ expect a different interface. Since there is no one-size-fits-all
solution, we do not base the :class:`envs.RLTaskEnv` on any particular learning library.
Instead, we implement wrappers to convert the environment into the expected interface.
These are specified in the :mod:`omni.isaac.orbit_tasks.utils.wrappers` module.
In this tutorial, we will use `Stable-Baselines3`_ to train an RL agent to solve the
cartpole balancing task.
.. caution::
Wrapping the environment with the respective learning framework's wrapper should happen in the end,
i.e. after all other wrappers have been applied. This is because the learning framework's wrapper
modifies the interpretation of environment's APIs which may no longer be compatible with :class:`gymnasium.Env`.
The Code
--------
For this tutorial, we use the training script from `Stable-Baselines3`_ workflow in the
``orbit/source/standalone/workflows/sb3`` directory.
.. dropdown:: Code for train.py
:icon: code
.. literalinclude:: ../../../../source/standalone/workflows/sb3/train.py
:language: python
:emphasize-lines: 61, 69, 74-76, 96-110, 125-126, 114-123
:linenos:
The Code Explained
------------------
.. currentmodule:: omni.isaac.orbit_tasks.utils
Most of the code above is boilerplate code to create logging directories, saving the parsed configurations,
and setting up different Stable-Baselines3 components. For this tutorial, the important part is creating
the environment and wrapping it with the Stable-Baselines3 wrapper.
There are three wrappers used in the code above:
1. :class:`gymnasium.wrappers.RecordVideo`: This wrapper records a video of the environment
and saves it to the specified directory. This is useful for visualizing the agent's behavior
during training.
2. :class:`wrappers.sb3.Sb3VecEnvWrapper`: This wrapper converts the environment
into a Stable-Baselines3 compatible environment.
3. `stable_baselines3.common.vec_env.VecNormalize`_: This wrapper normalizes the
environment's observations and rewards.
Each of these wrappers wrap around the previous wrapper by following ``env = wrapper(env, *args, **kwargs)``
repeatedly. The final environment is then used to train the agent. For more information on how these
wrappers work, please refer to the :ref:`how-to-env-wrappers` documentation.
The Code Execution
------------------
We train a PPO agent from Stable-Baselines3 to solve the cartpole balancing task.
Training the agent
~~~~~~~~~~~~~~~~~~
There are three main ways to train the agent. Each of them has their own advantages and disadvantages.
It is up to you to decide which one you prefer based on your use case.
Headless execution
""""""""""""""""""
If the ``--headless`` flag is set, the simulation is not rendered during training. This is useful
when training on a remote server or when you do not want to see the simulation. Typically, it speeds
up the training process since only physics simulation step is performed.
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless
Headless execution with off-screen render
"""""""""""""""""""""""""""""""""""""""""
Since the above command does not render the simulation, it is not possible to visualize the agent's
behavior during training. To visualize the agent's behavior, we pass the ``--offscreen_render`` which
enables off-screen rendering. Additionally, we pass the flag ``--video`` which records a video of the
agent's behavior during training.
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless --offscreen_render --video
The videos are saved to the ``logs/sb3/Isaac-Cartpole-v0/<run-dir>/videos`` directory. You can open these videos
using any video player.
Interactive execution
"""""""""""""""""""""
.. currentmodule:: omni.isaac.orbit
While the above two methods are useful for training the agent, they don't allow you to interact with the
simulation to see what is happening. In this case, you can ignore the ``--headless`` flag and run the
training script as follows:
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64
This will open the Isaac Sim window and you can see the agent training in the environment. However, this
will slow down the training process since the simulation is rendered on the screen. As a workaround, you
can switch between different render modes in the ``"Orbit"`` window that is docked on the bottom-right
corner of the screen. To learn more about these render modes, please check the
:class:`sim.SimulationContext.RenderMode` class.
Viewing the logs
~~~~~~~~~~~~~~~~
On a separate terminal, you can monitor the training progress by executing the following command:
.. code:: bash
# execute from the root directory of the repository
./orbit.sh -p -m tensorboard.main --logdir logs/sb3/Isaac-Cartpole-v0
Playing the trained agent
~~~~~~~~~~~~~~~~~~~~~~~~~
Once the training is complete, you can visualize the trained agent by executing the following command:
.. code:: bash
# execute from the root directory of the repository
./orbit.sh -p source/standalone/workflows/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32
By default, the above command will load the latest checkpoint from the ``logs/sb3/Isaac-Cartpole-v0``
directory. You can also specify a specific checkpoint by passing the ``--checkpoint`` flag.
.. _Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
.. _VecEnv API: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecenv-api-vs-gym-api
.. _`stable_baselines3.common.vec_env.VecNormalize`: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize
.. _RL-Games: https://github.com/Denys88/rl_games
.. _RSL-RL: https://github.com/leggedrobotics/rsl_rl
......@@ -74,10 +74,10 @@ class SimulationContext(_SimulationContext):
events) are updated. There are three main components that can be updated when the simulation is rendered:
1. **UI elements and other extensions**: These are UI elements (such as buttons, sliders, etc.) and other
extensions that are running in the background that need to be updated when the simulation is running.
extensions that are running in the background that need to be updated when the simulation is running.
2. **Cameras**: These are typically based on Hydra textures and are used to render the scene from different
viewpoints. They can be attached to a viewport or be used independently to render the scene.
3. **`Viewports`_**: These are windows where you can see the rendered scene.
3. **`Viewports`**: These are windows where you can see the rendered scene.
Updating each of the above components has a different overhead. For example, updating the viewports is
computationally expensive compared to updating the UI elements. Therefore, it is useful to be able to
......
......@@ -60,7 +60,6 @@ class CartpoleSceneCfg(InteractiveSceneCfg):
##
# Actions configuration
@configclass
class CommandsCfg:
"""Command terms for the MDP."""
......@@ -76,7 +75,6 @@ class ActionsCfg:
joint_effort = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=100.0)
# Observations configuration
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
......@@ -97,7 +95,6 @@ class ObservationsCfg:
policy: PolicyCfg = PolicyCfg()
# Randomization configuration
@configclass
class RandomizationCfg:
"""Configuration for randomization."""
......@@ -124,7 +121,6 @@ class RandomizationCfg:
)
# Rewards configuration
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
......@@ -153,7 +149,6 @@ class RewardsCfg:
)
# Terminations configuration
@configclass
class TerminationsCfg:
"""Termination terms for the MDP."""
......@@ -167,7 +162,6 @@ class TerminationsCfg:
)
# Curriculum configuration
@configclass
class CurriculumCfg:
"""Configuration for the curriculum."""
......
......@@ -4,7 +4,8 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates how to use the environment concept that combines a scene with an action, observation and randomization manager.
This script demonstrates how to create a simple environment with a cartpole. It combines the concepts of
scene, action, observation and randomization managers to create an environment.
"""
from __future__ import annotations
......@@ -17,8 +18,8 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to spawn.")
parser = argparse.ArgumentParser(description="This script demonstrates a simple cartpole environment.")
parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
......@@ -37,94 +38,26 @@ import traceback
import carb
import omni.isaac.orbit.envs.mdp as mdp
from omni.isaac.orbit.assets import RigidObject
from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
from omni.isaac.orbit.utils import configclass
from omni.isaac.orbit_tasks.classic.cartpole import CartpoleSceneCfg
# Cartpole Action Configuration
class CartpoleActionTerm(ActionTerm):
_asset: RigidObject
"""The articulation asset on which the action term is applied."""
def __init__(self, cfg, env: BaseEnv):
super().__init__(cfg, env)
self._raw_actions = torch.zeros(env.num_envs, 1, device=self.device)
self._processed_actions = torch.zeros(env.num_envs, 1, device=self.device)
# gains of controller
self.p_gain = 1500.0
self.d_gain = 10.0
# extract the joint id of the slider_to_cart joint
joint_ids, _ = self._asset.find_joints(["slider_to_cart", "cart_to_pole"])
self.slider_to_cart_joint_id = joint_ids[0]
self.cart_to_pole_joint_id = joint_ids[1]
"""
Properties.
"""
@property
def action_dim(self) -> int:
return self._raw_actions.shape[1]
@property
def raw_actions(self) -> torch.Tensor:
return self._raw_actions
@property
def processed_actions(self) -> torch.Tensor:
return self._processed_actions
"""
Operations
"""
def process_actions(self, actions: torch.Tensor):
# store the raw actions
self._raw_actions[:] = actions
joint_pos = (
self._asset.data.joint_pos[:, self.cart_to_pole_joint_id]
- self._asset.data.default_joint_pos[:, self.cart_to_pole_joint_id]
)
joint_vel = (
self._asset.data.joint_vel[:, self.cart_to_pole_joint_id]
- self._asset.data.default_joint_vel[:, self.cart_to_pole_joint_id]
)
self._processed_actions[:] = self.p_gain * (actions - joint_pos) - self.d_gain * joint_vel
def apply_actions(self):
# set slider joint target
self._asset.set_joint_effort_target(self.processed_actions, joint_ids=[self.slider_to_cart_joint_id])
@configclass
class CartpoleActionTermCfg(ActionTermCfg):
class_type: CartpoleActionTerm = CartpoleActionTerm
from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleSceneCfg
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
"""Action specifications for the environment."""
joint_pos = CartpoleActionTermCfg(asset_name="robot")
joint_efforts = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=5.0)
# Cartpole Observation Configuration
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
"""Observation specifications for the environment."""
@configclass
class PolicyCfg(ObsGroup):
......@@ -142,14 +75,21 @@ class ObservationsCfg:
policy: PolicyCfg = PolicyCfg()
# Cartpole Randomization Configuration
@configclass
class RandomizationCfg:
"""Configuration for randomization."""
# reset
# on startup
add_pole_mass = RandTerm(
func=mdp.add_body_mass,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=["pole"]),
"mass_range": (0.1, 0.5),
},
)
# on reset
reset_cart_position = RandTerm(
func=mdp.reset_joints_by_offset,
mode="reset",
......@@ -171,57 +111,57 @@ class RandomizationCfg:
)
# Cartpole Environment Configuration
@configclass
class CartpoleEnvCfg(BaseEnvCfg):
"""Configuration for the locomotion velocity-tracking environment."""
"""Configuration for the cartpole environment."""
# Scene settings
scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=False)
scene = CartpoleSceneCfg(num_envs=1024, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
randomization: RandomizationCfg = RandomizationCfg()
observations = ObservationsCfg()
actions = ActionsCfg()
randomization = RandomizationCfg()
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 4
self.episode_length_s = 20.0
# viewer settings
self.viewer.eye = [4.5, 0.0, 6.0]
self.viewer.lookat = [0.0, 0.0, 2.0]
# step settings
self.decimation = 4 # env step every 4 sim steps: 200Hz / 4 = 50Hz
# simulation settings
self.sim.dt = 0.005
self.sim.disable_contact_processing = True
# Main
self.sim.dt = 0.005 # sim step every 5ms: 200Hz
def main():
"""Main function."""
# parse the arguments
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
# setup base environment
env = BaseEnv(cfg=CartpoleEnvCfg())
obs = env.reset()
target_position = torch.zeros(env.num_envs, 1, device=env.device)
env = BaseEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
# reset
if count % 300 == 0:
env.reset()
count = 0
# step env
obs, _ = env.step(target_position)
# print current orientation of pole
print(obs["policy"][0][1].item())
# update counter
count += 1
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, _ = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
......
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates how to call the RL environment for the cartpole balancing task.
"""
from __future__ import annotations
"""Launch Isaac Sim Simulator first."""
import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates the RL environment for cartpole balancing.")
parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()
# launch omniverse app
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import torch
import traceback
import carb
from omni.isaac.orbit.envs import RLTaskEnv
from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleEnvCfg
def main():
"""Main function."""
# parse the arguments
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
# setup RL environment
env = RLTaskEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, rew, terminated, truncated, info = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
# run the main execution
main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
......@@ -4,8 +4,17 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates the base environment concept that combines a scene with an action,
observation and randomization manager for a floating cube.
This script creates a simple environment with a floating cube. The cube is controlled by a PD
controller to track an arbitrary target position.
While going through this tutorial, we recommend you to pay attention to how a custom action term
is defined. The action term is responsible for processing the raw actions and applying them to the
scene entities. The rest of the environment is similar to the previous tutorials.
.. code-block:: bash
# Run the script
./orbit.sh -p source/standalone/tutorials/04_envs/floating_cube.py --num_envs 32
"""
from __future__ import annotations
......@@ -18,7 +27,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser = argparse.ArgumentParser(description="This script demonstrates base environment with a floating cube.")
parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")
# append AppLauncher cli args
......@@ -31,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import torch
import traceback
......@@ -40,59 +50,41 @@ import omni.isaac.orbit.envs.mdp as mdp
import omni.isaac.orbit.sim as sim_utils
from omni.isaac.orbit.assets import AssetBaseCfg, RigidObject, RigidObjectCfg
from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
from omni.isaac.orbit.managers import ActionTerm, ActionTermCfg
from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
from omni.isaac.orbit.scene import InteractiveSceneCfg
from omni.isaac.orbit.terrains import TerrainImporterCfg
from omni.isaac.orbit.utils import configclass
##
# Scene definition
# Custom action term
##
@configclass
class MySceneCfg(InteractiveSceneCfg):
"""Example scene configuration."""
# add terrain
terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
# add cube
cube: RigidObjectCfg = RigidObjectCfg(
prim_path="{ENV_REGEX_NS}/cube",
spawn=sim_utils.CuboidCfg(
size=(0.2, 0.2, 0.2),
rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0),
mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
physics_material=sim_utils.RigidBodyMaterialCfg(),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
),
init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
)
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
)
class CubeActionTerm(ActionTerm):
"""Simple action term that implements a PD controller to track a target position.
##
# Action Term
##
The action term is applied to the cube asset. It involves two steps:
1. **Process the raw actions**: Typically, this includes any transformations of the raw actions
that are required to map them to the desired space. This is called once per environment step.
2. **Apply the processed actions**: This step applies the processed actions to the asset.
It is called once per simulation step.
class CubeActionTerm(ActionTerm):
"""Simple action term that implements a PD controller to track a target position."""
In this case, the action term simply applies the raw actions to the cube asset. The raw actions
are the desired target positions of the cube in the environment frame. The pre-processing step
simply copies the raw actions to the processed actions as no additional processing is required.
The processed actions are then applied to the cube asset by implementing a PD controller to
track the target position.
"""
_asset: RigidObject
"""The articulation asset on which the action term is applied."""
def __init__(self, cfg: ActionTermCfg, env: BaseEnv):
def __init__(self, cfg: CubeActionTermCfg, env: BaseEnv):
# call super constructor
super().__init__(cfg, env)
# create buffers
......@@ -100,8 +92,8 @@ class CubeActionTerm(ActionTerm):
self._processed_actions = torch.zeros(env.num_envs, 3, device=self.device)
self._vel_command = torch.zeros(self.num_envs, 6, device=self.device)
# gains of controller
self.p_gain = 5.0
self.d_gain = 0.5
self.p_gain = cfg.p_gain
self.d_gain = cfg.d_gain
"""
Properties.
......@@ -113,7 +105,6 @@ class CubeActionTerm(ActionTerm):
@property
def raw_actions(self) -> torch.Tensor:
# desired: (x, y, z)
return self._raw_actions
@property
......@@ -144,10 +135,16 @@ class CubeActionTermCfg(ActionTermCfg):
"""Configuration for the cube action term."""
class_type: type = CubeActionTerm
"""The class corresponding to the action term."""
p_gain: float = 5.0
"""Proportional gain of the PD controller."""
d_gain: float = 0.5
"""Derivative gain of the PD controller."""
##
# Observation Term
# Custom observation term
##
......@@ -158,6 +155,41 @@ def base_position(env: BaseEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
return asset.data.root_pos_w - env.scene.env_origins
##
# Scene definition
##
@configclass
class MySceneCfg(InteractiveSceneCfg):
"""Example scene configuration.
The scene comprises of a ground plane, light source and floating cubes (gravity disabled).
"""
# add terrain
terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
# add cube
cube: RigidObjectCfg = RigidObjectCfg(
prim_path="{ENV_REGEX_NS}/cube",
spawn=sim_utils.CuboidCfg(
size=(0.2, 0.2, 0.2),
rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0, disable_gravity=True),
mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
physics_material=sim_utils.RigidBodyMaterialCfg(),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
),
init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
)
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
)
##
# Environment settings
##
......@@ -218,7 +250,7 @@ class CubeEnvCfg(BaseEnvCfg):
"""Configuration for the locomotion velocity-tracking environment."""
# Scene settings
scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=True)
scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
......@@ -247,13 +279,15 @@ def main():
# simulate physics
count = 0
obs, _ = env.reset()
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
env.reset()
count = 0
obs, _ = env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# step env
obs, _ = env.step(target_position)
# print mean squared position error between target and current position
......@@ -262,6 +296,9 @@ def main():
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
......
......@@ -4,11 +4,17 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates the environment concept that combines a scene with an action,
observation and randomization manager for a quadruped robot.
This script demonstrates the environment for a quadruped robot with height-scan sensor.
In this example, we use a locomotion policy to control the robot. The robot is commanded to
move forward at a constant velocity. The height-scan sensor is used to detect the height of
the terrain.
.. code-block:: bash
# Run the script
./orbit.sh -p source/standalone/tutorials/04_envs/quadruped_base_env.py --num_envs 32
A locomotion policy is loaded and used to control the robot. This shows how to use the
environment with a policy.
"""
from __future__ import annotations
......@@ -21,7 +27,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser = argparse.ArgumentParser(description="This script demonstrates a quadrupedal locomotion environment.")
parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")
# append AppLauncher cli args
......@@ -34,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import os
import torch
import traceback
......@@ -62,6 +69,16 @@ from omni.isaac.orbit.utils.noise import AdditiveUniformNoiseCfg as Unoise
from omni.isaac.orbit.terrains.config.rough import ROUGH_TERRAINS_CFG # isort: skip
##
# Custom observation terms
##
def constant_commands(env: BaseEnv) -> torch.Tensor:
"""The generated command from the command generator."""
return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
##
# Scene definition
##
......@@ -112,11 +129,6 @@ class MySceneCfg(InteractiveSceneCfg):
##
def constant_commands(env: BaseEnv) -> torch.Tensor:
"""The generated command from the command generator."""
return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
......@@ -162,21 +174,7 @@ class ObservationsCfg:
class RandomizationCfg:
"""Configuration for randomization."""
reset_base = RandTerm(
func=mdp.reset_root_state_uniform,
mode="reset",
params={
"pose_range": {"x": (-0.5, 0.5), "y": (-0.5, 0.5), "yaw": (-3.14, 3.14)},
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"z": (-0.5, 0.5),
"roll": (-0.5, 0.5),
"pitch": (-0.5, 0.5),
"yaw": (-0.5, 0.5),
},
},
)
reset_scene = RandTerm(func=mdp.reset_scene_to_default, mode="reset")
##
......@@ -198,22 +196,21 @@ class QuadrupedEnvCfg(BaseEnvCfg):
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 4
self.episode_length_s = 20.0
self.decimation = 4 # env decimation -> 50 Hz control
# simulation settings
self.sim.dt = 0.005
self.sim.dt = 0.005 # simulation timestep -> 200 Hz physics
self.sim.physics_material = self.scene.terrain.physics_material
# update sensor update periods
# we tick all the sensors based on the smallest update period (physics update period)
if self.scene.height_scanner is not None:
self.scene.height_scanner.update_period = self.decimation * self.sim.dt
self.scene.height_scanner.update_period = self.decimation * self.sim.dt # 50 Hz
def main():
"""Main function."""
# setup base environment
env = BaseEnv(cfg=QuadrupedEnvCfg())
obs, _ = env.reset()
env_cfg = QuadrupedEnvCfg()
env = BaseEnv(cfg=env_cfg)
# load level policy
policy_path = os.path.join(ISAAC_ORBIT_NUCLEUS_DIR, "Policies", "ANYmal-C", "policy.pt")
......@@ -221,27 +218,29 @@ def main():
if not check_file_path(policy_path):
raise FileNotFoundError(f"Policy file '{policy_path}' does not exist.")
# jit load the policy
locomotion_policy = torch.jit.load(policy_path)
locomotion_policy.to(env.device)
locomotion_policy.eval()
policy = torch.jit.load(policy_path).to(env.device).eval()
# simulate physics
count = 0
obs, _ = env.reset()
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 1000 == 0:
obs, _ = env.reset()
count = 0
print("[INFO]: Resetting robots state...")
print("-" * 80)
print("[INFO]: Resetting environment...")
# infer action
action = locomotion_policy(obs["policy"])
action = policy(obs["policy"])
# step env
obs, _ = env.step(action)
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
......
......@@ -45,6 +45,7 @@ else:
# launch omniverse app
app_launcher = AppLauncher(args_cli, experience=app_experience)
simulation_app = app_launcher.app
"""Rest everything follows."""
......@@ -93,7 +94,7 @@ def main():
n_timesteps = agent_cfg.pop("n_timesteps")
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
# wrap for video recording
if args_cli.video:
video_kwargs = {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment