Unverified Commit e49048f9 authored by Mayank Mittal's avatar Mayank Mittal Committed by GitHub

Adapts tutorial on base and RL environment (#283)

# Description

This MR adapts the environment tutorials. It reorganizes the tutorials
for them and also modify the content to make them more complete.

## Type of change

- This change requires a documentation update

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent eb75c536
......@@ -62,10 +62,10 @@ For more information about the framework, please refer to the `paper <https://ar
:maxdepth: 1
:caption: Tutorials (Environments)
source/tutorials/03_envs/00_gym_env
source/tutorials/03_envs/01_create_base_env
source/tutorials/03_envs/02_create_rl_env
source/tutorials/03_envs/03_wrappers
source/tutorials/03_envs/base_env
source/tutorials/03_envs/rl_env
source/tutorials/03_envs/gym_registry
source/tutorials/03_envs/rl_training
.. toctree::
......
......@@ -23,7 +23,7 @@ the rail. The attached pole has 1 DOF that allows it to rotate freely.
.. TODO: Add isaac sim screenshot and replace GIF with a webdb
In :ref:`creating-base-env` participants will learn to control the
In :ref:`tutorial-create-base-env` participants will learn to control the
pole to stabilize the cart, but this tutorial focuses on merely constructing
the :class:`ArticulationCfg` that defines the cartpole.
......
.. _how-to-env-wrappers:
Using environment wrappers
==========================
......
......@@ -48,3 +48,12 @@ The `control` section focuses on how to implement controllers within Orbit.
Markers <04_controllers/ik_controller>
Please refer to the individual guides in each section for detailed instructions and examples.
Gym
---
.. toctree::
:maxdepth: 1
Gym Wrappers <05_gym/wrappers>
Launching Isaac Sim from AppLauncher
==============================
====================================
.. currentmodule:: omni.isaac.orbit
......
Running an RL environment
=========================
In this tutorial, we will learn how to run existing learning environments provided in the ``omni.isaac.orbit_tasks``
extension. All the environments included in Orbit follow the :class:`gymnasium.Env` interface, which means that they can be used
with any reinforcement learning framework that supports OpenAI Gym. However, since the environments are implemented
in a vectorized fashion, they can only be used with frameworks that support vectorized environments.
Many common frameworks come with their own desired definitions of a vectorized environment and require the returned data
to follow their supported data types and data structures. For example, ``stable-baselines3`` uses ``numpy`` arrays, while
``rsl-rl``, ``rl-games``, or ``skrl`` use ``torch.Tensor``. We provide wrappers for these different frameworks, which can be found
in the ``omni.isaac.orbit_tasks.utils.wrappers`` module.
The Code
~~~~~~~~
The tutorial corresponds to the ``zero_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:emphasize-lines: 34-35,41-44,49-55
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
Using gym registry for environments
-----------------------------------
All environments are registered using the ``gym`` registry, which means that you can create an instance of
an environment by calling ``gym.make``. The environments are registered in the ``__init__.py`` file of the
``omni.isaac.orbit_tasks`` extension with the following syntax:
.. code-block:: python
# Cartpole environment
gym.register(
id="Isaac-Cartpole-v0",
entry_point="omni.isaac.orbit_tasks.classic.cartpole:CartpoleEnv",
disable_env_checker=True,
kwargs={"env_cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
)
The ``env_cfg_entry_point`` argument is used to load the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_cfg_from_registry` function.
The configuration entry point can correspond to both a YAML file or a python configuration
class. The default configuration can be overridden by passing a custom configuration instance to the ``gym.make``
function as shown later in the tutorial.
To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks`` extension,
we must import the module at the start of the script.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 33-35
:linenos:
:lineno-start: 33
.. note::
As a convention, we name all the environments in ``omni.isaac.orbit_tasks`` extension with the prefix ``Isaac-``.
For more complicated environments, we follow the pattern: ``Isaac-<TaskName>-<RobotName>-v<N>``,
where `N` is used to specify different observations or action spaces within the same task definition. For example,
for legged locomotion with ANYmal C, the environment is called ``Isaac-Velocity-Anymal-C-v0``.
In this tutorial, the task name is read from the command line. The task name is used to load the default configuration
as well as to create the environment instance. In addition, other parsed command line arguments such as the
number of environments, the simulation device, and whether to render, are used to override the default configuration.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 42-45
:linenos:
:lineno-start: 42
Running the environment
-----------------------
Once creating the environment, the rest of the execution follows the standard resetting and stepping.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 45-55
:linenos:
:lineno-start: 45
Similar to previous tutorials, to ensure a safe exit when running the script, we need to add checks
for whether the simulation is stopped or not.
.. literalinclude:: ../../../../source/standalone/environments/zero_agent.py
:language: python
:lines: 57-59
:linenos:
:lineno-start: 57
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the script and see the result:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/zero_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with a ground plane, lights and 32 cartpoles spawned in a grid. The cartpole
would be falling down since no actions are acting on them. To stop the simulation,
you can either close the window, or press the ``STOP`` button in the UI, or press ``Ctrl+C``
in the terminal.
.. note::
When running environments with GPU pipeline, the states in the scene are not synced with the USD
interface. Therefore values in the UI may appear wrong when simulation is running. Although objects
may be updating in the Viewport, attribute values in the UI will not update along with them.
To enable USD synchronization, please use the CPU pipeline with ``--cpu`` and disable flatcache by setting
``use_flatcache`` to False in the environment configuration.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Registering an Environment
==========================
.. currentmodule:: omni.isaac.orbit
In the previous tutorial, we learned how to create a custom cartpole environment. We manually
created an instance of the environment by importing the environment class and its configuration
class.
.. dropdown:: Environment creation in the previous tutorial
:icon: code
.. literalinclude:: ../../../../source/standalone/tutorials/04_envs/cartpole_rl_env.py
:language: python
:lines: 39-50
While straightforward, this approach is not scalable as we have a large suite of environments.
In this tutorial, we will show how to use the :meth:`gymnasium.register` method to register
environments with the ``gymnasium`` registry. This allows us to create the environment through
the :meth:`gymnasium.make` function.
.. dropdown:: Environment creation in this tutorial
:icon: code
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 40-50
The Code
~~~~~~~~
The tutorial corresponds to the ``random_agent.py`` script in the ``orbit/source/standalone/environments`` directory.
.. dropdown:: Code for random_agent.py
:icon: code
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:emphasize-lines: 40-42, 47-50
:linenos:
The Code Explained
~~~~~~~~~~~~~~~~~~
The :class:`envs.RLTaskEnv` class inherits from the :class:`gymnasium.Env` class to follow
a standard interface. However, unlike the traditional Gym environments, the :class:`envs.RLTaskEnv`
implements a *vectorized* environment. This means that multiple environment instances
are running simultaneously in the same process, and all the data is returned in a batched
fashion.
Using the gym registry
----------------------
To register an environment, we use the :meth:`gymnasium.register` method. This method takes
in the environment name, the entry point to the environment class, and the entry point to the
environment configuration class. For the cartpole environment, the following shows the registration
call in the ``omni.isaac.orbit_tasks.classic.cartpole`` sub-package:
.. literalinclude:: ../../../../source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/classic/cartpole/__init__.py
:language: python
:lines: 10-
:emphasize-lines: 11, 12, 15
The ``id`` argument is the name of the environment. As a convention, we name all the environments
with the prefix ``Isaac-`` to make it easier to search for them in the registry. The name of the
environment is typically followed by the name of the task, and then the name of the robot.
For instance, for legged locomotion with ANYmal C on flat terrain, the environment is called
``Isaac-Velocity-Flat-Anymal-C-v0``. The version number ``v<N>`` is typically used to specify different
variations of the same environment. Otherwise, the names of the environments can become too long
and difficult to read.
The ``entry_point`` argument is the entry point to the environment class. The entry point is a string
of the form ``<module>:<class>``. In the case of the cartpole environment, the entry point is
``omni.isaac.orbit.envs:RLTaskEnv``. The entry point is used to import the environment class
when creating the environment instance.
The ``env_cfg_entry_point`` argument specifies the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_env_cfg` function.
It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
The configuration entry point can be both a YAML file or a python configuration class.
.. note::
The ``gymnasium`` registry is a global registry. Hence, it is important to ensure that the
environment names are unique. Otherwise, the registry will throw an error when registering
the environment.
Creating the environment
------------------------
To inform the ``gym`` registry with all the environments provided by the ``omni.isaac.orbit_tasks``
extension, we must import the module at the start of the script. This will execute the ``__init__.py``
file which iterates over all the sub-packages and registers their respective environments.
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 40-41
:linenos:
:lineno-start: 40
In this tutorial, the task name is read from the command line. The task name is used to parse
the default configuration as well as to create the environment instance. In addition, other
parsed command line arguments such as the number of environments, the simulation device,
and whether to render, are used to override the default configuration.
.. literalinclude:: ../../../../source/standalone/environments/random_agent.py
:language: python
:lines: 47-50
:linenos:
:lineno-start: 47
Once creating the environment, the rest of the execution follows the standard resetting and stepping.
The Code Execution
~~~~~~~~~~~~~~~~~~
Now that we have gone through the code, let's run the script and see the result:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with everything similar to the previous :ref:`tutorial-create-rl-env` tutorial.
To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal.
In addition, you can also change the simulation device from GPU to CPU by adding the ``--cpu`` flag:
.. code-block:: bash
./orbit.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32 --cpu
With the ``--cpu`` flag, the simulation will run on the CPU. This is useful for debugging the simulation.
However, the simulation will run much slower than on the GPU.
This diff is collapsed.
Training with an RL Agent
=========================
.. currentmodule:: omni.isaac.orbit
In the previous tutorials, we covered how to define an RL task environment, register
it into the ``gym`` registry, and interact with it using a random agent. We now move
on to the next step: training an RL agent to solve the task.
Although the :class:`envs.RLTaskEnv` conforms to the :class:`gymnasium.Env` interface,
it is not exactly a ``gym`` environment. The input and outputs of the environment are
not numpy arrays, but rather based on torch tensors with the first dimension being the
number of environment instances.
Additionally, most RL libraries expect their own variation of an environment interface.
For example, `Stable-Baselines3`_ expects the environment to conform to its
`VecEnv API`_ which expects a list of numpy arrays instead of a single tensor. Similarly,
`RSL-RL`_ and `RL-Games`_ expect a different interface. Since there is no one-size-fits-all
solution, we do not base the :class:`envs.RLTaskEnv` on any particular learning library.
Instead, we implement wrappers to convert the environment into the expected interface.
These are specified in the :mod:`omni.isaac.orbit_tasks.utils.wrappers` module.
In this tutorial, we will use `Stable-Baselines3`_ to train an RL agent to solve the
cartpole balancing task.
.. caution::
Wrapping the environment with the respective learning framework's wrapper should happen in the end,
i.e. after all other wrappers have been applied. This is because the learning framework's wrapper
modifies the interpretation of environment's APIs which may no longer be compatible with :class:`gymnasium.Env`.
The Code
--------
For this tutorial, we use the training script from `Stable-Baselines3`_ workflow in the
``orbit/source/standalone/workflows/sb3`` directory.
.. dropdown:: Code for train.py
:icon: code
.. literalinclude:: ../../../../source/standalone/workflows/sb3/train.py
:language: python
:emphasize-lines: 61, 69, 74-76, 96-110, 125-126, 114-123
:linenos:
The Code Explained
------------------
.. currentmodule:: omni.isaac.orbit_tasks.utils
Most of the code above is boilerplate code to create logging directories, saving the parsed configurations,
and setting up different Stable-Baselines3 components. For this tutorial, the important part is creating
the environment and wrapping it with the Stable-Baselines3 wrapper.
There are three wrappers used in the code above:
1. :class:`gymnasium.wrappers.RecordVideo`: This wrapper records a video of the environment
and saves it to the specified directory. This is useful for visualizing the agent's behavior
during training.
2. :class:`wrappers.sb3.Sb3VecEnvWrapper`: This wrapper converts the environment
into a Stable-Baselines3 compatible environment.
3. `stable_baselines3.common.vec_env.VecNormalize`_: This wrapper normalizes the
environment's observations and rewards.
Each of these wrappers wrap around the previous wrapper by following ``env = wrapper(env, *args, **kwargs)``
repeatedly. The final environment is then used to train the agent. For more information on how these
wrappers work, please refer to the :ref:`how-to-env-wrappers` documentation.
The Code Execution
------------------
We train a PPO agent from Stable-Baselines3 to solve the cartpole balancing task.
Training the agent
~~~~~~~~~~~~~~~~~~
There are three main ways to train the agent. Each of them has their own advantages and disadvantages.
It is up to you to decide which one you prefer based on your use case.
Headless execution
""""""""""""""""""
If the ``--headless`` flag is set, the simulation is not rendered during training. This is useful
when training on a remote server or when you do not want to see the simulation. Typically, it speeds
up the training process since only physics simulation step is performed.
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless
Headless execution with off-screen render
"""""""""""""""""""""""""""""""""""""""""
Since the above command does not render the simulation, it is not possible to visualize the agent's
behavior during training. To visualize the agent's behavior, we pass the ``--offscreen_render`` which
enables off-screen rendering. Additionally, we pass the flag ``--video`` which records a video of the
agent's behavior during training.
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64 --headless --offscreen_render --video
The videos are saved to the ``logs/sb3/Isaac-Cartpole-v0/<run-dir>/videos`` directory. You can open these videos
using any video player.
Interactive execution
"""""""""""""""""""""
.. currentmodule:: omni.isaac.orbit
While the above two methods are useful for training the agent, they don't allow you to interact with the
simulation to see what is happening. In this case, you can ignore the ``--headless`` flag and run the
training script as follows:
.. code-block:: bash
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --num_envs 64
This will open the Isaac Sim window and you can see the agent training in the environment. However, this
will slow down the training process since the simulation is rendered on the screen. As a workaround, you
can switch between different render modes in the ``"Orbit"`` window that is docked on the bottom-right
corner of the screen. To learn more about these render modes, please check the
:class:`sim.SimulationContext.RenderMode` class.
Viewing the logs
~~~~~~~~~~~~~~~~
On a separate terminal, you can monitor the training progress by executing the following command:
.. code:: bash
# execute from the root directory of the repository
./orbit.sh -p -m tensorboard.main --logdir logs/sb3/Isaac-Cartpole-v0
Playing the trained agent
~~~~~~~~~~~~~~~~~~~~~~~~~
Once the training is complete, you can visualize the trained agent by executing the following command:
.. code:: bash
# execute from the root directory of the repository
./orbit.sh -p source/standalone/workflows/sb3/play.py --task Isaac-Cartpole-v0 --num_envs 32
By default, the above command will load the latest checkpoint from the ``logs/sb3/Isaac-Cartpole-v0``
directory. You can also specify a specific checkpoint by passing the ``--checkpoint`` flag.
.. _Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
.. _VecEnv API: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecenv-api-vs-gym-api
.. _`stable_baselines3.common.vec_env.VecNormalize`: https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html#vecnormalize
.. _RL-Games: https://github.com/Denys88/rl_games
.. _RSL-RL: https://github.com/leggedrobotics/rsl_rl
......@@ -74,10 +74,10 @@ class SimulationContext(_SimulationContext):
events) are updated. There are three main components that can be updated when the simulation is rendered:
1. **UI elements and other extensions**: These are UI elements (such as buttons, sliders, etc.) and other
extensions that are running in the background that need to be updated when the simulation is running.
extensions that are running in the background that need to be updated when the simulation is running.
2. **Cameras**: These are typically based on Hydra textures and are used to render the scene from different
viewpoints. They can be attached to a viewport or be used independently to render the scene.
3. **`Viewports`_**: These are windows where you can see the rendered scene.
3. **`Viewports`**: These are windows where you can see the rendered scene.
Updating each of the above components has a different overhead. For example, updating the viewports is
computationally expensive compared to updating the UI elements. Therefore, it is useful to be able to
......
......@@ -60,7 +60,6 @@ class CartpoleSceneCfg(InteractiveSceneCfg):
##
# Actions configuration
@configclass
class CommandsCfg:
"""Command terms for the MDP."""
......@@ -76,7 +75,6 @@ class ActionsCfg:
joint_effort = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=100.0)
# Observations configuration
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
......@@ -97,7 +95,6 @@ class ObservationsCfg:
policy: PolicyCfg = PolicyCfg()
# Randomization configuration
@configclass
class RandomizationCfg:
"""Configuration for randomization."""
......@@ -124,7 +121,6 @@ class RandomizationCfg:
)
# Rewards configuration
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
......@@ -153,7 +149,6 @@ class RewardsCfg:
)
# Terminations configuration
@configclass
class TerminationsCfg:
"""Termination terms for the MDP."""
......@@ -167,7 +162,6 @@ class TerminationsCfg:
)
# Curriculum configuration
@configclass
class CurriculumCfg:
"""Configuration for the curriculum."""
......
......@@ -4,7 +4,8 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates how to use the environment concept that combines a scene with an action, observation and randomization manager.
This script demonstrates how to create a simple environment with a cartpole. It combines the concepts of
scene, action, observation and randomization managers to create an environment.
"""
from __future__ import annotations
......@@ -17,8 +18,8 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to spawn.")
parser = argparse.ArgumentParser(description="This script demonstrates a simple cartpole environment.")
parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
......@@ -37,94 +38,26 @@ import traceback
import carb
import omni.isaac.orbit.envs.mdp as mdp
from omni.isaac.orbit.assets import RigidObject
from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
from omni.isaac.orbit.utils import configclass
from omni.isaac.orbit_tasks.classic.cartpole import CartpoleSceneCfg
# Cartpole Action Configuration
class CartpoleActionTerm(ActionTerm):
_asset: RigidObject
"""The articulation asset on which the action term is applied."""
def __init__(self, cfg, env: BaseEnv):
super().__init__(cfg, env)
self._raw_actions = torch.zeros(env.num_envs, 1, device=self.device)
self._processed_actions = torch.zeros(env.num_envs, 1, device=self.device)
# gains of controller
self.p_gain = 1500.0
self.d_gain = 10.0
# extract the joint id of the slider_to_cart joint
joint_ids, _ = self._asset.find_joints(["slider_to_cart", "cart_to_pole"])
self.slider_to_cart_joint_id = joint_ids[0]
self.cart_to_pole_joint_id = joint_ids[1]
"""
Properties.
"""
@property
def action_dim(self) -> int:
return self._raw_actions.shape[1]
@property
def raw_actions(self) -> torch.Tensor:
return self._raw_actions
@property
def processed_actions(self) -> torch.Tensor:
return self._processed_actions
"""
Operations
"""
def process_actions(self, actions: torch.Tensor):
# store the raw actions
self._raw_actions[:] = actions
joint_pos = (
self._asset.data.joint_pos[:, self.cart_to_pole_joint_id]
- self._asset.data.default_joint_pos[:, self.cart_to_pole_joint_id]
)
joint_vel = (
self._asset.data.joint_vel[:, self.cart_to_pole_joint_id]
- self._asset.data.default_joint_vel[:, self.cart_to_pole_joint_id]
)
self._processed_actions[:] = self.p_gain * (actions - joint_pos) - self.d_gain * joint_vel
def apply_actions(self):
# set slider joint target
self._asset.set_joint_effort_target(self.processed_actions, joint_ids=[self.slider_to_cart_joint_id])
@configclass
class CartpoleActionTermCfg(ActionTermCfg):
class_type: CartpoleActionTerm = CartpoleActionTerm
from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleSceneCfg
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
"""Action specifications for the environment."""
joint_pos = CartpoleActionTermCfg(asset_name="robot")
joint_efforts = mdp.JointEffortActionCfg(asset_name="robot", joint_names=["slider_to_cart"], scale=5.0)
# Cartpole Observation Configuration
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
"""Observation specifications for the environment."""
@configclass
class PolicyCfg(ObsGroup):
......@@ -142,14 +75,21 @@ class ObservationsCfg:
policy: PolicyCfg = PolicyCfg()
# Cartpole Randomization Configuration
@configclass
class RandomizationCfg:
"""Configuration for randomization."""
# reset
# on startup
add_pole_mass = RandTerm(
func=mdp.add_body_mass,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=["pole"]),
"mass_range": (0.1, 0.5),
},
)
# on reset
reset_cart_position = RandTerm(
func=mdp.reset_joints_by_offset,
mode="reset",
......@@ -171,57 +111,57 @@ class RandomizationCfg:
)
# Cartpole Environment Configuration
@configclass
class CartpoleEnvCfg(BaseEnvCfg):
"""Configuration for the locomotion velocity-tracking environment."""
"""Configuration for the cartpole environment."""
# Scene settings
scene: CartpoleSceneCfg = CartpoleSceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=False)
scene = CartpoleSceneCfg(num_envs=1024, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
randomization: RandomizationCfg = RandomizationCfg()
observations = ObservationsCfg()
actions = ActionsCfg()
randomization = RandomizationCfg()
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 4
self.episode_length_s = 20.0
# viewer settings
self.viewer.eye = [4.5, 0.0, 6.0]
self.viewer.lookat = [0.0, 0.0, 2.0]
# step settings
self.decimation = 4 # env step every 4 sim steps: 200Hz / 4 = 50Hz
# simulation settings
self.sim.dt = 0.005
self.sim.disable_contact_processing = True
# Main
self.sim.dt = 0.005 # sim step every 5ms: 200Hz
def main():
"""Main function."""
# parse the arguments
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
# setup base environment
env = BaseEnv(cfg=CartpoleEnvCfg())
obs = env.reset()
target_position = torch.zeros(env.num_envs, 1, device=env.device)
env = BaseEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
# reset
if count % 300 == 0:
env.reset()
count = 0
# step env
obs, _ = env.step(target_position)
# print current orientation of pole
print(obs["policy"][0][1].item())
# update counter
count += 1
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, _ = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
......
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates how to call the RL environment for the cartpole balancing task.
"""
from __future__ import annotations
"""Launch Isaac Sim Simulator first."""
import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates the RL environment for cartpole balancing.")
parser.add_argument("--num_envs", type=int, default=16, help="Number of environments to spawn.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()
# launch omniverse app
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import torch
import traceback
import carb
from omni.isaac.orbit.envs import RLTaskEnv
from omni.isaac.orbit_tasks.classic.cartpole.cartpole_env_cfg import CartpoleEnvCfg
def main():
"""Main function."""
# parse the arguments
env_cfg = CartpoleEnvCfg()
env_cfg.scene.num_envs = args_cli.num_envs
# setup RL environment
env = RLTaskEnv(cfg=env_cfg)
# simulate physics
count = 0
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
count = 0
env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# sample random actions
joint_efforts = torch.randn_like(env.action_manager.action)
# step the environment
obs, rew, terminated, truncated, info = env.step(joint_efforts)
# print current orientation of pole
print("[Env 0]: Pole joint: ", obs["policy"][0][1].item())
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
# run the main execution
main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
......@@ -4,8 +4,17 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates the base environment concept that combines a scene with an action,
observation and randomization manager for a floating cube.
This script creates a simple environment with a floating cube. The cube is controlled by a PD
controller to track an arbitrary target position.
While going through this tutorial, we recommend you to pay attention to how a custom action term
is defined. The action term is responsible for processing the raw actions and applying them to the
scene entities. The rest of the environment is similar to the previous tutorials.
.. code-block:: bash
# Run the script
./orbit.sh -p source/standalone/tutorials/04_envs/floating_cube.py --num_envs 32
"""
from __future__ import annotations
......@@ -18,7 +27,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser = argparse.ArgumentParser(description="This script demonstrates base environment with a floating cube.")
parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")
# append AppLauncher cli args
......@@ -31,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import torch
import traceback
......@@ -40,59 +50,41 @@ import omni.isaac.orbit.envs.mdp as mdp
import omni.isaac.orbit.sim as sim_utils
from omni.isaac.orbit.assets import AssetBaseCfg, RigidObject, RigidObjectCfg
from omni.isaac.orbit.envs import BaseEnv, BaseEnvCfg
from omni.isaac.orbit.managers import ActionTerm, ActionTermCfg
from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.managers.action_manager import ActionTerm, ActionTermCfg
from omni.isaac.orbit.scene import InteractiveSceneCfg
from omni.isaac.orbit.terrains import TerrainImporterCfg
from omni.isaac.orbit.utils import configclass
##
# Scene definition
# Custom action term
##
@configclass
class MySceneCfg(InteractiveSceneCfg):
"""Example scene configuration."""
# add terrain
terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
# add cube
cube: RigidObjectCfg = RigidObjectCfg(
prim_path="{ENV_REGEX_NS}/cube",
spawn=sim_utils.CuboidCfg(
size=(0.2, 0.2, 0.2),
rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0),
mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
physics_material=sim_utils.RigidBodyMaterialCfg(),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
),
init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
)
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
)
class CubeActionTerm(ActionTerm):
"""Simple action term that implements a PD controller to track a target position.
##
# Action Term
##
The action term is applied to the cube asset. It involves two steps:
1. **Process the raw actions**: Typically, this includes any transformations of the raw actions
that are required to map them to the desired space. This is called once per environment step.
2. **Apply the processed actions**: This step applies the processed actions to the asset.
It is called once per simulation step.
class CubeActionTerm(ActionTerm):
"""Simple action term that implements a PD controller to track a target position."""
In this case, the action term simply applies the raw actions to the cube asset. The raw actions
are the desired target positions of the cube in the environment frame. The pre-processing step
simply copies the raw actions to the processed actions as no additional processing is required.
The processed actions are then applied to the cube asset by implementing a PD controller to
track the target position.
"""
_asset: RigidObject
"""The articulation asset on which the action term is applied."""
def __init__(self, cfg: ActionTermCfg, env: BaseEnv):
def __init__(self, cfg: CubeActionTermCfg, env: BaseEnv):
# call super constructor
super().__init__(cfg, env)
# create buffers
......@@ -100,8 +92,8 @@ class CubeActionTerm(ActionTerm):
self._processed_actions = torch.zeros(env.num_envs, 3, device=self.device)
self._vel_command = torch.zeros(self.num_envs, 6, device=self.device)
# gains of controller
self.p_gain = 5.0
self.d_gain = 0.5
self.p_gain = cfg.p_gain
self.d_gain = cfg.d_gain
"""
Properties.
......@@ -113,7 +105,6 @@ class CubeActionTerm(ActionTerm):
@property
def raw_actions(self) -> torch.Tensor:
# desired: (x, y, z)
return self._raw_actions
@property
......@@ -144,10 +135,16 @@ class CubeActionTermCfg(ActionTermCfg):
"""Configuration for the cube action term."""
class_type: type = CubeActionTerm
"""The class corresponding to the action term."""
p_gain: float = 5.0
"""Proportional gain of the PD controller."""
d_gain: float = 0.5
"""Derivative gain of the PD controller."""
##
# Observation Term
# Custom observation term
##
......@@ -158,6 +155,41 @@ def base_position(env: BaseEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
return asset.data.root_pos_w - env.scene.env_origins
##
# Scene definition
##
@configclass
class MySceneCfg(InteractiveSceneCfg):
"""Example scene configuration.
The scene comprises of a ground plane, light source and floating cubes (gravity disabled).
"""
# add terrain
terrain = TerrainImporterCfg(prim_path="/World/ground", terrain_type="plane", debug_vis=False)
# add cube
cube: RigidObjectCfg = RigidObjectCfg(
prim_path="{ENV_REGEX_NS}/cube",
spawn=sim_utils.CuboidCfg(
size=(0.2, 0.2, 0.2),
rigid_props=sim_utils.RigidBodyPropertiesCfg(max_depenetration_velocity=1.0, disable_gravity=True),
mass_props=sim_utils.MassPropertiesCfg(mass=1.0),
physics_material=sim_utils.RigidBodyMaterialCfg(),
visual_material=sim_utils.PreviewSurfaceCfg(diffuse_color=(0.5, 0.0, 0.0)),
),
init_state=RigidObjectCfg.InitialStateCfg(pos=(0.0, 0.0, 5)),
)
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DistantLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
)
##
# Environment settings
##
......@@ -218,7 +250,7 @@ class CubeEnvCfg(BaseEnvCfg):
"""Configuration for the locomotion velocity-tracking environment."""
# Scene settings
scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5, replicate_physics=True)
scene: MySceneCfg = MySceneCfg(num_envs=args_cli.num_envs, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
......@@ -247,13 +279,15 @@ def main():
# simulate physics
count = 0
obs, _ = env.reset()
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 300 == 0:
env.reset()
count = 0
obs, _ = env.reset()
print("-" * 80)
print("[INFO]: Resetting environment...")
# step env
obs, _ = env.step(target_position)
# print mean squared position error between target and current position
......@@ -262,6 +296,9 @@ def main():
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
......
......@@ -4,11 +4,17 @@
# SPDX-License-Identifier: BSD-3-Clause
"""
This script demonstrates the environment concept that combines a scene with an action,
observation and randomization manager for a quadruped robot.
This script demonstrates the environment for a quadruped robot with height-scan sensor.
In this example, we use a locomotion policy to control the robot. The robot is commanded to
move forward at a constant velocity. The height-scan sensor is used to detect the height of
the terrain.
.. code-block:: bash
# Run the script
./orbit.sh -p source/standalone/tutorials/04_envs/quadruped_base_env.py --num_envs 32
A locomotion policy is loaded and used to control the robot. This shows how to use the
environment with a policy.
"""
from __future__ import annotations
......@@ -21,7 +27,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="This script demonstrates how to use the concept of an Environment.")
parser = argparse.ArgumentParser(description="This script demonstrates a quadrupedal locomotion environment.")
parser.add_argument("--num_envs", type=int, default=64, help="Number of environments to spawn.")
# append AppLauncher cli args
......@@ -34,6 +40,7 @@ app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import os
import torch
import traceback
......@@ -62,6 +69,16 @@ from omni.isaac.orbit.utils.noise import AdditiveUniformNoiseCfg as Unoise
from omni.isaac.orbit.terrains.config.rough import ROUGH_TERRAINS_CFG # isort: skip
##
# Custom observation terms
##
def constant_commands(env: BaseEnv) -> torch.Tensor:
"""The generated command from the command generator."""
return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
##
# Scene definition
##
......@@ -112,11 +129,6 @@ class MySceneCfg(InteractiveSceneCfg):
##
def constant_commands(env: BaseEnv) -> torch.Tensor:
"""The generated command from the command generator."""
return torch.tensor([[1, 0, 0]], device=env.device).repeat(env.num_envs, 1)
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
......@@ -162,21 +174,7 @@ class ObservationsCfg:
class RandomizationCfg:
"""Configuration for randomization."""
reset_base = RandTerm(
func=mdp.reset_root_state_uniform,
mode="reset",
params={
"pose_range": {"x": (-0.5, 0.5), "y": (-0.5, 0.5), "yaw": (-3.14, 3.14)},
"velocity_range": {
"x": (-0.5, 0.5),
"y": (-0.5, 0.5),
"z": (-0.5, 0.5),
"roll": (-0.5, 0.5),
"pitch": (-0.5, 0.5),
"yaw": (-0.5, 0.5),
},
},
)
reset_scene = RandTerm(func=mdp.reset_scene_to_default, mode="reset")
##
......@@ -198,22 +196,21 @@ class QuadrupedEnvCfg(BaseEnvCfg):
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 4
self.episode_length_s = 20.0
self.decimation = 4 # env decimation -> 50 Hz control
# simulation settings
self.sim.dt = 0.005
self.sim.dt = 0.005 # simulation timestep -> 200 Hz physics
self.sim.physics_material = self.scene.terrain.physics_material
# update sensor update periods
# we tick all the sensors based on the smallest update period (physics update period)
if self.scene.height_scanner is not None:
self.scene.height_scanner.update_period = self.decimation * self.sim.dt
self.scene.height_scanner.update_period = self.decimation * self.sim.dt # 50 Hz
def main():
"""Main function."""
# setup base environment
env = BaseEnv(cfg=QuadrupedEnvCfg())
obs, _ = env.reset()
env_cfg = QuadrupedEnvCfg()
env = BaseEnv(cfg=env_cfg)
# load level policy
policy_path = os.path.join(ISAAC_ORBIT_NUCLEUS_DIR, "Policies", "ANYmal-C", "policy.pt")
......@@ -221,27 +218,29 @@ def main():
if not check_file_path(policy_path):
raise FileNotFoundError(f"Policy file '{policy_path}' does not exist.")
# jit load the policy
locomotion_policy = torch.jit.load(policy_path)
locomotion_policy.to(env.device)
locomotion_policy.eval()
policy = torch.jit.load(policy_path).to(env.device).eval()
# simulate physics
count = 0
obs, _ = env.reset()
while simulation_app.is_running():
with torch.inference_mode():
# reset
if count % 1000 == 0:
obs, _ = env.reset()
count = 0
print("[INFO]: Resetting robots state...")
print("-" * 80)
print("[INFO]: Resetting environment...")
# infer action
action = locomotion_policy(obs["policy"])
action = policy(obs["policy"])
# step env
obs, _ = env.step(action)
# update counter
count += 1
# close the environment
env.close()
if __name__ == "__main__":
try:
......
......@@ -45,6 +45,7 @@ else:
# launch omniverse app
app_launcher = AppLauncher(args_cli, experience=app_experience)
simulation_app = app_launcher.app
"""Rest everything follows."""
......@@ -93,7 +94,7 @@ def main():
n_timesteps = agent_cfg.pop("n_timesteps")
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
# wrap for video recording
if args_cli.video:
video_kwargs = {
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment