Unverified Commit 3ef7e678 authored by Mayank Mittal's avatar Mayank Mittal Committed by GitHub

Cleans up the `omni.isaac.lab.envs` submodule (#548)

# Description

Earlier, it was unclear where the configuration classes and
corresponding classes belonged inside the `omni.isaac.lab.envs` module.
This MR reorganizes the module to ensure parity between the class and
its respective configuration class. The MR also fixes docstrings with
the hope of making things cleaner.

## Type of change

- Bug fix (non-breaking change which fixes an issue)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- This change requires a documentation update

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have run all the tests with `./isaaclab.sh --test` and they pass
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 59493b89
......@@ -38,7 +38,7 @@ repos:
- id: pyupgrade
args: ["--py310-plus"]
# FIXME: This is a hack because Pytorch does not like: torch.Tensor | dict aliasing
exclude: "source/extensions/omni.isaac.lab/omni/isaac/lab/envs/types.py"
exclude: "source/extensions/omni.isaac.lab/omni/isaac/lab/envs/common.py"
- repo: https://github.com/codespell-project/codespell
rev: v2.2.6
hooks:
......
......@@ -69,7 +69,7 @@ Table of Contents
:maxdepth: 2
:caption: Features
source/features/workflows
source/features/task_workflows
source/features/multi_gpu
source/features/tiled_rendering
source/features/environments
......
......@@ -16,11 +16,11 @@
ManagerBasedEnv
ManagerBasedEnvCfg
ViewerCfg
ManagerBasedRLEnv
ManagerBasedRLEnvCfg
DirectRLEnv
DirectRLEnvCfg
ViewerCfg
Manager Based Environment
-------------------------
......@@ -32,10 +32,6 @@ Manager Based Environment
:members:
:exclude-members: __init__, class_type
.. autoclass:: ViewerCfg
:members:
:exclude-members: __init__
Manager Based RL Environment
----------------------------
......@@ -63,3 +59,10 @@ Direct RL Environment
:inherited-members:
:show-inheritance:
:exclude-members: __init__, class_type
Common
------
.. autoclass:: ViewerCfg
:members:
:exclude-members: __init__
.. _feature-workflows:
Task Design Workflows
=====================
.. currentmodule:: omni.isaac.lab
Environments define the interface between the agent and the simulation. In the simplest case, the environment provides
the agent with the current observations and executes the actions provided by the agent. In a Markov Decision Process
(MDP) formulation, the environment can also provide additional information such as the current reward, done flag, and
information about the current episode.
While the environment interface is simple to understand, its implementation can vary significantly depending on the
complexity of the task. In the context of reinforcement learning (RL), the environment implementation can be broken down
into several components, such as the reward function, observation function, termination function, and reset function.
Each of these components can be implemented in different ways depending on the complexity of the task and the desired
level of modularity.
We provide two different workflows for designing environments with the framework:
* **Manager-based**: The environment is decomposed into individual components (or managers) that handle different
aspects of the environment (such as computing observations, applying actions, and applying randomization). The
user defines configuration classes for each component and the environment is responsible for coordinating the
managers and calling their functions.
* **Direct**: The user defines a single class that implements the entire environment directly without the need for
separate managers. This class is responsible for computing observations, applying actions, and computing rewards.
Both workflows have their own advantages and disadvantages. The manager-based workflow is more modular and allows
different components of the environment to be swapped out easily. This is useful when prototyping the environment
and experimenting with different configurations. On the other hand, the direct workflow is more efficient and allows
for more fine-grained control over the environment logic. This is useful when optimizing the environment for performance
or when implementing complex logic that is difficult to decompose into separate components.
Manager-Based Environments
--------------------------
A majority of environment implementations follow a similar structure. The environment processes the input actions,
steps through the simulation, computes observations and reward signals, applies randomization, and resets the terminated
environments. Motivated by this, the environment can be decomposed into individual components that handle each of these tasks.
For example, the observation manager is responsible for computing the observations, the reward manager is responsible for
computing the rewards, and the termination manager is responsible for computing the termination signal. This approach
is known as the manager-based environment design in the framework.
Manager-based environments promote modular implementations of tasks by decomposing the task into individual
components that are managed by separate classes. Each component of the task, such as rewards, observations,
termination can all be specified as individual configuration classes that are then passed to the corresponding
manager classes. The manager is then responsible for parsing the configurations and processing the contents specified
in its configuration.
The coordination between the different managers is orchestrated by the class :class:`envs.ManagerBasedRLEnv`.
It takes in a task configuration class instance (:class:`envs.ManagerBasedRLEnvCfg`) that contains the configurations
for each of the components of the task. Based on the configurations, the scene is set up and the task is initialized.
Afterwards, while stepping through the environment, all the managers are called sequentially to perform the necessary
operations.
For their own tasks, we expect the user to mainly define the task configuration class and use the existing
:class:`envs.ManagerBasedRLEnv` class for the task implementation. The task configuration class should inherit from
the base class :class:`envs.ManagerBasedRLEnvCfg` and contain variables assigned to various configuration classes
for each component (such as the ``ObservationCfg`` and ``RewardCfg``).
.. dropdown:: Example for defining the reward function for the Cartpole task using the manager-style
:icon: plus
The following class is a part of the Cartpole environment configuration class. The :class:`RewardsCfg` class
defines individual terms that compose the reward function. Each reward term is defined by its function
implementation, weight and additional parameters to be passed to the function. Users can define multiple
reward terms and their weights to be used in the reward function.
.. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/cartpole_env_cfg.py
:language: python
:pyobject: RewardsCfg
Through this approach, it is possible to easily vary the implementations of the task by switching some components
while leaving the remaining of the code intact. This flexibility is desirable when prototyping the environment and
experimenting with different configurations. It also allows for easy collaborating with others on implementing an
environment, since contributors may choose to use different combinations of configurations for their own task
specifications.
.. seealso::
We provide a more detailed tutorial for setting up an environment using the manager-based workflow at
:ref:`tutorial-create-manager-rl-env`.
Direct Environments
-------------------
The direct-style environment aligns more closely with traditional implementations of environments,
where a single script directly implements the reward function, observation function, resets, and all the other components
of the environment. This approach does not require the manager classes. Instead, users are provided the complete freedom
to implement their task through the APIs from the base class :class:`envs.DirectRLEnv`. For users migrating from the `IsaacGymEnvs`_
and `OmniIsaacGymEnvs`_ framework, this workflow may be more familiar.
When defining an environment with the direct-style implementation, we expect the user define a single class that
implements the entire environment. The task class should inherit from the base :class:`envs.DirectRLEnv` class and should
have its corresponding configuration class that inherits from :class:`envs.DirectRLEnvCfg`. The task class is responsible
for setting up the scene, processing the actions, computing the rewards, observations, resets, and termination signals.
.. dropdown:: Example for defining the reward function for the Cartpole task using the direct-style
:icon: plus
The following function is a part of the Cartpole environment class and is responsible for computing the rewards.
.. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_env.py
:language: python
:pyobject: CartpoleEnv._get_rewards
:dedent: 4
It calls the :meth:`compute_rewards` function which is Torch JIT compiled for performance benefits.
.. literalinclude:: ../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_env.py
:language: python
:pyobject: compute_rewards
This approach provides more transparency in the implementations of the environments, as logic is defined within the task
class instead of abstracted with the use of managers. This may be beneficial when implementing complex logic that is
difficult to decompose into separate components. Additionally, the direct-style implementation may bring more performance
benefits for the environment, as it allows implementing large chunks of logic with optimized frameworks such as
`PyTorch JIT`_ or `Warp`_. This may be valuable when scaling up training tremendously which requires optimizing individual
operations in the environment.
.. seealso::
We provide a more detailed tutorial for setting up a RL environment using the direct workflow at
:ref:`tutorial-create-direct-rl-env`.
.. _IsaacGymEnvs: https://github.com/isaac-sim/IsaacGymEnvs
.. _OmniIsaacGymEnvs: https://github.com/isaac-sim/OmniIsaacGymEnvs
.. _Pytorch JIT: https://pytorch.org/docs/stable/jit.html
.. _Warp: https://github.com/NVIDIA/warp
.. _feature-workflows:
Task Design Workflows
=====================
.. currentmodule:: omni.isaac.lab
Reinforcement learning environments can be implemented using two different workflows: Manager-based and Direct.
This page outlines the two workflows, explaining their benefits and usecases.
In addition, multi-GPU and multi-node reinforcement learning support is explained, along with the tiled rendering API,
which can be used for efficient vectorized rendering across environments.
Manager-Based Environments
--------------------------
Manager-based environments promote modular implementations of reinforcement learning tasks
through the use of Managers. Each component of the task, such as rewards, observations, termination
can all be specified as individual configuration classes that are then passed to the corresponding
manager classes. Each manager is responsible for parsing the configurations and processing
the contents specified in each config class. The manager implementations are taken care of by
the base class :class:`envs.ManagerBasedRLEnv`.
With this approach, it is simple to switch implementations of some components in the task
while leaving the remaining of the code intact. This is desirable when collaborating with others
on implementing a reinforcement learning environment, where contributors may choose to use
different combinations of configurations for the reinforcement learning components of the task.
A class definition of a manager-based environment consists of defining a task configuration class that
inherits from :class:`envs.ManagerBasedRLEnvCfg`. This class should contain variables assigned to various
configuration classes for each of the components of the RL task, such as the ``ObservationCfg``
or ``RewardCfg``. The entry point of the environment becomes the base class :class:`envs.ManagerBasedRLEnv`,
which will process the main task config and iterate through the individual configuration classes that are defined
in the task config class.
An example of implementing the reward function for the Cartpole task using the manager-based implementation is as follow:
.. code-block:: python
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
# (1) Constant running reward
alive = RewTerm(func=mdp.is_alive, weight=1.0)
# (2) Failure penalty
terminating = RewTerm(func=mdp.is_terminated, weight=-2.0)
# (3) Primary task: keep pole upright
pole_pos = RewTerm(
func=mdp.joint_pos_target_l2,
weight=-1.0,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"]), "target": 0.0},
)
# (4) Shaping tasks: lower cart velocity
cart_vel = RewTerm(
func=mdp.joint_vel_l1,
weight=-0.01,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["slider_to_cart"])},
)
# (5) Shaping tasks: lower pole angular velocity
pole_vel = RewTerm(
func=mdp.joint_vel_l1,
weight=-0.005,
params={"asset_cfg": SceneEntityCfg("robot", joint_names=["cart_to_pole"])},
)
.. seealso::
We provide a more detailed tutorial for setting up a RL environment using the manager-based workflow at
`Creating a manager-based RL Environment <../tutorials/03_envs/create_rl_env.html>`_.
Direct Environments
-------------------
The direct-style environment more closely aligns with traditional implementations of reinforcement learning environments,
where a single script implements the reward function, observation function, resets, and all other components
of the environment. This approach does not use the Manager classes. Instead, users are left with the freedom
to implement the APIs from the base class :class:`envs.DirectRLEnv`. For users migrating from the IsaacGymEnvs
or OmniIsaacGymEnvs framework, this workflow will have a closer implementation to the previous frameworks.
When defining an environment following the direct-style implementation, a task configuration class inheriting from
:class:`envs.DirectRLEnvCfg` is used for defining task environment configuration variables, such as the number
of observations and actions. Adding configuration classes for the managers are not required and will not be processed
by the base class. In addition to the configuration class, the logic of the task should be defined in a new
task class that inherits from the base class :class:`envs.DirectRLEnv`. This class will then implement the main
task logics, including setting up the scene, processing the actions, computing resets, rewards, and observations.
This approach may bring more performance benefits for the environment, as it allows implementing large chunks
of logic with optimized frameworks such as `PyTorch Jit <https://pytorch.org/docs/stable/jit.html>`_ or
`Warp <https://github.com/NVIDIA/warp>`_. This may be important when scaling up training for large and complex
environments. Additionally, data may be cached in class variables and reused in multiple APIs for the class.
This method provides more transparency in the implementations of the environments, as logic is defined
within the task class instead of abstracted with the use the Managers.
An example of implementing the reward function for the Cartpole task using the Direct-style implementation is as follow:
.. code-block:: python
def _get_rewards(self) -> torch.Tensor:
total_reward = compute_rewards(
self.cfg.rew_scale_alive,
self.cfg.rew_scale_terminated,
self.cfg.rew_scale_pole_pos,
self.cfg.rew_scale_cart_vel,
self.cfg.rew_scale_pole_vel,
self.joint_pos[:, self._pole_dof_idx[0]],
self.joint_vel[:, self._pole_dof_idx[0]],
self.joint_pos[:, self._cart_dof_idx[0]],
self.joint_vel[:, self._cart_dof_idx[0]],
self.reset_terminated,
)
return total_reward
@torch.jit.script
def compute_rewards(
rew_scale_alive: float,
rew_scale_terminated: float,
rew_scale_pole_pos: float,
rew_scale_cart_vel: float,
rew_scale_pole_vel: float,
pole_pos: torch.Tensor,
pole_vel: torch.Tensor,
cart_pos: torch.Tensor,
cart_vel: torch.Tensor,
reset_terminated: torch.Tensor,
):
rew_alive = rew_scale_alive * (1.0 - reset_terminated.float())
rew_termination = rew_scale_terminated * reset_terminated.float()
rew_pole_pos = rew_scale_pole_pos * torch.sum(torch.square(pole_pos), dim=-1)
rew_cart_vel = rew_scale_cart_vel * torch.sum(torch.abs(cart_vel), dim=-1)
rew_pole_vel = rew_scale_pole_vel * torch.sum(torch.abs(pole_vel), dim=-1)
total_reward = rew_alive + rew_termination + rew_pole_pos + rew_cart_vel + rew_pole_vel
return total_reward
.. seealso::
We provide a more detailed tutorial for setting up a RL environment using the direct workflow at
`Creating a Direct Workflow RL Environment <../tutorials/03_envs/create_direct_rl_env.html>`_.
......@@ -77,7 +77,7 @@ The camera's pose and image resolution can be configured through the
.. dropdown:: Default parameters of the ViewerCfg class:
:icon: code
.. literalinclude:: ../../../source/extensions/omni.isaac.lab/omni/isaac/lab/envs/base_env_cfg.py
.. literalinclude:: ../../../source/extensions/omni.isaac.lab/omni/isaac/lab/envs/common.py
:language: python
:pyobject: ViewerCfg
......
.. _tutorial-create-oige-rl-env:
.. _tutorial-create-direct-rl-env:
Creating a Direct Workflow RL Environment
......@@ -103,7 +103,7 @@ Defining Rewards
Reward function should be defined in the ``_get_rewards(self)`` API, which returns the reward
buffer as a return value. Within this function, the task is free to implement the logic of
the reward function. In this example, we implement a Pytorch jitted function that computes
the reward function. In this example, we implement a Pytorch JIT function that computes
the various components of the reward function.
.. code-block:: python
......
.. _tutorial-create-base-env:
.. _tutorial-create-manager-base-env:
Creating a Manager-Based Base Environment
......
.. _tutorial-create-rl-env:
.. _tutorial-create-manager-rl-env:
Creating a Manager-Based RL Environment
......@@ -6,7 +6,7 @@ Creating a Manager-Based RL Environment
.. currentmodule:: omni.isaac.lab
Having learnt how to create a base environment in :ref:`tutorial-create-base-env`, we will now look at how to create a manager-based
Having learnt how to create a base environment in :ref:`tutorial-create-manager-base-env`, we will now look at how to create a manager-based
task environment for reinforcement learning.
The base environment is designed as an sense-act environment where the agent can send commands to the environment
......@@ -56,7 +56,7 @@ The script for running the environment ``run_cartpole_rl_env.py`` is present in
The Code Explained
~~~~~~~~~~~~~~~~~~
We already went through parts of the above in the :ref:`tutorial-create-base-env` tutorial to learn
We already went through parts of the above in the :ref:`tutorial-create-manager-base-env` tutorial to learn
about how to specify the scene, observations, actions and events. Thus, in this tutorial, we
will focus only on the RL components of the environment.
......@@ -144,7 +144,7 @@ Tying it all together
---------------------
With all the above components defined, we can now create the :class:`ManagerBasedRLEnvCfg` configuration for the
cartpole environment. This is similar to the :class:`ManagerBasedEnvCfg` defined in :ref:`tutorial-create-base-env`,
cartpole environment. This is similar to the :class:`ManagerBasedEnvCfg` defined in :ref:`tutorial-create-manager-base-env`,
only with the added RL components explained in the above sections.
.. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/cartpole_env_cfg.py
......
......@@ -10,8 +10,8 @@ different aspects of the framework to create a simulation environment for agent
:maxdepth: 1
:titlesonly:
create_base_env
create_rl_env
create_manager_base_env
create_manager_rl_env
create_direct_rl_env
register_rl_env_gym
run_rl_training
......@@ -63,7 +63,7 @@ in the environment name, the entry point to the environment class, and the entry
environment configuration class.
.. note::
The ``gymnasium`` registry is a global registry. Hence, it is important to ensure that the
The :mod:`gymnasium` registry is a global registry. Hence, it is important to ensure that the
environment names are unique. Otherwise, the registry will throw an error when registering
the environment.
......@@ -76,7 +76,7 @@ call for the cartpole environment in the ``omni.isaac.lab_tasks.manager_based.cl
.. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/manager_based/classic/cartpole/__init__.py
:language: python
:lines: 10-
:emphasize-lines: 11, 12, 15
:emphasize-lines: 4, 11, 12, 15
The ``id`` argument is the name of the environment. As a convention, we name all the environments
with the prefix ``Isaac-`` to make it easier to search for them in the registry. The name of the
......@@ -96,54 +96,22 @@ configuration is loaded using the :meth:`omni.isaac.lab_tasks.utils.parse_env_cf
It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
The configuration entry point can be both a YAML file or a python configuration class.
Direct Environemtns
Direct Environments
^^^^^^^^^^^^^^^^^^^
For direct-based environments, the following shows the registration call for the cartpole environment
in the ``omni.isaac.lab_tasks.direct.cartpole`` sub-package:
For direct-based environments, the environment registration follows a similar pattern. Instead of
registering the environment's entry point as the :class:`~omni.isaac.lab.envs.ManagerBasedRLEnv` class,
we register the environment's entry point as the implementation class of the environment.
Additionally, we add the suffix ``-Direct`` to the environment name to differentiate it from the
manager-based environments.
.. code-block:: python
As an example, the following shows the registration call for the cartpole environment in the
``omni.isaac.lab_tasks.direct.cartpole`` sub-package:
import gymnasium as gym
from . import agents
from .cartpole_env import CartpoleEnv, CartpoleEnvCfg
##
# Register Gym environments.
##
gym.register(
id="Isaac-Cartpole-Direct-v0",
entry_point="omni.isaac.lab_tasks.direct.cartpole:CartpoleEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": CartpoleEnvCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": agents.rsl_rl_ppo_cfg.CartpolePPORunnerCfg,
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
"sb3_cfg_entry_point": f"{agents.__name__}:sb3_ppo_cfg.yaml",
},
)
The ``id`` argument is the name of the environment. As a convention, we name all the environments
with the prefix ``Isaac-`` to make it easier to search for them in the registry.
For direct environments, we also add the suffix ``-Direct``. The name of the
environment is typically followed by the name of the task, and then the name of the robot.
For instance, for legged locomotion with ANYmal C on flat terrain, the environment is called
``Isaac-Velocity-Flat-Anymal-C-Direct-v0``. The version number ``v<N>`` is typically used to specify different
variations of the same environment. Otherwise, the names of the environments can become too long
and difficult to read.
The ``entry_point`` argument is the entry point to the environment class. The entry point is a string
of the form ``<module>:<class>``. In the case of the cartpole environment, the entry point is
``omni.isaac.lab_tasks.direct.cartpole:CartpoleEnv``. The entry point is used to import the environment class
when creating the environment instance.
The ``env_cfg_entry_point`` argument specifies the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.lab_tasks.utils.parse_env_cfg` function.
It is then passed to the :meth:`gymnasium.make` function to create the environment instance.
The configuration entry point can be both a YAML file or a python configuration class.
.. literalinclude:: ../../../../source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/__init__.py
:language: python
:lines: 10-31
:emphasize-lines: 5, 12, 13, 16
Creating the environment
......@@ -181,7 +149,7 @@ Now that we have gone through the code, let's run the script and see the result:
./isaaclab.sh -p source/standalone/environments/random_agent.py --task Isaac-Cartpole-v0 --num_envs 32
This should open a stage with everything similar to the previous :ref:`tutorial-create-rl-env` tutorial.
This should open a stage with everything similar to the :ref:`tutorial-create-manager-rl-env` tutorial.
To stop the simulation, you can either close the window, or press ``Ctrl+C`` in the terminal.
In addition, you can also change the simulation device from GPU to CPU by adding the ``--cpu`` flag:
......
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.18.0"
version = "0.18.1"
# Description
title = "Isaac Lab framework for Robot Learning"
......
Changelog
---------
0.18.1 (2024-06-25)
~~~~~~~~~~~~~~~~~~~
Changed
^^^^^^^
* Ensured that a parity between class and its configuration class is explicitly visible in the :class:`omni.isaac.lab.envs`
module. This makes it easier to follow where definitions are located and how they are related. This should not be
a breaking change as the classes are still accessible through the same module.
0.18.0 (2024-06-13)
~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~~~~~~~
Fixed
^^^^^
......@@ -23,8 +34,8 @@ Changed
Fixed
^^^^^
* Fixed the orientation reset logic in :func:`omni.isaac.lab.envs.mdp.events.reset_root_state_uniform` to make it relative to the default orientation.
Earlier, the position was sampled relative to the default and the orientation not.
* Fixed the orientation reset logic in :func:`omni.isaac.lab.envs.mdp.events.reset_root_state_uniform` to make it relative to
the default orientation. Earlier, the position was sampled relative to the default and the orientation not.
0.17.12 (2024-06-13)
......
......@@ -11,28 +11,35 @@ observations and executes the actions provided by the agent. However, the
environment can also provide additional information such as the current
reward, done flag, and information about the current episode.
Based on these, there are two types of environments:
* :class:`ManagerBasedEnv`: The manager-based workflow base environment which
only provides the agent with the
current observations and executes the actions provided by the agent.
* :class:`ManagerBasedRLEnv`: The manager-based workflow RL task environment which
besides the functionality of
the base environment also provides additional Markov Decision Process (MDP)
related information such as the current reward, done flag, and information.
There are two types of environment designing workflows:
* **Manager-based**: The environment is decomposed into individual components (or managers)
for different aspects (such as computing observations, applying actions, and applying
randomization. The users mainly configure the managers and the environment coordinates the
managers and calls their functions.
* **Direct**: The user implements all the necessary functionality directly into a single class
directly without the need for additional managers.
In addition, RL task environments can use the direct workflow implementation:
Based on these workflows, there are the following environment classes:
* :class:`ManagerBasedEnv`: The manager-based workflow base environment which only provides the
agent with the current observations and executes the actions provided by the agent.
* :class:`ManagerBasedRLEnv`: The manager-based workflow RL task environment which besides the
functionality of the base environment also provides additional Markov Decision Process (MDP)
related information such as the current reward, done flag, and information.
* :class:`DirectRLEnv`: The direct workflow RL task environment which provides implementations for
implementing scene setup, computing dones, performing resets, and computing reward and observation.
* :class:`DirectRLEnv`: The direct workflow RL task environment which provides implementations
for implementing scene setup, computing dones, performing resets, and computing
reward and observation.
For more information about the workflow design patterns, see the `Task Design Workflows`_ section.
.. _`Task Design Workflows`: https://isaac-sim.github.io/IsaacLab/source/features/task_workflows.html
"""
from . import mdp, ui
from .base_env_cfg import ManagerBasedEnvCfg, ViewerCfg
from .common import VecEnvObs, VecEnvStepReturn, ViewerCfg
from .direct_rl_env import DirectRLEnv
from .direct_rl_env_cfg import DirectRLEnvCfg
from .manager_based_env import ManagerBasedEnv
from .manager_based_env_cfg import ManagerBasedEnvCfg
from .manager_based_rl_env import ManagerBasedRLEnv
from .rl_env_cfg import DirectRLEnvCfg, ManagerBasedRLEnvCfg
from .types import VecEnvObs, VecEnvStepReturn
from .manager_based_rl_env_cfg import ManagerBasedRLEnvCfg
......@@ -6,7 +6,61 @@
from __future__ import annotations
import torch
from typing import Dict
from typing import Dict, Literal
from omni.isaac.lab.utils import configclass
##
# Configuration.
##
@configclass
class ViewerCfg:
"""Configuration of the scene viewport camera."""
eye: tuple[float, float, float] = (7.5, 7.5, 7.5)
"""Initial camera position (in m). Default is (7.5, 7.5, 7.5)."""
lookat: tuple[float, float, float] = (0.0, 0.0, 0.0)
"""Initial camera target position (in m). Default is (0.0, 0.0, 0.0)."""
cam_prim_path: str = "/OmniverseKit_Persp"
"""The camera prim path to record images from. Default is "/OmniverseKit_Persp",
which is the default camera in the viewport.
"""
resolution: tuple[int, int] = (1280, 720)
"""The resolution (width, height) of the camera specified using :attr:`cam_prim_path`.
Default is (1280, 720).
"""
origin_type: Literal["world", "env", "asset_root"] = "world"
"""The frame in which the camera position (eye) and target (lookat) are defined in. Default is "world".
Available options are:
* ``"world"``: The origin of the world.
* ``"env"``: The origin of the environment defined by :attr:`env_index`.
* ``"asset_root"``: The center of the asset defined by :attr:`asset_name` in environment :attr:`env_index`.
"""
env_index: int = 0
"""The environment index for frame origin. Default is 0.
This quantity is only effective if :attr:`origin` is set to "env" or "asset_root".
"""
asset_name: str | None = None
"""The asset name in the interactive scene for the frame origin. Default is None.
This quantity is only effective if :attr:`origin` is set to "asset_root".
"""
##
# Types.
##
VecEnvObs = Dict[str, torch.Tensor | Dict[str, torch.Tensor]]
"""Observation returned by the environment.
......@@ -31,7 +85,7 @@ Note:
"""
VecEnvStepReturn = tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, Dict]
VecEnvStepReturn = tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, dict]
"""The environment signals processed at the end of each step.
The tuple contains batched information for each sub-environment. The information is stored in the following order:
......
......@@ -21,21 +21,21 @@ import omni.isaac.core.utils.torch as torch_utils
import omni.kit.app
from omni.isaac.version import get_version
from omni.isaac.lab.envs.types import VecEnvObs, VecEnvStepReturn
from omni.isaac.lab.managers import EventManager
from omni.isaac.lab.scene import InteractiveScene
from omni.isaac.lab.sim import SimulationContext
from omni.isaac.lab.utils.noise import NoiseModel
from omni.isaac.lab.utils.timer import Timer
from .rl_env_cfg import DirectRLEnvCfg
from .common import VecEnvObs, VecEnvStepReturn
from .direct_rl_env_cfg import DirectRLEnvCfg
from .ui import ViewportCameraController
class DirectRLEnv(gym.Env):
"""The superclass for the direct workflow reinforcement learning-based environments.
"""The superclass for the direct workflow to design environments.
This class implements the core functionality for reinforcement learning-based
This class implements the core functionality for reinforcement learning (RL)
environments. It is designed to be used with any RL library. The class is designed
to be used with vectorized environments, i.e., the environment is expected to be run
in parallel with multiple sub-environments.
......@@ -70,6 +70,8 @@ class DirectRLEnv(gym.Env):
Args:
cfg: The configuration object for the environment.
render_mode: The render mode for the environment. Defaults to None, which
is similar to ``"human"``.
Raises:
RuntimeError: If a simulation context already exists. The environment must always create one
......@@ -165,10 +167,11 @@ class DirectRLEnv(gym.Env):
self.reset_time_outs = torch.zeros_like(self.reset_terminated)
self.reset_buf = torch.zeros(self.num_envs, dtype=torch.bool, device=self.sim.device)
self.actions = torch.zeros(self.num_envs, self.cfg.num_actions, device=self.sim.device)
# setup the action and observation spaces for Gym
self._configure_gym_env_spaces()
# -- noise cfg for adding action and observation noise
# setup noise cfg for adding action and observation noise
if self.cfg.action_noise_model:
self._action_noise_model: NoiseModel = self.cfg.action_noise_model.class_type(
self.num_envs, self.cfg.action_noise_model, self.device
......@@ -177,6 +180,7 @@ class DirectRLEnv(gym.Env):
self._observation_noise_model: NoiseModel = self.cfg.observation_noise_model.class_type(
self.num_envs, self.cfg.observation_noise_model, self.device
)
# perform events at the start of the simulation
if self.cfg.events:
if "startup" in self.event_manager.available_modes:
......@@ -260,19 +264,27 @@ class DirectRLEnv(gym.Env):
def step(self, action: torch.Tensor) -> VecEnvStepReturn:
"""Execute one time-step of the environment's dynamics.
The environment steps forward at a fixed time-step, while the physics simulation is
decimated at a lower time-step. This is to ensure that the simulation is stable. These two
time-steps can be configured independently using the :attr:`DirectRLEnvCfg.decimation` (number of
simulation steps per environment step) and the :attr:`DirectRLEnvCfg.physics_dt` (physics time-step).
Based on these parameters, the environment time-step is computed as the product of the two.
The environment steps forward at a fixed time-step, while the physics simulation is decimated at a
lower time-step. This is to ensure that the simulation is stable. These two time-steps can be configured
independently using the :attr:`DirectRLEnvCfg.decimation` (number of simulation steps per environment step)
and the :attr:`DirectRLEnvCfg.sim.physics_dt` (physics time-step). Based on these parameters, the environment
time-step is computed as the product of the two.
This function performs the following steps:
1. Pre-process the actions before stepping through the physics.
2. Apply the actions to the simulator and step through the physics in a decimated manner.
3. Compute the reward and done signals.
4. Reset environments that have terminated or reached the maximum episode length.
5. Apply interval events if they are enabled.
6. Compute observations.
Args:
action: The actions to apply on the environment. Shape is (num_envs, action_dim).
Returns:
A tuple containing the observations and extras.
A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
"""
# add action noise
if self.cfg.action_noise_model:
action = self._action_noise_model.apply(action.clone())
......@@ -317,6 +329,7 @@ class DirectRLEnv(gym.Env):
self.obs_buf = self._get_observations()
# add observation noise
# note: we apply no noise to the state space (since it is used for critic networks)
if self.cfg.observation_noise_model:
self.obs_buf["policy"] = self._observation_noise_model.apply(self.obs_buf["policy"])
......@@ -424,6 +437,10 @@ class DirectRLEnv(gym.Env):
# update closing status
self._is_closed = True
"""
Operations - Debug Visualization.
"""
def set_debug_vis(self, debug_vis: bool) -> bool:
"""Toggles the environment debug visualization.
......@@ -477,6 +494,7 @@ class DirectRLEnv(gym.Env):
self.observation_space = gym.vector.utils.batch_space(self.single_observation_space["policy"], self.num_envs)
self.action_space = gym.vector.utils.batch_space(self.single_action_space, self.num_envs)
# optional state space for asymmetric actor-critic architectures
if self.num_states > 0:
self.single_observation_space["critic"] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.num_states,))
self.state_space = gym.vector.utils.batch_space(self.single_observation_space["critic"], self.num_envs)
......@@ -488,49 +506,103 @@ class DirectRLEnv(gym.Env):
env_ids: List of environment ids which must be reset
"""
self.scene.reset(env_ids)
# apply events such as randomizations for environments that need a reset
# apply events such as randomization for environments that need a reset
if self.cfg.events:
if "reset" in self.event_manager.available_modes:
self.event_manager.apply(env_ids=env_ids, mode="reset")
# reset noise models
if self.cfg.action_noise_model:
self._action_noise_model.reset(env_ids)
if self.cfg.observation_noise_model:
self._observation_noise_model.reset(env_ids)
# reset the episode length buffer
self.episode_length_buf[env_ids] = 0
# this can be done through configs as well
"""
Implementation-specific functions.
"""
def _setup_scene(self):
pass
"""Setup the scene for the environment.
def _set_debug_vis_impl(self, debug_vis: bool):
"""Set debug visualization into visualization objects.
This function is responsible for creating the scene objects and setting up the scene for the environment.
The scene creation can happen through :class:`omni.isaac.lab.scene.InteractiveSceneCfg` or through
directly creating the scene objects and registering them with the scene manager.
This function is responsible for creating the visualization objects if they don't exist
and input ``debug_vis`` is True. If the visualization objects exist, the function should
set their visibility into the stage.
We leave the implementation of this function to the derived classes. If the environment does not require
any explicit scene setup, the function can be left empty.
"""
raise NotImplementedError(f"Debug visualization is not implemented for {self.__class__.__name__}.")
pass
@abstractmethod
def _pre_physics_step(self, actions: torch.Tensor):
return NotImplementedError
"""Pre-process actions before stepping through the physics.
This function is responsible for pre-processing the actions before stepping through the physics.
It is called before the physics stepping (which is decimated).
Args:
actions: The actions to apply on the environment. Shape is (num_envs, action_dim).
"""
raise NotImplementedError(f"Please implement the '_pre_physics_step' method for {self.__class__.__name__}.")
@abstractmethod
def _apply_action(self):
return NotImplementedError
"""Apply actions to the simulator.
This function is responsible for applying the actions to the simulator. It is called at each
physics time-step.
"""
raise NotImplementedError(f"Please implement the '_apply_action' method for {self.__class__.__name__}.")
@abstractmethod
def _get_observations(self) -> VecEnvObs:
return NotImplementedError
"""Compute and return the observations for the environment.
Returns:
The observations for the environment.
"""
raise NotImplementedError(f"Please implement the '_get_observations' method for {self.__class__.__name__}.")
def _get_states(self) -> VecEnvObs | None:
return None
"""Compute and return the states for the environment.
The state-space is used for asymmetric actor-critic architectures. It is configured
using the :attr:`DirectRLEnvCfg.num_states` parameter.
Returns:
The states for the environment. If the environment does not have a state-space, the function
returns a None.
"""
return None # noqa: R501
@abstractmethod
def _get_rewards(self) -> torch.Tensor:
return NotImplementedError
"""Compute and return the rewards for the environment.
Returns:
The rewards for the environment. Shape is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_rewards' method for {self.__class__.__name__}.")
@abstractmethod
def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
return NotImplementedError
"""Compute and return the done flags for the environment.
Returns:
A tuple containing the done flags for termination and time-out.
Shape of individual tensors is (num_envs,).
"""
raise NotImplementedError(f"Please implement the '_get_dones' method for {self.__class__.__name__}.")
def _set_debug_vis_impl(self, debug_vis: bool):
"""Set debug visualization into visualization objects.
This function is responsible for creating the visualization objects if they don't exist
and input ``debug_vis`` is True. If the visualization objects exist, the function should
set their visibility into the stage.
"""
raise NotImplementedError(f"Debug visualization is not implemented for {self.__class__.__name__}.")
......@@ -8,88 +8,23 @@ from dataclasses import MISSING
from omni.isaac.lab.scene import InteractiveSceneCfg
from omni.isaac.lab.sim import SimulationCfg
from omni.isaac.lab.utils import configclass
from omni.isaac.lab.utils.noise.noise_cfg import NoiseModelCfg
from omni.isaac.lab.utils.noise import NoiseModelCfg
from .base_env_cfg import ManagerBasedEnvCfg, ViewerCfg
from .ui import BaseEnvWindow, ManagerBasedRLEnvWindow
from .common import ViewerCfg
from .ui import BaseEnvWindow
@configclass
class ManagerBasedRLEnvCfg(ManagerBasedEnvCfg):
"""Configuration for a reinforcement learning environment with the manager-based workflow."""
class DirectRLEnvCfg:
"""Configuration for an RL environment defined with the direct workflow.
# ui settings
ui_window_class_type: type | None = ManagerBasedRLEnvWindow
# general settings
is_finite_horizon: bool = False
"""Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Defaults to False, which means the task is treated as an infinite horizon problem.
This flag handles the subtleties of finite and infinite horizon tasks:
* **Finite horizon**: no penalty or bootstrapping value is required by the the agent for
running out of time. However, the environment still needs to terminate the episode after the
time limit is reached.
* **Infinite horizon**: the agent needs to bootstrap the value of the state at the end of the episode.
This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this
bootstrapping calculation.
If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal
is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out
(or truncated) done signal is sent to the agent.
Note:
The base :class:`ManagerBasedRLEnv` class does not use this flag directly. It is used by the environment
wrappers to determine what type of done signal to send to the corresponding learning agent.
Please refer to the :class:`omni.isaac.lab.envs.direct_rl_env.DirectRLEnv` class for more details.
"""
episode_length_s: float = MISSING
"""Duration of an episode (in seconds).
Based on the decimation rate and physics time step, the episode length is calculated as:
.. code-block:: python
episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds,
then the episode length in steps is 100.
"""
# environment settings
rewards: object = MISSING
"""Reward settings.
Please refer to the :class:`omni.isaac.lab.managers.RewardManager` class for more details.
"""
terminations: object = MISSING
"""Termination settings.
Please refer to the :class:`omni.isaac.lab.managers.TerminationManager` class for more details.
"""
curriculum: object = MISSING
"""Curriculum settings.
Please refer to the :class:`omni.isaac.lab.managers.CurriculumManager` class for more details.
"""
commands: object = MISSING
"""Command settings.
Please refer to the :class:`omni.isaac.lab.managers.CommandManager` class for more details.
"""
@configclass
class DirectRLEnvCfg(ManagerBasedEnvCfg):
"""Configuration for a reinforcement learning environment with the direct workflow."""
# simulation settings
viewer: ViewerCfg = ViewerCfg()
"""Viewer configuration. Default is ViewerCfg()."""
sim: SimulationCfg = SimulationCfg()
"""Physics simulation configuration. Default is SimulationCfg()."""
......@@ -113,14 +48,6 @@ class DirectRLEnvCfg(ManagerBasedEnvCfg):
This means that the control action is updated every 10 simulation steps.
"""
# environment settings
scene: InteractiveSceneCfg = MISSING
"""Scene settings.
Please refer to the :class:`omni.isaac.lab.scene.InteractiveSceneCfg` class for more details.
"""
# general settings
is_finite_horizon: bool = False
"""Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Defaults to False, which means the task is treated as an infinite horizon problem.
......@@ -156,29 +83,39 @@ class DirectRLEnvCfg(ManagerBasedEnvCfg):
then the episode length in steps is 100.
"""
# environment settings
scene: InteractiveSceneCfg = MISSING
"""Scene settings.
Please refer to the :class:`omni.isaac.lab.scene.InteractiveSceneCfg` class for more details.
"""
events: object = None
"""Event settings. Defaults to None, in which case no events are applied through the event manager.
Please refer to the :class:`omni.isaac.lab.managers.EventManager` class for more details.
"""
num_observations: int = MISSING
"""The size of the observation for each environment."""
"""The dimension of the observation space from each environment instance."""
num_states: int = 0
"""The size of the state-space for each environment. Default is 0.
"""The dimension of the state-space from each environment instance. Default is 0, which means no state-space is defined.
This is used for asymmetric actor-critic and defines the observation space for the critic.
This is useful for asymmetric actor-critic and defines the observation space for the critic.
"""
num_actions: int = MISSING
"""The size of the action space for each environment."""
observation_noise_model: NoiseModelCfg | None = None
"""The noise model to apply to the computed observations from the environment. Default is None, which means no noise is added.
events: object = None
"""Settings for specifying domain randomization terms during training.
Please refer to the :class:`omni.isaac.lab.managers.EventManager` class for more details.
Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
"""
num_actions: int = MISSING
"""The dimension of the action space for each environment."""
action_noise_model: NoiseModelCfg | None = None
"""Settings for adding noise to the action buffer.
Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
"""
"""The noise model applied to the actions provided to the environment. Default is None, which means no noise is added.
observation_noise_model: NoiseModelCfg | None = None
"""Settings for adding noise to the observation buffer.
Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
"""
......@@ -12,13 +12,13 @@ from typing import Any
import carb
import omni.isaac.core.utils.torch as torch_utils
from omni.isaac.lab.envs.types import VecEnvObs
from omni.isaac.lab.managers import ActionManager, EventManager, ObservationManager
from omni.isaac.lab.scene import InteractiveScene
from omni.isaac.lab.sim import SimulationContext
from omni.isaac.lab.utils.timer import Timer
from .base_env_cfg import ManagerBasedEnvCfg
from .common import VecEnvObs
from .manager_based_env_cfg import ManagerBasedEnvCfg
from .ui import ViewportCameraController
......@@ -251,7 +251,7 @@ class ManagerBasedEnv:
The environment steps forward at a fixed time-step, while the physics simulation is
decimated at a lower time-step. This is to ensure that the simulation is stable. These two
time-steps can be configured independently using the :attr:`ManagerBasedEnvCfg.decimation` (number of
simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.physics_dt` (physics time-step).
simulation steps per environment step) and the :attr:`ManagerBasedEnvCfg.sim.dt` (physics time-step).
Based on these parameters, the environment time-step is computed as the product of the two.
Args:
......
......@@ -10,7 +10,6 @@ configuring the environment instances, viewer settings, and simulation parameter
"""
from dataclasses import MISSING
from typing import Literal
import omni.isaac.lab.envs.mdp as mdp
from omni.isaac.lab.managers import EventTermCfg as EventTerm
......@@ -18,52 +17,10 @@ from omni.isaac.lab.scene import InteractiveSceneCfg
from omni.isaac.lab.sim import SimulationCfg
from omni.isaac.lab.utils import configclass
from .common import ViewerCfg
from .ui import BaseEnvWindow
@configclass
class ViewerCfg:
"""Configuration of the scene viewport camera."""
eye: tuple[float, float, float] = (7.5, 7.5, 7.5)
"""Initial camera position (in m). Default is (7.5, 7.5, 7.5)."""
lookat: tuple[float, float, float] = (0.0, 0.0, 0.0)
"""Initial camera target position (in m). Default is (0.0, 0.0, 0.0)."""
cam_prim_path: str = "/OmniverseKit_Persp"
"""The camera prim path to record images from. Default is "/OmniverseKit_Persp",
which is the default camera in the viewport.
"""
resolution: tuple[int, int] = (1280, 720)
"""The resolution (width, height) of the camera specified using :attr:`cam_prim_path`.
Default is (1280, 720).
"""
origin_type: Literal["world", "env", "asset_root"] = "world"
"""The frame in which the camera position (eye) and target (lookat) are defined in. Default is "world".
Available options are:
* ``"world"``: The origin of the world.
* ``"env"``: The origin of the environment defined by :attr:`env_index`.
* ``"asset_root"``: The center of the asset defined by :attr:`asset_name` in environment :attr:`env_index`.
"""
env_index: int = 0
"""The environment index for frame origin. Default is 0.
This quantity is only effective if :attr:`origin` is set to "env" or "asset_root".
"""
asset_name: str | None = None
"""The asset name in the interactive scene for the frame origin. Default is None.
This quantity is only effective if :attr:`origin` is set to "asset_root".
"""
@configclass
class DefaultEventManagerCfg:
"""Configuration of the default event manager.
......@@ -82,6 +39,7 @@ class ManagerBasedEnvCfg:
# simulation settings
viewer: ViewerCfg = ViewerCfg()
"""Viewer configuration. Default is ViewerCfg()."""
sim: SimulationCfg = SimulationCfg()
"""Physics simulation configuration. Default is SimulationCfg()."""
......
......@@ -17,9 +17,9 @@ from omni.isaac.version import get_version
from omni.isaac.lab.managers import CommandManager, CurriculumManager, RewardManager, TerminationManager
from .common import VecEnvStepReturn
from .manager_based_env import ManagerBasedEnv
from .rl_env_cfg import ManagerBasedRLEnvCfg
from .types import VecEnvStepReturn
from .manager_based_rl_env_cfg import ManagerBasedRLEnvCfg
class ManagerBasedRLEnv(ManagerBasedEnv, gym.Env):
......@@ -84,9 +84,11 @@ class ManagerBasedRLEnv(ManagerBasedEnv, gym.Env):
# setup the action and observation spaces for Gym
self._configure_gym_env_spaces()
# perform events at the start of the simulation
if "startup" in self.event_manager.available_modes:
self.event_manager.apply(mode="startup")
# print the environment information
print("[INFO]: Completed setting up the environment...")
......
# Copyright (c) 2022-2024, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from dataclasses import MISSING
from omni.isaac.lab.utils import configclass
from .manager_based_env_cfg import ManagerBasedEnvCfg
from .ui import ManagerBasedRLEnvWindow
@configclass
class ManagerBasedRLEnvCfg(ManagerBasedEnvCfg):
"""Configuration for a reinforcement learning environment with the manager-based workflow."""
# ui settings
ui_window_class_type: type | None = ManagerBasedRLEnvWindow
# general settings
is_finite_horizon: bool = False
"""Whether the learning task is treated as a finite or infinite horizon problem for the agent.
Defaults to False, which means the task is treated as an infinite horizon problem.
This flag handles the subtleties of finite and infinite horizon tasks:
* **Finite horizon**: no penalty or bootstrapping value is required by the the agent for
running out of time. However, the environment still needs to terminate the episode after the
time limit is reached.
* **Infinite horizon**: the agent needs to bootstrap the value of the state at the end of the episode.
This is done by sending a time-limit (or truncated) done signal to the agent, which triggers this
bootstrapping calculation.
If True, then the environment is treated as a finite horizon problem and no time-out (or truncated) done signal
is sent to the agent. If False, then the environment is treated as an infinite horizon problem and a time-out
(or truncated) done signal is sent to the agent.
Note:
The base :class:`ManagerBasedRLEnv` class does not use this flag directly. It is used by the environment
wrappers to determine what type of done signal to send to the corresponding learning agent.
"""
episode_length_s: float = MISSING
"""Duration of an episode (in seconds).
Based on the decimation rate and physics time step, the episode length is calculated as:
.. code-block:: python
episode_length_steps = ceil(episode_length_s / (decimation_rate * physics_time_step))
For example, if the decimation rate is 10, the physics time step is 0.01, and the episode length is 10 seconds,
then the episode length in steps is 100.
"""
# environment settings
rewards: object = MISSING
"""Reward settings.
Please refer to the :class:`omni.isaac.lab.managers.RewardManager` class for more details.
"""
terminations: object = MISSING
"""Termination settings.
Please refer to the :class:`omni.isaac.lab.managers.TerminationManager` class for more details.
"""
curriculum: object = MISSING
"""Curriculum settings.
Please refer to the :class:`omni.isaac.lab.managers.CurriculumManager` class for more details.
"""
commands: object = MISSING
"""Command settings.
Please refer to the :class:`omni.isaac.lab.managers.CommandManager` class for more details.
"""
......@@ -46,7 +46,7 @@ Added
Changed
^^^^^^^
* Set default device for RSL RL and SB3 configs to "cuda:0".
* Made default device for RSL RL and SB3 configs to "cuda:0".
0.7.3 (2024-05-21)
~~~~~~~~~~~~~~~~~~
......@@ -54,7 +54,7 @@ Changed
Added
^^^^^
* Introduce ``--max_iterations`` argument to training scripts for specifying number of training iterations.
* Introduced ``--max_iterations`` argument to training scripts for specifying number of training iterations.
0.7.2 (2024-05-13)
~~~~~~~~~~~~~~~~~~
......@@ -62,7 +62,8 @@ Added
Added
^^^^^
* Add Shadow Hand environments: ``Isaac-Shadow-Hand-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-FF-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-LSTM-Direct-v0``.
* Added Shadow Hand environments: ``Isaac-Shadow-Hand-Direct-v0``, ``Isaac-Shadow-Hand-OpenAI-FF-Direct-v0``,
and ``Isaac-Shadow-Hand-OpenAI-LSTM-Direct-v0``.
0.7.1 (2024-05-09)
......@@ -80,7 +81,9 @@ Added
Changed
^^^^^^^
* Renamed all references of ``BaseEnv``, ``RLTaskEnv``, and ``OIGEEnv`` to :class:`omni.isaac.lab.envs.ManagerBasedEnv`, :class:`omni.isaac.lab.envs.ManagerBasedRLEnv`, and :class:`omni.isaac.lab.envs.DirectRLEnv`.
* Renamed all references of ``BaseEnv``, ``RLTaskEnv``, and ``OIGEEnv`` to
:class:`omni.isaac.lab.envs.ManagerBasedEnv`, :class:`omni.isaac.lab.envs.ManagerBasedRLEnv`,
and :class:`omni.isaac.lab.envs.DirectRLEnv` respectively.
* Split environments into ``manager_based`` and ``direct`` folders.
Added
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment