Unverified Commit cd2c4f1d authored by Mayank Mittal's avatar Mayank Mittal Committed by GitHub

Upgrades environments from Gym 0.21 to Gymnasium 0.29 (#234)

# Description

Currently, we are downgrading many libraries to be able to use the Gym
0.21.0 version. However, this is not great and is causing issues
installing new Python packages, as highlighted in #204. It is becoming a
more significant issue with Python 3.10 in Isaac Sim 2023.1.

This MR upgrades the repository to use the Gymnasium Environment class.

## Type of Change

- Bug fix (non-breaking change which fixes an issue)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------
Signed-off-by: 's avatarMayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Co-authored-by: 's avatarDavid Hoeller <dhoeller@ethz.ch>
parent e5b43e96
......@@ -4,7 +4,7 @@ omni.isaac.orbit_tasks.isaac_env
We use OpenAI Gym registry to register the environment and their default configuration file.
The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
The string is parsed into respective configuration container which needs to be passed to the environment
class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
:mod:`omni.isaac.orbit.utils.parse_cfg`.
......@@ -17,12 +17,12 @@ class. This is done using the function :meth:`load_default_env_cfg` in the sub-m
.. code-block:: python
import gym
import gymnasium as gym
import omni.isaac.orbit_tasks
from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry
task_name = "Isaac-Cartpole-v0"
cfg = load_default_env_cfg(task_name)
cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
env = gym.make(task_name, cfg=cfg)
......
Known issues
============
Installation errors due to gym==0.21.0
--------------------------------------
When installing the gym package, you may encounter the following error:
.. code-block::
error in gym setup command: 'extras_require' must be a dictionary whose values are strings or lists of
strings containing valid project/version requirement specifiers.
----------------------------------------
ERROR: Could not find a version that satisfies the requirement gym==0.21.0 (from omni-isaac-orbit-envs[all])
(from versions: 0.0.2, 0.0.3, 0.0.4, 0.0.5, 0.0.6, 0.0.7, 0.1.0, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6,
...
0.15.7, 0.16.0, 0.17.0, 0.17.1, 0.17.2, 0.17.3, 0.18.0, 0.18.3, 0.19.0, 0.20.0, 0.21.0, 0.22.0, 0.23.0,
0.23.1, 0.24.0, 0.24.1, 0.25.0, 0.25.1, 0.25.2, 0.26.0, 0.26.1, 0.26.2)
ERROR: No matching distribution found for gym==0.21.0
This issue arises since the ``setuptools`` package from version 67.0 onwards does not support malformed version strings.
Since the OpenAI Gym package that is no longer being maintained (`issue link <https://github.com/openai/gym/issues/3200>`_),
the current workaround is to install the ``setuptools`` package version 66.0.0. You can do this by running the following
command:
.. code-block:: bash
./orbit.sh -p -m pip install -U setuptools==66
Regression in Isaac Sim 2022.2.1
--------------------------------
......
......@@ -157,7 +157,7 @@ utilities to manage extensions:
optional arguments:
-h, --help Display the help content.
-i, --install Install the extensions inside Isaac Orbit.
-i, --install Install the extensions inside Orbit.
-e, --extra Install extra dependencies such as the learning frameworks.
-f, --format Run pre-commit to format the code and check lints.
-p, --python Run the python executable (python.sh) provided by Isaac Sim.
......
......@@ -141,7 +141,7 @@ format.
.. code:: bash
# install python module (for robomimic)
./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[robomimic]'
./orbit.sh -e robomimic
# split data
./orbit.sh -p source/standalone//workflows/robomimic/tools/split_train_val.py logs/robomimic/Isaac-Lift-Franka-v0/hdf_dataset.hdf5 --ratio 0.2
......@@ -171,7 +171,7 @@ from the environments into the respective libraries function argument and return
.. code:: bash
# install python module (for stable-baselines3)
./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[sb3]'
./orbit.sh -e sb3
# run script for training
# note: we enable cpu flag since SB3 doesn't optimize for GPU anyway
./orbit.sh -p source/standalone/workflows/sb3/train.py --task Isaac-Cartpole-v0 --headless --cpu
......@@ -184,7 +184,7 @@ from the environments into the respective libraries function argument and return
.. code:: bash
# install python module (for skrl)
./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[skrl]'
./orbit.sh -e skrl
# run script for training
./orbit.sh -p source/standalone/workflows/skrl/train.py --task Isaac-Reach-Franka-v0 --headless
# run script for playing with 32 environments
......@@ -196,7 +196,7 @@ from the environments into the respective libraries function argument and return
.. code:: bash
# install python module (for rl-games)
./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[rl_games]'
./orbit.sh -e rl_games
# run script for training
./orbit.sh -p source/standalone/workflows/rl_games/train.py --task Isaac-Ant-v0 --headless
# run script for playing with 32 environments
......@@ -208,7 +208,7 @@ from the environments into the respective libraries function argument and return
.. code:: bash
# install python module (for rsl-rl)
./orbit.sh -p -m pip install -e 'source/extensions/omni.isaac.orbit_tasks[rsl_rl]'
./orbit.sh -e rsl_rl
# run script for training
./orbit.sh -p source/standalone/workflows/rsl_rl/train.py --task Isaac-Reach-Franka-v0 --headless
# run script for playing with 32 environments
......
......@@ -39,11 +39,12 @@ an environment by calling ``gym.make``. The environments are registered in the `
gym.register(
id="Isaac-Cartpole-v0",
entry_point="omni.isaac.orbit_tasks.classic.cartpole:CartpoleEnv",
kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
disable_env_checker=True,
kwargs={"env_cfg_entry_point": "omni.isaac.orbit_tasks.classic.cartpole:cartpole_cfg.yaml"},
)
The ``cfg_entry_point`` argument is used to load the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_default_env_cfg` function.
The ``env_cfg_entry_point`` argument is used to load the default configuration for the environment. The default
configuration is loaded using the :meth:`omni.isaac.orbit_tasks.utils.parse_cfg.load_cfg_from_registry` function.
The configuration entry point can correspond to both a YAML file or a python configuration
class. The default configuration can be overridden by passing a custom configuration instance to the ``gym.make``
function as shown later in the tutorial.
......
......@@ -26,13 +26,13 @@ For example, here is how you would wrap an environment to enforce that reset is
"""Rest everything follows."""
import gym
import gymnasium as gym
import omni.isaac.orbit_tasks # noqa: F401
from omni.isaac.orbit_tasks.utils import load_default_env_cfg
from omni.isaac.orbit_tasks.utils import load_cfg_from_registry
# create base environment
cfg = load_default_env_cfg("Isaac-Reach-Franka-v0")
cfg = load_cfg_from_registry("Isaac-Reach-Franka-v0", "env_cfg_entry_point")
env = gym.make("Isaac-Reach-Franka-v0", cfg=cfg)
# wrap environment to enforce that reset is called before step
env = gym.wrappers.OrderEnforcing(env)
......@@ -105,7 +105,7 @@ for 200 steps, and saves it in the ``videos`` folder at a step interval of 1500
"""Rest everything follows."""
import gym
import gymnasium as gym
# adjust camera resolution and pose
env_cfg.viewer.resolution = (640, 480)
......
......@@ -185,7 +185,7 @@ print_help () {
echo -e "\nusage: $(basename "$0") [-h] [-i] [-e] [-f] [-p] [-s] [-o] [-v] [-d] [-c] -- Utility to manage extensions in Orbit."
echo -e "\noptional arguments:"
echo -e "\t-h, --help Display the help content."
echo -e "\t-i, --install Install the extensions inside Isaac Orbit."
echo -e "\t-i, --install Install the extensions inside Orbit."
echo -e "\t-e, --extra Install extra dependencies such as the learning frameworks."
echo -e "\t-f, --format Run pre-commit to format the code and check lints."
echo -e "\t-p, --python Run the python executable (python.sh) provided by Isaac Sim."
......@@ -220,9 +220,6 @@ while [[ $# -gt 0 ]]; do
# this does not check dependencies between extensions
export -f extract_python_exe
export -f install_orbit_extension
# downgrade setuptools to avoid issues with OpenAI Gym
# Check the `Known Issues` section in the documentation
$(extract_python_exe) -m pip install --upgrade setuptools==66
# source directory
find -L "${ORBIT_PATH}/source/extensions" -mindepth 1 -maxdepth 1 -type d -exec bash -c 'install_orbit_extension "{}"' \;
# unset local variables
......@@ -235,8 +232,17 @@ while [[ $# -gt 0 ]]; do
# install the python packages for supported reinforcement learning frameworks
echo "[INFO] Installing extra requirements such as learning frameworks..."
python_exe=$(extract_python_exe)
# check if specified which rl-framework to install
if [ -z "$2" ]; then
echo "[INFO] Installing all rl-frameworks..."
framework_name="all"
else
echo "[INFO] Installing rl-framework: $2"
framework_name=$2
shift # past argument
fi
# install the rl-frameworks specified
${python_exe} -m pip install -e ${ORBIT_PATH}/source/extensions/omni.isaac.orbit_tasks[all]
${python_exe} -m pip install -e ${ORBIT_PATH}/source/extensions/omni.isaac.orbit_tasks["${framework_name}"]
shift # past argument
;;
-c|--conda)
......
......@@ -27,7 +27,7 @@ extra_standard_library = [
"tensordict",
"bpy",
"matplotlib",
"gym",
"gymnasium",
"scipy",
"hid",
"yaml",
......
......@@ -18,9 +18,12 @@ itself. However, its various instances should be included in directories within
The environments should then be registered in the `omni/isaac/contrib_tasks/__init__.py`:
```python
import gymnasium as gym
gym.register(
id="Isaac-Contrib-<my-awesome-env>-v0",
entry_point="omni.isaac.contrib_tasks.<your-env-package>:<your-env-class>",
disable_env_checker=True,
kwargs={"cfg_entry_point": "omni.isaac.contrib_tasks.<your-env-package-cfg>:<your-env-class-cfg>"},
)
```
......@@ -9,7 +9,7 @@
We use OpenAI Gym registry to register the environment and their default configuration file.
The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
The string is parsed into respective configuration container which needs to be passed to the environment
class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
:mod:`omni.isaac.orbit.utils.parse_cfg`.
Note:
......@@ -18,18 +18,18 @@ Note:
the kwarg argument :obj:`cfg` while creating the environment.
Usage:
>>> import gym
>>> import gymnasium as gym
>>> import omni.isaac.contrib_tasks
>>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
>>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry
>>>
>>> task_name = "Isaac-Contrib-<my-registered-env-name>-v0"
>>> cfg = load_default_env_cfg(task_name)
>>> cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
>>> env = gym.make(task_name, cfg=cfg)
"""
from __future__ import annotations
import gym # noqa: F401
import gymnasium as gym # noqa: F401
import os
import toml
......
......@@ -28,6 +28,10 @@ setup(
include_package_data=True,
python_requires=">=3.7",
packages=["omni.isaac.contrib_tasks"],
classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
classifiers=[
"Natural Language :: English",
"Programming Language :: Python :: 3.10",
"Isaac Sim :: 2023.1.0-hotfix.1",
],
zip_safe=False,
)
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.9.37"
version = "0.9.38"
# Description
title = "ORBIT framework for Robot Learning"
......
Changelog
---------
0.9.38 (2023-11-07)
~~~~~~~~~~~~~~~~~~~
Changed
^^^^^^^
* Upgraded the :class:`omni.isaac.orbit.envs.RLTaskEnv` class to support Gym 0.29.0 environment definition.
Added
^^^^^
* Added computation of ``time_outs`` and ``terminated`` signals inside the termination manager. These follow the
definition mentioned in `Gym 0.29.0 <https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_.
* Added proper handling of observation and action spaces in the :class:`omni.isaac.orbit.envs.RLTaskEnv` class.
These now follow closely to how Gym VecEnv handles the spaces.
0.9.37 (2023-11-06)
~~~~~~~~~~~~~~~~~~~
......
......@@ -5,12 +5,14 @@
from __future__ import annotations
import gym
import gymnasium as gym
import math
import numpy as np
import torch
from typing import Any, ClassVar, Dict, Sequence, Tuple, Union
from omni.isaac.version import get_version
from omni.isaac.orbit.command_generators import CommandGeneratorBase
from omni.isaac.orbit.managers import CurriculumManager, RewardManager, TerminationManager
......@@ -41,10 +43,16 @@ Note:
"""
VecEnvStepReturn = Tuple[VecEnvObs, torch.Tensor, torch.Tensor, Dict]
VecEnvStepReturn = Tuple[VecEnvObs, torch.Tensor, torch.Tensor, torch.Tensor, Dict]
"""The environment signals processed at the end of each step.
It contains the observation, reward, termination signal and additional information for each sub-environment.
The tuple contains batched information for each sub-environment. The information is stored in the following order:
1. **Observations**: The observations from the environment.
2. **Rewards**: The rewards from the environment.
3. **Terminated Dones**: Whether the environment reached a terminal state, such as task success or robot falling etc.
4. **Timeout Dones**: Whether the environment reached a timeout state, such as end of max episode length.
5. **Extras**: A dictionary containing additional information from the environment.
"""
......@@ -72,40 +80,43 @@ class RLTaskEnv(BaseEnv, gym.Env):
is_vector_env: ClassVar[bool] = True
"""Whether the environment is a vectorized environment."""
metadata: ClassVar[dict[str, Any]] = {"render.modes": ["human", "rgb_array"]}
metadata: ClassVar[dict[str, Any]] = {
"render_modes": [None, "human", "rgb_array"],
"isaac_sim_version": get_version(),
}
"""Metadata for the environment."""
cfg: RLTaskEnvCfg
"""Configuration for the environment."""
def __init__(self, cfg: RLTaskEnvCfg, **kwargs):
def __init__(self, cfg: RLTaskEnvCfg, render_mode: str | None = None, **kwargs):
"""Initialize the environment.
Args:
cfg: The configuration for the environment.
render_mode: The render mode for the environment. Defaults to None, which
is similar to ``"human"``.
"""
# initialize the base class to setup the scene.
super().__init__(cfg=cfg)
# store the render mode
self.render_mode = render_mode
# initialize data and constants
# -- counter for curriculum
self.common_step_counter = 0
# -- init buffers
self.reset_buf = torch.ones(self.num_envs, device=self.device, dtype=torch.long)
self.reward_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.float)
self.episode_length_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.long)
# -- allocate dictionary to store metrics
self.extras = {}
# print the environment information
print("[INFO]: Completed setting up the environment...")
# setup the action and observation spaces for Gym
# -- observation space
self.observation_space = gym.spaces.Dict()
for group_name, group_dim in self.observation_manager.group_obs_dim.items():
self.observation_space[group_name] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=group_dim)
# -- action space (unbounded since we don't impose any limits)
action_dim = sum(self.action_manager.action_term_dim)
self.action_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(action_dim,))
self._configure_gym_env_spaces()
# perform randomization at the start of the simulation
if "startup" in self.randomization_manager.available_modes:
self.randomization_manager.randomize(mode="startup")
# print the environment information
print("[INFO]: Completed setting up the environment...")
"""
Properties.
......@@ -147,44 +158,54 @@ class RLTaskEnv(BaseEnv, gym.Env):
Operations - MDP
"""
def reset(self) -> VecEnvObs:
"""Resets all the environments and returns observations.
def reset(self, seed: int | None = None, options: dict[str, Any] | None = None) -> tuple[VecEnvObs, dict]:
"""Resets all the environments and returns observations and extras.
Note:
This function (if called) must **only** be called before the first call to :meth:`step`, i.e.
after the environment is created. After that, the :meth:`step` function handles the reset
of terminated sub-environments.
Args:
seed: The seed to use for randomization. Defaults to None, in which case the seed is not set.
options: Additional information to specify how the environment is reset. Defaults to None.
Note:
This is not used in the current implementation. It is mostly there for compatibility with
Gymnasium environment definition.
Returns:
Observations from the environment.
A tuple containing the observations and extras.
"""
# set the seed
if seed is not None:
gym.Env.reset(self, seed=seed)
self.seed(seed)
# reset state of scene
indices = torch.arange(self.num_envs, dtype=torch.int64, device=self.device)
self._reset_idx(indices)
# return observations
return self.observation_manager.compute()
return self.observation_manager.compute(), self.extras
def step(self, action: torch.Tensor) -> VecEnvStepReturn:
"""Apply actions on the environment and reset terminated environments.
"""Run one timestep of the environment's dynamics and reset terminated environments.
This function deals with various timeline events (play, pause and stop) for clean execution.
When the simulation is stopped all the physics handles expire and we cannot perform any read or
write operations. The timeline event is only detected after every `sim.step()` call. Hence, at
every call we need to check the status of the simulator. The logic is as follows:
The environment dynamics may comprise of many steps of the physics engine. The number of steps
is controlled by the :attr:`RLTaskEnvCfg.decimation` parameter in the configuration. This means
that the agent control can happen at a slower rate than the physics simulation. This is useful
for real-time control of the robot, where the control loop may be slower than the frequency of
the actual dynamics.
1. If the simulation is stopped, the environment is closed and the simulator is shutdown.
2. If the simulation is paused, we step the simulator until it is playing.
3. If the simulation is playing, we set the actions and step the simulator.
The function also handles resetting of the terminated environments, at the end of the physics
stepping and computation of the reward and terminated signals. This is because it is not
possible to reset the sub-environments individually due to the vectorized implementation
of sub-environments in the simulator.
Args:
action: Actions to apply on the simulator.
action: The actions to apply on the environment. Shape is ``(num_envs, action_dim)``.
Returns:
VecEnvStepReturn: A tuple containing:
- (VecEnvObs) observations from the environment
- (torch.Tensor) reward from the environment
- (torch.Tensor) whether the current episode is completed or not
- (dict) misc information
A tuple containing the observations, rewards, resets (terminated and truncated) and extras.
"""
# process actions
self.action_manager.process_action(action)
......@@ -206,13 +227,14 @@ class RLTaskEnv(BaseEnv, gym.Env):
# -- update env counters (used for curriculum generation)
self.episode_length_buf += 1 # step in current episode (per env)
self.common_step_counter += 1 # total step (common for all envs)
# compute MDP signals
# -- check terminations
self.reset_buf = self.termination_manager.compute().to(torch.long)
self.reset_buf = self.termination_manager.compute()
self.reset_terminated = self.termination_manager.terminated
self.reset_time_outs = self.termination_manager.time_outs
# -- reward computation
self.reward_buf = self.reward_manager.compute(dt=self.step_dt)
# -- reset envs that terminated and log the episode information
# -- reset envs that terminated/timed-out and log the episode information
reset_env_ids = self.reset_buf.nonzero(as_tuple=False).squeeze(-1)
if len(reset_env_ids) > 0:
self._reset_idx(reset_env_ids)
......@@ -221,11 +243,14 @@ class RLTaskEnv(BaseEnv, gym.Env):
# -- step interval randomization
if "interval" in self.randomization_manager.available_modes:
self.randomization_manager.randomize(mode="interval", dt=self.step_dt)
# -- compute observations
# note: done after reset to get the correct observations for reset envs
self.obs_buf = self.observation_manager.compute()
# return observations, rewards, resets and extras
return self.observation_manager.compute(), self.reward_buf, self.reset_buf, self.extras
return self.obs_buf, self.reward_buf, self.reset_terminated, self.reset_time_outs, self.extras
def render(self, mode: str = "human") -> np.ndarray | None:
def render(self) -> np.ndarray | None:
"""Run rendering without stepping through the physics.
By convention, if mode is:
......@@ -234,9 +259,6 @@ class RLTaskEnv(BaseEnv, gym.Env):
- **rgb_array**: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an
x-by-y pixel image, suitable for turning into a video.
Args:
mode: The mode to render with. Defaults to "human".
Returns:
The rendered image as a numpy array if mode is "rgb_array".
......@@ -249,15 +271,15 @@ class RLTaskEnv(BaseEnv, gym.Env):
# run a rendering step of the simulator
self.sim.render()
# decide the rendering mode
if mode == "human":
if self.render_mode == "human" or self.render_mode is None:
return None
elif mode == "rgb_array":
elif self.render_mode == "rgb_array":
# check that if any render could have happened
if self.sim.render_mode.value < self.sim.RenderMode.PARTIAL_RENDERING.value:
raise RuntimeError(
f"Cannot render '{mode}' when the simulation render mode is '{self.sim.render_mode.name}'."
f" Please set the simulation render mode to '{self.sim.RenderMode.PARTIAL_RENDERING.name}' or "
f" '{self.sim.RenderMode.FULL_RENDERING.name}'."
f"Cannot render '{self.render_mode}' when the simulation render mode is"
f" '{self.sim.render_mode.name}'. Please set the simulation render mode to:"
f"'{self.sim.RenderMode.PARTIAL_RENDERING.name}' or '{self.sim.RenderMode.FULL_RENDERING.name}'."
)
# create the annotator if it does not exist
if not hasattr(self, "_rgb_annotator"):
......@@ -282,7 +304,7 @@ class RLTaskEnv(BaseEnv, gym.Env):
return rgb_data[:, :, :3]
else:
raise NotImplementedError(
f"Render mode '{mode}' is not supported. Please use: {self.metadata['render.modes']}."
f"Render mode '{self.render_mode}' is not supported. Please use: {self.metadata['render_modes']}."
)
def close(self):
......@@ -296,9 +318,37 @@ class RLTaskEnv(BaseEnv, gym.Env):
super().close()
"""
Implementation specifics.
Helper functions.
"""
def _configure_gym_env_spaces(self):
"""Configure the action and observation spaces for the Gym environment."""
# observation space (unbounded since we don't impose any limits)
self.single_observation_space = gym.spaces.Dict()
for group_name, group_term_names in self.observation_manager.active_terms.items():
# extract quantities about the group
has_concatenated_obs = self.observation_manager.group_obs_concatenate[group_name]
group_dim = self.observation_manager.group_obs_dim[group_name]
group_term_dim = self.observation_manager.group_obs_term_dim[group_name]
# check if group is concatenated or not
# if not concatenated, then we need to add each term separately as a dictionary
if has_concatenated_obs:
self.single_observation_space[group_name] = gym.spaces.Box(low=-np.inf, high=np.inf, shape=group_dim)
else:
self.single_observation_space[group_name] = gym.spaces.Dict(
{
term_name: gym.spaces.Box(low=-np.inf, high=np.inf, shape=term_dim)
for term_name, term_dim in zip(group_term_names, group_term_dim)
}
)
# action space (unbounded since we don't impose any limits)
action_dim = sum(self.action_manager.action_term_dim)
self.single_action_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(action_dim,))
# batch the spaces for vectorized environments
self.observation_space = gym.vector.utils.batch_space(self.single_observation_space, self.num_envs)
self.action_space = gym.vector.utils.batch_space(self.single_action_space, self.num_envs)
def _reset_idx(self, env_ids: Sequence[int]):
"""Reset environments based on specified indices.
......@@ -341,6 +391,3 @@ class RLTaskEnv(BaseEnv, gym.Env):
# reset the episode length buffer
self.episode_length_buf[env_ids] = 0
# -- add information to extra if timeout occurred due to episode length
# Note: this is used by algorithms like PPO where time-outs are handled differently
self.extras["time_outs"] = self.termination_manager.time_outs
......@@ -91,6 +91,11 @@ class ObservationManager(ManagerBase):
"""Shape of observation tensor for each term in each group."""
return self._group_obs_term_dim
@property
def group_obs_concatenate(self) -> dict[str, bool]:
"""Whether the observation terms are concatenated in each group."""
return self._group_obs_concatenate
"""
Operations.
"""
......
......@@ -26,8 +26,20 @@ class TerminationManager(ManagerBase):
argument and returns a boolean tensor of shape ``(num_envs,)``. The termination manager
computes the termination signal as the union (logical or) of all the termination terms.
Following the `Gymnasium API <https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/>`_,
the termination signal is computed as the logical OR of the following signals:
* **Time-out**: This signal is set to true if the environment has ended after an externally defined condition
(that is outside the scope of a MDP). For example, the environment may be terminated if the episode has
timed out (i.e. reached max episode length).
* **Terminated**: This signal is set to true if the environment has reached a terminal state defined by the
environment. This state may correspond to task success, task failure, robot falling, etc.
These signals can be individually accessed using the :attr:`time_outs` and :attr:`terminated` properties.
The termination terms are parsed from a config class containing the manager's settings and each term's
parameters. Each termination term should instantiate the :class:`TerminationTermCfg` class.
parameters. Each termination term should instantiate the :class:`TerminationTermCfg` class. The term's
configuration :attr:`TerminationTermCfg.time_out` decides whether the term is a timeout or a termination term.
"""
_env: RLTaskEnv
......@@ -46,8 +58,8 @@ class TerminationManager(ManagerBase):
for term_name in self._term_names:
self._episode_dones[term_name] = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
# create buffer for managing termination per environment
self._done_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._time_out_buf = torch.zeros_like(self._done_buf)
self._truncated_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.bool)
self._terminated_buf = torch.zeros_like(self._truncated_buf)
def __str__(self) -> str:
"""Returns: A string representation for termination manager."""
......@@ -79,12 +91,26 @@ class TerminationManager(ManagerBase):
@property
def dones(self) -> torch.Tensor:
"""The net termination signal. Shape is ``(num_envs,)``."""
return self._done_buf
return self._truncated_buf | self._terminated_buf
@property
def time_outs(self) -> torch.Tensor:
"""The timeout signal. Shape is ``(num_envs,)``."""
return self._time_out_buf
"""The timeout signal (reaching max episode length). Shape is ``(num_envs,)``.
This signal is set to true if the environment has ended after an externally defined condition
(that is outside the scope of a MDP). For example, the environment may be terminated if the episode has
timed out (i.e. reached max episode length).
"""
return self._truncated_buf
@property
def terminated(self) -> torch.Tensor:
"""The terminated signal (reaching a terminal state). Shape is ``(num_envs,)``.
This signal is set to true if the environment has reached a terminal state defined by the environment.
This state may correspond to task success, task failure, robot falling, etc.
"""
return self._terminated_buf
"""
Operations.
......@@ -122,20 +148,20 @@ class TerminationManager(ManagerBase):
The combined termination signal of shape ``(num_envs,)``.
"""
# reset computation
self._done_buf[:] = False
self._time_out_buf[:] = False
self._truncated_buf[:] = False
self._terminated_buf[:] = False
# iterate over all the termination terms
for name, term_cfg in zip(self._term_names, self._term_cfgs):
value = term_cfg.func(self._env, **term_cfg.params)
# update total termination
self._done_buf |= value
# store timeout signal separately
if term_cfg.time_out:
self._time_out_buf |= value
self._truncated_buf |= value
else:
self._terminated_buf |= value
# add to episode dones
self._episode_dones[name] |= value
# return termination signal
return self._done_buf
# return combined termination signal
return self._truncated_buf | self._terminated_buf
"""
Operations - Term settings.
......
......@@ -292,13 +292,13 @@ class SimulationContext(_SimulationContext):
# hide the viewport and disable updates
self._viewport_context.updates_enabled = False # pyright: ignore [reportOptionalMemberAccess]
self._viewport_window.visible = False # pyright: ignore [reportOptionalMemberAccess]
# reset the throttle counter
self._render_throttle_counter = 0
elif mode == self.RenderMode.NO_RENDERING:
# hide the viewport and disable updates
if self._viewport_context is not None:
self._viewport_context.updates_enabled = False # pyright: ignore [reportOptionalMemberAccess]
self._viewport_window.visible = False # pyright: ignore [reportOptionalMemberAccess]
# reset the throttle counter
self._render_throttle_counter = 0
else:
raise ValueError(f"Unsupported render mode: {mode}! Please check `RenderMode` for details.")
# update render mode
......@@ -403,14 +403,21 @@ class SimulationContext(_SimulationContext):
self._render_throttle_counter += 1
if self._render_throttle_counter % self._render_throttle_period == 0:
self._render_throttle_counter = 0
# here we don't render viewport so don't need to flush flatcache
super().render()
# here we don't render viewport so don't need to flush fabric data
# note: we don't call super().render() anymore because they do flush the fabric data
self.set_setting("/app/player/playSimulations", False)
self._app.update()
self.set_setting("/app/player/playSimulations", True)
else:
# manually flush the flatcache data to update Hydra textures
# manually flush the fabric data to update Hydra textures
if self._fabric_iface is not None:
self._fabric_iface.update(0.0, 0.0)
# render the simulation
super().render()
# note: we don't call super().render() anymore because they do above operation inside
# and we don't want to do it twice. We may remove it once we drop support for Isaac Sim 2022.2.
self.set_setting("/app/player/playSimulations", False)
self._app.update()
self.set_setting("/app/player/playSimulations", True)
"""
Operations - Override (extension)
......
......@@ -25,18 +25,17 @@ INSTALL_REQUIRES = [
# devices
"hidapi",
# gym
"gym==0.21.0",
"importlib-metadata~=4.13.0",
"setuptools<=66", # setuptools 67.0 breaks gym
"gymnasium==0.29.0",
# procedural-generation
"trimesh",
"pyglet==1.5.27", # pyglet 2.0 requires python 3.8
"pyglet==1.5.27; python_version < '3.8'", # pyglet 2.0 requires python 3.8
"pyglet; python_version >= '3.8'",
]
# Installation operation
setup(
name="omni-isaac-orbit",
author="NVIDIA, ETH Zurich, and University of Toronto",
author="ORBIT Project Developers",
maintainer="Mayank Mittal",
maintainer_email="mittalma@ethz.ch",
url=EXTENSION_TOML_DATA["package"]["repository"],
......@@ -48,6 +47,10 @@ setup(
python_requires=">=3.7",
install_requires=INSTALL_REQUIRES,
packages=["omni.isaac.orbit"],
classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
classifiers=[
"Natural Language :: English",
"Programming Language :: Python :: 3.10",
"Isaac Sim :: 2023.1.0-hotfix.1",
],
zip_safe=False,
)
......@@ -6,6 +6,7 @@
from __future__ import annotations
import torch
import torch.utils.benchmark as benchmark
import unittest
......@@ -124,6 +125,30 @@ class TestTorchOperations(unittest.TestCase):
my_slice = my_tensor[torch.tensor([0, 1]), ...]
self.assertNotEqual(my_slice.untyped_storage().data_ptr(), my_tensor.untyped_storage().data_ptr())
def test_logical_or(self):
"""Test bitwise or operation."""
size = (400, 300, 5)
my_tensor_1 = torch.rand(size, device="cuda:0") > 0.5
my_tensor_2 = torch.rand(size, device="cuda:0") < 0.5
# check the speed of logical or
timer_logical_or = benchmark.Timer(
stmt="torch.logical_or(my_tensor_1, my_tensor_2)",
globals={"my_tensor_1": my_tensor_1, "my_tensor_2": my_tensor_2},
)
timer_bitwise_or = benchmark.Timer(
stmt="my_tensor_1 | my_tensor_2", globals={"my_tensor_1": my_tensor_1, "my_tensor_2": my_tensor_2}
)
print("Time for logical or:", timer_logical_or.timeit(number=1000))
print("Time for bitwise or:", timer_bitwise_or.timeit(number=1000))
# check that logical or works as expected
output_logical_or = torch.logical_or(my_tensor_1, my_tensor_2)
output_bitwise_or = my_tensor_1 | my_tensor_2
self.assertTrue(torch.allclose(output_logical_or, output_bitwise_or))
if __name__ == "__main__":
unittest.main()
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.5.0"
version = "0.5.1"
# Description
title = "ORBIT Environments"
......
Changelog
---------
0.5.1 (2023-11-04)
~~~~~~~~~~~~~~~~~~
Fixed
^^^^^
* Fixed the wrappers to different learning frameworks to use the new :class:`omni.isaac.orbit_tasks.RLTaskEnv` class.
The :class:`RLTaskEnv` class inherits from the :class:`gymnasium.Env` class (Gym 0.29.0).
* Fixed the registration of tasks in the Gym registry based on Gym 0.29.0 API.
Changed
^^^^^^^
* Removed the inheritance of all the RL-framework specific wrappers from the :class:`gymnasium.Wrapper` class.
This is because the wrappers don't comply with the new Gym 0.29.0 API. The wrappers are now only inherit
from their respective RL-framework specific base classes.
0.5.0 (2023-10-30)
~~~~~~~~~~~~~~~~~~
......
......@@ -17,28 +17,31 @@ This looks like as follows:
omni/isaac/orbit_tasks/locomotion/
├── __init__.py
└── velocity
├── a1
│ └── flat_terrain_cfg.py
├── anymal_c
│ └── flat_terrain_cfg.py
├── config
│ └── anymal_c
│ ├── agent # <- this is where we store the learning agent configurations
│ ├── __init__.py # <- this is where we register the environment and configurations to gym registry
│ ├── flat_env_cfg.py
│ └── rough_env_cfg.py
├── __init__.py
├── velocity_cfg.py
└── velocity_env.py
└── velocity_env_cfg.py # <- this is the base task configuration
```
The environments are then registered in the `omni/isaac/orbit_tasks/__init__.py`:
The environments are then registered in the `omni/isaac/orbit_tasks/locomotion/velocity/config/anymal_c/__init__.py`:
```python
gym.register(
id="Isaac-Velocity-Anymal-C-v0",
entry_point="omni.isaac.orbit_tasks.locomotion.velocity:LocomotionEnv",
kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.locomotion.velocity.anymal_c.flat_terrain_cfg:FlatTerrainCfg"},
id="Isaac-Velocity-Rough-Anymal-C-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={"env_cfg_entry_point": f"{__name__}.rough_env_cfg:AnymalCRoughEnvCfg"},
)
gym.register(
id="Isaac-Velocity-A1-v0",
entry_point="omni.isaac.orbit_tasks.locomotion.velocity:LocomotionEnv",
kwargs={"cfg_entry_point": "omni.isaac.orbit_tasks.locomotion.velocity.a1.flat_terrain_cfg:FlatTerrainCfg"},
id="Isaac-Velocity-Flat-Anymal-C-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={"env_cfg_entry_point": f"{__name__}.flat_env_cfg:AnymalCFlatEnvCfg"},
)
```
......
......@@ -9,7 +9,7 @@
We use OpenAI Gym registry to register the environment and their default configuration file.
The default configuration file is passed to the argument "kwargs" in the Gym specification registry.
The string is parsed into respective configuration container which needs to be passed to the environment
class. This is done using the function :meth:`load_default_env_cfg` in the sub-module
class. This is done using the function :meth:`load_cfg_from_registry` in the sub-module
:mod:`omni.isaac.orbit.utils.parse_cfg`.
Note:
......@@ -18,12 +18,12 @@ Note:
the kwarg argument :obj:`cfg` while creating the environment.
Usage:
>>> import gym
>>> import gymnasium as gym
>>> import omni.isaac.orbit_tasks
>>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_default_env_cfg
>>> from omni.isaac.orbit_tasks.utils.parse_cfg import load_cfg_from_registry
>>>
>>> task_name = "Isaac-Cartpole-v0"
>>> cfg = load_default_env_cfg(task_name)
>>> cfg = load_cfg_from_registry(task_name, "env_cfg_entry_point")
>>> env = gym.make(task_name, cfg=cfg)
"""
......
......@@ -7,7 +7,7 @@
Ant locomotion environment (similar to OpenAI Gym Ant-v2).
"""
import gym
import gymnasium as gym
from . import agents
......
......@@ -5,7 +5,7 @@
from __future__ import annotations
import gym.spaces
import gymnasium as gym
import math
import torch
......
......@@ -7,7 +7,7 @@
Cartpole balancing environment.
"""
import gym
import gymnasium as gym
from . import agents
......
......@@ -5,7 +5,7 @@
from __future__ import annotations
import gym.spaces
import gymnasium as gym
import math
import torch
......
......@@ -7,7 +7,7 @@
Humanoid locomotion environment (similar to OpenAI Gym Humanoid-v2).
"""
import gym
import gymnasium as gym
from . import agents
......
......@@ -5,7 +5,7 @@
from __future__ import annotations
import gym.spaces
import gymnasium as gym
import math
import torch
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
import gym
import gymnasium as gym
from . import agents, flat_env_cfg, rough_env_cfg
......@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
gym.register(
id="Isaac-Velocity-Flat-Anymal-B-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalBFlatEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBFlatPPORunnerCfg,
......@@ -23,6 +24,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Flat-Anymal-B-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalBFlatEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBFlatPPORunnerCfg,
......@@ -32,6 +34,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-B-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalBRoughEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBRoughPPORunnerCfg,
......@@ -41,6 +44,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-B-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalBRoughEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalBRoughPPORunnerCfg,
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
import gym
import gymnasium as gym
from . import agents, flat_env_cfg, rough_env_cfg
......@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
gym.register(
id="Isaac-Velocity-Flat-Anymal-C-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalCFlatEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCFlatPPORunnerCfg,
......@@ -24,6 +25,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Flat-Anymal-C-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalCFlatEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCFlatPPORunnerCfg,
......@@ -33,6 +35,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-C-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalCRoughEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCRoughPPORunnerCfg,
......@@ -42,6 +45,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-C-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalCRoughEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalCRoughPPORunnerCfg,
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
import gym
import gymnasium as gym
from . import agents, flat_env_cfg, rough_env_cfg
......@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
gym.register(
id="Isaac-Velocity-Flat-Anymal-D-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalDFlatEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDFlatPPORunnerCfg,
......@@ -23,6 +24,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Flat-Anymal-D-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.AnymalDFlatEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDFlatPPORunnerCfg,
......@@ -32,6 +34,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-D-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalDRoughEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDRoughPPORunnerCfg,
......@@ -41,6 +44,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Anymal-D-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.AnymalDRoughEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.AnymalDRoughPPORunnerCfg,
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
import gym
import gymnasium as gym
from . import agents, flat_env_cfg, rough_env_cfg
......@@ -14,6 +14,7 @@ from . import agents, flat_env_cfg, rough_env_cfg
gym.register(
id="Isaac-Velocity-Flat-Unitree-A1-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.UnitreeA1FlatEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1FlatPPORunnerCfg,
......@@ -23,6 +24,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Flat-Unitree-A1-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": flat_env_cfg.UnitreeA1FlatEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1FlatPPORunnerCfg,
......@@ -32,6 +34,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Unitree-A1-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.UnitreeA1RoughEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1RoughPPORunnerCfg,
......@@ -41,6 +44,7 @@ gym.register(
gym.register(
id="Isaac-Velocity-Rough-Unitree-A1-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": rough_env_cfg.UnitreeA1RoughEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.UnitreeA1RoughPPORunnerCfg,
......
......@@ -65,7 +65,7 @@ class MySceneCfg(InteractiveSceneCfg):
offset=RayCasterCfg.OffsetCfg(pos=(0.0, 0.0, 20.0)),
attach_yaw_only=True,
pattern_cfg=patterns.GridPatternCfg(resolution=0.1, size=[1.6, 1.0]),
debug_vis=True,
debug_vis=False,
mesh_prim_paths=["/World/ground"],
)
contact_forces = ContactSensorCfg(prim_path="{ENV_REGEX_NS}/Robot/.*", history_length=3, track_air_time=True)
......
......@@ -7,7 +7,7 @@
Environment for lifting an object with fixed-base robot.
"""
import gym
import gymnasium as gym
from . import agents
......@@ -18,6 +18,7 @@ from . import agents
gym.register(
id="Isaac-Lift-Franka-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.lift_env_cfg:LiftEnvCfg",
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
......
......@@ -5,7 +5,7 @@
from __future__ import annotations
import gym.spaces
import gymnasium as gym
import math
import torch
......
......@@ -5,7 +5,7 @@
"""Environment for end-effector pose tracking task for fixed-arm robots."""
import gym
import gymnasium as gym
from . import agents
......@@ -16,6 +16,7 @@ from . import agents
gym.register(
id="Isaac-Reach-Franka-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.reach_env_cfg:ReachEnvCfg",
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
......
......@@ -5,7 +5,7 @@
from __future__ import annotations
import gym.spaces
import gymnasium as gym
import math
import torch
......
......@@ -7,7 +7,7 @@
from __future__ import annotations
import gym
import gymnasium as gym
import importlib
import inspect
import os
......@@ -52,7 +52,7 @@ def load_cfg_from_registry(task_name: str, entry_point_key: str) -> dict | Any:
ValueError: If the entry point key is not available in the gym registry for the task.
"""
# obtain the configuration entry point
cfg_entry_point = gym.spec(task_name)._kwargs.pop(entry_point_key)
cfg_entry_point = gym.spec(task_name).kwargs.pop(entry_point_key)
# check if entry point exists
if cfg_entry_point is None:
raise ValueError(
......
......@@ -33,7 +33,7 @@ for RL-Games :class:`Runner` class:
from __future__ import annotations
import gym
import gymnasium as gym
import torch
from rl_games.common import env_configurations
......@@ -49,10 +49,10 @@ Vectorized environment wrapper.
"""
class RlGamesVecEnvWrapper(gym.Wrapper):
"""Wraps around Isaac Orbit environment for RL-Games.
class RlGamesVecEnvWrapper(IVecEnv):
"""Wraps around Orbit environment for RL-Games.
This class wraps around the Isaac Orbit environment. Since RL-Games works directly on
This class wraps around the Orbit environment. Since RL-Games works directly on
GPU buffers, the wrapper handles moving of buffers from the simulation environment
to the same device as the learning agent. Additionally, it performs clipping of
observations and actions.
......@@ -69,6 +69,13 @@ class RlGamesVecEnvWrapper(gym.Wrapper):
checks if these attributes exist. If they don't then the wrapper defaults to zero as number
of privileged observations.
.. caution::
This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
wrapper.
Reference:
https://github.com/Denys88/rl_games/blob/master/rl_games/common/ivecenv.py
https://github.com/NVIDIA-Omniverse/IsaacGymEnvs
......@@ -85,30 +92,77 @@ class RlGamesVecEnvWrapper(gym.Wrapper):
Raises:
ValueError: The environment is not inherited from :class:`RLTaskEnv`.
ValueError: If specified, the privileged observations (critic) are not of type :obj:`gym.spaces.Box`.
"""
# check that input is valid
if not isinstance(env.unwrapped, RLTaskEnv):
raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
# initialize gym wrapper
gym.Wrapper.__init__(self, env)
# initialize rl-games vec-env
IVecEnv.__init__(self)
# initialize the wrapper
self.env = env
# store provided arguments
self._rl_device = rl_device
self._clip_obs = clip_obs
self._clip_actions = clip_actions
self._sim_device = env.unwrapped.device
# information about spaces for the wrapper
self.observation_space = self.env.observation_space
self.action_space = self.env.action_space
# note: rl-games only wants single observation and action spaces
self.rlg_observation_space = self.unwrapped.single_observation_space["policy"]
self.rlg_action_space = self.unwrapped.single_action_space
# information for privileged observations
self.state_space = getattr(self.env, "state_space", None)
self.num_states = getattr(self.env, "num_states", 0)
# print information about wrapper
print("[INFO]: RL-Games Environment Wrapper:")
print(f"\t\t Observations clipping: {clip_obs}")
print(f"\t\t Actions clipping : {clip_actions}")
print(f"\t\t Agent device : {rl_device}")
print(f"\t\t Asymmetric-learning : {self.num_states != 0}")
self.rlg_state_space = self.unwrapped.single_observation_space.get("critic")
if self.rlg_state_space is not None:
if not isinstance(self.rlg_state_space, gym.spaces.Box):
raise ValueError(f"Privileged observations must be of type Box. Type: {type(self.rlg_state_space)}")
self.rlg_num_states = self.rlg_state_space.shape[0]
else:
self.rlg_num_states = 0
def __str__(self):
"""Returns the wrapper name and the :attr:`env` representation string."""
return (
f"<{type(self).__name__}{self.env}>"
f"\n\tObservations clipping: {self._clip_obs}"
f"\n\tActions clipping : {self._clip_actions}"
f"\n\tAgent device : {self._rl_device}"
f"\n\tAsymmetric-learning : {self.rlg_num_states != 0}"
)
def __repr__(self):
"""Returns the string representation of the wrapper."""
return str(self)
"""
Properties -- Gym.Wrapper
"""
@property
def render_mode(self) -> str | None:
"""Returns the :attr:`Env` :attr:`render_mode`."""
return self.env.render_mode
@property
def observation_space(self) -> gym.Space:
"""Returns the :attr:`Env` :attr:`observation_space`."""
return self.env.observation_space
@property
def action_space(self) -> gym.Space:
"""Returns the :attr:`Env` :attr:`action_space`."""
return self.env.action_space
@classmethod
def class_name(cls) -> str:
"""Returns the class name of the wrapper."""
return cls.__name__
@property
def unwrapped(self) -> RLTaskEnv:
"""Returns the base environment of the wrapper.
This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
"""
return self.env.unwrapped
"""
Properties
......@@ -120,40 +174,46 @@ class RlGamesVecEnvWrapper(gym.Wrapper):
def get_env_info(self) -> dict:
"""Returns the Gym spaces for the environment."""
# fill the env info dict
env_info = {"observation_space": self.observation_space, "action_space": self.action_space}
# add information about privileged observations space
if self.num_states > 0:
env_info["state_space"] = self.state_space
return env_info
return {
"observation_space": self.rlg_observation_space,
"action_space": self.rlg_action_space,
"state_space": self.rlg_state_space,
}
"""
Operations - MDP
"""
def seed(self, seed: int = -1) -> int: # noqa: D102
return self.unwrapped.seed(seed)
def reset(self): # noqa: D102
obs_dict = self.env.reset()
obs_dict, _ = self.env.reset()
# process observations and states
return self._process_obs(obs_dict)
def step(self, actions): # noqa: D102
# move actions to sim-device
actions = actions.detach().clone().to(device=self._sim_device)
# clip the actions
actions = torch.clamp(actions.clone(), -self._clip_actions, self._clip_actions)
actions = torch.clamp(actions, -self._clip_actions, self._clip_actions)
# perform environment step
obs_dict, rew, dones, extras = self.env.step(actions)
obs_dict, rew, terminated, truncated, extras = self.env.step(actions)
# process observations and states
obs_and_states = self._process_obs(obs_dict)
# move buffers to rl-device
# note: we perform clone to prevent issues when rl-device and sim-device are the same.
rew = rew.to(self._rl_device)
dones = dones.to(self._rl_device)
rew = rew.to(device=self._rl_device)
dones = (terminated | truncated).to(device=self._rl_device)
extras = {
k: v.to(device=self._rl_device, non_blocking=True) if hasattr(v, "to") else v for k, v in extras.items()
}
return obs_and_states, rew, dones, extras
def close(self): # noqa: D102
return self.env.close()
"""
Helper functions
"""
......@@ -163,34 +223,29 @@ class RlGamesVecEnvWrapper(gym.Wrapper):
Note:
States typically refers to privileged observations for the critic function. It is typically used in
asymmetric actor-critic algorithms [1].
asymmetric actor-critic algorithms.
Args:
obs: The current observations from environment.
obs_dict: The current observations from environment.
Returns:
If environment provides states, then a dictionary
containing the observations and states is returned. Otherwise just the observations tensor
is returned.
Reference:
1. Pinto, Lerrel, et al. "Asymmetric actor critic for image-based robot learning."
arXiv preprint arXiv:1710.06542 (2017).
If environment provides states, then a dictionary containing the observations and states is returned.
Otherwise just the observations tensor is returned.
"""
# process policy obs
obs = obs_dict["policy"]
# clip the observations
obs = torch.clamp(obs, -self._clip_obs, self._clip_obs)
# move the buffer to rl-device
obs = obs.to(self._rl_device).clone()
obs = obs.to(device=self._rl_device).clone()
# check if asymmetric actor-critic or not
if self.num_states > 0:
if self.rlg_num_states > 0:
# acquire states from the environment if it exists
try:
states = obs_dict["critic"]
except AttributeError:
raise NotImplementedError("Environment does not define key `critic` for privileged observations.")
raise NotImplementedError("Environment does not define key 'critic' for privileged observations.")
# clip the states
states = torch.clamp(states, -self._clip_obs, self._clip_obs)
# move buffers to rl-device
......
......@@ -17,22 +17,28 @@ The following example shows how to wrap an environment for RSL-RL:
from __future__ import annotations
import gym
import gym.spaces
import gymnasium as gym
import torch
from rsl_rl.env import VecEnv
from omni.isaac.orbit.envs import RLTaskEnv
class RslRlVecEnvWrapper(gym.Wrapper):
"""Wraps around Isaac Orbit environment for RSL-RL library
class RslRlVecEnvWrapper(VecEnv):
"""Wraps around Orbit environment for RSL-RL library
To use asymmetric actor-critic, the environment instance must have the attributes :attr:`num_privileged_obs` (int).
This is used by the learning agent to allocate buffers in the trajectory memory. Additionally, the returned
observations should have the key "critic" which corresponds to the privileged observations. Since this is
optional for some environments, the wrapper checks if these attributes exist. If they don't then the wrapper
defaults to zero as number of privileged observations.
.. caution::
To use asymmetric actor-critic, the environment instance must have the attributes :attr:`num_states` (int)
and :attr:`state_space` (:obj:`gym.spaces.Box`). These are used by the learning agent to allocate buffers in
the trajectory memory. Additionally, the method :meth:`_get_observations()` should have the key "critic"
which corresponds to the privileged observations. Since this is optional for some environments, the wrapper
checks if these attributes exist. If they don't then the wrapper defaults to zero as number of privileged
observations.
This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
wrapper.
Reference:
https://github.com/leggedrobotics/rsl_rl/blob/master/rsl_rl/env/vec_env.py
......@@ -41,6 +47,9 @@ class RslRlVecEnvWrapper(gym.Wrapper):
def __init__(self, env: RLTaskEnv):
"""Initializes the wrapper.
Note:
The wrapper calls :meth:`reset` at the start since the RSL-RL runner does not call reset.
Args:
env: The environment to wrap around.
......@@ -51,28 +60,74 @@ class RslRlVecEnvWrapper(gym.Wrapper):
if not isinstance(env.unwrapped, RLTaskEnv):
raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
# initialize the wrapper
gym.Wrapper.__init__(self, env)
self.env = env
# store information required by wrapper
orbit_env: RLTaskEnv = self.env.unwrapped
self.num_envs = orbit_env.num_envs
self.num_actions = orbit_env.action_manager.total_action_dim
self.num_obs = orbit_env.observation_manager.group_obs_dim["policy"][0]
self.num_envs = self.unwrapped.num_envs
self.device = self.unwrapped.device
self.max_episode_length = self.unwrapped.max_episode_length
self.num_actions = self.unwrapped.action_manager.total_action_dim
self.num_obs = self.unwrapped.observation_manager.group_obs_dim["policy"][0]
# -- privileged observations
if "critic" in self.unwrapped.observation_manager.group_obs_dim:
self.num_privileged_obs = self.unwrapped.observation_manager.group_obs_dim["critic"][0]
else:
self.num_privileged_obs = 0
# reset at the start since the RSL-RL runner does not call reset
self.env.reset()
def __str__(self):
"""Returns the wrapper name and the :attr:`env` representation string."""
return f"<{type(self).__name__}{self.env}>"
def __repr__(self):
"""Returns the string representation of the wrapper."""
return str(self)
"""
Properties -- Gym.Wrapper
"""
@property
def render_mode(self) -> str | None:
"""Returns the :attr:`Env` :attr:`render_mode`."""
return self.env.render_mode
@property
def observation_space(self) -> gym.Space:
"""Returns the :attr:`Env` :attr:`observation_space`."""
return self.env.observation_space
@property
def action_space(self) -> gym.Space:
"""Returns the :attr:`Env` :attr:`action_space`."""
return self.env.action_space
@classmethod
def class_name(cls) -> str:
"""Returns the class name of the wrapper."""
return cls.__name__
@property
def unwrapped(self) -> RLTaskEnv:
"""Returns the base environment of the wrapper.
This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
"""
return self.env.unwrapped
"""
Properties
"""
def get_observations(self) -> torch.Tensor:
def get_observations(self) -> tuple[torch.Tensor, dict]:
"""Returns the current observations of the environment."""
obs_dict = self.env.unwrapped.observation_manager.compute()
obs_dict = self.unwrapped.observation_manager.compute()
return obs_dict["policy"], {"observations": obs_dict}
@property
def episode_length_buf(self) -> torch.Tensor:
"""The episode length buffer."""
return self.env.unwrapped.episode_length_buf
return self.unwrapped.episode_length_buf
@episode_length_buf.setter
def episode_length_buf(self, value: torch.Tensor):
......@@ -80,22 +135,34 @@ class RslRlVecEnvWrapper(gym.Wrapper):
Note: This is needed to perform random initialization of episode lengths in RSL-RL.
"""
self.env.unwrapped.episode_length_buf = value
self.unwrapped.episode_length_buf = value
"""
Operations - MDP
"""
def reset(self) -> tuple[torch.Tensor, dict]:
def seed(self, seed: int = -1) -> int: # noqa: D102
return self.unwrapped.seed(seed)
def reset(self) -> tuple[torch.Tensor, dict]: # noqa: D102
# reset the environment
obs_dict = self.env.reset()
obs_dict, _ = self.env.reset()
# return observations
return obs_dict["policy"], {"observations": obs_dict}
def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor, torch.Tensor, dict]:
# record step information
obs_dict, rew, dones, extras = self.env.step(actions)
# return step information
obs_dict, rew, terminated, truncated, extras = self.env.step(actions)
# compute dones for compatibility with RSL-RL
dones = (terminated | truncated).to(dtype=torch.long)
# move extra observations to the extras dict
obs = obs_dict["policy"]
extras["observations"] = obs_dict
# move time out information to the extras dict
extras["time_outs"] = truncated
# return the step information
return obs, rew, dones, extras
def close(self): # noqa: D102
return self.env.close()
......@@ -17,7 +17,6 @@ The following example shows how to wrap an environment for Stable-Baselines3:
from __future__ import annotations
import gym
import numpy as np
import torch
from typing import Any
......@@ -65,8 +64,8 @@ Vectorized environment wrapper.
"""
class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
"""Wraps around Isaac Orbit environment for Stable Baselines3.
class Sb3VecEnvWrapper(VecEnv):
"""Wraps around Orbit environment for Stable Baselines3.
Isaac Sim internally implements a vectorized environment. However, since it is
still considered a single environment instance, Stable Baselines tries to wrap
......@@ -74,10 +73,15 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
is not inheriting from their :class:`VecEnv`. Thus, this class thinly wraps
over the environment from :class:`RLTaskEnv`.
Note:
While Stable-Baselines3 supports Gym 0.26+ API, their vectorized environment
still uses the old API (i.e. it is closer to Gym 0.21). Thus, we implement
the old API for the vectorized environment.
We also add monitoring functionality that computes the un-discounted episode
return and length. This information is added to the info dicts under key `episode`.
In contrast to Isaac Orbit environment, stable-baselines expect the following:
In contrast to the Orbit environment, stable-baselines expect the following:
1. numpy datatype for MDP signals
2. a list of info dicts for each sub-environment (instead of a dict)
......@@ -85,16 +89,24 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
to the one after reset. The "real" final observation is passed using the info dicts
under the key ``terminal_observation``.
Warning:
.. warning::
By the nature of physics stepping in Isaac Sim, it is not possible to forward the
simulation buffers without performing a physics step. Thus, reset is performed only
at the start of :meth:`step()` function before the actual physics step is taken.
Thus, the returned observations for terminated environments is still the final
observation and not the ones after the reset.
simulation buffers without performing a physics step. Thus, reset is performed
inside the :meth:`step()` function after the actual physics step is taken.
Thus, the returned observations for terminated environments is the one after the reset.
.. caution::
This class must be the last wrapper in the wrapper chain. This is because the wrapper does not follow
the :class:`gym.Wrapper` interface. Any subsequent wrappers will need to be modified to work with this
wrapper.
Reference:
https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html
https://stable-baselines3.readthedocs.io/en/master/common/monitor.html
1. https://stable-baselines3.readthedocs.io/en/master/guide/vec_envs.html
2. https://stable-baselines3.readthedocs.io/en/master/common/monitor.html
"""
def __init__(self, env: RLTaskEnv):
......@@ -110,12 +122,43 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
if not isinstance(env.unwrapped, RLTaskEnv):
raise ValueError(f"The environment must be inherited from RLTaskEnv. Environment type: {type(env)}")
# initialize the wrapper
gym.Wrapper.__init__(self, env)
self.env = env
# collect common information
self.num_envs = self.unwrapped.num_envs
self.sim_device = self.unwrapped.device
self.render_mode = self.unwrapped.render_mode
# initialize vec-env
VecEnv.__init__(self, self.env.num_envs, self.env.observation_space, self.env.action_space)
observation_space = self.unwrapped.single_observation_space["policy"]
action_space = self.unwrapped.single_action_space
VecEnv.__init__(self, self.num_envs, observation_space, action_space)
# add buffer for logging episodic information
self._ep_rew_buf = torch.zeros(self.env.num_envs, dtype=torch.float, device=self.env.device)
self._ep_len_buf = torch.zeros(self.env.num_envs, dtype=torch.float, device=self.env.device)
self._ep_rew_buf = torch.zeros(self.num_envs, device=self.sim_device)
self._ep_len_buf = torch.zeros(self.num_envs, device=self.sim_device)
def __str__(self):
"""Returns the wrapper name and the :attr:`env` representation string."""
return f"<{type(self).__name__}{self.env}>"
def __repr__(self):
"""Returns the string representation of the wrapper."""
return str(self)
"""
Properties -- Gym.Wrapper
"""
@classmethod
def class_name(cls) -> str:
"""Returns the class name of the wrapper."""
return cls.__name__
@property
def unwrapped(self) -> RLTaskEnv:
"""Returns the base environment of the wrapper.
This will be the bare :class:`gymnasium.Env` environment, underneath all layers of wrappers.
"""
return self.env.unwrapped
"""
Properties
......@@ -133,31 +176,43 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
Operations - MDP
"""
def seed(self, seed: int | None = None) -> list[int | None]: # noqa: D102
return [self.unwrapped.seed(seed)] * self.unwrapped.num_envs
def reset(self) -> VecEnvObs: # noqa: D102
obs_dict = self.env.reset()
obs_dict, _ = self.env.reset()
# convert data types to numpy depending on backend
return self._process_obs(obs_dict)
def step(self, actions: np.ndarray) -> VecEnvStepReturn: # noqa: D102
def step_async(self, actions): # noqa: D102
# convert input to numpy array
actions = np.asarray(actions)
if not isinstance(actions, torch.Tensor):
actions = np.asarray(actions)
actions = torch.from_numpy(actions).to(device=self.sim_device, dtype=torch.float32)
else:
actions = actions.to(device=self.sim_device, dtype=torch.float32)
# convert to tensor
actions = torch.from_numpy(actions).to(device=self.env.device)
# record step information
obs_dict, rew, dones, extras = self.env.step(actions)
self._async_actions = actions
def step_wait(self) -> VecEnvStepReturn: # noqa: D102
# record step information
obs_dict, rew, terminated, truncated, extras = self.env.step(self._async_actions)
# update episode un-discounted return and length
self._ep_rew_buf += rew
self._ep_len_buf += 1
# compute reset ids
dones = terminated | truncated
reset_ids = (dones > 0).nonzero(as_tuple=False)
# convert data types to numpy depending on backend
# Note: RLTaskEnv uses torch backend (by default).
obs = self._process_obs(obs_dict)
rew = rew.cpu().numpy()
dones = dones.cpu().numpy()
rew = rew.detach().cpu().numpy()
terminated = terminated.detach().cpu().numpy()
truncated = truncated.detach().cpu().numpy()
dones = dones.detach().cpu().numpy()
# convert extra information to list of dicts
infos = self._process_extras(obs, dones, extras, reset_ids)
infos = self._process_extras(obs, terminated, truncated, extras, reset_ids)
# reset info for terminated environments
self._ep_rew_buf[reset_ids] = 0
......@@ -165,59 +220,70 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
return obs, rew, dones, infos
"""
Unused methods.
"""
def step_async(self, actions): # noqa: D102
self._async_actions = actions
def close(self): # noqa: D102
self.env.close()
def step_wait(self): # noqa: D102
return self.step(self._async_actions)
def get_attr(self, attr_name, indices): # noqa: D102
raise NotImplementedError
def get_attr(self, attr_name, indices=None): # noqa: D102
# resolve indices
if indices is None:
indices = slice(None)
num_indices = self.num_envs
else:
num_indices = len(indices)
# obtain attribute value
attr_val = getattr(self.env, attr_name)
# return the value
if not isinstance(attr_val, torch.Tensor):
return [attr_val] * num_indices
else:
return attr_val[indices].detach().cpu().numpy()
def set_attr(self, attr_name, value, indices=None): # noqa: D102
raise NotImplementedError
raise NotImplementedError("Setting attributes is not supported.")
def env_method(self, method_name: str, *method_args, indices=None, **method_kwargs): # noqa: D102
raise NotImplementedError
if method_name == "render":
# gymnasium does not support changing render mode at runtime
return self.env.render()
else:
# this isn't properly implemented but it is not necessary.
# mostly done for completeness.
env_method = getattr(self.env, method_name)
return env_method(*method_args, indices=indices, **method_kwargs)
def env_is_wrapped(self, wrapper_class, indices=None): # noqa: D102
raise NotImplementedError
raise NotImplementedError("Checking if environment is wrapped is not supported.")
def get_images(self): # noqa: D102
raise NotImplementedError
raise NotImplementedError("Getting images is not supported.")
"""
Helper functions.
"""
def _process_obs(self, obs_dict) -> np.ndarray:
def _process_obs(self, obs_dict: torch.Tensor | dict[str, torch.Tensor]) -> np.ndarray | dict[str, np.ndarray]:
"""Convert observations into NumPy data type."""
# Sb3 doesn't support asymmetric observation spaces, so we only use "policy"
obs = obs_dict["policy"]
# Note: RLTaskEnv uses torch backend (by default).
if self.env.sim.backend == "torch":
if isinstance(obs, dict):
for key, value in obs.items():
obs[key] = value.detach().cpu().numpy()
else:
obs = obs.detach().cpu().numpy()
elif self.env.sim.backend == "numpy":
pass
if isinstance(obs, dict):
for key, value in obs.items():
obs[key] = value.detach().cpu().numpy()
elif isinstance(obs, torch.Tensor):
obs = obs.detach().cpu().numpy()
else:
raise NotImplementedError(f"Unsupported backend for simulation: {self.env.sim.backend}")
raise NotImplementedError(f"Unsupported data type: {type(obs)}")
return obs
def _process_extras(self, obs, dones, extras, reset_ids) -> list[dict[str, Any]]:
def _process_extras(
self, obs: np.ndarray, terminated: np.ndarray, truncated: np.ndarray, extras: dict, reset_ids: np.ndarray
) -> list[dict[str, Any]]:
"""Convert miscellaneous information into dictionary for each sub-environment."""
# create empty list of dictionaries to fill
infos: list[dict[str, Any]] = [dict.fromkeys(extras.keys()) for _ in range(self.env.num_envs)]
infos: list[dict[str, Any]] = [dict.fromkeys(extras.keys()) for _ in range(self.num_envs)]
# fill-in information for each sub-environment
# Note: This loop becomes slow when number of environments is large.
for idx in range(self.env.num_envs):
for idx in range(self.num_envs):
# fill-in episode monitoring info
if idx in reset_ids:
infos[idx]["episode"] = dict()
......@@ -225,14 +291,13 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
infos[idx]["episode"]["l"] = float(self._ep_len_buf[idx])
else:
infos[idx]["episode"] = None
# fill-in bootstrap information
infos[idx]["TimeLimit.truncated"] = truncated[idx] and not terminated[idx]
# fill-in information from extras
for key, value in extras.items():
# 1. remap the key for time-outs for what SB3 expects
# 2. remap extra episodes information safely
# 3. for others just store their values
if key == "time_outs":
infos[idx]["TimeLimit.truncated"] = bool(value[idx])
elif key == "episode":
# 1. remap extra episodes information safely
# 2. for others just store their values
if key == "log":
# only log this data for episodes that are terminated
if infos[idx]["episode"] is not None:
for sub_key, sub_value in value.items():
......@@ -240,7 +305,7 @@ class Sb3VecEnvWrapper(gym.Wrapper, VecEnv):
else:
infos[idx][key] = value[idx]
# add information about terminal observation separately
if dones[idx] == 1:
if idx in reset_ids:
# extract terminal observations
if isinstance(obs, dict):
terminal_obs = dict.fromkeys(obs.keys())
......
......@@ -93,9 +93,9 @@ Vectorized environment wrapper.
def SkrlVecEnvWrapper(env: RLTaskEnv):
"""Wraps around Isaac Orbit environment for skrl.
"""Wraps around Orbit environment for skrl.
This function wraps around the Isaac Orbit environment. Since the :class:`RLTaskEnv` environment
This function wraps around the Orbit environment. Since the :class:`RLTaskEnv` environment
wrapping functionality is defined within the skrl library itself, this implementation
is maintained for compatibility with the structure of the extension that contains it.
Internally it calls the :func:`wrap_env` from the skrl library API.
......
......@@ -22,18 +22,21 @@ INSTALL_REQUIRES = [
"numpy",
"torch",
"torchvision>=0.14.1", # ensure compatibility with torch 1.13.1
"protobuf==3.20.2",
"protobuf>=3.20.2",
# data collection
"h5py",
# basic logger
"tensorboard",
# video recording
"moviepy",
]
# Extra dependencies for RL agents
EXTRAS_REQUIRE = {
"sb3": ["stable-baselines3>=1.5,<=1.8", "tensorboard"],
"sb3": ["stable-baselines3>=2.0"],
"skrl": ["skrl>=0.10.0"],
"rl_games": ["rl-games==1.5.2"],
# TODO: Uncomment when rsl_rl is updated to public.
# "rsl_rl": ["rsl_rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
"rl_games": ["rl-games==1.6.1"],
"rsl_rl": ["rsl_rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
"robomimic": ["robomimic@git+https://github.com/ARISE-Initiative/robomimic.git"],
}
# cumulation of all extra-requires
......@@ -43,7 +46,7 @@ EXTRAS_REQUIRE["all"] = list(itertools.chain.from_iterable(EXTRAS_REQUIRE.values
# Installation operation
setup(
name="omni-isaac-orbit_tasks",
author="NVIDIA, ETH Zurich, and University of Toronto",
author="ORBIT Project Developers",
maintainer="Mayank Mittal",
maintainer_email="mittalma@ethz.ch",
url=EXTENSION_TOML_DATA["package"]["repository"],
......@@ -55,6 +58,10 @@ setup(
install_requires=INSTALL_REQUIRES,
extras_require=EXTRAS_REQUIRE,
packages=["omni.isaac.orbit_tasks"],
classifiers=["Natural Language :: English", "Programming Language :: Python :: 3.7"],
classifiers=[
"Natural Language :: English",
"Programming Language :: Python :: 3.10",
"Isaac Sim :: 2023.1.0-hotfix.1",
],
zip_safe=False,
)
......@@ -20,8 +20,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gym.envs
import gymnasium as gym
import torch
import traceback
import unittest
......@@ -42,7 +41,7 @@ class TestEnvironments(unittest.TestCase):
def setUpClass(cls):
# acquire all Isaac environments names
cls.registered_tasks = list()
for task_spec in gym.envs.registry.all():
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
cls.registered_tasks.append(task_spec.id)
# sort environments by name
......@@ -70,19 +69,20 @@ class TestEnvironments(unittest.TestCase):
env: RLTaskEnv = gym.make(task_name, cfg=env_cfg)
# reset environment
obs = env.reset()
obs, _ = env.reset()
# check signal
self.assertTrue(self._check_valid_tensor(obs))
# simulate environment for 1000 steps
for _ in range(1000):
# sample actions from -1 to 1
actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
# apply actions
transition = env.step(actions)
# check signals
for data in transition:
self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
with torch.inference_mode():
for _ in range(1000):
# sample actions from -1 to 1
actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
# apply actions
transition = env.step(actions)
# check signals
for data in transition:
self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
# close the environment
print(f">>> Closing environment: {task_name}")
......@@ -108,9 +108,9 @@ class TestEnvironments(unittest.TestCase):
valid_tensor = True
for value in data.values():
if isinstance(value, dict):
return TestEnvironments._check_valid_tensor(value)
valid_tensor &= TestEnvironments._check_valid_tensor(value)
elif isinstance(value, torch.Tensor):
valid_tensor = valid_tensor and not torch.any(torch.isnan(value))
valid_tensor &= not torch.any(torch.isnan(value))
return valid_tensor
else:
raise ValueError(f"Input data of invalid type: {type(data)}.")
......
......@@ -19,7 +19,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import os
import torch
import traceback
......@@ -42,7 +42,7 @@ class TestRecordVideoWrapper(unittest.TestCase):
def setUpClass(cls):
# acquire all Isaac environments names
cls.registered_tasks = list()
for task_spec in gym.envs.registry.all():
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
cls.registered_tasks.append(task_spec.id)
# sort environments by name
......@@ -73,25 +73,24 @@ class TestRecordVideoWrapper(unittest.TestCase):
env_cfg.sim.shutdown_app_on_stop = False
# create environment
env: RLTaskEnv = gym.make(task_name, cfg=env_cfg)
env: RLTaskEnv = gym.make(task_name, cfg=env_cfg, render_mode="rgb_array")
# directory to save videos
videos_dir = os.path.join(self.videos_dir, task_name)
# wrap environment to record videos
env = gym.wrappers.RecordVideo(
env, videos_dir, step_trigger=self.step_trigger, video_length=self.video_length
env, videos_dir, step_trigger=self.step_trigger, video_length=self.video_length, disable_logger=True
)
# reset environment
env.reset()
# simulate environment
for _ in range(500):
# compute zero actions
actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
# apply actions
_ = env.step(actions)
# render environment
env.render(mode="human")
with torch.inference_mode():
for _ in range(500):
# compute zero actions
actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
# apply actions
_ = env.step(actions)
# close the simulator
env.close()
......
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
"""Launch Isaac Sim Simulator first."""
import os
from omni.isaac.orbit.app import AppLauncher
# launch the simulator
app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
app_launcher = AppLauncher(headless=True, experience=app_experience)
simulation_app = app_launcher.app
"""Rest everything follows."""
import gymnasium as gym
import torch
import traceback
import unittest
import carb
import omni.usd
from omni.isaac.orbit.envs import RLTaskEnvCfg
import omni.isaac.orbit_tasks # noqa: F401
from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
from omni.isaac.orbit_tasks.utils.wrappers.rl_games import RlGamesVecEnvWrapper
class TestRlGamesVecEnvWrapper(unittest.TestCase):
"""Test that RL-Games VecEnv wrapper works as expected."""
@classmethod
def setUpClass(cls):
# acquire all Isaac environments names
cls.registered_tasks = list()
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
cls.registered_tasks.append(task_spec.id)
# sort environments by name
cls.registered_tasks.sort()
# only pick the first three environments to test
cls.registered_tasks = cls.registered_tasks[:3]
# print all existing task names
print(">>> All registered environments:", cls.registered_tasks)
def setUp(self) -> None:
# common parameters
self.num_envs = 512
self.use_gpu = True
def test_random_actions(self):
"""Run random actions and check environments return valid signals."""
for task_name in self.registered_tasks:
print(f">>> Running test for environment: {task_name}")
# create a new stage
omni.usd.get_context().new_stage()
# parse configuration
env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
# note: we don't want to shutdown the app on stop during the tests since we reload the stage
env_cfg.sim.shutdown_app_on_stop = False
# create environment
env = gym.make(task_name, cfg=env_cfg)
# wrap environment
env = RlGamesVecEnvWrapper(env, "cuda:0", 100, 100)
# reset environment
obs = env.reset()
# check signal
self.assertTrue(self._check_valid_tensor(obs))
# simulate environment for 100 steps
with torch.inference_mode():
for _ in range(100):
# sample actions from -1 to 1
actions = 2 * torch.rand(env.action_space.shape, device=env.device) - 1
# apply actions
transition = env.step(actions)
# check signals
for data in transition:
self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
# close the environment
print(f">>> Closing environment: {task_name}")
env.close()
"""
Helper functions.
"""
@staticmethod
def _check_valid_tensor(data: torch.Tensor | dict) -> bool:
"""Checks if given data does not have corrupted values.
Args:
data: Data buffer.
Returns:
True if the data is valid.
"""
if isinstance(data, torch.Tensor):
return not torch.any(torch.isnan(data))
elif isinstance(data, dict):
valid_tensor = True
for value in data.values():
if isinstance(value, dict):
valid_tensor &= TestRlGamesVecEnvWrapper._check_valid_tensor(value)
elif isinstance(value, torch.Tensor):
valid_tensor &= not torch.any(torch.isnan(value))
return valid_tensor
else:
raise ValueError(f"Input data of invalid type: {type(data)}.")
if __name__ == "__main__":
try:
unittest.main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
"""Launch Isaac Sim Simulator first."""
import os
from omni.isaac.orbit.app import AppLauncher
# launch the simulator
app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
app_launcher = AppLauncher(headless=True, experience=app_experience)
simulation_app = app_launcher.app
"""Rest everything follows."""
import gymnasium as gym
import torch
import traceback
import unittest
import carb
import omni.usd
from omni.isaac.orbit.envs import RLTaskEnvCfg
import omni.isaac.orbit_tasks # noqa: F401
from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import RslRlVecEnvWrapper
class TestRslRlVecEnvWrapper(unittest.TestCase):
"""Test that RSL-RL VecEnv wrapper works as expected."""
@classmethod
def setUpClass(cls):
# acquire all Isaac environments names
cls.registered_tasks = list()
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
cls.registered_tasks.append(task_spec.id)
# sort environments by name
cls.registered_tasks.sort()
# only pick the first three environments to test
cls.registered_tasks = cls.registered_tasks[:3]
# print all existing task names
print(">>> All registered environments:", cls.registered_tasks)
def setUp(self) -> None:
# common parameters
self.num_envs = 512
self.use_gpu = True
def test_random_actions(self):
"""Run random actions and check environments return valid signals."""
for task_name in self.registered_tasks:
print(f">>> Running test for environment: {task_name}")
# create a new stage
omni.usd.get_context().new_stage()
# parse configuration
env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
# note: we don't want to shutdown the app on stop during the tests since we reload the stage
env_cfg.sim.shutdown_app_on_stop = False
# create environment
env = gym.make(task_name, cfg=env_cfg)
# wrap environment
env = RslRlVecEnvWrapper(env)
# reset environment
obs, extras = env.reset()
# check signal
self.assertTrue(self._check_valid_tensor(obs))
self.assertTrue(self._check_valid_tensor(extras))
# simulate environment for 1000 steps
with torch.inference_mode():
for _ in range(1000):
# sample actions from -1 to 1
actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
# apply actions
transition = env.step(actions)
# check signals
for data in transition:
self.assertTrue(self._check_valid_tensor(data), msg=f"Invalid data: {data}")
# close the environment
print(f">>> Closing environment: {task_name}")
env.close()
"""
Helper functions.
"""
@staticmethod
def _check_valid_tensor(data: torch.Tensor | dict) -> bool:
"""Checks if given data does not have corrupted values.
Args:
data: Data buffer.
Returns:
True if the data is valid.
"""
if isinstance(data, torch.Tensor):
return not torch.any(torch.isnan(data))
elif isinstance(data, dict):
valid_tensor = True
for value in data.values():
if isinstance(value, dict):
valid_tensor &= TestRslRlVecEnvWrapper._check_valid_tensor(value)
elif isinstance(value, torch.Tensor):
valid_tensor &= not torch.any(torch.isnan(value))
return valid_tensor
else:
raise ValueError(f"Input data of invalid type: {type(data)}.")
if __name__ == "__main__":
try:
unittest.main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
"""Launch Isaac Sim Simulator first."""
import os
from omni.isaac.orbit.app import AppLauncher
# launch the simulator
app_experience = f"{os.environ['EXP_PATH']}/omni.isaac.sim.python.gym.headless.kit"
app_launcher = AppLauncher(headless=True, experience=app_experience)
simulation_app = app_launcher.app
"""Rest everything follows."""
import gymnasium as gym
import numpy as np
import torch
import traceback
import unittest
import carb
import omni.usd
from omni.isaac.orbit.envs import RLTaskEnvCfg
import omni.isaac.orbit_tasks # noqa: F401
from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
from omni.isaac.orbit_tasks.utils.wrappers.sb3 import Sb3VecEnvWrapper
class TestStableBaselines3VecEnvWrapper(unittest.TestCase):
"""Test that RSL-RL VecEnv wrapper works as expected."""
@classmethod
def setUpClass(cls):
# acquire all Isaac environments names
cls.registered_tasks = list()
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
cls.registered_tasks.append(task_spec.id)
# sort environments by name
cls.registered_tasks.sort()
# only pick the first three environments to test
cls.registered_tasks = cls.registered_tasks[:3]
# print all existing task names
print(">>> All registered environments:", cls.registered_tasks)
def setUp(self) -> None:
# common parameters
self.num_envs = 512
self.use_gpu = True
def test_random_actions(self):
"""Run random actions and check environments return valid signals."""
for task_name in self.registered_tasks:
print(f">>> Running test for environment: {task_name}")
# create a new stage
omni.usd.get_context().new_stage()
# parse configuration
env_cfg: RLTaskEnvCfg = parse_env_cfg(task_name, use_gpu=self.use_gpu, num_envs=self.num_envs)
# note: we don't want to shutdown the app on stop during the tests since we reload the stage
env_cfg.sim.shutdown_app_on_stop = False
# create environment
env = gym.make(task_name, cfg=env_cfg)
# wrap environment
env = Sb3VecEnvWrapper(env)
# reset environment
obs = env.reset()
# check signal
self.assertTrue(self._check_valid_array(obs))
# simulate environment for 1000 steps
with torch.inference_mode():
for _ in range(1000):
# sample actions from -1 to 1
actions = 2 * np.random.rand(env.num_envs, env.action_space.shape) - 1
# apply actions
transition = env.step(actions)
# check signals
for data in transition:
self.assertTrue(self._check_valid_array(data), msg=f"Invalid data: {data}")
# close the environment
print(f">>> Closing environment: {task_name}")
env.close()
"""
Helper functions.
"""
@staticmethod
def _check_valid_array(data: np.ndarray | dict | list) -> bool:
"""Checks if given data does not have corrupted values.
Args:
data: Data buffer.
Returns:
True if the data is valid.
"""
if isinstance(data, np.ndarray):
return not np.any(np.isnan(data))
elif isinstance(data, dict):
valid_array = True
for value in data.values():
if isinstance(value, dict):
valid_array &= TestStableBaselines3VecEnvWrapper._check_valid_array(value)
elif isinstance(value, np.ndarray):
valid_array &= not np.any(np.isnan(value))
return valid_array
elif isinstance(data, list):
valid_array = True
for value in data:
valid_array &= TestStableBaselines3VecEnvWrapper._check_valid_array(value)
return valid_array
else:
raise ValueError(f"Input data of invalid type: {type(data)}.")
if __name__ == "__main__":
try:
unittest.main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
......@@ -27,7 +27,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
from prettytable import PrettyTable
import omni.isaac.contrib_tasks # noqa: F401
......@@ -47,10 +47,10 @@ def main():
# count of environments
index = 0
# acquire all Isaac environments names
for task_spec in gym.envs.registry.all():
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id:
# add details to table
table.add_row([index + 1, task_spec.id, task_spec.entry_point, task_spec._kwargs["env_cfg_entry_point"]])
table.add_row([index + 1, task_spec.id, task_spec.entry_point, task_spec.kwargs["env_cfg_entry_point"]])
# increment count
index += 1
......@@ -61,6 +61,8 @@ if __name__ == "__main__":
try:
# run the main function
main()
except Exception as e:
raise e
finally:
# close the app
simulation_app.close()
......@@ -15,7 +15,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Random agent for Isaac Orbit environments.")
parser = argparse.ArgumentParser(description="Random agent for Orbit environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
......@@ -31,7 +31,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......@@ -43,12 +43,15 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg
def main():
"""Random actions agent with Isaac Orbit environment."""
"""Random actions agent with Orbit environment."""
# parse configuration
env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=args_cli.num_envs)
# create environment
env = gym.make(args_cli.task, cfg=env_cfg)
# print info (this is vectorized environment)
print(f"[INFO]: Gym observation space: {env.observation_space}")
print(f"[INFO]: Gym action space: {env.action_space}")
# reset environment
env.reset()
# simulate environment
......@@ -56,9 +59,9 @@ def main():
# run everything in inference mode
with torch.inference_mode():
# sample actions from -1 to 1
actions = 2 * torch.rand((env.num_envs, env.action_space.shape[0]), device=env.device) - 1
actions = 2 * torch.rand(env.action_space.shape, device=env.unwrapped.device) - 1
# apply actions
_, _, _, _ = env.step(actions)
env.step(actions)
# close the simulator
env.close()
......
......@@ -36,7 +36,7 @@ simulation_app = app_launcher.app
"""Rest everything else."""
import gym
import gymnasium as gym
import torch
import traceback
from enum import Enum
......
......@@ -15,7 +15,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Keyboard teleoperation for Isaac Orbit environments.")
parser = argparse.ArgumentParser(description="Keyboard teleoperation for Orbit environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.")
parser.add_argument("--device", type=str, default="keyboard", help="Device for interacting with environment")
......@@ -33,7 +33,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......
......@@ -15,7 +15,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Zero agent for Isaac Orbit environments.")
parser = argparse.ArgumentParser(description="Zero agent for Orbit environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
......@@ -30,7 +30,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......@@ -42,12 +42,15 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg
def main():
"""Zero actions agent with Isaac Orbit environment."""
"""Zero actions agent with Orbit environment."""
# parse configuration
env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=args_cli.num_envs)
# create environment
env = gym.make(args_cli.task, cfg=env_cfg)
# print info (this is vectorized environment)
print(f"[INFO]: Gym observation space: {env.observation_space}")
print(f"[INFO]: Gym action space: {env.action_space}")
# reset environment
env.reset()
# simulate environment
......@@ -55,9 +58,9 @@ def main():
# run everything in inference mode
with torch.inference_mode():
# compute zero actions
actions = torch.zeros((env.num_envs, env.action_space.shape[0]), device=env.device)
actions = torch.zeros(env.action_space.shape, device=env.unwrapped.device)
# apply actions
_, _, _, _ = env.step(actions)
env.step(actions)
# close the simulator
env.close()
......
......@@ -37,7 +37,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import math
import os
import torch
......
......@@ -41,7 +41,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import math
import os
import traceback
......@@ -96,13 +96,14 @@ def main():
clip_actions = agent_cfg["params"]["env"].get("clip_actions", math.inf)
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
# wrap for video recording
if args_cli.video:
video_kwargs = {
"video_folder": os.path.join(log_dir, "videos"),
"step_trigger": lambda step: step % args_cli.video_interval == 0,
"video_length": args_cli.video_length,
"disable_logger": True,
}
print("[INFO] Recording videos during training.")
print_dict(video_kwargs, nesting=4)
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
"""Script to collect demonstrations with Isaac Orbit environments."""
"""Script to collect demonstrations with Orbit environments."""
from __future__ import annotations
......@@ -15,7 +15,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Collect demonstrations for Isaac Orbit environments.")
parser = argparse.ArgumentParser(description="Collect demonstrations for Orbit environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument("--num_envs", type=int, default=1, help="Number of environments to simulate.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
......@@ -35,7 +35,7 @@ simulation_app = app_launcher.app
import contextlib
import gym
import gymnasium as gym
import os
import torch
import traceback
......
......@@ -15,7 +15,7 @@ import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Play policy trained using robomimic for Isaac Orbit environments.")
parser = argparse.ArgumentParser(description="Play policy trained using robomimic for Orbit environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
parser.add_argument("--checkpoint", type=str, default=None, help="Pytorch model checkpoint to load.")
......@@ -31,7 +31,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......@@ -46,7 +46,7 @@ from omni.isaac.orbit_tasks.utils import parse_env_cfg
def main():
"""Run a trained policy from robomimic with Isaac Orbit environment."""
"""Run a trained policy from robomimic with Orbit environment."""
# parse configuration
env_cfg = parse_env_cfg(args_cli.task, use_gpu=not args_cli.cpu, num_envs=1)
# modify configuration
......
......@@ -54,7 +54,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import argparse
import gym
import gymnasium as gym
import json
import numpy as np
import os
......
......@@ -36,7 +36,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import os
import torch
import traceback
......
......@@ -47,7 +47,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import os
import torch
import traceback
......@@ -88,13 +88,14 @@ def main():
log_dir = os.path.join(log_root_path, log_dir)
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
# wrap for video recording
if args_cli.video:
video_kwargs = {
"video_folder": os.path.join(log_dir, "videos"),
"step_trigger": lambda step: step % args_cli.video_interval == 0,
"video_length": args_cli.video_length,
"disable_logger": True,
}
print("[INFO] Recording videos during training.")
print_dict(video_kwargs, nesting=4)
......
......@@ -33,7 +33,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......
......@@ -43,7 +43,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import os
import traceback
from datetime import datetime
......@@ -95,6 +95,7 @@ def main():
"video_folder": os.path.join(log_dir, "videos"),
"step_trigger": lambda step: step % args_cli.video_interval == 0,
"video_length": args_cli.video_length,
"disable_logger": True,
}
print("[INFO] Recording videos during training.")
print_dict(video_kwargs, nesting=4)
......
......@@ -38,7 +38,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import torch
import traceback
......
......@@ -48,7 +48,7 @@ simulation_app = app_launcher.app
"""Rest everything follows."""
import gym
import gymnasium as gym
import traceback
from datetime import datetime
......@@ -97,13 +97,14 @@ def main():
dump_pickle(os.path.join(log_dir, "params", "agent.pkl"), experiment_cfg)
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
env = gym.make(args_cli.task, cfg=env_cfg, render_mode="rgb_array" if args_cli.video else None)
# wrap for video recording
if args_cli.video:
video_kwargs = {
"video_folder": os.path.join(log_dir, "videos"),
"step_trigger": lambda step: step % args_cli.video_interval == 0,
"video_length": args_cli.video_length,
"disable_logger": True,
}
print("[INFO] Recording videos during training.")
print_dict(video_kwargs, nesting=4)
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment