Commit d494e00e authored by Kelly Guo's avatar Kelly Guo Committed by Kelly Guo

Fixes extension renaming and adds documentation for RL training (#207)

# Description

This change fixes some lingering extension renaming for the deprecated
omni.isaac.ui extension.
Additionally, new documentation for RL training is added to provide a
guideline and some troubleshooting tips for RL training with Isaac Lab.
parent ee91a42d
......@@ -12,3 +12,4 @@ learning frameworks.
rl_existing_scripts
rl_frameworks
performance_benchmarks
training_guide
Debugging and Training Guide
============================
In this tutorial, we'll guide developers working with Isaac Lab to understand the
impact of various parameters on training time, GPU utilization, and memory usage.
This is especially helpful for addressing Out of Memory (OOM) errors that commonly
occur during reinforcement learning (RL) training. We will touch on common errors seen
during RL training in Isaac Lab and provide some guidance on troubleshooting steps.
Training with Parallel Environments
-----------------------------------
The key RL paradigm of Isaac Lab is to train with many environments in parallel.
Here, we define an environment as an instance of a robot or multiple robots interacting with other robots or objects in simulation.
By creating multiple environments in parallel, we generate multiple copies of the environment such that the robots in each environment can explore the world independently of other environments.
The number of environments thus becomes an important hyperparameter for training.
In general, the more environments we have running in parallel,
the more data we can collect during rollout, which in turn, provides more data
for RL training and allows for faster training since the RL agent can learn from parallel experiences.
However, the number of environments can also be bounded by other factors.
Memory can often be a hard constraint on the number of environments we can run in parallel.
When more environments are added to the world, the simulation also requires more memory to represent and simulate each object in the scene.
The number of environments we can simulate in parallel thus often depend on the amount of memory resources available on the machine.
In addition, different forms of simulation can also consume various amounts of memory.
For example, objects with high fidelity visual and collision meshes will consume more memory than simple primitive shapes.
Deformable simulation will also likely require more memory to simulate than rigid bodies.
Training with rendering often consumes much higher memory than running with only physics simulation. This is especially true when rendering at relatively large resolutions. Additionally, when training RL policies with image observations, we often also require more memory to hold the rollout trajectories of image buffers and larger networks for the policies. Both of these components will also impact the amount of memory available for the simulation.
To reduce memory consumption, one method is to simplify collision meshes of the assets where possible to keep only bare minimum collision shapes required for correct simulation of contacts.
Additionally, we recommend only running with the viewport when debugging with a small number of environments.
When training with larger number of environments in parallel, it is recommended to run in headless mode to avoid any rendering overhead.
If the RL pipeline requires rendering in the loop, make sure to reduce the number of environments, taking into consideration for the dimensions of the image buffers and the size of the policy networks. When hitting out of memory errors, the simplest solution may be to reduce the number of environments.
Hyperparameter Tuning
---------------------
Although in many cases, simulating more environments in parallel can yield faster training and better results, there are also cases where diminishing returns are observed when the number of environments reaches certain thresholds.
This threshold will vary depending on the complexity of the environment, task, policy setup, and RL algorithm.
When more environments are simulated in parallel, each simulation step requires more time to simulate, which will impact the overall training time.
When the number of environments is small, this increase in per-step simulation time is often insignificant compared to the increase in training performance from more experiences collected.
However, when the number of environments reaches a point, the benefits from having even more experiences for the RL algorithm may start to saturate, and the amount of increased simulation time can outweigh the benefits in training performance.
In contrast to diminishing returns on number of environments that are too large, training with low number of environments can also be challenging.
This is often due to the RL policies not getting enough experiences to learn from.
To address this issue, it may be helpful to increase the batch size or the horizon length to accommodate for the smaller amount of data collected from lower number of parallel environments.
When the number of environments is constrained by available resources, running with parallel GPUs or training across multiple nodes can also help alleviate issues due to limited rollouts.
Debugging NaNs during Training
------------------------------
One common error seen during RL training is the appearance of NaNs in the observation buffers, which often get propagated into the policy networks and cause crashes in the downstream training pipeline.
In most cases, the appearance of NaNs occur when the simulation becomes unstable.
This could be due to drastic actions being applied to the robots that exceed the limits of the simulation, or resets of the assets into invalid states.
Some helpful tips to reduce the occurrence of NaNs include proper tuning of the physics parameters for the assets to ensure that joint, velocity, and force limits are within reasonable ranges and the gains are correctly tuned for the robot.
It is also a good idea to check that actions applied to the robots are reasonable and will not impose large forces or impulses on the objects.
Reducing the timestep of the physics simulation can also help improve accuracy and stability of the simulation, in addition to increasing the solver iterations.
Understanding Training Outputs
------------------------------
Each RL library produces its own output data during training.
Some libraries are more verbose and generate logs that contain more detailed information on the training process, while others are more compact.
In this section, we will explain the common outputs from the RL libraries.
RL-Games
^^^^^^^^
For each iteration, RL-Games prints statistics of the data collection, inference, and training performance.
.. code:: bash
fps step: 112918 fps step and policy inference: 104337 fps total: 78179 epoch: 1/150 frames: 0
``fps step`` refers to the environment step FPS, which includes the applying actions, computing observations, rewards, dones, and resets, as well as stepping simulation.
``step and policy inference`` measure everything in ``fps step`` along with the time it takes for the policy inference to compute the actions.
``fps total`` measure the above and the time it takes for the training iteration.
At specified intervals, it will also log the current best reward and the path of the intermmediate checkpoints saved to file.
.. code:: bash
=> saving checkpoint 'IsaacLab/logs/rl_games/cartpole_direct/2024-12-28_20-23-06/nn/last_cartpole_direct_ep_150_rew_294.18793.pth'
saving next best rewards: [294.18793]
RSL RL
^^^^^^
For each iteration, RSL RL provides the following output:
.. code:: bash
Learning iteration 0/150
Computation: 50355 steps/s (collection: 1.106s, learning 0.195s)
Value function loss: 22.0539
Surrogate loss: -0.0086
Mean action noise std: 1.00
Mean reward: -5.49
Mean episode length: 15.79
--------------------------------------------------------------------------------
Total timesteps: 65536
Iteration time: 1.30s
Total time: 1.30s
ETA: 195.2s
This output encapsulates the total FPS for data collection, inference, and learning, along with the per-step breakdown for collection and learning time per step.
In addition, statistics for the training losses are provided, along with the current average reward and episode length.
In the bottom section, it logs the total number of steps completed so far, the total ieration time for the current ieration, the total overall training time, and the estimated training time to complete the full number of iterations.
SKRL
^^^^
SKRL provides a very simplistic output showing the training progress as a percentage of the total number of timesteps (divided by the number of environments). It also includes the total elapsed time so far and the estimated time to complete training.
.. code:: bash
0%| | 2/4800 [00:00<10:02, 7.96it/s]
Stable-Baselines3
^^^^^^^^^^^^^^^^^
Stable-Baselines3 provides a detailed output, outlining the rollout statistics, timing, and policy data.
.. code:: bash
------------------------------------------
| rollout/ | |
| ep_len_mean | 30.8 |
| ep_rew_mean | 2.87 |
| time/ | |
| fps | 8824 |
| iterations | 2 |
| time_elapsed | 14 |
| total_timesteps | 131072 |
| train/ | |
| approx_kl | 0.0079056695 |
| clip_fraction | 0.0842 |
| clip_range | 0.2 |
| entropy_loss | -1.42 |
| explained_variance | 0.0344 |
| learning_rate | 0.0003 |
| loss | 10.4 |
| n_updates | 20 |
| policy_gradient_loss | -0.0119 |
| std | 1 |
| value_loss | 17 |
------------------------------------------
Under the ``rollout/`` section, average episode length and reward are logged for the iteration. Under ``time/``, data for the total FPS, number of iterations, total time elapsed, and the total number of timesteps are provided. Finally, under ``train/``, statistics of the training parameters are logged, such as KL, losses, learning rates, and more.
......@@ -59,6 +59,9 @@ For SpaceMouse, these are as follows:
Move arm along z-axis: Push or pull the SpaceMouse
Rotate arm: Twist the SpaceMouse
The next section describes how teleoperation devices can be used for data collection for imitation learning.
Imitation Learning
~~~~~~~~~~~~~~~~~~
......
......@@ -67,7 +67,7 @@ compatibility issues with some Linux distributions. If you encounter any issues,
env_isaaclab\Scripts\activate
- Next, install a CUDA-enabled PyTorch 2.4.0 build based on the CUDA version available on your system. This step is optional for Linux, but required for Windows to ensure a CUDA-compatible version of PyTorch is installed.
- Next, install a CUDA-enabled PyTorch 2.5.1 build based on the CUDA version available on your system. This step is optional for Linux, but required for Windows to ensure a CUDA-compatible version of PyTorch is installed.
.. tab-set::
......@@ -75,13 +75,13 @@ compatibility issues with some Linux distributions. If you encounter any issues,
.. code-block:: bash
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu118
.. tab-item:: CUDA 12
.. code-block:: bash
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu121
- Before installing Isaac Sim, ensure the latest pip version is installed. To update pip, run
......@@ -117,7 +117,7 @@ Verifying the Isaac Sim installation
.. code:: bash
# experience files can be absolute path, or relative path searched in isaacsim/apps or omni/apps
isaacsim omni.isaac.sim.python.kit
isaacsim isaacsim.exp.full.kit
.. attention::
......
......@@ -45,15 +45,8 @@ from isaaclab_assets import H1_CFG # isort:skip
from isaaclab_assets import G1_CFG # isort:skip
def main():
"""Main function."""
# Load kit helper
sim_cfg = sim_utils.SimulationCfg(dt=0.005, device=args_cli.device)
sim = SimulationContext(sim_cfg)
# Set main camera
sim.set_camera_view(eye=[3.0, 0.0, 2.25], target=[0.0, 0.0, 1.0])
# Spawn things into stage
def design_scene(sim: sim_utils.SimulationContext) -> tuple[list, torch.Tensor]:
"""Designs the scene."""
# Ground-plane
cfg = sim_utils.GroundPlaneCfg()
cfg.func("/World/defaultGroundPlane", cfg)
......@@ -74,12 +67,11 @@ def main():
g1 = Articulation(G1_CFG.replace(prim_path="/World/G1"))
robots = [cassie, h1, g1]
# Play the simulator
sim.reset()
return robots, origins
# Now we are ready!
print("[INFO]: Setup complete...")
def run_simulator(sim: sim_utils.SimulationContext, robots: list[Articulation], origins: torch.Tensor):
"""Runs the simulation loop."""
# Define simulation stepping
sim_dt = sim.get_physics_dt()
sim_time = 0.0
......@@ -116,6 +108,27 @@ def main():
robot.update(sim_dt)
def main():
"""Main function."""
# Load kit helper
sim_cfg = sim_utils.SimulationCfg(dt=0.005, device=args_cli.device)
sim = SimulationContext(sim_cfg)
# Set main camera
sim.set_camera_view(eye=[3.0, 0.0, 2.25], target=[0.0, 0.0, 1.0])
# design scene
robots, origins = design_scene(sim)
# Play the simulator
sim.reset()
# Now we are ready!
print("[INFO]: Setup complete...")
# Run the simulator
run_simulator(sim, robots, origins)
if __name__ == "__main__":
# run the main function
main()
......
......@@ -8,7 +8,7 @@ Utility to convert a MJCF into USD format.
MuJoCo XML Format (MJCF) is an XML file format used in MuJoCo to describe all elements of a robot. For more information, see: http://www.mujoco.org/book/XMLreference.html
This script uses the MJCF importer extension from Isaac Sim (``omni.isaac.mjcf_importer``) to convert a MJCF asset into USD format. It is designed as a convenience script for command-line use. For more information on the MJCF importer, see the documentation for the extension:
This script uses the MJCF importer extension from Isaac Sim (``isaacsim.asset.importer.mjcf``) to convert a MJCF asset into USD format. It is designed as a convenience script for command-line use. For more information on the MJCF importer, see the documentation for the extension:
https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_mjcf.html
......
......@@ -9,7 +9,7 @@ Utility to convert a URDF into USD format.
Unified Robot Description Format (URDF) is an XML file format used in ROS to describe all elements of
a robot. For more information, see: http://wiki.ros.org/urdf
This script uses the URDF importer extension from Isaac Sim (``omni.isaac.urdf_importer``) to convert a
This script uses the URDF importer extension from Isaac Sim (``isaacsim.asset.importer.urdf``) to convert a
URDF asset into USD format. It is designed as a convenience script for command-line use. For more
information on the URDF importer, see the documentation for the extension:
https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_urdf.html
......
......@@ -14,10 +14,10 @@ Currently, the sub-package provides the following classes:
.. note::
For some simple use-cases, it may be sufficient to use the debug drawing utilities from Isaac Sim.
The debug drawing API is available in the `omni.isaac.debug_drawing`_ module. It allows drawing of
The debug drawing API is available in the `isaacsim.util.debug_drawing`_ module. It allows drawing of
points and splines efficiently on the UI.
.. _omni.isaac.debug_drawing: https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_debug_drawing.html
.. _isaacsim.util.debug_drawing: https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_debug_drawing.html
"""
......
......@@ -14,7 +14,7 @@ import omni.log
from .ui_widget_wrapper import UIWidgetWrapper
if TYPE_CHECKING:
import omni.isaac.ui
import isaacsim.gui.components
import omni.ui
......@@ -155,7 +155,9 @@ class ImagePlot(UIWidgetWrapper):
with omni.ui.HStack():
# Write the leftmost label for what this plot is
omni.ui.Label(
self._label, width=omni.isaac.ui.ui_utils.LABEL_WIDTH, alignment=omni.ui.Alignment.LEFT_TOP
self._label,
width=isaacsim.gui.components.ui_utils.LABEL_WIDTH,
alignment=omni.ui.Alignment.LEFT_TOP,
)
with omni.ui.Frame(width=self._aspect_ratio * self._widget_height, height=self._widget_height):
self._base_plot = omni.ui.ImageWithProvider(self._byte_provider)
......@@ -192,7 +194,7 @@ class ImagePlot(UIWidgetWrapper):
def _change_mode(value):
self._curr_mode = value
omni.isaac.ui.ui_utils.dropdown_builder(
isaacsim.gui.components.ui_utils.dropdown_builder(
label="Mode",
type="dropdown",
items=["Original", "Normalization", "Colorization"],
......
......@@ -15,7 +15,7 @@ from isaacsim.core.api.simulation_context import SimulationContext
from .ui_widget_wrapper import UIWidgetWrapper
if TYPE_CHECKING:
import omni.isaac.ui
import isaacsim.gui.components
import omni.ui
......@@ -397,7 +397,7 @@ class LiveLinePlot(UIWidgetWrapper):
max_legend = max([len(legend) for legend in self._legends])
CHAR_WIDTH = 8
with omni.ui.VGrid(
row_height=omni.isaac.ui.ui_utils.LABEL_HEIGHT,
row_height=isaacsim.gui.components.ui_utils.LABEL_HEIGHT,
column_width=max_legend * CHAR_WIDTH + 6,
):
for idx in range(len(self._y_data)):
......@@ -442,7 +442,7 @@ class LiveLinePlot(UIWidgetWrapper):
with omni.ui.HStack():
omni.ui.Label(
"Limits",
width=omni.isaac.ui.ui_utils.LABEL_WIDTH,
width=isaacsim.gui.components.ui_utils.LABEL_WIDTH,
alignment=omni.ui.Alignment.LEFT_CENTER,
)
......@@ -460,10 +460,10 @@ class LiveLinePlot(UIWidgetWrapper):
omni.ui.Button(
"Re-Scale",
width=omni.isaac.ui.ui_utils.BUTTON_WIDTH,
width=isaacsim.gui.components.ui_utils.BUTTON_WIDTH,
clicked_fn=self._rescale_btn_pressed,
alignment=omni.ui.Alignment.LEFT_CENTER,
style=omni.isaac.ui.ui_utils.get_style(),
style=isaacsim.gui.components.ui_utils.get_style(),
)
omni.ui.CheckBox(model=self._autoscale_model, tooltip="", width=4)
......@@ -498,7 +498,7 @@ class LiveLinePlot(UIWidgetWrapper):
self.clear()
self._filter_mode = value
omni.isaac.ui.ui_utils.dropdown_builder(
isaacsim.gui.components.ui_utils.dropdown_builder(
label="Filter",
type="dropdown",
items=["None", "Lowpass", "Integrate", "Derivative"],
......@@ -512,10 +512,10 @@ class LiveLinePlot(UIWidgetWrapper):
# Button
omni.ui.Button(
"Play/Pause",
width=omni.isaac.ui.ui_utils.BUTTON_WIDTH,
width=isaacsim.gui.components.ui_utils.BUTTON_WIDTH,
clicked_fn=_toggle_paused,
alignment=omni.ui.Alignment.LEFT_CENTER,
style=omni.isaac.ui.ui_utils.get_style(),
style=isaacsim.gui.components.ui_utils.get_style(),
)
def _create_ui_widget(self):
......
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
# This file has been adapted from _isaac_sim/exts/omni.isaac.ui/omni/isaac/ui/element_wrappers/base_ui_element_wrappers.py
# This file has been adapted from _isaac_sim/exts/isaacsim.gui.components/isaacsim/gui/components/element_wrappers/base_ui_element_wrappers.py
from __future__ import annotations
......
......@@ -32,7 +32,7 @@ from isaaclab.envs.ui import ManagerBasedRLEnvWindow
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.utils import configclass
enable_extension("omni.isaac.ui")
enable_extension("isaacsim.gui.components")
@configclass
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment