Fixes extension renaming and adds documentation for RL training (#207)

# Description This change fixes some lingering extension renaming for the deprecated omni.isaac.ui extension. Additionally, new documentation for RL training is added to provide a guideline and some troubleshooting tips for RL training with Isaac Lab.

Fixes extension renaming and adds documentation for RL training (#207)
# Description This change fixes some lingering extension renaming for the deprecated omni.isaac.ui extension. Additionally, new documentation for RL training is added to provide a guideline and some troubleshooting tips for RL training with Isaac Lab.
d494e00e · Kelly Guo · Kelly Guo · ee91a42d · d494e00e · d494e00e
Commit d494e00e authored Dec 31, 2024 by Kelly Guo Committed by Kelly Guo Jan 30, 2025
12 changed files
--- a/docs/source/overview/reinforcement-learning/index.rst
+++ b/docs/source/overview/reinforcement-learning/index.rst
@@ -12,3 +12,4 @@ learning frameworks.
  rl_existing_scripts
  rl_frameworks
  performance_benchmarks
+  training_guide
--- a/docs/source/overview/reinforcement-learning/training_guide.rst
+++ b/docs/source/overview/reinforcement-learning/training_guide.rst
+Debugging and Training Guide
+============================
+
+In this tutorial, we'll guide developers working with Isaac Lab to understand the
+impact of various parameters on training time, GPU utilization, and memory usage.
+This is especially helpful for addressing Out of Memory (OOM) errors that commonly
+occur during reinforcement learning (RL) training. We will touch on common errors seen
+during RL training in Isaac Lab and provide some guidance on troubleshooting steps.
+
+
+Training with Parallel Environments
+-----------------------------------
+
+The key RL paradigm of Isaac Lab is to train with many environments in parallel.
+Here, we define an environment as an instance of a robot or multiple robots interacting with other robots or objects in simulation.
+By creating multiple environments in parallel, we generate multiple copies of the environment such that the robots in each environment can explore the world independently of other environments.
+The number of environments thus becomes an important hyperparameter for training.
+In general, the more environments we have running in parallel,
+the more data we can collect during rollout, which in turn, provides more data
+for RL training and allows for faster training since the RL agent can learn from parallel experiences.
+
+However, the number of environments can also be bounded by other factors.
+Memory can often be a hard constraint on the number of environments we can run in parallel.
+When more environments are added to the world, the simulation also requires more memory to represent and simulate each object in the scene.
+The number of environments we can simulate in parallel thus often depend on the amount of memory resources available on the machine.
+In addition, different forms of simulation can also consume various amounts of memory.
+For example, objects with high fidelity visual and collision meshes will consume more memory than simple primitive shapes.
+Deformable simulation will also likely require more memory to simulate than rigid bodies.
+
+Training with rendering often consumes much higher memory than running with only physics simulation. This is especially true when rendering at relatively large resolutions. Additionally, when training RL policies with image observations, we often also require more memory to hold the rollout trajectories of image buffers and larger networks for the policies. Both of these components will also impact the amount of memory available for the simulation.
+
+To reduce memory consumption, one method is to simplify collision meshes of the assets where possible to keep only bare minimum collision shapes required for correct simulation of contacts.
+Additionally, we recommend only running with the viewport when debugging with a small number of environments.
+When training with larger number of environments in parallel, it is recommended to run in headless mode to avoid any rendering overhead.
+If the RL pipeline requires rendering in the loop, make sure to reduce the number of environments, taking into consideration for the dimensions of the image buffers and the size of the policy networks. When hitting out of memory errors, the simplest solution may be to reduce the number of environments.
+
+
+Hyperparameter Tuning
+---------------------
+
+Although in many cases, simulating more environments in parallel can yield faster training and better results, there are also cases where diminishing returns are observed when the number of environments reaches certain thresholds.
+This threshold will vary depending on the complexity of the environment, task, policy setup, and RL algorithm.
+When more environments are simulated in parallel, each simulation step requires more time to simulate, which will impact the overall training time.
+When the number of environments is small, this increase in per-step simulation time is often insignificant compared to the increase in training performance from more experiences collected.
+However, when the number of environments reaches a point, the benefits from having even more experiences for the RL algorithm may start to saturate, and the amount of increased simulation time can outweigh the benefits in training performance.
+
+In contrast to diminishing returns on number of environments that are too large, training with low number of environments can also be challenging.
+This is often due to the RL policies not getting enough experiences to learn from.
+To address this issue, it may be helpful to increase the batch size or the horizon length to accommodate for the smaller amount of data collected from lower number of parallel environments.
+When the number of environments is constrained by available resources, running with parallel GPUs or training across multiple nodes can also help alleviate issues due to limited rollouts.
+
+
+Debugging NaNs during Training
+------------------------------
+
+One common error seen during RL training is the appearance of NaNs in the observation buffers, which often get propagated into the policy networks and cause crashes in the downstream training pipeline.
+In most cases, the appearance of NaNs occur when the simulation becomes unstable.
+This could be due to drastic actions being applied to the robots that exceed the limits of the simulation, or resets of the assets into invalid states.
+Some helpful tips to reduce the occurrence of NaNs include proper tuning of the physics parameters for the assets to ensure that joint, velocity, and force limits are within reasonable ranges and the gains are correctly tuned for the robot.
+It is also a good idea to check that actions applied to the robots are reasonable and will not impose large forces or impulses on the objects.
+Reducing the timestep of the physics simulation can also help improve accuracy and stability of the simulation, in addition to increasing the solver iterations.
+
+
+Understanding Training Outputs
+------------------------------
+
+Each RL library produces its own output data during training.
+Some libraries are more verbose and generate logs that contain more detailed information on the training process, while others are more compact.
+In this section, we will explain the common outputs from the RL libraries.
+
+
+RL-Games
+^^^^^^^^
+
+For each iteration, RL-Games prints statistics of the data collection, inference, and training performance.
+
+.. code:: bash
+
+  fps step: 112918 fps step and policy inference: 104337 fps total: 78179 epoch: 1/150 frames: 0
+
+``fps step`` refers to the environment step FPS, which includes the applying actions, computing observations, rewards, dones, and resets, as well as stepping simulation.
+
+``step and policy inference`` measure everything in ``fps step`` along with the time it takes for the policy inference to compute the actions.
+
+``fps total`` measure the above and the time it takes for the training iteration.
+
+At specified intervals, it will also log the current best reward and the path of the intermmediate checkpoints saved to file.
+
+.. code:: bash
+
+  => saving checkpoint 'IsaacLab/logs/rl_games/cartpole_direct/2024-12-28_20-23-06/nn/last_cartpole_direct_ep_150_rew_294.18793.pth'
+  saving next best rewards:  [294.18793]
+
+
+RSL RL
+^^^^^^
+
+For each iteration, RSL RL provides the following output:
+
+.. code:: bash
+
+                          Learning iteration 0/150
+
+                       Computation: 50355 steps/s (collection: 1.106s, learning 0.195s)
+               Value function loss: 22.0539
+                    Surrogate loss: -0.0086
+             Mean action noise std: 1.00
+                       Mean reward: -5.49
+               Mean episode length: 15.79
+  --------------------------------------------------------------------------------
+                   Total timesteps: 65536
+                    Iteration time: 1.30s
+                        Total time: 1.30s
+                               ETA: 195.2s
+
+
+This output encapsulates the total FPS for data collection, inference, and learning, along with the per-step breakdown for collection and learning time per step.
+In addition, statistics for the training losses are provided, along with the current average reward and episode length.
+
+In the bottom section, it logs the total number of steps completed so far, the total ieration time for the current ieration, the total overall training time, and the estimated training time to complete the full number of iterations.
+
+
+SKRL
+^^^^
+
+SKRL provides a very simplistic output showing the training progress as a percentage of the total number of timesteps (divided by the number of environments). It also includes the total elapsed time so far and the estimated time to complete training.
+
+.. code:: bash
+
+    0%|                                          | 2/4800 [00:00<10:02,  7.96it/s]
+
+
+Stable-Baselines3
+^^^^^^^^^^^^^^^^^
+
+Stable-Baselines3 provides a detailed output, outlining the rollout statistics, timing, and policy data.
+
+.. code:: bash
+
+  ------------------------------------------
+  | rollout/                |              |
+  |    ep_len_mean          | 30.8         |
+  |    ep_rew_mean          | 2.87         |
+  | time/                   |              |
+  |    fps                  | 8824         |
+  |    iterations           | 2            |
+  |    time_elapsed         | 14           |
+  |    total_timesteps      | 131072       |
+  | train/                  |              |
+  |    approx_kl            | 0.0079056695 |
+  |    clip_fraction        | 0.0842       |
+  |    clip_range           | 0.2          |
+  |    entropy_loss         | -1.42        |
+  |    explained_variance   | 0.0344       |
+  |    learning_rate        | 0.0003       |
+  |    loss                 | 10.4         |
+  |    n_updates            | 20           |
+  |    policy_gradient_loss | -0.0119      |
+  |    std                  | 1            |
+  |    value_loss           | 17           |
+  ------------------------------------------
+
+Under the ``rollout/`` section, average episode length and reward are logged for the iteration. Under ``time/``, data for the total FPS, number of iterations, total time elapsed, and the total number of timesteps are provided. Finally, under ``train/``, statistics of the training parameters are logged, such as KL, losses, learning rates, and more.
--- a/docs/source/overview/teleop_imitation.rst
+++ b/docs/source/overview/teleop_imitation.rst
@@ -59,6 +59,9 @@ For SpaceMouse, these are as follows:
      Move arm along z-axis: Push or pull the SpaceMouse
      Rotate arm: Twist the SpaceMouse

+The next section describes how teleoperation devices can be used for data collection for imitation learning.
+
+
 Imitation Learning
 ~~~~~~~~~~~~~~~~~~


--- a/docs/source/setup/installation/pip_installation.rst
+++ b/docs/source/setup/installation/pip_installation.rst
@@ -67,7 +67,7 @@ compatibility issues with some Linux distributions. If you encounter any issues,
                  env_isaaclab\Scripts\activate


-  Next, install a CUDA-enabled PyTorch 2.4.0 build based on the CUDA version available on your system. This step is optional for Linux, but required for Windows to ensure a CUDA-compatible version of PyTorch is installed.
+-  Next, install a CUDA-enabled PyTorch 2.5.1 build based on the CUDA version available on your system. This step is optional for Linux, but required for Windows to ensure a CUDA-compatible version of PyTorch is installed.

   .. tab-set::

@@ -75,13 +75,13 @@ compatibility issues with some Linux distributions. If you encounter any issues,

         .. code-block:: bash

-            pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
+            pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu118

      .. tab-item:: CUDA 12

         .. code-block:: bash

-            pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
+            pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu121

 -  Before installing Isaac Sim, ensure the latest pip version is installed. To update pip, run

@@ -117,7 +117,7 @@ Verifying the Isaac Sim installation
   .. code:: bash

      # experience files can be absolute path, or relative path searched in isaacsim/apps or omni/apps
-      isaacsim omni.isaac.sim.python.kit
+      isaacsim isaacsim.exp.full.kit


 .. attention::

--- a/scripts/demos/bipeds.py
+++ b/scripts/demos/bipeds.py
@@ -45,15 +45,8 @@ from isaaclab_assets import H1_CFG  # isort:skip
 from isaaclab_assets import G1_CFG  # isort:skip


-def main():
-    """Main function."""
-    # Load kit helper
-    sim_cfg = sim_utils.SimulationCfg(dt=0.005, device=args_cli.device)
-    sim = SimulationContext(sim_cfg)
-    # Set main camera
-    sim.set_camera_view(eye=[3.0, 0.0, 2.25], target=[0.0, 0.0, 1.0])
-
-    # Spawn things into stage
+def design_scene(sim: sim_utils.SimulationContext) -> tuple[list, torch.Tensor]:
+    """Designs the scene."""
    # Ground-plane
    cfg = sim_utils.GroundPlaneCfg()
    cfg.func("/World/defaultGroundPlane", cfg)
@@ -74,12 +67,11 @@ def main():
    g1 = Articulation(G1_CFG.replace(prim_path="/World/G1"))
    robots = [cassie, h1, g1]

-    # Play the simulator
-    sim.reset()
+    return robots, origins

-    # Now we are ready!
-    print("[INFO]: Setup complete...")

+def run_simulator(sim: sim_utils.SimulationContext, robots: list[Articulation], origins: torch.Tensor):
+    """Runs the simulation loop."""
    # Define simulation stepping
    sim_dt = sim.get_physics_dt()
    sim_time = 0.0
@@ -116,6 +108,27 @@ def main():
            robot.update(sim_dt)


+def main():
+    """Main function."""
+    # Load kit helper
+    sim_cfg = sim_utils.SimulationCfg(dt=0.005, device=args_cli.device)
+    sim = SimulationContext(sim_cfg)
+    # Set main camera
+    sim.set_camera_view(eye=[3.0, 0.0, 2.25], target=[0.0, 0.0, 1.0])
+
+    # design scene
+    robots, origins = design_scene(sim)
+
+    # Play the simulator
+    sim.reset()
+
+    # Now we are ready!
+    print("[INFO]: Setup complete...")
+
+    # Run the simulator
+    run_simulator(sim, robots, origins)
+
+
 if __name__ == "__main__":
    # run the main function
    main()

--- a/scripts/tools/convert_mjcf.py
+++ b/scripts/tools/convert_mjcf.py
@@ -8,7 +8,7 @@ Utility to convert a MJCF into USD format.

 MuJoCo XML Format (MJCF) is an XML file format used in MuJoCo to describe all elements of a robot. For more information, see: http://www.mujoco.org/book/XMLreference.html

-This script uses the MJCF importer extension from Isaac Sim (``omni.isaac.mjcf_importer``) to convert a MJCF asset into USD format. It is designed as a convenience script for command-line use. For more information on the MJCF importer, see the documentation for the extension:
+This script uses the MJCF importer extension from Isaac Sim (``isaacsim.asset.importer.mjcf``) to convert a MJCF asset into USD format. It is designed as a convenience script for command-line use. For more information on the MJCF importer, see the documentation for the extension:
 https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_mjcf.html



--- a/scripts/tools/convert_urdf.py
+++ b/scripts/tools/convert_urdf.py
@@ -9,7 +9,7 @@ Utility to convert a URDF into USD format.
 Unified Robot Description Format (URDF) is an XML file format used in ROS to describe all elements of
 a robot. For more information, see: http://wiki.ros.org/urdf

-This script uses the URDF importer extension from Isaac Sim (``omni.isaac.urdf_importer``) to convert a
+This script uses the URDF importer extension from Isaac Sim (``isaacsim.asset.importer.urdf``) to convert a
 URDF asset into USD format. It is designed as a convenience script for command-line use. For more
 information on the URDF importer, see the documentation for the extension:
 https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_urdf.html

--- a/source/isaaclab/isaaclab/markers/__init__.py
+++ b/source/isaaclab/isaaclab/markers/__init__.py
@@ -14,10 +14,10 @@ Currently, the sub-package provides the following classes:
 .. note::

    For some simple use-cases, it may be sufficient to use the debug drawing utilities from Isaac Sim.
-    The debug drawing API is available in the `omni.isaac.debug_drawing`_ module. It allows drawing of
+    The debug drawing API is available in the `isaacsim.util.debug_drawing`_ module. It allows drawing of
    points and splines efficiently on the UI.

-    .. _omni.isaac.debug_drawing: https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_debug_drawing.html
+    .. _isaacsim.util.debug_drawing: https://docs.omniverse.nvidia.com/app_isaacsim/app_isaacsim/ext_omni_isaac_debug_drawing.html

 """


--- a/source/isaaclab/isaaclab/ui/widgets/image_plot.py
+++ b/source/isaaclab/isaaclab/ui/widgets/image_plot.py
@@ -14,7 +14,7 @@ import omni.log
 from .ui_widget_wrapper import UIWidgetWrapper

 if TYPE_CHECKING:
-    import omni.isaac.ui
+    import isaacsim.gui.components
    import omni.ui


@@ -155,7 +155,9 @@ class ImagePlot(UIWidgetWrapper):
            with omni.ui.HStack():
                # Write the leftmost label for what this plot is
                omni.ui.Label(
-                    self._label, width=omni.isaac.ui.ui_utils.LABEL_WIDTH, alignment=omni.ui.Alignment.LEFT_TOP
+                    self._label,
+                    width=isaacsim.gui.components.ui_utils.LABEL_WIDTH,
+                    alignment=omni.ui.Alignment.LEFT_TOP,
                )
                with omni.ui.Frame(width=self._aspect_ratio * self._widget_height, height=self._widget_height):
                    self._base_plot = omni.ui.ImageWithProvider(self._byte_provider)
@@ -192,7 +194,7 @@ class ImagePlot(UIWidgetWrapper):
                def _change_mode(value):
                    self._curr_mode = value

-                omni.isaac.ui.ui_utils.dropdown_builder(
+                isaacsim.gui.components.ui_utils.dropdown_builder(
                    label="Mode",
                    type="dropdown",
                    items=["Original", "Normalization", "Colorization"],

--- a/source/isaaclab/isaaclab/ui/widgets/line_plot.py
+++ b/source/isaaclab/isaaclab/ui/widgets/line_plot.py
@@ -15,7 +15,7 @@ from isaacsim.core.api.simulation_context import SimulationContext
 from .ui_widget_wrapper import UIWidgetWrapper

 if TYPE_CHECKING:
-    import omni.isaac.ui
+    import isaacsim.gui.components
    import omni.ui


@@ -397,7 +397,7 @@ class LiveLinePlot(UIWidgetWrapper):
            max_legend = max([len(legend) for legend in self._legends])
            CHAR_WIDTH = 8
            with omni.ui.VGrid(
-                row_height=omni.isaac.ui.ui_utils.LABEL_HEIGHT,
+                row_height=isaacsim.gui.components.ui_utils.LABEL_HEIGHT,
                column_width=max_legend * CHAR_WIDTH + 6,
            ):
                for idx in range(len(self._y_data)):
@@ -442,7 +442,7 @@ class LiveLinePlot(UIWidgetWrapper):
            with omni.ui.HStack():
                omni.ui.Label(
                    "Limits",
-                    width=omni.isaac.ui.ui_utils.LABEL_WIDTH,
+                    width=isaacsim.gui.components.ui_utils.LABEL_WIDTH,
                    alignment=omni.ui.Alignment.LEFT_CENTER,
                )

@@ -460,10 +460,10 @@ class LiveLinePlot(UIWidgetWrapper):

                omni.ui.Button(
                    "Re-Scale",
-                    width=omni.isaac.ui.ui_utils.BUTTON_WIDTH,
+                    width=isaacsim.gui.components.ui_utils.BUTTON_WIDTH,
                    clicked_fn=self._rescale_btn_pressed,
                    alignment=omni.ui.Alignment.LEFT_CENTER,
-                    style=omni.isaac.ui.ui_utils.get_style(),
+                    style=isaacsim.gui.components.ui_utils.get_style(),
                )

                omni.ui.CheckBox(model=self._autoscale_model, tooltip="", width=4)
@@ -498,7 +498,7 @@ class LiveLinePlot(UIWidgetWrapper):
                    self.clear()
                    self._filter_mode = value

-                omni.isaac.ui.ui_utils.dropdown_builder(
+                isaacsim.gui.components.ui_utils.dropdown_builder(
                    label="Filter",
                    type="dropdown",
                    items=["None", "Lowpass", "Integrate", "Derivative"],
@@ -512,10 +512,10 @@ class LiveLinePlot(UIWidgetWrapper):
                # Button
                omni.ui.Button(
                    "Play/Pause",
-                    width=omni.isaac.ui.ui_utils.BUTTON_WIDTH,
+                    width=isaacsim.gui.components.ui_utils.BUTTON_WIDTH,
                    clicked_fn=_toggle_paused,
                    alignment=omni.ui.Alignment.LEFT_CENTER,
-                    style=omni.isaac.ui.ui_utils.get_style(),
+                    style=isaacsim.gui.components.ui_utils.get_style(),
                )

    def _create_ui_widget(self):

--- a/source/isaaclab/isaaclab/ui/widgets/ui_widget_wrapper.py
+++ b/source/isaaclab/isaaclab/ui/widgets/ui_widget_wrapper.py
@@ -3,7 +3,7 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-# This file has been adapted from _isaac_sim/exts/omni.isaac.ui/omni/isaac/ui/element_wrappers/base_ui_element_wrappers.py
+# This file has been adapted from _isaac_sim/exts/isaacsim.gui.components/isaacsim/gui/components/element_wrappers/base_ui_element_wrappers.py

 from __future__ import annotations


--- a/source/isaaclab/test/envs/test_manager_based_rl_env_ui.py
+++ b/source/isaaclab/test/envs/test_manager_based_rl_env_ui.py
@@ -32,7 +32,7 @@ from isaaclab.envs.ui import ManagerBasedRLEnvWindow
 from isaaclab.scene import InteractiveSceneCfg
 from isaaclab.utils import configclass

-enable_extension("omni.isaac.ui")
+enable_extension("isaacsim.gui.components")


 @configclass