Adds walkthrough section in documentation with jetbot tutorial (#2368)

# Description The intent is to create an in depth walkthrough for setting up a project, adding a robot, and training it in the direct workflow. the goal is to reference our tutorials and other documentation appropriately, and build off of the walkthrough for other workflows in the future ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Michael Gussert <michael@gussert.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

Adds walkthrough section in documentation with jetbot tutorial (#2368)
# Description The intent is to create an in depth walkthrough for setting up a project, adding a robot, and training it in the direct workflow. the goal is to reference our tutorials and other documentation appropriately, and build off of the walkthrough for other workflows in the future ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Michael Gussert <michael@gussert.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
d2a41266 · Michael Gussert · GitHub · 9f1aa4cd · d2a41266 · d2a41266
Unverified Commit d2a41266 authored Jun 07, 2025 by Michael Gussert Committed by GitHub Jun 07, 2025
17 changed files
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -45,7 +45,7 @@ repos:
      - id: codespell
        additional_dependencies:
        - tomli
-        exclude: "CONTRIBUTORS.md"
+        exclude: "CONTRIBUTORS.md|docs/source/setup/walkthrough/concepts_env_design.rst"
  # FIXME: Figure out why this is getting stuck under VPN.
  # - repo: https://github.com/RobertCraigie/pyright-python
  #   rev: v1.1.315

--- a/docs/index.rst
+++ b/docs/index.rst
@@ -74,19 +74,32 @@ Table of Contents

 .. toctree::
   :maxdepth: 2
-   :caption: Getting Started
+   :caption: Isaac Lab

   source/setup/ecosystem
-   source/setup/quickstart
   source/setup/installation/index
   source/setup/installation/cloud_installation
+   source/refs/reference_architecture/index
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Getting Started
+   :titlesonly:
+
+   source/setup/quickstart
+   source/setup/walkthrough/index
+   source/tutorials/index
+   source/how-to/index
+   source/overview/developer-guide/index
+

 .. toctree::
   :maxdepth: 3
   :caption: Overview
   :titlesonly:

-   source/overview/developer-guide/index
+
   source/overview/core-concepts/index
   source/overview/environments
   source/overview/reinforcement-learning/index
@@ -109,8 +122,6 @@ Table of Contents
   :caption: Resources
   :titlesonly:

-   source/tutorials/index
-   source/how-to/index
   source/deployment/index
   source/policy_deployment/index

@@ -133,7 +144,7 @@ Table of Contents
   :maxdepth: 1
   :caption: References

-   source/refs/reference_architecture/index
+
   source/refs/additional_resources
   source/refs/contributing
   source/refs/troubleshooting

--- a/docs/source/_static/overview/overview_sensors_contact_visualization.jpg
+++ b/docs/source/_static/overview/overview_sensors_contact_visualization.jpg
--- a/docs/source/_static/setup/walkthrough_1_1_result.jpg
+++ b/docs/source/_static/setup/walkthrough_1_1_result.jpg
--- a/docs/source/_static/setup/walkthrough_1_2_arrows.jpg
+++ b/docs/source/_static/setup/walkthrough_1_2_arrows.jpg
--- a/docs/source/_static/setup/walkthrough_project_setup.svg
+++ b/docs/source/_static/setup/walkthrough_project_setup.svg
--- a/docs/source/_static/setup/walkthrough_sim_stage_scene.svg
+++ b/docs/source/_static/setup/walkthrough_sim_stage_scene.svg
--- a/docs/source/_static/setup/walkthrough_stage_context.svg
+++ b/docs/source/_static/setup/walkthrough_stage_context.svg
--- a/docs/source/_static/setup/walkthrough_training_vectors.svg
+++ b/docs/source/_static/setup/walkthrough_training_vectors.svg
--- a/docs/source/overview/core-concepts/sensors/contact_sensor.rst
+++ b/docs/source/overview/core-concepts/sensors/contact_sensor.rst
@@ -73,7 +73,7 @@ Here, we print both the net contact force and the filtered force matrix for each
  Received contact force of:  tensor([[[1.3529e-05, 0.0000e+00, 1.0069e+02]]], device='cuda:0')


-.. figure:: ../../_static/overview/overview_sensors_contact_visualization.jpg
+.. figure:: ../../../_static/overview/sensors/contact_visualization.jpg
    :align: center
    :figwidth: 100%
    :alt: The contact sensor visualization

--- a/docs/source/setup/walkthrough/api_env_design.rst
+++ b/docs/source/setup/walkthrough/api_env_design.rst
+.. _walkthrough_api_env_design:
+
+Classes and Configs
+====================================
+
+To begin, navigate to the task: ``source/isaac_lab_tutorial/isaac_lab_tutorial/tasks/direct/isaac_lab_tutorial``, and take a look
+and the contents of ``isaac_lab_tutorial_env_cfg.py``.  You should see something that looks like the following
+
+.. code-block:: python
+
+  from isaaclab_assets.robots.cartpole import CARTPOLE_CFG
+
+  from isaaclab.assets import ArticulationCfg
+  from isaaclab.envs import DirectRLEnvCfg
+  from isaaclab.scene import InteractiveSceneCfg
+  from isaaclab.sim import SimulationCfg
+  from isaaclab.utils import configclass
+
+
+  @configclass
+  class IsaacLabTutorialEnvCfg(DirectRLEnvCfg):
+
+      # Some useful fields
+      .
+      .
+      .
+
+      # simulation
+      sim: SimulationCfg = SimulationCfg(dt=1 / 120, render_interval=2)
+
+      # robot(s)
+      robot_cfg: ArticulationCfg = CARTPOLE_CFG.replace(prim_path="/World/envs/env_.*/Robot")
+
+      # scene
+      scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=4096, env_spacing=4.0, replicate_physics=True)
+
+      # Some more useful fields
+      .
+      .
+      .
+
+This is the default configuration for a simple cartpole environment that comes with the template and defines the ``self`` scope
+for anything you do within the corresponding environment.
+
+.. currentmodule:: isaaclab.envs
+
+The first thing to note is the presence of the ``@configclass`` decorator. This defines a class as a configuration class, which holds
+a special place in Isaac Lab. Configuration classes are part of how Isaac Lab determines what to "care" about when it comes to cloning
+the environment to scale up training. Isaac Lab provides different base configuration classes depending on your goals, and in this
+case we are using the :class:`DirectRLEnvCfg` class because we are interested in performing reinforcement learning in the direct workflow.
+
+.. currentmodule:: isaaclab.sim
+
+The second thing to note is the content of the configuration class. As the author, you can specify any fields you desire but, generally speaking, there are three things you
+will always define here: The **sim**, the **scene**, and the **robot**. Notice that these fields are also configuration classes! Configuration classes
+are compositional in this way as a solution for cloning arbitrarily complex environments.
+
+The **sim** is an instance of :class:`SimulationCfg`, and this is the config that controls the nature of the simulated reality we are building. This field is a member
+of the base class, ``DirecRLEnvCfg``, but has a default sim configuration, so it's *technically* optional.   The ``SimulationCfg`` dictates
+how finely to step through time (dt), the direction of gravity, and even how physics should be simulated. In this case we only specify the time step and the render interval, with the
+former indicating that each step through time should simulate :math:`1/120`th of a second, and the latter being how many steps we should take before we render a frame (a value of 2 means
+render every other frame).
+
+.. currentmodule:: isaaclab.scene
+
+The **scene** is an instance of :class:`InteractiveSceneCfg`. The scene describes what goes "on the stage" and manages those simulation entities to be cloned across environments.
+The scene is also a member of the base class ``DirectRLEnvCfg``, but unlike the sim it has no default and must be defined in every ``DirectRLEnvCfg``.  The ``InteractiveSceneCfg``
+describes how many copies of the scene we want to create for training purposes, as well as how far apart they should be spaced on the stage.
+
+.. currentmodule:: isaaclab.assets
+
+Finally we have the **robot** definition, which is an instance of  :class:`ArticulationCfg`. An environment could have multiple articulations, and so the presence of
+an ``ArticulationCfg`` is not strictly required in order to define a ``DirectRLEnv``.  Instead, the usual workflow is to define a regex path to the robot, and replace
+the ``prim_path`` attribute in the base configuration. In this case, ``CARTPOLE_CFG`` is a configuration defined in ``isaaclab_assets.robots.cartpole`` and by replacing
+the prim path with ``/World/envs/env_.*/Robot`` we are implicitly saying that every copy of the scene will have a robot named ``Robot``.
+
+
+The Environment
+-----------------
+
+Next, let's take a look at the contents of the other python file in our task directory: ``isaac_lab_tutorial_env_cfg.py``
+
+.. code-block:: python
+
+  #imports
+  .
+  .
+  .
+  from .isaac_lab_tutorial_env_cfg import IsaacLabTutorialEnvCfg
+
+  class IsaacLabTutorialEnv(DirectRLEnv):
+      cfg: IsaacLabTutorialEnvCfg
+
+      def __init__(self, cfg: IsaacLabTutorialEnvCfg, render_mode: str | None = None, **kwargs):
+          super().__init__(cfg, render_mode, **kwargs)
+          . . .
+
+      def _setup_scene(self):
+          self.robot = Articulation(self.cfg.robot_cfg)
+          # add ground plane
+          spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg())
+          # add articulation to scene
+          self.scene.articulations["robot"] = self.robot
+          # clone and replicate
+          self.scene.clone_environments(copy_from_source=False)
+          # add lights
+          light_cfg = sim_utils.DomeLightCfg(intensity=2000.0, color=(0.75, 0.75, 0.75))
+          light_cfg.func("/World/Light", light_cfg)
+
+      def _pre_physics_step(self, actions: torch.Tensor) -> None:
+          . . .
+
+      def _apply_action(self) -> None:
+          . . .
+
+      def _get_observations(self) -> dict:
+          . . .
+
+      def _get_rewards(self) -> torch.Tensor:
+          total_reward = compute_rewards(...)
+          return total_reward
+
+      def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
+          . . .
+
+      def _reset_idx(self, env_ids: Sequence[int] | None):
+          . . .
+
+  @torch.jit.script
+  def compute_rewards(...):
+      . . .
+      return total_reward
+
+
+.. currentmodule:: isaaclab.envs
+
+Some of the code has been omitted for clarity, in order to aid in discussion. This is where the actual "meat" of the
+direct workflow exists and where most of our modifications will take place as we tweak the template to suit our needs.
+Currently, all of the member functions of ``IsaacLabTutorialEnv`` are directly inherited from the :class:`DirectRLEnv`. This
+known interface is how Isaac Lab and its supported RL frameworks interact with the environment.
+
+When the environment is initialized, it receives its own config as an argument, which is then immediately passed to super in order
+to initialize the ``DirectRLEnv``.  This super call also calls ``_setup_scene``, which actually constructs the scene and clones
+it appropriately. Notably is how the robot is created and registered to the scene in ``_setup_scene``.  First, the robot articulation
+is created by using the ``robot_config`` we defined in ``IsaacLabTutorialEnvCfg``: it doesn't exist before this point! When the
+articulation is created, the robot exists on the stage at ``/World/envs/env_0/Robot``.  The call to ``scene.clone_environments`` then
+copies ``env_0`` appropriately.  At this point the robot exists as many copies on the stage, so all that's left is to notify the ``scene``
+object of the existence of this articulation to be tracked.  The articulations of the scene are kept as a dictionary, so ``scene.articulations["robot"] = self.robot``
+creates a new ``robot`` element of the ``articulations`` dictionary and sets the value to be ``self.robot``.
+
+Notice also that the remaining functions do not take additional arguments except ``_reset_idx``.  This is because the environment only manages the application of
+actions to the agent being simulated, and then updating the sim.  This is what the ``_pre_physics_step`` and ``_apply_action`` steps are for: we set the drive commands
+to the robot so that when the simulation steps forward, the actions are applied and the joints are driven to new targets. This process is broken into steps like this
+in order to ensure systematic control over how the environment is executed, and is especially important in the manager workflow. A similar relationship exists between the
+``_get_dones`` function and ``_reset_idx``.  The former, ``_get_dones`` determines if each of the environments is in a terminal state, and populates tensors of boolean
+values to indicate which environments terminated due to entering a terminal state vs time out (the two returned tensors of the function).  The latter, ``_reset_idx`` takes a
+list environment index values (integers) and then actually resets those environments.  It is important that things like updating drive targets or resetting environments
+do not happen **during** the physics or rendering steps, and breaking up the interface in this way helps prevent that.
--- a/docs/source/setup/walkthrough/concepts_env_design.rst
+++ b/docs/source/setup/walkthrough/concepts_env_design.rst
+.. _walkthrough_concepts_env_design:
+
+Environment Design Background
+==============================
+
+Now that we have our project installed, we can start designing the environment. In the traditional description
+of a reinforcement learning (RL) problem, the environment is responsible for using the actions produced by the agent to
+update the state of the "world", and finally compute and return the observations and the reward signal. However, there are
+some additional concepts that are unique to Isaac Sim and Lab regarding the mechanics of the simulation itself.
+The traditional description of a reinforcement learning problem presumes a "world", but we get no such luxury; we must define
+the world ourselves, and success depends on understanding on how to construct that world and how it will fit into the simulation.
+
+App, Sim, World, Stage, and Scene
+----------------------------------
+
+.. figure:: ../../_static/setup/walkthrough_sim_stage_scene.svg
+    :align: center
+    :figwidth: 100%
+    :alt: How the sim is organized.
+
+The **World** is defined by the origin of a cartesian coordinate system and the units that define it. How big or how small? How
+near or how far?  The answers to questions like these can only be defined *relative* to some contextual reference frame, and that
+reference frame is what defines the world.
+
+"Above" the world in structure is the **Sim**\ ulation and the **App**\ lication.  The **Application** is "the thing responsible for
+everything else": It governs all resource management as well as launching and destroying the simulation when we are done with it.
+When we :ref:`launched training with the template<walkthrough_project_setup>`, the window that appears with the viewport of cartpoles
+training is the Application window.  The application is not defined by the GUI however, and even when running in headless mode all
+simulations have an application that governs them.
+
+The **Simulation** controls the "rules" of the world.  It defines the laws of physics, such as how time and gravity should work, and how frequently to perform
+rendering. If the application holds the sim, then the sim holds the world. The simulation governs a single step through time by dividing it into many different
+sub-steps, each devoted to a specific aspect of updating the world into a state. Many of the APIs in Isaac Lab are written to specifically hook into
+these various steps and you will often see functions named like ``_pre_XYZ_step`` and ``_post_XYZ_step`` where ``XYZ_step`` is the name of one of these sub-steps of
+the simulation, such as the ``physics_step`` or the ``render_step``.
+
+"Below" the world in structure is the **Stage** and the **Scene**.  If the world provides spatial context to the sim, then
+the **Stage** provides the *compositional context* for the world. Suppose we want to simulate a table set for a meal in a room:
+the room is the "world" in this case, and we choose the origin of the world to be one of the corners of the room. The position of the
+table in the room is defined as a vector from the origin of the world to some point on the table that we choose to be the origin of a *new* coordinate
+system, fixed to the table.  It's not useful to us, *the agent*\ , to talk about the location of the food and the utensils on the table with respect to the
+corner of the room: instead it is preferable to use the coordinates defined with respect to the table. However, the simulation needs to know
+these global coordinates in order to properly simulate the next time step, so we must define how these two coordinate systems are *composed* together.
+
+This is what the stage accomplishes: everything in the simulation is a `USD primitive <https://openusd.org/release/glossary.html#usdglossary-prim>`_ and the
+stage represents the relationships between these primitives as a tree, with the context being defined by the relative path in the tree. Every prim on the stage
+has a name and therefore a path in this tree, such as ``/room/table/food`` or ``room/table/utensils``. Relationships are defined by the "parents" and "children"
+of a given node in this tree: the ``table`` is a child of the ``room`` but a parent of ``food``. Compositional properties of the parent are applied to all of its
+children, but child prims have the ability to override parent properties if necessary, as is often the case for materials.
+
+.. figure:: ../../_static/setup/walkthrough_stage_context.svg
+    :align: center
+    :figwidth: 100%
+    :alt: How the stage organizes context
+
+Armed with this vocabulary, we can finally talk about the **Scene**, one of the most critical elements to understand about Isaac Lab. Deep learning, in
+all its forms, is rooted in the analysis of data.  This is true even in robot learning, where data is acquired through the sensors of the robot being trained.
+The time required to setup the robot, collect data, and reset the robot to collect more, is a fundamental bottleneck in teaching robots to do *anything*, with any method.
+Isaac Sim gives us access to robots without the need for literal physical robots, but Isaac Lab gives us access to *vectorization*: the ability to simulate many copies
+of a training procedure efficiently, thus multiplying the rate of data generation and accelerating training proportionally. The scene governs those primitives on the stage
+that matter to this vectorization process, known as **simulation entities**.
+
+Suppose the reason why you want to simulate a table set for a meal is because you would like to train a robot to place the table settings for you! The robot, the table,
+and all the things on it can be registered to the scene of an environment.  We can then specify how many copies we want and the scene will automatically
+construct and run those copies on the stage. These copies are placed at new coordinates on the stage, defining a new reference frame from which observations
+and rewards can be computed. Every copy of the scene exists on the stage and is being simulated by the same world.  This is much more efficient
+than running unique simulations for each copy, but it does open up the possibility of unwanted interactions between copies of the scene, so it's important
+to keep this in mind while debugging.
+
+Now that we have a grasp on the mechanics, we can take a look at the code generated for our template project!
--- a/docs/source/setup/walkthrough/index.rst
+++ b/docs/source/setup/walkthrough/index.rst
+.. _walkthrough:
+
+Walkthrough
+========================
+
+So you finished installing Isaac Sim and Isaac Lab, and you verified that everything is working as expected...
+
+Now what?
+
+The following walkthrough will guide you through setting up an Isaac Lab extension project, adding a new robot to lab, designing an environment, and training a policy for that robot.
+For this walkthrough, we will be starting with the Jetbot, a simple two wheeled differential base robot with a camera mounted on top, but the intent is for these guides to be general enough that you can use them to add your own robots and environments to Isaac Lab!
+
+The end result of this walkthrough can be found in our tutorial project repository `here <https://github.com/isaac-sim/IsaacLabTutorial/tree/main>`_. Each branch of this repository
+represents a different stage of modifying the default template project to achieve our goals.
+
+.. toctree::
+  :maxdepth: 1
+  :titlesonly:
+
+  project_setup
+  concepts_env_design
+  api_env_design
+  technical_env_design
+  training_jetbot_gt
+  training_jetbot_reward_exploration
--- a/docs/source/setup/walkthrough/project_setup.rst
+++ b/docs/source/setup/walkthrough/project_setup.rst
+.. _walkthrough_project_setup:
+
+
+Isaac Lab Project Setup
+========================
+
+The best way to create a new project is to use the :ref:`Template Generator<template-generator>`. Generating the template
+for this tutorial series is done by calling the ``isaaclab`` script from the root directory of the repository
+
+.. code-block:: bash
+
+    ./isaaclab.sh --new
+
+Be sure to select ``External`` and ``Direct | single agent``.  For the frameworks, select ``skrl`` and both ``PPO`` and ``AMP`` on the following menu.  You can
+select other frameworks if you like, but this tutorial will detail ``skrl`` specifically. The configuration process for other frameworks is similar. You
+can get a copy of this code directly by checking out the `initial branch of the tutorial repository <https://github.com/isaac-sim/IsaacLabTutorial/tree/initial>`_!
+
+
+This will create an extension project with the specified name at the chosen path.  For this tutorial, we chose the name ``isaac_lab_tutorial``.
+
+.. note::
+
+    The template generator expects the project name to respect "snake_case": all lowercase with underscores separating words. However, we have renamed the
+    sample project to "IsaacLabTutorial" to more closely match the naming convention GitHub and our other projects. If you are following along with the example
+    repository, note this minor difference as some superficial path names may change.  If you are following along by building the project yourself, then you can ignore this note.
+
+Next, we must install the project as a python module.  Navigate to the directory that was just created
+(it will contain the ``source`` and ``scripts`` directories for the project) and then run the following to install the module.
+
+.. code-block:: bash
+
+    python -m pip install -e source/isaac_lab_tutorial
+
+To verify that things have been setup properly, run
+
+.. code-block:: bash
+
+    python scripts/list_envs.py
+
+from the root directory of your new project.  This should generate a table that looks something like the following
+
+.. code-block:: bash
+
+    +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
+    |                                                                                                          Available Environments in Isaac Lab                                                                                                          |
+    +--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
+    | S. No. | Task Name                             | Entry Point                                                                                   | Config                                                                                               |
+    +--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
+    |   1    | Template-Isaac-Lab-Tutorial-Direct-v0 | isaac_lab_tutorial.tasks.direct.isaac_lab_tutorial.isaac_lab_tutorial_env:IsaacLabTutorialEnv | isaac_lab_tutorial.tasks.direct.isaac_lab_tutorial.isaac_lab_tutorial_env_cfg:IsaacLabTutorialEnvCfg |
+    +--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
+
+We can now use the task name to run the environment.
+
+.. code-block:: bash
+
+    python scripts/skrl/train.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
+
+and by default, this should start a cartpole training environment.
+
+Let the training finish and then run the following command to see the trained policy in action!
+
+.. code-block:: bash
+
+    python scripts/skrl/play.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
+
+Notice that you did not need to specify the path for the checkpoint file! This is because Isaac Lab handles much of the minute details
+like checkpoint saving, loading, and logging. In this case, the ``train.py`` script will create two directories: **logs** and **output**, which are
+used as the default output directories for tasks run by this project.
+
+
+Project Structure
+------------------------------
+
+There are four nested structures you need to be aware of when working in the direct workflow with an Isaac Lab template
+project: the **Project**, the **Extension**, the **Modules**, and the **Task**.
+
+.. figure:: ../../_static/setup/walkthrough_project_setup.svg
+    :align: center
+    :figwidth: 100%
+    :alt: The structure of the isaac lab template project.
+
+The **Project** is the root directory of the generated template.  It contains the source and scripts directories, as well as
+a ``README.md`` file. When we created the template, we named the project *IsaacLabTutorial* and this defined the root directory
+of a git repository.   If you examine the project root with hidden files visible you will see a number of files defining
+the behavior of the project with respect to git. The ``scripts`` directory contains the ``train.py`` and ``play.py`` scripts for the
+various RL libraries you chose when generating the template, while the source directory contains the python packages for the project.
+
+The **Extension** is the name of the python package we installed via pip. By default, the template generates a project
+with a single extension of the same name. A project can have multiple extensions, and so they are kept in a common ``source``
+directory. Traditional python packages are defined by the presence of a ``pyproject.toml`` file that describes the package
+metadata, but packages using Isaac Lab must also be Isaac Sim extensions and so require a ``config`` directory and an accompanying
+``extension.toml`` file that describes the metadata needed by the Isaac Sim extension manager. Finally, because the template
+is intended to be installed via pip, it needs a ``setup.py`` file to complete the setup procedure using the ``extension.toml``
+config. A project can have multiple extensions, as evidenced by the Isaac Lab repository itself!
+
+The **Modules** are what actually gets loaded by Isaac Lab to run training (the meat of the code). By default, the template
+generates an extension with a single module that is named the same as the project. The structure of the various sub-modules
+in the extension is what determines the ``entry_point`` for an environment in Isaac Lab. This is why our template project needed
+to be installed before we could call ``train.py``: the path to the necessary components to run the task needed to be exposed
+to python for Isaac Lab to find them.
+
+Finally, the **Task** is the heart of the direct workflow. By default, the template generates a single task with the same name
+as the project. The environment and configuration files are stored here, as well as placeholder, RL library dependent ``agents``.
+Critically, note the contents of the ``__init__.py``! Specifically, the ``gym.register`` function needs to be called at least once
+before an environment and task can be used with the Isaac Lab ``train.py`` and ``play.py`` scripts.
+This function should be included in one of the module ``__init__.py`` files so it is called at installation. The path to
+this init file is what defines the entry point for the task!
+
+For the template, ``gym.register`` is called within ``isaac_lab_tutorial/source/isaac_lab_tutorial/isaac_lab_tutorial/tasks/direct/isaac_lab_tutorial/__init__.py``.
+The repeated name is a consequence of needing default names for the template, but now we can see the structure of the project.
+**Project**/source/**Extension**/**Module**/tasks/direct/**Task**/__init__.py
--- a/docs/source/setup/walkthrough/technical_env_design.rst
+++ b/docs/source/setup/walkthrough/technical_env_design.rst
--- a/docs/source/setup/walkthrough/training_jetbot_gt.rst
+++ b/docs/source/setup/walkthrough/training_jetbot_gt.rst
--- a/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst
+++ b/docs/source/setup/walkthrough/training_jetbot_reward_exploration.rst
+.. _walkthrough_training_jetbot_reward_exploration:
+
+Exploring the RL problem
+=========================
+
+The command to the Jetbot is a unit vector in specifying the desired drive direction and we must make the agent aware of this somehow
+so it can adjust its actions accordingly.  There are many possible ways to do this, with the "zeroth order" approach to simply change the observation space to include
+this command. To start, **edit the ``IsaacLabTutorialEnvCfg`` to set the observation space to 9**: the world velocity vector contains the linear and angular velocities
+of the robot, which is 6 dimensions and if we append the command to this vector, that's 9 dimensions for the observation space in total.
+
+Next, we just need to do that appending when we get the observations.  We also need to calculate our forward vectors for later use. The forward vector for the Jetbot is
+the x axis, so we apply the ``root_link_quat_w`` to ``[1,0,0]`` to get the forward vector in the world frame. Replace the ``_get_observations`` method with the following:
+
+.. code-block:: python
+
+    def _get_observations(self) -> dict:
+        self.velocity = self.robot.data.root_com_vel_w
+        self.forwards = math_utils.quat_apply(self.robot.data.root_link_quat_w, self.robot.data.FORWARD_VEC_B)
+        obs = torch.hstack((self.velocity, self.commands))
+        observations = {"policy": obs}
+        return observations
+
+ So now what should the reward be?
+
+When the robot is behaving as desired, it will be driving at full speed in the direction of the command. If we reward both
+"driving forward" and "alignment to the command", then maximizing that combined signal should result in driving to the command... right?
+
+Let's give it a try! Replace the ``_get_rewards`` method with the following:
+
+.. code-block:: python
+
+    def _get_rewards(self) -> torch.Tensor:
+        forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
+        alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
+        total_reward = forward_reward + alignment_reward
+        return total_reward
+
+The ``forward_reward`` is the x component of the linear center of mass velocity of the robot in the body frame. We know that
+the x direction is the forward direction for the asset, so this should be equivalent to inner product between the forward vector and
+the linear velocity in the world frame.  The alignment term is the inner product between the forward vector and the command vector: when they are
+pointing in the same direction this term will be 1, but in the opposite direction it will be -1.  We add them together to get the combined reward and
+we can finally run training!  Let's see what happens!
+
+.. code-block:: bash
+
+    python scripts/skrl/train.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
+
+
+.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_naive_webp.webp
+    :align: center
+    :figwidth: 100%
+    :alt: Naive results
+
+Surely we can do better!
+
+Reward and Observation Tuning
+-------------------------------
+
+When tuning an environment for training, as a rule of thumb, you want to keep the observation space as small as possible.  This is to
+reduce the number parameters in the model (the literal interpretation of Occam's razor) and thus improve training time. In this case we
+need to somehow encode our alignment to the command and our forward speed. One way to do this is to exploit the dot and cross products
+from linear algebra! Replace the contents of ``_get_observations`` with the following:
+
+.. code-block:: python
+
+    def _get_observations(self) -> dict:
+        self.velocity = self.robot.data.root_com_vel_w
+        self.forwards = math_utils.quat_apply(self.robot.data.root_link_quat_w, self.robot.data.FORWARD_VEC_B)
+
+        dot = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
+        cross = torch.cross(self.forwards, self.commands, dim=-1)[:,-1].reshape(-1,1)
+        forward_speed = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
+        obs = torch.hstack((dot, cross, forward_speed))
+
+        observations = {"policy": obs}
+        return observations
+
+The dot or inner product tells us how aligned two vectors are as a single scalar quantity.  If they are very aligned and pointed in the same direction, then the inner
+product will be large and positive, but if they are aligned and in opposite directions, it will be large and negative.  If two vectors are
+perpendicular, the inner product is zero. This means that the inner product between the forward vector and the command vector can tell us
+how much we are facing towards or away from the command, but not which direction we need to turn to improve alignment.
+
+The cross product also tells us how aligned two vectors are, but it expresses this relationship as a vector.  The cross product between any
+two vectors defines an axis that is perpendicular to the plane containing the two argument vectors, where the direction of the result vector along this axis is
+determined by the chirality (dimension ordering, or handedness) of the coordinate system. In our case, we can exploit the fact that we are operating in 2D to only
+examine the z component of the result of :math:`\vec{forward} \times \vec{command}`. This component will be zero if the vectors are colinear, positive if the
+command vector is to the left of forward, and negative if it's to the right.
+
+Finally, the x component of the center of mass linear velocity tells us our forward speed, with positive being forward and negative being backwards. We stack these together
+"horizontally" (along dim 1) to generate the observations for each Jetbot. This alone improves performance!
+
+
+.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_improved_webp.webp
+    :align: center
+    :figwidth: 100%
+    :alt: Improved results
+
+It seems to qualitatively train better, and the Jetbots are somewhat inching forward... Surely we can do better still!
+
+Another rule of thumb for training is to reduce and simplify the reward function as much as possible.  Terms in the reward behave similarly to
+the logical "OR" operation.  In our case, we are rewarding driving forward and being aligned to the command by adding them together, so our agent
+can be reward for driving forward OR being aligned to the command. To force the agent to learn to drive in the direction of the command, we should only
+reward the agent driving forward AND being aligned. Logical AND suggests multiplication and therefore the following reward function:
+
+.. code-block:: python
+
+    def _get_rewards(self) -> torch.Tensor:
+        forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
+        alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
+        total_reward = forward_reward*alignment_reward
+        return total_reward
+
+Now we will only get rewarded for driving forward if our alignment reward is non zero.  Let's see what kind of result this produces!
+
+.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_tuned_webp.webp
+    :align: center
+    :figwidth: 100%
+    :alt: Tuned results
+
+It definitely trains faster, but the Jetbots have learned to drive in reverse if the command is pointed behind them. This may be desirable in our
+case, but it shows just how dependent the policy behavior is on the reward function.  In this case, there are **degenerate solutions** to our
+reward function: The reward is maximized for driving forward and aligned to the command, but if the Jetbot drives in reverse, then the forward
+term is negative, and if its driving in reverse towards the command, then the alignment term is **also negative**, meaning hat the reward is positive!
+When you design your own environments, you will run into degenerate solutions like this and a significant amount of reward engineering is devoted to
+suppressing or supporting these behaviors by modifying the reward function.
+
+Let's say, in our case, we don't want this behavior. In our case, the alignment term has a domain of ``[-1, 1]``, but we would much prefer it to be mapped
+only to positive values. We don't want to *eliminate* the sign on the alignment term, rather, we would like large negative values to be near zero, so if we
+are misaligned, we don't get rewarded. The exponential function accomplishes this!
+
+.. code-block:: python
+
+    def _get_rewards(self) -> torch.Tensor:
+        forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
+        alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
+        total_reward = forward_reward*torch.exp(alignment_reward)
+        return total_reward
+
+Now when we train, the Jetbots will turn to always drive towards the command in the forward direction!
+
+.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_directed_webp.webp
+    :align: center
+    :figwidth: 100%
+    :alt: Directed results