Unverified Commit d2a41266 authored by Michael Gussert's avatar Michael Gussert Committed by GitHub

Adds walkthrough section in documentation with jetbot tutorial (#2368)

# Description

The intent is to create an in depth walkthrough for setting up a
project, adding a robot, and training it in the direct workflow. the
goal is to reference our tutorials and other documentation
appropriately, and build off of the walkthrough for other workflows in
the future

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------
Signed-off-by: 's avatarMichael Gussert <michael@gussert.com>
Co-authored-by: 's avatarKelly Guo <kellyg@nvidia.com>
parent 9f1aa4cd
......@@ -45,7 +45,7 @@ repos:
- id: codespell
additional_dependencies:
- tomli
exclude: "CONTRIBUTORS.md"
exclude: "CONTRIBUTORS.md|docs/source/setup/walkthrough/concepts_env_design.rst"
# FIXME: Figure out why this is getting stuck under VPN.
# - repo: https://github.com/RobertCraigie/pyright-python
# rev: v1.1.315
......
......@@ -74,19 +74,32 @@ Table of Contents
.. toctree::
:maxdepth: 2
:caption: Getting Started
:caption: Isaac Lab
source/setup/ecosystem
source/setup/quickstart
source/setup/installation/index
source/setup/installation/cloud_installation
source/refs/reference_architecture/index
.. toctree::
:maxdepth: 2
:caption: Getting Started
:titlesonly:
source/setup/quickstart
source/setup/walkthrough/index
source/tutorials/index
source/how-to/index
source/overview/developer-guide/index
.. toctree::
:maxdepth: 3
:caption: Overview
:titlesonly:
source/overview/developer-guide/index
source/overview/core-concepts/index
source/overview/environments
source/overview/reinforcement-learning/index
......@@ -109,8 +122,6 @@ Table of Contents
:caption: Resources
:titlesonly:
source/tutorials/index
source/how-to/index
source/deployment/index
source/policy_deployment/index
......@@ -133,7 +144,7 @@ Table of Contents
:maxdepth: 1
:caption: References
source/refs/reference_architecture/index
source/refs/additional_resources
source/refs/contributing
source/refs/troubleshooting
......
This diff is collapsed.
This source diff could not be displayed because it is too large. You can view the blob instead.
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
......@@ -73,7 +73,7 @@ Here, we print both the net contact force and the filtered force matrix for each
Received contact force of: tensor([[[1.3529e-05, 0.0000e+00, 1.0069e+02]]], device='cuda:0')
.. figure:: ../../_static/overview/overview_sensors_contact_visualization.jpg
.. figure:: ../../../_static/overview/sensors/contact_visualization.jpg
:align: center
:figwidth: 100%
:alt: The contact sensor visualization
......
.. _walkthrough_api_env_design:
Classes and Configs
====================================
To begin, navigate to the task: ``source/isaac_lab_tutorial/isaac_lab_tutorial/tasks/direct/isaac_lab_tutorial``, and take a look
and the contents of ``isaac_lab_tutorial_env_cfg.py``. You should see something that looks like the following
.. code-block:: python
from isaaclab_assets.robots.cartpole import CARTPOLE_CFG
from isaaclab.assets import ArticulationCfg
from isaaclab.envs import DirectRLEnvCfg
from isaaclab.scene import InteractiveSceneCfg
from isaaclab.sim import SimulationCfg
from isaaclab.utils import configclass
@configclass
class IsaacLabTutorialEnvCfg(DirectRLEnvCfg):
# Some useful fields
.
.
.
# simulation
sim: SimulationCfg = SimulationCfg(dt=1 / 120, render_interval=2)
# robot(s)
robot_cfg: ArticulationCfg = CARTPOLE_CFG.replace(prim_path="/World/envs/env_.*/Robot")
# scene
scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=4096, env_spacing=4.0, replicate_physics=True)
# Some more useful fields
.
.
.
This is the default configuration for a simple cartpole environment that comes with the template and defines the ``self`` scope
for anything you do within the corresponding environment.
.. currentmodule:: isaaclab.envs
The first thing to note is the presence of the ``@configclass`` decorator. This defines a class as a configuration class, which holds
a special place in Isaac Lab. Configuration classes are part of how Isaac Lab determines what to "care" about when it comes to cloning
the environment to scale up training. Isaac Lab provides different base configuration classes depending on your goals, and in this
case we are using the :class:`DirectRLEnvCfg` class because we are interested in performing reinforcement learning in the direct workflow.
.. currentmodule:: isaaclab.sim
The second thing to note is the content of the configuration class. As the author, you can specify any fields you desire but, generally speaking, there are three things you
will always define here: The **sim**, the **scene**, and the **robot**. Notice that these fields are also configuration classes! Configuration classes
are compositional in this way as a solution for cloning arbitrarily complex environments.
The **sim** is an instance of :class:`SimulationCfg`, and this is the config that controls the nature of the simulated reality we are building. This field is a member
of the base class, ``DirecRLEnvCfg``, but has a default sim configuration, so it's *technically* optional. The ``SimulationCfg`` dictates
how finely to step through time (dt), the direction of gravity, and even how physics should be simulated. In this case we only specify the time step and the render interval, with the
former indicating that each step through time should simulate :math:`1/120`th of a second, and the latter being how many steps we should take before we render a frame (a value of 2 means
render every other frame).
.. currentmodule:: isaaclab.scene
The **scene** is an instance of :class:`InteractiveSceneCfg`. The scene describes what goes "on the stage" and manages those simulation entities to be cloned across environments.
The scene is also a member of the base class ``DirectRLEnvCfg``, but unlike the sim it has no default and must be defined in every ``DirectRLEnvCfg``. The ``InteractiveSceneCfg``
describes how many copies of the scene we want to create for training purposes, as well as how far apart they should be spaced on the stage.
.. currentmodule:: isaaclab.assets
Finally we have the **robot** definition, which is an instance of :class:`ArticulationCfg`. An environment could have multiple articulations, and so the presence of
an ``ArticulationCfg`` is not strictly required in order to define a ``DirectRLEnv``. Instead, the usual workflow is to define a regex path to the robot, and replace
the ``prim_path`` attribute in the base configuration. In this case, ``CARTPOLE_CFG`` is a configuration defined in ``isaaclab_assets.robots.cartpole`` and by replacing
the prim path with ``/World/envs/env_.*/Robot`` we are implicitly saying that every copy of the scene will have a robot named ``Robot``.
The Environment
-----------------
Next, let's take a look at the contents of the other python file in our task directory: ``isaac_lab_tutorial_env_cfg.py``
.. code-block:: python
#imports
.
.
.
from .isaac_lab_tutorial_env_cfg import IsaacLabTutorialEnvCfg
class IsaacLabTutorialEnv(DirectRLEnv):
cfg: IsaacLabTutorialEnvCfg
def __init__(self, cfg: IsaacLabTutorialEnvCfg, render_mode: str | None = None, **kwargs):
super().__init__(cfg, render_mode, **kwargs)
. . .
def _setup_scene(self):
self.robot = Articulation(self.cfg.robot_cfg)
# add ground plane
spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg())
# add articulation to scene
self.scene.articulations["robot"] = self.robot
# clone and replicate
self.scene.clone_environments(copy_from_source=False)
# add lights
light_cfg = sim_utils.DomeLightCfg(intensity=2000.0, color=(0.75, 0.75, 0.75))
light_cfg.func("/World/Light", light_cfg)
def _pre_physics_step(self, actions: torch.Tensor) -> None:
. . .
def _apply_action(self) -> None:
. . .
def _get_observations(self) -> dict:
. . .
def _get_rewards(self) -> torch.Tensor:
total_reward = compute_rewards(...)
return total_reward
def _get_dones(self) -> tuple[torch.Tensor, torch.Tensor]:
. . .
def _reset_idx(self, env_ids: Sequence[int] | None):
. . .
@torch.jit.script
def compute_rewards(...):
. . .
return total_reward
.. currentmodule:: isaaclab.envs
Some of the code has been omitted for clarity, in order to aid in discussion. This is where the actual "meat" of the
direct workflow exists and where most of our modifications will take place as we tweak the template to suit our needs.
Currently, all of the member functions of ``IsaacLabTutorialEnv`` are directly inherited from the :class:`DirectRLEnv`. This
known interface is how Isaac Lab and its supported RL frameworks interact with the environment.
When the environment is initialized, it receives its own config as an argument, which is then immediately passed to super in order
to initialize the ``DirectRLEnv``. This super call also calls ``_setup_scene``, which actually constructs the scene and clones
it appropriately. Notably is how the robot is created and registered to the scene in ``_setup_scene``. First, the robot articulation
is created by using the ``robot_config`` we defined in ``IsaacLabTutorialEnvCfg``: it doesn't exist before this point! When the
articulation is created, the robot exists on the stage at ``/World/envs/env_0/Robot``. The call to ``scene.clone_environments`` then
copies ``env_0`` appropriately. At this point the robot exists as many copies on the stage, so all that's left is to notify the ``scene``
object of the existence of this articulation to be tracked. The articulations of the scene are kept as a dictionary, so ``scene.articulations["robot"] = self.robot``
creates a new ``robot`` element of the ``articulations`` dictionary and sets the value to be ``self.robot``.
Notice also that the remaining functions do not take additional arguments except ``_reset_idx``. This is because the environment only manages the application of
actions to the agent being simulated, and then updating the sim. This is what the ``_pre_physics_step`` and ``_apply_action`` steps are for: we set the drive commands
to the robot so that when the simulation steps forward, the actions are applied and the joints are driven to new targets. This process is broken into steps like this
in order to ensure systematic control over how the environment is executed, and is especially important in the manager workflow. A similar relationship exists between the
``_get_dones`` function and ``_reset_idx``. The former, ``_get_dones`` determines if each of the environments is in a terminal state, and populates tensors of boolean
values to indicate which environments terminated due to entering a terminal state vs time out (the two returned tensors of the function). The latter, ``_reset_idx`` takes a
list environment index values (integers) and then actually resets those environments. It is important that things like updating drive targets or resetting environments
do not happen **during** the physics or rendering steps, and breaking up the interface in this way helps prevent that.
.. _walkthrough_concepts_env_design:
Environment Design Background
==============================
Now that we have our project installed, we can start designing the environment. In the traditional description
of a reinforcement learning (RL) problem, the environment is responsible for using the actions produced by the agent to
update the state of the "world", and finally compute and return the observations and the reward signal. However, there are
some additional concepts that are unique to Isaac Sim and Lab regarding the mechanics of the simulation itself.
The traditional description of a reinforcement learning problem presumes a "world", but we get no such luxury; we must define
the world ourselves, and success depends on understanding on how to construct that world and how it will fit into the simulation.
App, Sim, World, Stage, and Scene
----------------------------------
.. figure:: ../../_static/setup/walkthrough_sim_stage_scene.svg
:align: center
:figwidth: 100%
:alt: How the sim is organized.
The **World** is defined by the origin of a cartesian coordinate system and the units that define it. How big or how small? How
near or how far? The answers to questions like these can only be defined *relative* to some contextual reference frame, and that
reference frame is what defines the world.
"Above" the world in structure is the **Sim**\ ulation and the **App**\ lication. The **Application** is "the thing responsible for
everything else": It governs all resource management as well as launching and destroying the simulation when we are done with it.
When we :ref:`launched training with the template<walkthrough_project_setup>`, the window that appears with the viewport of cartpoles
training is the Application window. The application is not defined by the GUI however, and even when running in headless mode all
simulations have an application that governs them.
The **Simulation** controls the "rules" of the world. It defines the laws of physics, such as how time and gravity should work, and how frequently to perform
rendering. If the application holds the sim, then the sim holds the world. The simulation governs a single step through time by dividing it into many different
sub-steps, each devoted to a specific aspect of updating the world into a state. Many of the APIs in Isaac Lab are written to specifically hook into
these various steps and you will often see functions named like ``_pre_XYZ_step`` and ``_post_XYZ_step`` where ``XYZ_step`` is the name of one of these sub-steps of
the simulation, such as the ``physics_step`` or the ``render_step``.
"Below" the world in structure is the **Stage** and the **Scene**. If the world provides spatial context to the sim, then
the **Stage** provides the *compositional context* for the world. Suppose we want to simulate a table set for a meal in a room:
the room is the "world" in this case, and we choose the origin of the world to be one of the corners of the room. The position of the
table in the room is defined as a vector from the origin of the world to some point on the table that we choose to be the origin of a *new* coordinate
system, fixed to the table. It's not useful to us, *the agent*\ , to talk about the location of the food and the utensils on the table with respect to the
corner of the room: instead it is preferable to use the coordinates defined with respect to the table. However, the simulation needs to know
these global coordinates in order to properly simulate the next time step, so we must define how these two coordinate systems are *composed* together.
This is what the stage accomplishes: everything in the simulation is a `USD primitive <https://openusd.org/release/glossary.html#usdglossary-prim>`_ and the
stage represents the relationships between these primitives as a tree, with the context being defined by the relative path in the tree. Every prim on the stage
has a name and therefore a path in this tree, such as ``/room/table/food`` or ``room/table/utensils``. Relationships are defined by the "parents" and "children"
of a given node in this tree: the ``table`` is a child of the ``room`` but a parent of ``food``. Compositional properties of the parent are applied to all of its
children, but child prims have the ability to override parent properties if necessary, as is often the case for materials.
.. figure:: ../../_static/setup/walkthrough_stage_context.svg
:align: center
:figwidth: 100%
:alt: How the stage organizes context
Armed with this vocabulary, we can finally talk about the **Scene**, one of the most critical elements to understand about Isaac Lab. Deep learning, in
all its forms, is rooted in the analysis of data. This is true even in robot learning, where data is acquired through the sensors of the robot being trained.
The time required to setup the robot, collect data, and reset the robot to collect more, is a fundamental bottleneck in teaching robots to do *anything*, with any method.
Isaac Sim gives us access to robots without the need for literal physical robots, but Isaac Lab gives us access to *vectorization*: the ability to simulate many copies
of a training procedure efficiently, thus multiplying the rate of data generation and accelerating training proportionally. The scene governs those primitives on the stage
that matter to this vectorization process, known as **simulation entities**.
Suppose the reason why you want to simulate a table set for a meal is because you would like to train a robot to place the table settings for you! The robot, the table,
and all the things on it can be registered to the scene of an environment. We can then specify how many copies we want and the scene will automatically
construct and run those copies on the stage. These copies are placed at new coordinates on the stage, defining a new reference frame from which observations
and rewards can be computed. Every copy of the scene exists on the stage and is being simulated by the same world. This is much more efficient
than running unique simulations for each copy, but it does open up the possibility of unwanted interactions between copies of the scene, so it's important
to keep this in mind while debugging.
Now that we have a grasp on the mechanics, we can take a look at the code generated for our template project!
.. _walkthrough:
Walkthrough
========================
So you finished installing Isaac Sim and Isaac Lab, and you verified that everything is working as expected...
Now what?
The following walkthrough will guide you through setting up an Isaac Lab extension project, adding a new robot to lab, designing an environment, and training a policy for that robot.
For this walkthrough, we will be starting with the Jetbot, a simple two wheeled differential base robot with a camera mounted on top, but the intent is for these guides to be general enough that you can use them to add your own robots and environments to Isaac Lab!
The end result of this walkthrough can be found in our tutorial project repository `here <https://github.com/isaac-sim/IsaacLabTutorial/tree/main>`_. Each branch of this repository
represents a different stage of modifying the default template project to achieve our goals.
.. toctree::
:maxdepth: 1
:titlesonly:
project_setup
concepts_env_design
api_env_design
technical_env_design
training_jetbot_gt
training_jetbot_reward_exploration
.. _walkthrough_project_setup:
Isaac Lab Project Setup
========================
The best way to create a new project is to use the :ref:`Template Generator<template-generator>`. Generating the template
for this tutorial series is done by calling the ``isaaclab`` script from the root directory of the repository
.. code-block:: bash
./isaaclab.sh --new
Be sure to select ``External`` and ``Direct | single agent``. For the frameworks, select ``skrl`` and both ``PPO`` and ``AMP`` on the following menu. You can
select other frameworks if you like, but this tutorial will detail ``skrl`` specifically. The configuration process for other frameworks is similar. You
can get a copy of this code directly by checking out the `initial branch of the tutorial repository <https://github.com/isaac-sim/IsaacLabTutorial/tree/initial>`_!
This will create an extension project with the specified name at the chosen path. For this tutorial, we chose the name ``isaac_lab_tutorial``.
.. note::
The template generator expects the project name to respect "snake_case": all lowercase with underscores separating words. However, we have renamed the
sample project to "IsaacLabTutorial" to more closely match the naming convention GitHub and our other projects. If you are following along with the example
repository, note this minor difference as some superficial path names may change. If you are following along by building the project yourself, then you can ignore this note.
Next, we must install the project as a python module. Navigate to the directory that was just created
(it will contain the ``source`` and ``scripts`` directories for the project) and then run the following to install the module.
.. code-block:: bash
python -m pip install -e source/isaac_lab_tutorial
To verify that things have been setup properly, run
.. code-block:: bash
python scripts/list_envs.py
from the root directory of your new project. This should generate a table that looks something like the following
.. code-block:: bash
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Available Environments in Isaac Lab |
+--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| S. No. | Task Name | Entry Point | Config |
+--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
| 1 | Template-Isaac-Lab-Tutorial-Direct-v0 | isaac_lab_tutorial.tasks.direct.isaac_lab_tutorial.isaac_lab_tutorial_env:IsaacLabTutorialEnv | isaac_lab_tutorial.tasks.direct.isaac_lab_tutorial.isaac_lab_tutorial_env_cfg:IsaacLabTutorialEnvCfg |
+--------+---------------------------------------+-----------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------+
We can now use the task name to run the environment.
.. code-block:: bash
python scripts/skrl/train.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
and by default, this should start a cartpole training environment.
Let the training finish and then run the following command to see the trained policy in action!
.. code-block:: bash
python scripts/skrl/play.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
Notice that you did not need to specify the path for the checkpoint file! This is because Isaac Lab handles much of the minute details
like checkpoint saving, loading, and logging. In this case, the ``train.py`` script will create two directories: **logs** and **output**, which are
used as the default output directories for tasks run by this project.
Project Structure
------------------------------
There are four nested structures you need to be aware of when working in the direct workflow with an Isaac Lab template
project: the **Project**, the **Extension**, the **Modules**, and the **Task**.
.. figure:: ../../_static/setup/walkthrough_project_setup.svg
:align: center
:figwidth: 100%
:alt: The structure of the isaac lab template project.
The **Project** is the root directory of the generated template. It contains the source and scripts directories, as well as
a ``README.md`` file. When we created the template, we named the project *IsaacLabTutorial* and this defined the root directory
of a git repository. If you examine the project root with hidden files visible you will see a number of files defining
the behavior of the project with respect to git. The ``scripts`` directory contains the ``train.py`` and ``play.py`` scripts for the
various RL libraries you chose when generating the template, while the source directory contains the python packages for the project.
The **Extension** is the name of the python package we installed via pip. By default, the template generates a project
with a single extension of the same name. A project can have multiple extensions, and so they are kept in a common ``source``
directory. Traditional python packages are defined by the presence of a ``pyproject.toml`` file that describes the package
metadata, but packages using Isaac Lab must also be Isaac Sim extensions and so require a ``config`` directory and an accompanying
``extension.toml`` file that describes the metadata needed by the Isaac Sim extension manager. Finally, because the template
is intended to be installed via pip, it needs a ``setup.py`` file to complete the setup procedure using the ``extension.toml``
config. A project can have multiple extensions, as evidenced by the Isaac Lab repository itself!
The **Modules** are what actually gets loaded by Isaac Lab to run training (the meat of the code). By default, the template
generates an extension with a single module that is named the same as the project. The structure of the various sub-modules
in the extension is what determines the ``entry_point`` for an environment in Isaac Lab. This is why our template project needed
to be installed before we could call ``train.py``: the path to the necessary components to run the task needed to be exposed
to python for Isaac Lab to find them.
Finally, the **Task** is the heart of the direct workflow. By default, the template generates a single task with the same name
as the project. The environment and configuration files are stored here, as well as placeholder, RL library dependent ``agents``.
Critically, note the contents of the ``__init__.py``! Specifically, the ``gym.register`` function needs to be called at least once
before an environment and task can be used with the Isaac Lab ``train.py`` and ``play.py`` scripts.
This function should be included in one of the module ``__init__.py`` files so it is called at installation. The path to
this init file is what defines the entry point for the task!
For the template, ``gym.register`` is called within ``isaac_lab_tutorial/source/isaac_lab_tutorial/isaac_lab_tutorial/tasks/direct/isaac_lab_tutorial/__init__.py``.
The repeated name is a consequence of needing default names for the template, but now we can see the structure of the project.
**Project**/source/**Extension**/**Module**/tasks/direct/**Task**/__init__.py
This diff is collapsed.
This diff is collapsed.
.. _walkthrough_training_jetbot_reward_exploration:
Exploring the RL problem
=========================
The command to the Jetbot is a unit vector in specifying the desired drive direction and we must make the agent aware of this somehow
so it can adjust its actions accordingly. There are many possible ways to do this, with the "zeroth order" approach to simply change the observation space to include
this command. To start, **edit the ``IsaacLabTutorialEnvCfg`` to set the observation space to 9**: the world velocity vector contains the linear and angular velocities
of the robot, which is 6 dimensions and if we append the command to this vector, that's 9 dimensions for the observation space in total.
Next, we just need to do that appending when we get the observations. We also need to calculate our forward vectors for later use. The forward vector for the Jetbot is
the x axis, so we apply the ``root_link_quat_w`` to ``[1,0,0]`` to get the forward vector in the world frame. Replace the ``_get_observations`` method with the following:
.. code-block:: python
def _get_observations(self) -> dict:
self.velocity = self.robot.data.root_com_vel_w
self.forwards = math_utils.quat_apply(self.robot.data.root_link_quat_w, self.robot.data.FORWARD_VEC_B)
obs = torch.hstack((self.velocity, self.commands))
observations = {"policy": obs}
return observations
So now what should the reward be?
When the robot is behaving as desired, it will be driving at full speed in the direction of the command. If we reward both
"driving forward" and "alignment to the command", then maximizing that combined signal should result in driving to the command... right?
Let's give it a try! Replace the ``_get_rewards`` method with the following:
.. code-block:: python
def _get_rewards(self) -> torch.Tensor:
forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
total_reward = forward_reward + alignment_reward
return total_reward
The ``forward_reward`` is the x component of the linear center of mass velocity of the robot in the body frame. We know that
the x direction is the forward direction for the asset, so this should be equivalent to inner product between the forward vector and
the linear velocity in the world frame. The alignment term is the inner product between the forward vector and the command vector: when they are
pointing in the same direction this term will be 1, but in the opposite direction it will be -1. We add them together to get the combined reward and
we can finally run training! Let's see what happens!
.. code-block:: bash
python scripts/skrl/train.py --task=Template-Isaac-Lab-Tutorial-Direct-v0
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_naive_webp.webp
:align: center
:figwidth: 100%
:alt: Naive results
Surely we can do better!
Reward and Observation Tuning
-------------------------------
When tuning an environment for training, as a rule of thumb, you want to keep the observation space as small as possible. This is to
reduce the number parameters in the model (the literal interpretation of Occam's razor) and thus improve training time. In this case we
need to somehow encode our alignment to the command and our forward speed. One way to do this is to exploit the dot and cross products
from linear algebra! Replace the contents of ``_get_observations`` with the following:
.. code-block:: python
def _get_observations(self) -> dict:
self.velocity = self.robot.data.root_com_vel_w
self.forwards = math_utils.quat_apply(self.robot.data.root_link_quat_w, self.robot.data.FORWARD_VEC_B)
dot = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
cross = torch.cross(self.forwards, self.commands, dim=-1)[:,-1].reshape(-1,1)
forward_speed = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
obs = torch.hstack((dot, cross, forward_speed))
observations = {"policy": obs}
return observations
The dot or inner product tells us how aligned two vectors are as a single scalar quantity. If they are very aligned and pointed in the same direction, then the inner
product will be large and positive, but if they are aligned and in opposite directions, it will be large and negative. If two vectors are
perpendicular, the inner product is zero. This means that the inner product between the forward vector and the command vector can tell us
how much we are facing towards or away from the command, but not which direction we need to turn to improve alignment.
The cross product also tells us how aligned two vectors are, but it expresses this relationship as a vector. The cross product between any
two vectors defines an axis that is perpendicular to the plane containing the two argument vectors, where the direction of the result vector along this axis is
determined by the chirality (dimension ordering, or handedness) of the coordinate system. In our case, we can exploit the fact that we are operating in 2D to only
examine the z component of the result of :math:`\vec{forward} \times \vec{command}`. This component will be zero if the vectors are colinear, positive if the
command vector is to the left of forward, and negative if it's to the right.
Finally, the x component of the center of mass linear velocity tells us our forward speed, with positive being forward and negative being backwards. We stack these together
"horizontally" (along dim 1) to generate the observations for each Jetbot. This alone improves performance!
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_improved_webp.webp
:align: center
:figwidth: 100%
:alt: Improved results
It seems to qualitatively train better, and the Jetbots are somewhat inching forward... Surely we can do better still!
Another rule of thumb for training is to reduce and simplify the reward function as much as possible. Terms in the reward behave similarly to
the logical "OR" operation. In our case, we are rewarding driving forward and being aligned to the command by adding them together, so our agent
can be reward for driving forward OR being aligned to the command. To force the agent to learn to drive in the direction of the command, we should only
reward the agent driving forward AND being aligned. Logical AND suggests multiplication and therefore the following reward function:
.. code-block:: python
def _get_rewards(self) -> torch.Tensor:
forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
total_reward = forward_reward*alignment_reward
return total_reward
Now we will only get rewarded for driving forward if our alignment reward is non zero. Let's see what kind of result this produces!
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_tuned_webp.webp
:align: center
:figwidth: 100%
:alt: Tuned results
It definitely trains faster, but the Jetbots have learned to drive in reverse if the command is pointed behind them. This may be desirable in our
case, but it shows just how dependent the policy behavior is on the reward function. In this case, there are **degenerate solutions** to our
reward function: The reward is maximized for driving forward and aligned to the command, but if the Jetbot drives in reverse, then the forward
term is negative, and if its driving in reverse towards the command, then the alignment term is **also negative**, meaning hat the reward is positive!
When you design your own environments, you will run into degenerate solutions like this and a significant amount of reward engineering is devoted to
suppressing or supporting these behaviors by modifying the reward function.
Let's say, in our case, we don't want this behavior. In our case, the alignment term has a domain of ``[-1, 1]``, but we would much prefer it to be mapped
only to positive values. We don't want to *eliminate* the sign on the alignment term, rather, we would like large negative values to be near zero, so if we
are misaligned, we don't get rewarded. The exponential function accomplishes this!
.. code-block:: python
def _get_rewards(self) -> torch.Tensor:
forward_reward = self.robot.data.root_com_lin_vel_b[:,0].reshape(-1,1)
alignment_reward = torch.sum(self.forwards * self.commands, dim=-1, keepdim=True)
total_reward = forward_reward*torch.exp(alignment_reward)
return total_reward
Now when we train, the Jetbots will turn to always drive towards the command in the forward direction!
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/walkthrough_directed_webp.webp
:align: center
:figwidth: 100%
:alt: Directed results
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment