Commit e6cf93ef authored by pulkitg01's avatar pulkitg01 Committed by Kelly Guo

Adds tutorial for training & validating HOVER policy using Isaac Lab (#320)

# Description

<!--
Thank you for your interest in sending a pull request. Please make sure
to check the contribution guidelines.

Link:
https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html
-->

This MRs adds a tutorial for training and validating the Hover policy
(already released to public https://github.com/NVlabs/HOVER) in Isaac
Lab.

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Added a tutorial to the documentation outlining the steps for training
and evaluation.

## Screenshots


![image](https://github.com/user-attachments/assets/44fff773-2484-499d-b3e9-8b6d54efc387)

![image](https://github.com/user-attachments/assets/478e6908-31e1-4873-b338-4182847992a7)

![image](https://github.com/user-attachments/assets/77056f8b-b4c5-4f97-be90-bb9bd75eb4c0)

![image](https://github.com/user-attachments/assets/97432a3c-6959-4b18-a2ac-cc9a39043d54)


<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->
parent f4c64b60
......@@ -87,6 +87,7 @@ Guidelines for modifications:
* Oyindamola Omotuyi
* Özhan Özen
* Peter Du
* Pulkit Goyal
* Qian Wan
* Qinxi Yu
* Rafael Wiltz
......
......@@ -111,6 +111,7 @@ Table of Contents
source/tutorials/index
source/how-to/index
source/deployment/index
source/policy_deployment/index
.. toctree::
:maxdepth: 1
......
Training & Deploying HOVER Policy
=================================
This tutorial shows you an example of how to train and deploy HOVER which is a whole-body control (WBC) policy for humanoid robots in the Isaac Lab simulation environment.
It uses the `HOVER`_ repository, which provides an Isaac Lab extension for training neural whole-body control policy for humanoids, as described in the `HOVER Paper`_ and `OMNIH2O Paper`_ papers.
For video demonstrations and more details about the project, please visit the `HOVER Project Website`_ and the `OMNIH2O Project Website`_.
.. figure:: ../../_static/policy_deployment/00_hover/hover_training_robots.png
:align: center
:figwidth: 100%
:alt: visualization of training the policy
Installation
------------
.. note::
This tutorial is for linux only.
1. Install Isaac Lab following the instructions in the `Isaac Lab Installation Guide`_.
2. Define the following environment variable to specify the path to your Isaac Lab installation:
.. code-block:: bash
# Set the ISAACLAB_PATH environment variable to point to your Isaac Lab installation directory
export ISAACLAB_PATH=<your_isaac_lab_path>
3. Clone the `HOVER`_ repository and its submodules in your workspace.
.. code-block:: bash
git clone --recurse-submodules https://github.com/NVlabs/HOVER.git
4. Install the dependencies.
.. code-block:: bash
cd HOVER
./install_deps.sh
Training the Policy
-------------------
Dataset
~~~~~~~
Refer to the `HOVER Dataset`_ repository for the steps to obtain and process data for training the policy.
Training the teacher policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Execute the following command from the ``HOVER`` directory to train the teacher policy.
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p scripts/rsl_rl/train_teacher_policy.py \
--num_envs 1024 \
--reference_motion_path neural_wbc/data/data/motions/stable_punch.pkl \
--headless
The teacher policy is trained for 10000000 iterations, or until the user interrupts the training.
The resulting checkpoint is stored in ``neural_wbc/data/data/policy/h1:teacher/`` and the filename is ``model_<iteration_number>.pt``.
Training the student policy
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Execute the following command from the ``HOVER`` directory to train the student policy using teacher policy checkpoint.
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p scripts/rsl_rl/train_student_policy.py \
--num_envs 1024 \
--reference_motion_path neural_wbc/data/data/motions/stable_punch.pkl \
--teacher_policy.resume_path neural_wbc/data/data/policy/h1:teacher \
--teacher_policy.checkpoint model_<iteration_number>.pt \
--headless
This assumes that you have already trained the teacher policy as there is no provided teacher policy in the repo.
Please refer to these sections on the HOVER repository for more details about training configurations:
- `General Remarks for Training`_
- `Generalist vs Specialist Policy`_
Testing the trained policy
--------------------------
Play teacher policy
~~~~~~~~~~~~~~~~~~~
Execute the following command from the ``HOVER`` directory to play the trained teacher policy checkpoint.
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p scripts/rsl_rl/play.py \
--num_envs 10 \
--reference_motion_path neural_wbc/data/data/motions/stable_punch.pkl \
--teacher_policy.resume_path neural_wbc/data/data/policy/h1:teacher \
--teacher_policy.checkpoint model_<iteration_number>.pt
Play student policy
~~~~~~~~~~~~~~~~~~~
Execute the following command from the ``HOVER`` directory to play the trained student policy checkpoint.
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p scripts/rsl_rl/play.py \
--num_envs 10 \
--reference_motion_path neural_wbc/data/data/motions/stable_punch.pkl \
--student_player \
--student_path neural_wbc/data/data/policy/h1:student \
--student_checkpoint model_<iteration_number>.pt
Evaluate the trained policy
---------------------------
Evaluate the trained policy checkpoint in the Isaac Lab environment.
The evaluation iterates through all the reference motions included in the dataset specified by the ``--reference_motion_path`` option and exits when all motions are evaluated. Randomization is turned off during evaluation.
Refer to the `HOVER Evaluation`_ repository for more details about the evaluation pipeline and the metrics used.
The evaluation script, ``scripts/rsl_rl/eval.py``, uses the same arguments as the play script, ``scripts/rsl_rl/play.py``. You can use it for both teacher and student policies.
.. code-block:: bash
${ISAACLAB_PATH}/isaaclab.sh -p scripts/rsl_rl/eval.py \
--num_envs 10 \
--teacher_policy.resume_path neural_wbc/data/data/policy/h1:teacher \
--teacher_policy.checkpoint model_<iteration_number>.pt
Validation of the policy
------------------------
The trained policy in Isaac Lab can be validated in another simulation environment or on the real robot.
.. figure:: ../../_static/policy_deployment/00_hover/hover_stable_wave.png
:align: center
:width: 100%
Stable Wave - Mujoco (left) & Real Robot (right)
Sim-to-Sim Validation
~~~~~~~~~~~~~~~~~~~~~
Use the provided `Mujoco Environment`_ for conducting sim-to-sim validation of the trained policy. To run the evaluation of Sim2Sim,
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p neural_wbc/inference_env/scripts/eval.py \
--num_envs 1 \
--headless \
--student_path neural_wbc/data/data/policy/h1:student/ \
--student_checkpoint model_<iteration_number>.pt
Please be aware that the mujoco_wrapper only supports one environment at a time. For reference, it will take up to 5h to evaluate 8k reference motions. The inference_env is designed for maximum versatility.
Sim-to-Real Deployment
~~~~~~~~~~~~~~~~~~~~~~
For sim-to-real deployment, we provide a `Hardware Environment`_ for `Unitree H1 Robot`_.
Detailed steps of setting up a Sim-to-Real deployment workflow is explained at `README of Sim2Real deployment`_.
To deploy the trained policy on the H1 robot,
.. code-block:: bash
${ISAACLAB_PATH:?}/isaaclab.sh -p neural_wbc/inference_env/scripts/s2r_player.py \
--student_path neural_wbc/data/data/policy/h1:student/ \
--student_checkpoint model_<iteration_number>.pt \
--reference_motion_path neural_wbc/data/data/motions/<motion_name>.pkl \
--robot unitree_h1 \
--max_iterations 5000 \
--num_envs 1 \
--headless
.. note::
The sim-to-real deployment wrapper currently only supports the Unitree H1 robot. It can be extended to other robots by implementing the corresponding hardware wrapper interface.
.. _Isaac Lab Installation Guide: https://isaac-sim.github.io/IsaacLab/v2.0.0/source/setup/installation/index.html
.. _HOVER: https://github.com/NVlabs/HOVER
.. _HOVER Dataset: https://github.com/NVlabs/HOVER/?tab=readme-ov-file#data-processing
.. _HOVER Evaluation: https://github.com/NVlabs/HOVER/?tab=readme-ov-file#evaluation
.. _General Remarks for Training: https://github.com/NVlabs/HOVER/?tab=readme-ov-file#general-remarks-for-training
.. _Generalist vs Specialist Policy: https://github.com/NVlabs/HOVER/?tab=readme-ov-file#generalist-vs-specialist-policy
.. _HOVER Paper: https://arxiv.org/abs/2410.21229
.. _HOVER Project Website: https://omni.human2humanoid.com/
.. _OMNIH2O Paper: https://arxiv.org/abs/2410.21229
.. _OMNIH2O Project Website: https://hover-versatile-humanoid.github.io/
.. _README of Sim2Real deployment: https://github.com/NVlabs/HOVER/blob/main/neural_wbc/hw_wrappers/README.md
.. _Hardware Environment: https://github.com/NVlabs/HOVER/blob/main/neural_wbc/hw_wrappers/README.md
.. _Mujoco Environment: https://github.com/NVlabs/HOVER/tree/main/neural_wbc/mujoco_wrapper
.. _Unitree H1 Robot: https://unitree.com/h1
Deploying a Policy Trained in Isaac Lab
=======================================
Welcome to the Policy Deployment Guide! This section provides examples of training policies in Isaac Lab and deploying them to both simulation and real robots.
Below, you’ll find detailed examples of various policies for training and deploying them, along with essential configuration details.
.. toctree::
:maxdepth: 1
00_hover/hover_policy
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment