Commit d71f9b7b authored by shauryadNv's avatar shauryadNv Committed by Kelly Guo

Adds stack environment, scripts for Cosmos, and visual robustness evaluation (#395)

<!--
Thank you for your interest in sending a pull request. Please make sure
to check the contribution guidelines.

Link:
https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html
-->

Changes:
1. Adds a new Franka cube stacking visuomotor environment as per Cosmos
requirements: higher resolution and multi-modality support.
2. Adds scripts for data pre-processing and post-processing before and
after Cosmos augmentation respectively.
3. Adds evaluation of trained visuomotor policies for robustness to
visual changes using domain randomization.
4. Makes task termination checks more strict for the Franka cube
stacking task.
5. Adds new documentation for the Cosmos imitation learning pipeline.

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)
- This change requires a documentation update

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------
Signed-off-by: 's avatarrwiltz <165190220+rwiltz@users.noreply.github.com>
Signed-off-by: 's avatarKelly Guo <kellyguo123@hotmail.com>
Signed-off-by: 's avatarAshwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com>
Signed-off-by: 's avatarKelly Guo <kellyg@nvidia.com>
Signed-off-by: 's avatarMichael Gussert <michael@gussert.com>
Signed-off-by: 's avatarsamibouziri <79418773+samibouziri@users.noreply.github.com>
Signed-off-by: 's avatarMayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: 's avatarKyle Morgenstein <34984693+KyleM73@users.noreply.github.com>
Signed-off-by: 's avatarHongyu Li <lihongyu0807@icloud.com>
Signed-off-by: 's avatarToni-SM <toni.semu@gmail.com>
Signed-off-by: 's avatarJames Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Signed-off-by: 's avatarPascal Roth <57946385+pascal-roth@users.noreply.github.com>
Signed-off-by: 's avatarVictor Khaustov <3192677+vi3itor@users.noreply.github.com>
Signed-off-by: 's avatarAlvinC <alvincny529@gmail.com>
Signed-off-by: 's avatarTyler Lum <tylergwlum@gmail.com>
Signed-off-by: 's avatarMiguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Signed-off-by: 's avatarrenaudponcelet <renaud.poncelet@gmail.com>
Co-authored-by: 's avatarjaczhangnv <jaczhang@nvidia.com>
Co-authored-by: 's avatarrwiltz <165190220+rwiltz@users.noreply.github.com>
Co-authored-by: 's avatarKelly Guo <kellyg@nvidia.com>
Co-authored-by: 's avatarYanzi Zhu <yanziz@nvidia.com>
Co-authored-by: 's avatarnv-mhaselton <mhaselton@nvidia.com>
Co-authored-by: 's avatarlotusl-code <lotusl@nvidia.com>
Co-authored-by: 's avatarcosmith-nvidia <141183495+cosmith-nvidia@users.noreply.github.com>
Co-authored-by: 's avatarMichael Gussert <michael@gussert.com>
Co-authored-by: 's avatarCY Chen <cyc@nvidia.com>
Co-authored-by: 's avataroahmednv <oahmed@Nvidia.com>
Co-authored-by: 's avatarAshwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com>
Co-authored-by: 's avatarRafael Wiltz <rwiltz@nvidia.com>
Co-authored-by: 's avatarPeter Du <peterd@nvidia.com>
Co-authored-by: 's avatarmatthewtrepte <mtrepte@nvidia.com>
Co-authored-by: 's avatarchengronglai <chengrongl@nvidia.com>
Co-authored-by: 's avatarpulkitg01 <pulkitg@nvidia.com>
Co-authored-by: 's avatarConnor Smith <cosmith@nvidia.com>
Co-authored-by: 's avatarAshwin Varghese Kuruttukulam <ashwinvk@nvidia.com>
Co-authored-by: 's avatarKelly Guo <kellyguo123@hotmail.com>
Co-authored-by: 's avatarMayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Co-authored-by: 's avatarsamibouziri <79418773+samibouziri@users.noreply.github.com>
Co-authored-by: 's avatarJames Smith <142246516+jsmith-bdai@users.noreply.github.com>
Co-authored-by: 's avatarShundo Kishi <syundo0730@gmail.com>
Co-authored-by: 's avatarSheikh Dawood <sabdulajees@nvidia.com>
Co-authored-by: 's avatarToni-SM <aserranomuno@nvidia.com>
Co-authored-by: 's avatarGonglitian <70052908+Gonglitian@users.noreply.github.com>
Co-authored-by: 's avatarJames Tigue <166445701+jtigue-bdai@users.noreply.github.com>
Co-authored-by: 's avatarMayank Mittal <mittalma@leggedrobotics.com>
Co-authored-by: 's avatarKyle Morgenstein <34984693+KyleM73@users.noreply.github.com>
Co-authored-by: 's avatarJohnson Sun <20457146+j3soon@users.noreply.github.com>
Co-authored-by: 's avatarPascal Roth <57946385+pascal-roth@users.noreply.github.com>
Co-authored-by: 's avatarHongyu Li <lihongyu0807@icloud.com>
Co-authored-by: 's avatarJean-Francois-Lafleche <57650687+Jean-Francois-Lafleche@users.noreply.github.com>
Co-authored-by: 's avatarWei Jinqi <changshanshi@outlook.com>
Co-authored-by: 's avatarLouis LE LAY <le.lay.louis@gmail.com>
Co-authored-by: 's avatarHarsh Patel <hapatel@theaiinstitute.com>
Co-authored-by: 's avatarKousheek Chakraborty <kousheekc@gmail.com>
Co-authored-by: 's avatarVictor Khaustov <3192677+vi3itor@users.noreply.github.com>
Co-authored-by: 's avatarAlvinC <alvincny529@gmail.com>
Co-authored-by: 's avatarFelipe Mohr <50018670+felipemohr@users.noreply.github.com>
Co-authored-by: 's avatarAdAstra7 <87345760+likecanyon@users.noreply.github.com>
Co-authored-by: 's avatargao <ziqi.gao@iff-extern.fraunhofer.de>
Co-authored-by: 's avatarTyler Lum <tylergwlum@gmail.com>
Co-authored-by: 's avatar-T.K.- <t_k_233@outlook.com>
Co-authored-by: 's avatarClemens Schwarke <96480707+ClemensSchwarke@users.noreply.github.com>
Co-authored-by: 's avatarMiguel Alonso Jr. <76960110+miguelalonsojr@users.noreply.github.com>
Co-authored-by: 's avatarMiguel Alonso Jr. <miguel.alonso@nfinite.app>
Co-authored-by: 's avatarrenaudponcelet <renaud.poncelet@gmail.com>
parent 0224a373
......@@ -111,6 +111,7 @@ Guidelines for modifications:
* Ryley McCarroll
* Shafeef Omar
* Shaoshu Su
* Shaurya Dewan
* Shundo Kishi
* Stefan Van de Mosselaer
* Stephan Pleines
......
......@@ -104,6 +104,7 @@ Table of Contents
source/overview/environments
source/overview/reinforcement-learning/index
source/overview/teleop_imitation
source/overview/augmented_imitation
source/overview/showroom
source/overview/simple_agents
......
.. _augmented-imitation-learning:
Augmented Imitation Learning
============================
This section describes how to use Isaac Lab's imitation learning capabilities with the visual augmentation capabilities of `Cosmos <https://www.nvidia.com/en-us/ai/cosmos/>`_ models to generate demonstrations at scale to train visuomotor policies robust against visual variations.
Generating Demonstrations
~~~~~~~~~~~~~~~~~~~~~~~~~
We use the Isaac Lab Mimic feature that allows the generation of additional demonstrations automatically from a handful of annotated demonstrations.
.. note::
This section assumes you already have an annotated dataset of collected demonstrations. If you don't, you can follow the instructions in :ref:`teleoperation-imitation-learning` to collect and annotate your own demonstrations.
In the following example, we will show you how to use Isaac Lab Mimic to generate additional demonstrations that can be used to train a visuomotor policy directly or can be augmented with visual variations using Cosmos (using the ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0`` environment).
.. note::
The ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0`` environment is similar to the standard visuomotor environment (``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Mimic-v0``), but with the addition of segmentation masks, depth maps, and normal maps in the generated dataset. These additional modalities are required to get the best results from the visual augmentation done using Cosmos.
.. code:: bash
./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py \
--device cuda --enable_cameras --headless --num_envs 10 --generation_num_trials 1000 \
--input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/mimic_dataset_1k.hdf5 \
--task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0 \
--rendering_mode performance
The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task.
Additionally, the number of environments in the ``--num_envs`` parameter can be adjusted to speed up data generation.
The suggested number of 10 can be executed on a moderate laptop GPU.
On a more powerful desktop machine, use a larger number of environments for a significant speedup of this step.
Cosmos Augmentation
~~~~~~~~~~~~~~~~~~~
HDF5 to MP4 Conversion
^^^^^^^^^^^^^^^^^^^^^^
The ``hdf5_to_mp4.py`` script converts camera frames stored in HDF5 demonstration files to MP4 videos. It supports multiple camera modalities including RGB, segmentation, depth and normal maps. This conversion is necessary for visual augmentation using Cosmos as it only works with video files rather than HDF5 data.
.. rubric:: Required Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--input_file``
- Path to the input HDF5 file.
* - ``--output_dir``
- Directory to save the output MP4 files.
.. rubric:: Optional Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--input_keys``
- List of input keys to process from the HDF5 file. (default: ["table_cam", "wrist_cam", "table_cam_segmentation", "table_cam_normals", "table_cam_shaded_segmentation", "table_cam_depth"])
* - ``--video_height``
- Height of the output video in pixels. (default: 704)
* - ``--video_width``
- Width of the output video in pixels. (default: 1280)
* - ``--framerate``
- Frames per second for the output video. (default: 30)
.. note::
The default input keys cover all camera modalities as per the naming convention followed in the ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0`` environment. We include an additional modality "table_cam_shaded_segmentation" which is not a part of the generated modalities from simulation in the HDF5 data file. Instead, it is automatically generated by this script using a combination of the segmentation and normal maps to get a pseudo-textured segmentation video for better controlling the Cosmos augmentation.
.. note::
We recommend using the default values given above for the output video height, width and framerate for the best results with Cosmos augmentation.
Example usage for the cube stacking task:
.. code:: bash
python scripts/tools/hdf5_to_mp4.py \
--input_file datasets/mimic_generated_dataset.hdf5 \
--output_dir datasets/mimic_generated_dataset_mp4
Running Cosmos for Visual Augmentation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
After converting the demonstrations to MP4 format, you can use a `Cosmos <https://github.com/NVIDIA/Cosmos?tab=readme-ov-file>`_ model to visually augment the videos. Follow the Cosmos documentation for details on the augmentation process. Visual augmentation can include changes to lighting, textures, backgrounds, and other visual elements while preserving the essential task-relevant features.
We use the RGB, depth and shaded segmentation videos from the previous step as input to the Cosmos model as seen below:
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/cosmos_inputs.gif
:width: 100%
:align: center
:alt: RGB, depth and segmentation control inputs to Cosmos
We provide an example augmentation output from `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1>`_ below:
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/cosmos_output.gif
:width: 100%
:align: center
:alt: Cosmos Transfer1 augmentation output
We recommend using the `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1>`_ model for visual augmentation as we found it to produce the best results in the form of a highly diverse dataset with a wide range of visual variations. We further recommend the following settings to be used with the Transfer1 model for this task:
.. rubric:: Hyperparameters
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``negative_prompt``
- "The video captures a game playing, with bad crappy graphics and cartoonish frames. It represents a recording of old outdated games. The images are very pixelated and of poor CG quality. There are many subtitles in the footage. Overall, the video is unrealistic and appears cg. Plane background."
* - ``positive_prompt``
- "realistic, photorealistic, high fidelity, varied lighting, varied background"
* - ``sigma_max``
- 50
* - ``control_weight``
- "0.3,0.3,0.6,0.7"
* - ``hint_key``
- "blur,canny,depth,segmentation"
* - ``control_input_preset_strength``
- "low"
Another crucial aspect to get good augmentations is the set of prompts used to control the Cosmos generation. We provide a script, ``cosmos_prompt_gen.py``, to construct prompts from a set of carefully chosen templates that handle various aspects of the augmentation process.
.. rubric:: Required Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--templates_path``
- Path to the file containing templates for the prompts.
.. rubric:: Optional Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--num_prompts``
- Number of prompts to generate (default: 1).
* - ``--output_path``
- Path to the output file to write generated prompts. (default: prompts.txt)
.. code:: bash
python scripts/tools/cosmos/cosmos_prompt_gen.py \
--templates_path scripts/tools/cosmos/transfer1_templates.json \
--num_prompts 10 --output_path prompts.txt
In case you want to create your own prompts, we suggest you refer to the following guidelines:
1. Keep the prompts as detailed as possible. It is best to have some instruction on how the generation should handle each visible object/region of interest. For instance, the prompts that we provide cover explicit details for the table, lighting, background, robot arm, cubes, and the general setting.
2. Try to keep the augmentation instructions as realistic and coherent as possible. The more unrealistic or unconventional the prompt is, the worse the model does at retaining key features of the input control video(s).
3. Keep the augmentation instructions in-sync for each aspect. What we mean by this is that the augmentation for all the objects/regions of interest should be coherent and conventional with respect to each other. For example, it is better to have a prompt such as "The table is of old dark wood with faded polish and food stains and the background consists of a suburban home" instead of something like "The table is of old dark wood with faded polish and food stains and the background consists of a spaceship hurtling through space".
4. It is vital to include details on key aspects of the input control video(s) that should be retained or left unchanged. In our prompts, we very clearly mention that the cube colors should be left unchanged such that the bottom cube is blue, the middle is red and the top is green. Note that we not only mention what should be left unchanged but also give details on what form that aspect currently has.
MP4 to HDF5 Conversion
^^^^^^^^^^^^^^^^^^^^^^
The ``mp4_to_hdf5.py`` script converts the visually augmented MP4 videos back to HDF5 format for training. This step is crucial as it ensures the augmented visual data is in the correct format for training visuomotor policies in Isaac Lab and pairs the videos with the corresponding demonstration data from the original dataset.
.. rubric:: Required Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--input_file``
- Path to the input HDF5 file containing the original demonstrations.
* - ``--videos_dir``
- Directory containing the visually augmented MP4 videos.
* - ``--output_file``
- Path to save the new HDF5 file with augmented videos.
.. note::
The input HDF5 file is used to preserve the non-visual data (such as robot states and actions) while replacing the visual data with the augmented versions.
.. important::
The visually augmented MP4 files must follow the naming convention ``demo_{demo_id}_*.mp4``, where:
- ``demo_id`` matches the demonstration ID from the original MP4 file
- ``*`` signifies that the file name can be as per user preference starting from this point
This naming convention is required for the script to correctly pair the augmented videos with their corresponding demonstrations.
Example usage for the cube stacking task:
.. code:: bash
python scripts/tools/mp4_to_hdf5.py \
--input_file datasets/mimic_generated_dataset.hdf5 \
--videos_dir datasets/cosmos_dataset_mp4 \
--output_file datasets/cosmos_dataset_1k.hdf5
Pre-generated Dataset
^^^^^^^^^^^^^^^^^^^^^
We provide a pre-generated dataset in HDF5 format containing visually augmented demonstrations for the cube stacking task. This dataset can be used if you do not wish to run Cosmos locally to generate your own augmented data. The dataset is available on `Hugging Face <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-Manipulation-Augmented>`_ and contains both (as separate dataset files), original and augmented demonstrations, that can be used for training visuomotor policies.
Merging Datasets
^^^^^^^^^^^^^^^^
The ``merge_hdf5_datasets.py`` script combines multiple HDF5 datasets into a single file. This is useful when you want to combine the original demonstrations with the augmented ones to create a larger, more diverse training dataset.
.. rubric:: Required Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--input_files``
- A list of paths to HDF5 files to merge.
.. rubric:: Optional Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--output_file``
- File path to merged output. (default: merged_dataset.hdf5)
.. tip::
Merging datasets can help improve policy robustness by exposing the model to both original and augmented visual conditions during training.
Example usage for the cube stacking task:
.. code:: bash
python scripts/tools/merge_hdf5_datasets.py \
--input_files datasets/mimic_generated_dataset.hdf5 datasets/cosmos_dataset.hdf5 \
--output_file datasets/mimic_cosmos_dataset.hdf5
Model Training and Evaluation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Robomimic Setup
^^^^^^^^^^^^^^^
As an example, we will train a BC agent implemented in `Robomimic <https://robomimic.github.io/>`__ to train a policy. Any other framework or training method could be used.
To install the robomimic framework, use the following commands:
.. code:: bash
# install the dependencies
sudo apt install cmake build-essential
# install python module (for robomimic)
./isaaclab.sh -i robomimic
Training an agent
^^^^^^^^^^^^^^^^^
Using the generated data, we can now train a visuomotor BC agent for ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0``:
.. code:: bash
./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py \
--task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0 --algo bc \
--dataset ./datasets/mimic_cosmos_dataset.hdf5
.. note::
By default the trained models and logs will be saved to ``IssacLab/logs/robomimic``.
Evaluation
^^^^^^^^^^
The ``robust_eval.py`` script evaluates trained visuomotor policies in simulation. This evaluation helps assess how well the policy generalizes to different visual variations and whether the visually augmented data has improved the policy's robustness.
Below is an explanation of the different settings used for evaluation:
.. rubric:: Evaluation Settings
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``Vanilla``
- Exact same setting as that used during Mimic data generation.
* - ``Light Intensity``
- Light intensity/brightness is varied, all other aspects remain the same.
* - ``Light Color``
- Light color is varied, all other aspects remain the same.
* - ``Light Texture (Background)``
- Light texture/background is varied, all other aspects remain the same.
* - ``Table Texture``
- Table's visual texture is varied, all other aspects remain the same.
* - ``Robot Arm Texture``
- Robot arm's visual texture is varied, all other aspects remain the same.
.. rubric:: Required Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--task``
- Name of the environment.
* - ``--input_dir``
- Directory containing the model checkpoints to evaluate.
.. rubric:: Optional Arguments
.. list-table::
:widths: 30 70
:header-rows: 0
* - ``--horizon``
- Step horizon of each rollout. (default: 400)
* - ``--num_rollouts``
- Number of rollouts per model per setting. (default: 15)
* - ``--num_seeds``
- Number of random seeds to evaluate. (default: 3)
* - ``--seeds``
- List of specific seeds to use instead of random ones.
* - ``--log_dir``
- Directory to write results to. (default: /tmp/policy_evaluation_results)
* - ``--log_file``
- Name of the output file. (default: results)
* - ``--norm_factor_min``
- Minimum value of the action space normalization factor.
* - ``--norm_factor_max``
- Maximum value of the action space normalization factor.
* - ``--disable_fabric``
- Whether to disable fabric and use USD I/O operations.
* - ``--enable_pinocchio``
- Whether to enable Pinocchio for IK controllers.
.. note::
The evaluation results will help you understand if the visual augmentation has improved the policy's performance and robustness. Compare these results with evaluations on the original dataset to measure the impact of augmentation.
Example usage for the cube stacking task:
.. code:: bash
./isaaclab.sh -p scripts/imitation_learning/robomimic/robust_eval.py \
--task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0 \
--input_dir logs/robomimic/Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0/bc_rnn_image_franka_stack_mimic_cosmos_table_only/*/models \
--log_dir robust_results/bc_rnn_image_franka_stack_mimic_cosmos_table_only \
--log_file result \
--enable_cameras \
--seeds 0 \
--num_rollouts 15 \
--rendering_mode performance
We use the above script to compare models trained with 1000 Mimic-generated demonstrations, 2000 Mimic-generated demonstrations and 2000 Cosmos-Mimic-generated demonstrations (1000 original mimic + 1000 Cosmos augmented) respectively. We use the same seeds (0, 1000 and 5000) for all three models and provide the metrics (averaged across best checkpoints for each seed) below:
.. rubric:: Model Comparison
.. list-table::
:widths: 25 25 25 25
:header-rows: 0
* - **Evaluation Setting**
- **Mimic 1k Baseline**
- **Mimic 2k Baseline**
- **Cosmos-Mimic 2k**
* - ``Vanilla``
- 62%
- 96.6%
- 86.6%
* - ``Light Intensity``
- 11.1%
- 20%
- 62.2%
* - ``Light Color``
- 24.6%
- 30%
- 77.7%
* - ``Light Texture (Background)``
- 16.6%
- 20%
- 68.8%
* - ``Table Texture``
- 0%
- 0%
- 20%
* - ``Robot Arm Texture``
- 0%
- 0%
- 4.4%
The above trained models' checkpoints can be accessed `here <https://huggingface.co/datasets/nvidia/PhysicalAI-Robotics-Manipulation-Augmented/tree/main/robomimic_bc_rnn_visuomotor_models>`_ in case you wish to use the models directly.
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Script to evaluate a trained policy from robomimic across multiple evaluation settings.
This script loads a trained robomimic policy and evaluates it in an Isaac Lab environment
across multiple evaluation settings (lighting, textures, etc.) and seeds. It saves the results
to a specified output directory.
Args:
task: Name of the environment.
input_dir: Directory containing the model checkpoints to evaluate.
horizon: Step horizon of each rollout.
num_rollouts: Number of rollouts per model per setting.
num_seeds: Number of random seeds to evaluate.
seeds: Optional list of specific seeds to use instead of random ones.
log_dir: Directory to write results to.
log_file: Name of the output file.
output_vis_file: File path to export recorded episodes.
norm_factor_min: If provided, minimum value of the action space normalization factor.
norm_factor_max: If provided, maximum value of the action space normalization factor.
disable_fabric: Whether to disable fabric and use USD I/O operations.
"""
"""Launch Isaac Sim Simulator first."""
import argparse
from isaaclab.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Evaluate robomimic policy for Isaac Lab environment.")
parser.add_argument(
"--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations."
)
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
parser.add_argument("--input_dir", type=str, default=None, help="Directory containing models to evaluate.")
parser.add_argument("--horizon", type=int, default=400, help="Step horizon of each rollout.")
parser.add_argument("--num_rollouts", type=int, default=15, help="Number of rollouts for each setting.")
parser.add_argument("--num_seeds", type=int, default=3, help="Number of random seeds to evaluate.")
parser.add_argument("--seeds", nargs="+", type=int, default=None, help="List of specific seeds to use.")
parser.add_argument(
"--log_dir", type=str, default="/tmp/policy_evaluation_results", help="Directory to write results to."
)
parser.add_argument("--log_file", type=str, default="results", help="Name of output file.")
parser.add_argument(
"--output_vis_file", type=str, default="visuals.hdf5", help="File path to export recorded episodes."
)
parser.add_argument(
"--norm_factor_min", type=float, default=None, help="Optional: minimum value of the normalization factor."
)
parser.add_argument(
"--norm_factor_max", type=float, default=None, help="Optional: maximum value of the normalization factor."
)
parser.add_argument("--enable_pinocchio", default=False, action="store_true", help="Enable Pinocchio.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()
if args_cli.enable_pinocchio:
# Import pinocchio before AppLauncher to force the use of the version installed by IsaacLab and not the one installed by Isaac Sim
# pinocchio is required by the Pink IK controllers and the GR1T2 retargeter
import pinocchio # noqa: F401
# launch omniverse app
app_launcher = AppLauncher(args_cli)
simulation_app = app_launcher.app
"""Rest everything follows."""
import copy
import gymnasium as gym
import os
import pathlib
import random
import torch
import robomimic.utils.file_utils as FileUtils
import robomimic.utils.torch_utils as TorchUtils
from isaaclab_tasks.utils import parse_env_cfg
def rollout(policy, env: gym.Env, success_term, horizon: int, device: torch.device) -> tuple[bool, dict]:
"""Perform a single rollout of the policy in the environment.
Args:
policy: The robomimic policy to evaluate.
env: The environment to evaluate in.
horizon: The step horizon of each rollout.
device: The device to run the policy on.
args_cli: Command line arguments containing normalization factors.
Returns:
terminated: Whether the rollout terminated successfully.
traj: The trajectory of the rollout.
"""
policy.start_episode()
obs_dict, _ = env.reset()
traj = dict(actions=[], obs=[], next_obs=[])
for _ in range(horizon):
# Prepare policy observations
obs = copy.deepcopy(obs_dict["policy"])
for ob in obs:
obs[ob] = torch.squeeze(obs[ob])
# Check if environment image observations
if hasattr(env.cfg, "image_obs_list"):
# Process image observations for robomimic inference
for image_name in env.cfg.image_obs_list:
if image_name in obs_dict["policy"].keys():
# Convert from chw uint8 to hwc normalized float
image = torch.squeeze(obs_dict["policy"][image_name])
image = image.permute(2, 0, 1).clone().float()
image = image / 255.0
image = image.clip(0.0, 1.0)
obs[image_name] = image
traj["obs"].append(obs)
# Compute actions
actions = policy(obs)
# Unnormalize actions if normalization factors are provided
if args_cli.norm_factor_min is not None and args_cli.norm_factor_max is not None:
actions = (
(actions + 1) * (args_cli.norm_factor_max - args_cli.norm_factor_min)
) / 2 + args_cli.norm_factor_min
actions = torch.from_numpy(actions).to(device=device).view(1, env.action_space.shape[1])
# Apply actions
obs_dict, _, terminated, truncated, _ = env.step(actions)
obs = obs_dict["policy"]
# Record trajectory
traj["actions"].append(actions.tolist())
traj["next_obs"].append(obs)
if bool(success_term.func(env, **success_term.params)[0]):
return True, traj
elif terminated or truncated:
return False, traj
return False, traj
def evaluate_model(
model_path: str,
env: gym.Env,
device: torch.device,
success_term,
num_rollouts: int,
horizon: int,
seed: int,
output_file: str,
) -> float:
"""Evaluate a single model checkpoint across multiple rollouts.
Args:
model_path: Path to the model checkpoint.
env: The environment to evaluate in.
device: The device to run the policy on.
num_rollouts: Number of rollouts to perform.
horizon: Step horizon of each rollout.
seed: Random seed to use.
output_file: File to write results to.
Returns:
float: Success rate of the model
"""
# Set seed
torch.manual_seed(seed)
env.seed(seed)
random.seed(seed)
# Load policy
policy, _ = FileUtils.policy_from_checkpoint(ckpt_path=model_path, device=device, verbose=False)
# Run policy
results = []
for trial in range(num_rollouts):
print(f"[Model: {os.path.basename(model_path)}] Starting trial {trial}")
terminated, _ = rollout(policy, env, success_term, horizon, device)
results.append(terminated)
with open(output_file, "a") as file:
file.write(f"[Model: {os.path.basename(model_path)}] Trial {trial}: {terminated}\n")
print(f"[Model: {os.path.basename(model_path)}] Trial {trial}: {terminated}")
# Calculate and log results
success_rate = results.count(True) / len(results)
with open(output_file, "a") as file:
file.write(
f"[Model: {os.path.basename(model_path)}] Successful trials: {results.count(True)}, out of"
f" {len(results)} trials\n"
)
file.write(f"[Model: {os.path.basename(model_path)}] Success rate: {success_rate}\n")
file.write(f"[Model: {os.path.basename(model_path)}] Results: {results}\n")
file.write("-" * 80 + "\n\n")
print(
f"\n[Model: {os.path.basename(model_path)}] Successful trials: {results.count(True)}, out of"
f" {len(results)} trials"
)
print(f"[Model: {os.path.basename(model_path)}] Success rate: {success_rate}\n")
print(f"[Model: {os.path.basename(model_path)}] Results: {results}\n")
return success_rate
def main() -> None:
"""Run evaluation of trained policies from robomimic with Isaac Lab environment."""
# Parse configuration
env_cfg = parse_env_cfg(args_cli.task, device=args_cli.device, num_envs=1, use_fabric=not args_cli.disable_fabric)
# Set observations to dictionary mode for Robomimic
env_cfg.observations.policy.concatenate_terms = False
# Set termination conditions
env_cfg.terminations.time_out = None
# Disable recorder
env_cfg.recorders = None
# Extract success checking function
success_term = env_cfg.terminations.success
env_cfg.terminations.success = None
# Set evaluation settings
env_cfg.eval_mode = True
# Create environment
env = gym.make(args_cli.task, cfg=env_cfg)
# Acquire device
device = TorchUtils.get_torch_device(try_to_use_cuda=True)
# Get model checkpoints
model_checkpoints = [f.name for f in os.scandir(args_cli.input_dir) if f.is_file()]
# Set up seeds
seeds = random.sample(range(0, 10000), args_cli.num_seeds) if args_cli.seeds is None else args_cli.seeds
# Define evaluation settings
settings = ["vanilla", "light_intensity", "light_color", "light_texture", "table_texture", "robot_texture", "all"]
# Create log directory if it doesn't exist
os.makedirs(args_cli.log_dir, exist_ok=True)
# Evaluate each seed
for seed in seeds:
output_path = os.path.join(args_cli.log_dir, f"{args_cli.log_file}_seed_{seed}")
path = pathlib.Path(output_path)
path.parent.mkdir(parents=True, exist_ok=True)
# Initialize results summary
results_summary = dict()
results_summary["overall"] = {}
for setting in settings:
results_summary[setting] = {}
with open(output_path, "w") as file:
# Evaluate each setting
for setting in settings:
env.cfg.eval_type = setting
file.write(f"Evaluation setting: {setting}\n")
file.write("=" * 80 + "\n\n")
print(f"Evaluation setting: {setting}")
print("=" * 80)
# Evaluate each model
for model in model_checkpoints:
# Skip early checkpoints
model_epoch = int(model.split(".")[0].split("_")[-1])
if model_epoch <= 100:
continue
model_path = os.path.join(args_cli.input_dir, model)
success_rate = evaluate_model(
model_path=model_path,
env=env,
device=device,
success_term=success_term,
num_rollouts=args_cli.num_rollouts,
horizon=args_cli.horizon,
seed=seed,
output_file=output_path,
)
# Store results
results_summary[setting][model] = success_rate
if model not in results_summary["overall"].keys():
results_summary["overall"][model] = 0.0
results_summary["overall"][model] += success_rate
env.reset()
file.write("=" * 80 + "\n\n")
env.reset()
# Calculate overall success rates
for model in results_summary["overall"].keys():
results_summary["overall"][model] /= len(settings)
# Write final summary
file.write("\nResults Summary (success rate):\n")
for setting in results_summary.keys():
file.write(f"\nSetting: {setting}\n")
for model in results_summary[setting].keys():
file.write(f"{model}: {results_summary[setting][model]}\n")
max_key = max(results_summary[setting], key=results_summary[setting].get)
file.write(
f"\nBest model for setting {setting} is {max_key} with success rate"
f" {results_summary[setting][max_key]}\n"
)
env.close()
if __name__ == "__main__":
# run the main function
main()
# close sim app
simulation_app.close()
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Script to construct prompts to control the Cosmos model's generation.
Required arguments:
--templates_path Path to the file containing templates for the prompts.
Optional arguments:
--num_prompts Number of prompts to generate (default: 1).
--output_path Path to the output file to write generated prompts (default: prompts.txt).
"""
import argparse
import json
import random
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(description="Generate prompts for controlling Cosmos model's generation.")
parser.add_argument(
"--templates_path", type=str, required=True, help="Path to the JSON file containing prompt templates"
)
parser.add_argument("--num_prompts", type=int, default=1, help="Number of prompts to generate (default: 1)")
parser.add_argument(
"--output_path", type=str, default="prompts.txt", help="Path to the output file to write generated prompts"
)
args = parser.parse_args()
return args
def generate_prompt(templates_path: str):
"""Generate a random prompt for controlling the Cosmos model's visual augmentation.
The prompt describes the scene and desired visual variations, which the model
uses to guide the augmentation process while preserving the core robotic actions.
Args:
templates_path (str): Path to the JSON file containing prompt templates.
Returns:
str: Generated prompt string that specifies visual aspects to modify in the video.
"""
try:
with open(templates_path) as f:
templates = json.load(f)
except FileNotFoundError:
raise FileNotFoundError(f"Prompt templates file not found: {templates_path}")
except json.JSONDecodeError:
raise ValueError(f"Invalid JSON in prompt templates file: {templates_path}")
prompt_parts = []
for section_name, section_options in templates.items():
if not isinstance(section_options, list):
continue
if len(section_options) == 0:
continue
selected_option = random.choice(section_options)
prompt_parts.append(selected_option)
return " ".join(prompt_parts)
def main():
# Parse command line arguments
args = parse_args()
prompts = [generate_prompt(args.templates_path) for _ in range(args.num_prompts)]
try:
with open(args.output_path, "w") as f:
for prompt in prompts:
f.write(prompt + "\n")
except Exception as e:
print(f"Failed to write to {args.output_path}: {e}")
if __name__ == "__main__":
main()
{
"env": [
"A robotic arm is picking up and stacking cubes inside a foggy industrial scrapyard at dawn, surrounded by piles of old robotic parts and twisted metal. The background includes large magnetic cranes, rusted conveyor belts, and flickering yellow floodlights struggling to penetrate the fog.",
"A robotic arm is picking up and stacking cubes inside a luxury penthouse showroom during sunset. The background includes minimalist designer furniture, a panoramic view of a glowing city skyline, and hovering autonomous drones offering refreshments.",
"A robotic arm is picking up and stacking cubes within an ancient temple-themed robotics exhibit in a museum. The background includes stone columns with hieroglyphic-style etchings, interactive display panels, and a few museum visitors observing silently from behind glass barriers.",
"A robotic arm is picking up and stacking cubes inside a futuristic daycare facility for children. The background includes robotic toys, soft padded walls, holographic storybooks floating in mid-air, and tiny humanoid robots assisting toddlers.",
"A robotic arm is picking up and stacking cubes inside a deep underwater laboratory where pressure-resistant glass panels reveal a shimmering ocean outside. The background includes jellyfish drifting outside the windows, robotic submarines gliding by, and walls lined with wet-surface equipment panels.",
"A robotic arm is picking up and stacking cubes inside a post-apocalyptic lab, partially collapsed and exposed to the open sky. The background includes ruined machinery, exposed rebar, and a distant city skyline covered in ash and fog.",
"A robotic arm is picking up and stacking cubes in a biotech greenhouse surrounded by lush plant life. The background includes rows of bio-engineered plants, misting systems, and hovering inspection drones checking crop health.",
"A robotic arm is picking up and stacking cubes inside a dark, volcanic research outpost. The background includes robotic arms encased in heat-resistant suits, seismic monitors, and distant lava fountains occasionally illuminating the space.",
"A robotic arm is picking up and stacking cubes inside an icy arctic base, with frost-covered walls and equipment glinting under bright artificial white lights. The background includes heavy-duty heaters, control consoles wrapped in thermal insulation, and a large window looking out onto a frozen tundra with polar winds swirling snow outside.",
"A robotic arm is picking up and stacking cubes inside a zero-gravity chamber on a rotating space habitat. The background includes floating lab instruments, panoramic windows showing stars and Earth in rotation, and astronauts monitoring data.",
"A robotic arm is picking up and stacking cubes inside a mystical tech-art installation blending robotics with generative art. The background includes sculptural robotics, shifting light patterns on the walls, and visitors interacting with the exhibit using gestures.",
"A robotic arm is picking up and stacking cubes in a Martian colony dome, under a terraformed red sky filtering through thick glass. The background includes pressure-locked entry hatches, Martian rovers parked outside, and domed hydroponic farms stretching into the distance.",
"A robotic arm is picking up and stacking cubes inside a high-security military robotics testing bunker, with matte green steel walls and strict order. The background includes surveillance cameras, camouflage netting over equipment racks, and military personnel observing from a secure glass-walled control room.",
"A robotic arm is picking up and stacking cubes inside a retro-futuristic robotics lab from the 1980s with checkered floors and analog computer panels. The background includes CRT monitors with green code, rotary dials, printed schematics on the walls, and operators in lab coats typing on clunky terminals.",
"A robotic arm is picking up and stacking cubes inside a sunken ancient ruin repurposed for modern robotics experiments. The background includes carved pillars, vines creeping through gaps in stone, and scattered crates of modern equipment sitting on ancient floors.",
"A robotic arm is picking up and stacking cubes on a luxury interstellar yacht cruising through deep space. The background includes elegant furnishings, ambient synth music systems, and holographic butlers attending to other passengers.",
"A robotic arm is picking up and stacking cubes in a rebellious underground cybernetic hacker hideout. The background includes graffiti-covered walls, tangled wires, makeshift workbenches, and anonymous figures hunched over terminals with scrolling code.",
"A robotic arm is picking up and stacking cubes inside a dense jungle outpost where technology is being tested in extreme organic environments. The background includes humid control panels, vines creeping onto the robotics table, and occasional wildlife observed from a distance by researchers in camo gear.",
"A robotic arm is picking up and stacking cubes in a minimalist Zen tech temple. The background includes bonsai trees on floating platforms, robotic monks sweeping floors silently, and smooth stone pathways winding through digital meditation alcoves."
],
"robot": [
"The robot arm is matte dark green with yellow diagonal hazard stripes along the upper arm; the joints are rugged and chipped, and the hydraulics are exposed with faded red tubing.",
"The robot arm is worn orange with black caution tape markings near the wrist; the elbow joint is dented and the pistons have visible scarring from long use.",
"The robot arm is steel gray with smooth curved panels and subtle blue stripes running down the length; the joints are sealed tight and the hydraulics have a glossy black casing.",
"The robot arm is bright yellow with alternating black bands around each segment; the joints show minor wear, and the hydraulics gleam with fresh lubrication.",
"The robot arm is navy blue with white serial numbers stenciled along the arm; the joints are well-maintained and the hydraulic shafts are matte silver with no visible dirt.",
"The robot arm is deep red with a matte finish and faint white grid lines across the panels; the joints are squared off and the hydraulic units look compact and embedded.",
"The robot arm is dirty white with dark gray speckled patches from wear; the joints are squeaky with exposed rivets, and the hydraulics are rusted at the base.",
"The robot arm is olive green with chipped paint and a black triangle warning icon near the shoulder; the joints are bulky and the hydraulics leak slightly around the seals.",
"The robot arm is bright teal with a glossy surface and silver stripes on the outer edges; the joints rotate smoothly and the pistons reflect a pale cyan hue.",
"The robot arm is orange-red with carbon fiber textures and white racing-style stripes down the forearm; the joints have minimal play and the hydraulics are tightly sealed in synthetic tubing.",
"The robot arm is flat black with uneven camouflage blotches in dark gray; the joints are reinforced and the hydraulic tubes are dusty and loose-fitting.",
"The robot arm is dull maroon with vertical black grooves etched into the panels; the joints show corrosion on the bolts and the pistons are thick and slow-moving.",
"The robot arm is powder blue with repeating geometric patterns printed in light gray; the joints are square and the hydraulic systems are internal and silent.",
"The robot arm is brushed silver with high-gloss finish and blue LED strips along the seams; the joints are shiny and tight, and the hydraulics hiss softly with every movement.",
"The robot arm is lime green with paint faded from sun exposure and white warning labels near each joint; the hydraulics are scraped and the fittings show heat marks.",
"The robot arm is dusty gray with chevron-style black stripes pointing toward the claw; the joints have uneven wear, and the pistons are dented and slightly bent.",
"The robot arm is cobalt blue with glossy texture and stylized angular black patterns across each segment; the joints are clean and the hydraulics show new flexible tubing.",
"The robot arm is industrial brown with visible welded seams and red caution tape wrapped loosely around the middle section; the joints are clunky and the hydraulics are slow and loud.",
"The robot arm is flat tan with dark green splotches and faint stencil text across the forearm; the joints have dried mud stains and the pistons are partially covered in grime.",
"The robot arm is light orange with chrome hexagon detailing and black number codes on the side; the joints are smooth and the hydraulic actuators shine under the lab lights."
],
"table": [
"The robot arm is mounted on a table that is dull gray metal with scratches and scuff marks across the surface; faint rust rings are visible where older machinery used to be mounted.",
"The robot arm is mounted on a table that is smooth black plastic with a matte finish and faint fingerprint smudges near the edges; corners are slightly worn from regular use.",
"The robot arm is mounted on a table that is light oak wood with a natural grain pattern and a glossy varnish that reflects overhead lights softly; small burn marks dot one corner.",
"The robot arm is mounted on a table that is rough concrete with uneven texture and visible air bubbles; some grease stains and faded yellow paint markings suggest heavy usage.",
"The robot arm is mounted on a table that is brushed aluminum with a clean silver tone and very fine linear grooves; surface reflects light evenly, giving a soft glow.",
"The robot arm is mounted on a table that is pale green composite with chipped corners and scratches revealing darker material beneath; tape residue is stuck along the edges.",
"The robot arm is mounted on a table that is dark brown with a slightly cracked synthetic coating; patches of discoloration suggest exposure to heat or chemicals over time.",
"The robot arm is mounted on a table that is polished steel with mirror-like reflections; every small movement of the robot is mirrored faintly across the surface.",
"The robot arm is mounted on a table that is white with a slightly textured ceramic top, speckled with tiny black dots; the surface is clean but the edges are chipped.",
"The robot arm is mounted on a table that is glossy black glass with a deep shine and minimal dust; any lights above are clearly reflected, and fingerprints are visible under certain angles.",
"The robot arm is mounted on a table that is matte red plastic with wide surface scuffs and paint transfer from other objects; faint gridlines are etched into one side.",
"The robot arm is mounted on a table that is dark navy laminate with a low-sheen surface and subtle wood grain texture; the edge banding is slightly peeling off.",
"The robot arm is mounted on a table that is yellow-painted steel with diagonal black warning stripes running along one side; the paint is scratched and faded in high-contact areas.",
"The robot arm is mounted on a table that is translucent pale blue polymer with internal striations and slight glow under overhead lights; small bubbles are frozen inside the material.",
"The robot arm is mounted on a table that is cold concrete with embedded metal panels bolted into place; the surface has oil stains, welding marks, and tiny debris scattered around.",
"The robot arm is mounted on a table that is shiny chrome with heavy smudging and streaks; the table reflects distorted shapes of everything around it, including the arm itself.",
"The robot arm is mounted on a table that is matte forest green with shallow dents and drag marks from prior mechanical operations; a small sticker label is half-torn in one corner.",
"The robot arm is mounted on a table that is textured black rubber with slight give under pressure; scratches from the robot's base and clamp marks are clearly visible.",
"The robot arm is mounted on a table that is medium gray ceramic tile with visible grout lines and chips along the edges; some tiles have tiny cracks or stains.",
"The robot arm is mounted on a table that is old dark wood with faded polish and visible circular stains from spilled liquids; a few deep grooves are carved into the surface near the center."
],
"cubes": [
"The arm is connected to the base mounted on the table. The bottom cube is deep blue, the second cube is bright red, and the top cube is vivid green, maintaining their correct order after stacking."
],
"light": [
"The lighting is soft and diffused from large windows, allowing daylight to fill the room, creating gentle shadows that elongate throughout the space, with a natural warmth due to the sunlight streaming in.",
"Bright fluorescent tubes overhead cast a harsh, even light across the scene, creating sharp, well-defined shadows under the arm and cubes, with a sterile, clinical feel due to the cold white light.",
"Warm tungsten lights in the ceiling cast a golden glow over the table, creating long, soft shadows and a cozy, welcoming atmosphere. The light contrasts with cool blue tones from the robot arm.",
"The lighting comes from several intense spotlights mounted above, each casting focused beams of light that create stark, dramatic shadows around the cubes and the robotic arm, producing a high-contrast look.",
"A single adjustable desk lamp with a soft white bulb casts a directional pool of light over the cubes, causing deep, hard shadows and a quiet, intimate feel in the dimly lit room.",
"The space is illuminated with bright daylight filtering in through a skylight above, casting diffused, soft shadows and giving the scene a clean and natural look, with a cool tint from the daylight.",
"Soft, ambient lighting from hidden LEDs embedded in the ceiling creates a halo effect around the robotic arm, while subtle, elongated shadows stretch across the table surface, giving a sleek modern vibe.",
"Neon strip lights line the walls, casting a cool blue and purple glow across the scene. The robot and table are bathed in this colored light, producing sharp-edged shadows with a futuristic feel.",
"Bright artificial lights overhead illuminate the scene in a harsh white, with scattered, uneven shadows across the table and robot arm. There's a slight yellow hue to the light, giving it an industrial ambiance.",
"Soft morning sunlight spills through a large open window, casting long shadows across the floor and the robot arm. The warm, golden light creates a peaceful, natural atmosphere with a slight coolness in the shadows.",
"Dim ambient lighting with occasional flashes of bright blue light from overhead digital screens creates a high-tech, slightly eerie atmosphere. The shadows are soft, stretching in an almost surreal manner.",
"Lighting from tall lamps outside the room filters in through large glass doors, casting angled shadows across the table and robot arm. The ambient light creates a relaxing, slightly diffused atmosphere.",
"Artificial overhead lighting casts a harsh, stark white light with little warmth, producing sharply defined, almost clinical shadows on the robot arm and cubes. The space feels cold and industrial.",
"Soft moonlight from a large window at night creates a cool, ethereal glow on the table and arm. The shadows are long and faint, and the lighting provides a calm and serene atmosphere.",
"Bright overhead LED panels illuminate the scene with clean, white light, casting neutral shadows that give the environment a modern, sleek feel with minimal distortion or softness in the shadows.",
"A floodlight positioned outside casts bright, almost blinding natural light through an open door, creating high-contrast, sharp-edged shadows across the table and robot arm, adding dramatic tension to the scene.",
"Dim lighting from vintage tungsten bulbs hanging from the ceiling gives the room a warm, nostalgic glow, casting elongated, soft shadows that provide a cozy atmosphere around the robotic arm.",
"Bright fluorescent lights directly above produce a harsh, clinical light that creates sharp, defined shadows on the table and robotic arm, enhancing the industrial feel of the scene.",
"Neon pink and purple lights flicker softly from the walls, illuminating the robot arm with an intense glow that produces sharp, angular shadows across the cubes. The atmosphere feels futuristic and edgy.",
"Sunlight pouring in from a large, open window bathes the table and robotic arm in a warm golden light. The shadows are soft, and the scene feels natural and inviting with a slight contrast between light and shadow."
]
}
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Script to convert HDF5 demonstration files to MP4 videos.
This script converts camera frames stored in HDF5 demonstration files to MP4 videos.
It supports multiple camera modalities including RGB, segmentation, and normal maps.
The output videos are saved in the specified directory with appropriate naming.
required arguments:
--input_file Path to the input HDF5 file.
--output_dir Directory to save the output MP4 files.
optional arguments:
--input_keys List of input keys to process from the HDF5 file. (default: ["table_cam", "wrist_cam", "table_cam_segmentation", "table_cam_normals", "table_cam_shaded_segmentation"])
--video_height Height of the output video in pixels. (default: 704)
--video_width Width of the output video in pixels. (default: 1280)
--framerate Frames per second for the output video. (default: 30)
"""
# Standard library imports
import argparse
import h5py
import numpy as np
# Third-party imports
import os
import cv2
# Constants
DEFAULT_VIDEO_HEIGHT = 704
DEFAULT_VIDEO_WIDTH = 1280
DEFAULT_INPUT_KEYS = [
"table_cam",
"wrist_cam",
"table_cam_segmentation",
"table_cam_normals",
"table_cam_shaded_segmentation",
"table_cam_depth",
]
DEFAULT_FRAMERATE = 30
LIGHT_SOURCE = np.array([0.0, 0.0, 1.0])
MIN_DEPTH = 0.0
MAX_DEPTH = 1.5
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(description="Convert HDF5 demonstration files to MP4 videos.")
parser.add_argument(
"--input_file",
type=str,
required=True,
help="Path to the input HDF5 file containing demonstration data.",
)
parser.add_argument(
"--output_dir",
type=str,
required=True,
help="Directory path where the output MP4 files will be saved.",
)
parser.add_argument(
"--input_keys",
type=str,
nargs="+",
default=DEFAULT_INPUT_KEYS,
help="List of input keys to process.",
)
parser.add_argument(
"--video_height",
type=int,
default=DEFAULT_VIDEO_HEIGHT,
help="Height of the output video in pixels.",
)
parser.add_argument(
"--video_width",
type=int,
default=DEFAULT_VIDEO_WIDTH,
help="Width of the output video in pixels.",
)
parser.add_argument(
"--framerate",
type=int,
default=DEFAULT_FRAMERATE,
help="Frames per second for the output video.",
)
args = parser.parse_args()
return args
def write_demo_to_mp4(
hdf5_file,
demo_id,
frames_path,
input_key,
output_dir,
video_height,
video_width,
framerate=DEFAULT_FRAMERATE,
):
"""Convert frames from an HDF5 file to an MP4 video.
Args:
hdf5_file (str): Path to the HDF5 file containing the frames.
demo_id (int): ID of the demonstration to convert.
frames_path (str): Path to the frames data in the HDF5 file.
input_key (str): Name of the input key to convert.
output_dir (str): Directory to save the output MP4 file.
video_height (int): Height of the output video in pixels.
video_width (int): Width of the output video in pixels.
framerate (int, optional): Frames per second for the output video. Defaults to 30.
"""
with h5py.File(hdf5_file, "r") as f:
# Get frames based on input key type
if "shaded_segmentation" in input_key:
temp_key = input_key.replace("shaded_segmentation", "segmentation")
frames = f[f"data/demo_{demo_id}/obs/{temp_key}"]
else:
frames = f[frames_path + "/" + input_key]
# Setup video writer
output_path = os.path.join(output_dir, f"demo_{demo_id}_{input_key}.mp4")
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
if "depth" in input_key:
video = cv2.VideoWriter(output_path, fourcc, framerate, (video_width, video_height), isColor=False)
else:
video = cv2.VideoWriter(output_path, fourcc, framerate, (video_width, video_height))
# Process and write frames
for ix, frame in enumerate(frames):
# Convert normal maps to uint8 if needed
if "normals" in input_key:
frame = (frame * 255.0).astype(np.uint8)
# Process shaded segmentation frames
elif "shaded_segmentation" in input_key:
seg = frame[..., :-1]
normals_key = input_key.replace("shaded_segmentation", "normals")
normals = f[f"data/demo_{demo_id}/obs/{normals_key}"][ix]
shade = 0.5 + (normals * LIGHT_SOURCE[None, None, :]).sum(axis=-1) * 0.5
shaded_seg = (shade[..., None] * seg).astype(np.uint8)
frame = np.concatenate((shaded_seg, frame[..., -1:]), axis=-1)
# Convert RGB to BGR
if "depth" not in input_key:
frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
else:
frame = (frame[..., 0] - MIN_DEPTH) / (MAX_DEPTH - MIN_DEPTH)
frame = np.where(frame < 0.01, 1.0, frame)
frame = 1.0 - frame
frame = (frame * 255.0).astype(np.uint8)
# Resize to video resolution
frame = cv2.resize(frame, (video_width, video_height), interpolation=cv2.INTER_CUBIC)
video.write(frame)
video.release()
def get_num_demos(hdf5_file):
"""Get the number of demonstrations in the HDF5 file.
Args:
hdf5_file (str): Path to the HDF5 file.
Returns:
int: Number of demonstrations found in the file.
"""
with h5py.File(hdf5_file, "r") as f:
return len(f["data"].keys())
def main():
"""Main function to convert all demonstrations to MP4 videos."""
# Parse command line arguments
args = parse_args()
# Create output directory if it doesn't exist
os.makedirs(args.output_dir, exist_ok=True)
# Get number of demonstrations from the file
num_demos = get_num_demos(args.input_file)
print(f"Found {num_demos} demonstrations in {args.input_file}")
# Convert each demonstration
for i in range(num_demos):
frames_path = f"data/demo_{str(i)}/obs"
for input_key in args.input_keys:
write_demo_to_mp4(
args.input_file,
i,
frames_path,
input_key,
args.output_dir,
args.video_height,
args.video_width,
args.framerate,
)
if __name__ == "__main__":
main()
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Script to create a new dataset by combining existing HDF5 demonstrations with visually augmented MP4 videos.
This script takes an existing HDF5 dataset containing demonstrations and a directory of MP4 videos
that are visually augmented versions of the original demonstration videos (e.g., with different lighting,
color schemes, or visual effects). It creates a new HDF5 dataset that preserves all the original
demonstration data (actions, robot state, etc.) but replaces the video frames with the augmented versions.
required arguments:
--input_file Path to the input HDF5 file containing original demonstrations.
--output_file Path to save the new HDF5 file with augmented videos.
--videos_dir Directory containing the visually augmented MP4 videos.
"""
# Standard library imports
import argparse
import glob
import h5py
import numpy as np
# Third-party imports
import os
import cv2
def parse_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(description="Create a new dataset with visually augmented videos.")
parser.add_argument(
"--input_file",
type=str,
required=True,
help="Path to the input HDF5 file containing original demonstrations.",
)
parser.add_argument(
"--videos_dir",
type=str,
required=True,
help="Directory containing the visually augmented MP4 videos.",
)
parser.add_argument(
"--output_file",
type=str,
required=True,
help="Path to save the new HDF5 file with augmented videos.",
)
args = parser.parse_args()
return args
def get_frames_from_mp4(video_path, target_height=None, target_width=None):
"""Extract frames from an MP4 video file.
Args:
video_path (str): Path to the MP4 video file.
target_height (int, optional): Target height for resizing frames. If None, no resizing is done.
target_width (int, optional): Target width for resizing frames. If None, no resizing is done.
Returns:
np.ndarray: Array of frames from the video in RGB format.
"""
# Open the video file
video = cv2.VideoCapture(video_path)
# Get video properties
frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
# Read all frames into a numpy array
frames = []
for _ in range(frame_count):
ret, frame = video.read()
if not ret:
break
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
if target_height is not None and target_width is not None:
frame = cv2.resize(frame, (target_width, target_height), interpolation=cv2.INTER_LINEAR)
frames.append(frame)
# Convert to numpy array
frames = np.array(frames).astype(np.uint8)
# Release the video object
video.release()
return frames
def process_video_and_demo(f_in, f_out, video_path, orig_demo_id, new_demo_id):
"""Process a single video and create a new demo with augmented video frames.
Args:
f_in (h5py.File): Input HDF5 file.
f_out (h5py.File): Output HDF5 file.
video_path (str): Path to the augmented video file.
orig_demo_id (int): ID of the original demo to copy.
new_demo_id (int): ID for the new demo.
"""
# Get original demo data
actions = f_in[f"data/demo_{str(orig_demo_id)}/actions"]
eef_pos = f_in[f"data/demo_{str(orig_demo_id)}/obs/eef_pos"]
eef_quat = f_in[f"data/demo_{str(orig_demo_id)}/obs/eef_quat"]
gripper_pos = f_in[f"data/demo_{str(orig_demo_id)}/obs/gripper_pos"]
wrist_cam = f_in[f"data/demo_{str(orig_demo_id)}/obs/wrist_cam"]
# Get original video resolution
orig_video = f_in[f"data/demo_{str(orig_demo_id)}/obs/table_cam"]
target_height, target_width = orig_video.shape[1:3]
# Extract frames from video with original resolution
frames = get_frames_from_mp4(video_path, target_height, target_width)
# Create new datasets
f_out.create_dataset(f"data/demo_{str(new_demo_id)}/actions", data=actions, compression="gzip")
f_out.create_dataset(f"data/demo_{str(new_demo_id)}/obs/eef_pos", data=eef_pos, compression="gzip")
f_out.create_dataset(f"data/demo_{str(new_demo_id)}/obs/eef_quat", data=eef_quat, compression="gzip")
f_out.create_dataset(f"data/demo_{str(new_demo_id)}/obs/gripper_pos", data=gripper_pos, compression="gzip")
f_out.create_dataset(
f"data/demo_{str(new_demo_id)}/obs/table_cam", data=frames.astype(np.uint8), compression="gzip"
)
f_out.create_dataset(f"data/demo_{str(new_demo_id)}/obs/wrist_cam", data=wrist_cam, compression="gzip")
# Copy attributes
f_out[f"data/demo_{str(new_demo_id)}"].attrs["num_samples"] = f_in[f"data/demo_{str(orig_demo_id)}"].attrs[
"num_samples"
]
def main():
"""Main function to create a new dataset with augmented videos."""
# Parse command line arguments
args = parse_args()
# Get list of MP4 videos
search_path = os.path.join(args.videos_dir, "*.mp4")
video_paths = glob.glob(search_path)
video_paths.sort()
print(f"Found {len(video_paths)} MP4 videos in {args.videos_dir}")
# Create output directory if it doesn't exist
os.makedirs(os.path.dirname(args.output_file), exist_ok=True)
with h5py.File(args.input_file, "r") as f_in, h5py.File(args.output_file, "w") as f_out:
# Copy all data from input to output
f_in.copy("data", f_out)
# Get the largest demo ID to start new demos from
demo_ids = [int(key.split("_")[1]) for key in f_in["data"].keys()]
next_demo_id = max(demo_ids) + 1 # noqa: SIM113
print(f"Starting new demos from ID: {next_demo_id}")
# Process each video and create new demo
for video_path in video_paths:
# Extract original demo ID from video filename
video_filename = os.path.basename(video_path)
orig_demo_id = int(video_filename.split("_")[1])
process_video_and_demo(f_in, f_out, video_path, orig_demo_id, next_demo_id)
next_demo_id += 1
print(f"Augmented data saved to {args.output_file}")
if __name__ == "__main__":
main()
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Test cases for Cosmos prompt generation script."""
import json
import os
import tempfile
import unittest
from scripts.tools.cosmos.cosmos_prompt_gen import generate_prompt, main
class TestCosmosPromptGen(unittest.TestCase):
"""Test cases for Cosmos prompt generation functionality."""
@classmethod
def setUpClass(cls):
"""Set up test fixtures that are shared across all test methods."""
# Create temporary templates file
cls.temp_templates_file = tempfile.NamedTemporaryFile(suffix=".json", delete=False)
# Create test templates
test_templates = {
"lighting": ["with bright lighting", "with dim lighting", "with natural lighting"],
"color": ["in warm colors", "in cool colors", "in vibrant colors"],
"style": ["in a realistic style", "in an artistic style", "in a minimalist style"],
"empty_section": [], # Test empty section
"invalid_section": "not a list", # Test invalid section
}
# Write templates to file
with open(cls.temp_templates_file.name, "w") as f:
json.dump(test_templates, f)
def setUp(self):
"""Set up test fixtures that are created for each test method."""
self.temp_output_file = tempfile.NamedTemporaryFile(suffix=".txt", delete=False)
def tearDown(self):
"""Clean up test fixtures after each test method."""
# Remove the temporary output file
os.remove(self.temp_output_file.name)
@classmethod
def tearDownClass(cls):
"""Clean up test fixtures that are shared across all test methods."""
# Remove the temporary templates file
os.remove(cls.temp_templates_file.name)
def test_generate_prompt_valid_templates(self):
"""Test generating a prompt with valid templates."""
prompt = generate_prompt(self.temp_templates_file.name)
# Check that prompt is a string
self.assertIsInstance(prompt, str)
# Check that prompt contains at least one word
self.assertTrue(len(prompt.split()) > 0)
# Check that prompt contains valid sections
valid_sections = ["lighting", "color", "style"]
found_sections = [section for section in valid_sections if section in prompt.lower()]
self.assertTrue(len(found_sections) > 0)
def test_generate_prompt_invalid_file(self):
"""Test generating a prompt with invalid file path."""
with self.assertRaises(FileNotFoundError):
generate_prompt("nonexistent_file.json")
def test_generate_prompt_invalid_json(self):
"""Test generating a prompt with invalid JSON file."""
# Create a temporary file with invalid JSON
with tempfile.NamedTemporaryFile(suffix=".json", delete=False) as temp_file:
temp_file.write(b"invalid json content")
temp_file.flush()
try:
with self.assertRaises(ValueError):
generate_prompt(temp_file.name)
finally:
os.remove(temp_file.name)
def test_main_function_single_prompt(self):
"""Test main function with single prompt generation."""
# Mock command line arguments
import sys
original_argv = sys.argv
sys.argv = [
"cosmos_prompt_gen.py",
"--templates_path",
self.temp_templates_file.name,
"--num_prompts",
"1",
"--output_path",
self.temp_output_file.name,
]
try:
main()
# Check if output file was created
self.assertTrue(os.path.exists(self.temp_output_file.name))
# Check content of output file
with open(self.temp_output_file.name) as f:
content = f.read().strip()
self.assertTrue(len(content) > 0)
self.assertEqual(len(content.split("\n")), 1)
finally:
# Restore original argv
sys.argv = original_argv
def test_main_function_multiple_prompts(self):
"""Test main function with multiple prompt generation."""
# Mock command line arguments
import sys
original_argv = sys.argv
sys.argv = [
"cosmos_prompt_gen.py",
"--templates_path",
self.temp_templates_file.name,
"--num_prompts",
"3",
"--output_path",
self.temp_output_file.name,
]
try:
main()
# Check if output file was created
self.assertTrue(os.path.exists(self.temp_output_file.name))
# Check content of output file
with open(self.temp_output_file.name) as f:
content = f.read().strip()
self.assertTrue(len(content) > 0)
self.assertEqual(len(content.split("\n")), 3)
# Check that each line is a valid prompt
for line in content.split("\n"):
self.assertTrue(len(line) > 0)
finally:
# Restore original argv
sys.argv = original_argv
def test_main_function_default_output(self):
"""Test main function with default output path."""
# Mock command line arguments
import sys
original_argv = sys.argv
sys.argv = ["cosmos_prompt_gen.py", "--templates_path", self.temp_templates_file.name, "--num_prompts", "1"]
try:
main()
# Check if default output file was created
self.assertTrue(os.path.exists("prompts.txt"))
# Clean up default output file
os.remove("prompts.txt")
finally:
# Restore original argv
sys.argv = original_argv
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Test cases for HDF5 to MP4 conversion script."""
import h5py
import numpy as np
import os
import tempfile
import unittest
from scripts.tools.hdf5_to_mp4 import get_num_demos, main, write_demo_to_mp4
class TestHDF5ToMP4(unittest.TestCase):
"""Test cases for HDF5 to MP4 conversion functionality."""
@classmethod
def setUpClass(cls):
"""Set up test fixtures that are shared across all test methods."""
# Create temporary HDF5 file with test data
cls.temp_hdf5_file = tempfile.NamedTemporaryFile(suffix=".h5", delete=False)
with h5py.File(cls.temp_hdf5_file.name, "w") as h5f:
# Create test data structure
for demo_id in range(2): # Create 2 demos
demo_group = h5f.create_group(f"data/demo_{demo_id}/obs")
# Create RGB frames (2 frames per demo)
rgb_data = np.random.randint(0, 255, (2, 704, 1280, 3), dtype=np.uint8)
demo_group.create_dataset("table_cam", data=rgb_data)
# Create segmentation frames
seg_data = np.random.randint(0, 255, (2, 704, 1280, 4), dtype=np.uint8)
demo_group.create_dataset("table_cam_segmentation", data=seg_data)
# Create normal maps
normals_data = np.random.rand(2, 704, 1280, 3).astype(np.float32)
demo_group.create_dataset("table_cam_normals", data=normals_data)
# Create depth maps
depth_data = np.random.rand(2, 704, 1280, 1).astype(np.float32)
demo_group.create_dataset("table_cam_depth", data=depth_data)
def setUp(self):
"""Set up test fixtures that are created for each test method."""
self.temp_output_dir = tempfile.mkdtemp()
def tearDown(self):
"""Clean up test fixtures after each test method."""
# Remove all files in the output directory
for file in os.listdir(self.temp_output_dir):
os.remove(os.path.join(self.temp_output_dir, file))
# Remove the output directory
os.rmdir(self.temp_output_dir)
@classmethod
def tearDownClass(cls):
"""Clean up test fixtures that are shared across all test methods."""
# Remove the temporary HDF5 file
os.remove(cls.temp_hdf5_file.name)
def test_get_num_demos(self):
"""Test the get_num_demos function."""
num_demos = get_num_demos(self.temp_hdf5_file.name)
self.assertEqual(num_demos, 2)
def test_write_demo_to_mp4_rgb(self):
"""Test writing RGB frames to MP4."""
write_demo_to_mp4(self.temp_hdf5_file.name, 0, "data/demo_0/obs", "table_cam", self.temp_output_dir, 704, 1280)
output_file = os.path.join(self.temp_output_dir, "demo_0_table_cam.mp4")
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
def test_write_demo_to_mp4_segmentation(self):
"""Test writing segmentation frames to MP4."""
write_demo_to_mp4(
self.temp_hdf5_file.name, 0, "data/demo_0/obs", "table_cam_segmentation", self.temp_output_dir, 704, 1280
)
output_file = os.path.join(self.temp_output_dir, "demo_0_table_cam_segmentation.mp4")
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
def test_write_demo_to_mp4_normals(self):
"""Test writing normal maps to MP4."""
write_demo_to_mp4(
self.temp_hdf5_file.name, 0, "data/demo_0/obs", "table_cam_normals", self.temp_output_dir, 704, 1280
)
output_file = os.path.join(self.temp_output_dir, "demo_0_table_cam_normals.mp4")
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
def test_write_demo_to_mp4_shaded_segmentation(self):
"""Test writing shaded_segmentation frames to MP4."""
write_demo_to_mp4(
self.temp_hdf5_file.name,
0,
"data/demo_0/obs",
"table_cam_shaded_segmentation",
self.temp_output_dir,
704,
1280,
)
output_file = os.path.join(self.temp_output_dir, "demo_0_table_cam_shaded_segmentation.mp4")
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
def test_write_demo_to_mp4_depth(self):
"""Test writing depth maps to MP4."""
write_demo_to_mp4(
self.temp_hdf5_file.name, 0, "data/demo_0/obs", "table_cam_depth", self.temp_output_dir, 704, 1280
)
output_file = os.path.join(self.temp_output_dir, "demo_0_table_cam_depth.mp4")
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
def test_write_demo_to_mp4_invalid_demo(self):
"""Test writing with invalid demo ID."""
with self.assertRaises(KeyError):
write_demo_to_mp4(
self.temp_hdf5_file.name,
999, # Invalid demo ID
"data/demo_999/obs",
"table_cam",
self.temp_output_dir,
704,
1280,
)
def test_write_demo_to_mp4_invalid_key(self):
"""Test writing with invalid input key."""
with self.assertRaises(KeyError):
write_demo_to_mp4(
self.temp_hdf5_file.name, 0, "data/demo_0/obs", "invalid_key", self.temp_output_dir, 704, 1280
)
def test_main_function(self):
"""Test the main function."""
# Mock command line arguments
import sys
original_argv = sys.argv
sys.argv = [
"hdf5_to_mp4.py",
"--input_file",
self.temp_hdf5_file.name,
"--output_dir",
self.temp_output_dir,
"--input_keys",
"table_cam",
"table_cam_segmentation",
"--video_height",
"704",
"--video_width",
"1280",
"--framerate",
"30",
]
try:
main()
# Check if output files were created
expected_files = [
"demo_0_table_cam.mp4",
"demo_0_table_cam_segmentation.mp4",
"demo_1_table_cam.mp4",
"demo_1_table_cam_segmentation.mp4",
]
for file in expected_files:
output_file = os.path.join(self.temp_output_dir, file)
self.assertTrue(os.path.exists(output_file))
self.assertGreater(os.path.getsize(output_file), 0)
finally:
# Restore original argv
sys.argv = original_argv
if __name__ == "__main__":
unittest.main()
# Copyright (c) 2024-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Test cases for MP4 to HDF5 conversion script."""
import h5py
import numpy as np
import os
import tempfile
import unittest
import cv2
from scripts.tools.mp4_to_hdf5 import get_frames_from_mp4, main, process_video_and_demo
class TestMP4ToHDF5(unittest.TestCase):
"""Test cases for MP4 to HDF5 conversion functionality."""
@classmethod
def setUpClass(cls):
"""Set up test fixtures that are shared across all test methods."""
# Create temporary HDF5 file with test data
cls.temp_hdf5_file = tempfile.NamedTemporaryFile(suffix=".h5", delete=False)
with h5py.File(cls.temp_hdf5_file.name, "w") as h5f:
# Create test data structure for 2 demos
for demo_id in range(2):
demo_group = h5f.create_group(f"data/demo_{demo_id}")
obs_group = demo_group.create_group("obs")
# Create actions data
actions_data = np.random.rand(10, 7).astype(np.float32)
demo_group.create_dataset("actions", data=actions_data)
# Create robot state data
eef_pos_data = np.random.rand(10, 3).astype(np.float32)
eef_quat_data = np.random.rand(10, 4).astype(np.float32)
gripper_pos_data = np.random.rand(10, 1).astype(np.float32)
obs_group.create_dataset("eef_pos", data=eef_pos_data)
obs_group.create_dataset("eef_quat", data=eef_quat_data)
obs_group.create_dataset("gripper_pos", data=gripper_pos_data)
# Create camera data
table_cam_data = np.random.randint(0, 255, (10, 704, 1280, 3), dtype=np.uint8)
wrist_cam_data = np.random.randint(0, 255, (10, 704, 1280, 3), dtype=np.uint8)
obs_group.create_dataset("table_cam", data=table_cam_data)
obs_group.create_dataset("wrist_cam", data=wrist_cam_data)
# Set attributes
demo_group.attrs["num_samples"] = 10
# Create temporary MP4 files
cls.temp_videos_dir = tempfile.mkdtemp()
cls.video_paths = []
for demo_id in range(2):
video_path = os.path.join(cls.temp_videos_dir, f"demo_{demo_id}_table_cam.mp4")
cls.video_paths.append(video_path)
# Create a test video
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
video = cv2.VideoWriter(video_path, fourcc, 30, (1280, 704))
# Write some random frames
for _ in range(10):
frame = np.random.randint(0, 255, (704, 1280, 3), dtype=np.uint8)
video.write(frame)
video.release()
def setUp(self):
"""Set up test fixtures that are created for each test method."""
self.temp_output_file = tempfile.NamedTemporaryFile(suffix=".h5", delete=False)
def tearDown(self):
"""Clean up test fixtures after each test method."""
# Remove the temporary output file
os.remove(self.temp_output_file.name)
@classmethod
def tearDownClass(cls):
"""Clean up test fixtures that are shared across all test methods."""
# Remove the temporary HDF5 file
os.remove(cls.temp_hdf5_file.name)
# Remove temporary videos and directory
for video_path in cls.video_paths:
os.remove(video_path)
os.rmdir(cls.temp_videos_dir)
def test_get_frames_from_mp4(self):
"""Test extracting frames from MP4 video."""
frames = get_frames_from_mp4(self.video_paths[0])
# Check frame properties
self.assertEqual(frames.shape[0], 10) # Number of frames
self.assertEqual(frames.shape[1:], (704, 1280, 3)) # Frame dimensions
self.assertEqual(frames.dtype, np.uint8) # Data type
def test_get_frames_from_mp4_resize(self):
"""Test extracting frames with resizing."""
target_height, target_width = 352, 640
frames = get_frames_from_mp4(self.video_paths[0], target_height, target_width)
# Check resized frame properties
self.assertEqual(frames.shape[0], 10) # Number of frames
self.assertEqual(frames.shape[1:], (target_height, target_width, 3)) # Resized dimensions
self.assertEqual(frames.dtype, np.uint8) # Data type
def test_process_video_and_demo(self):
"""Test processing a single video and creating a new demo."""
with h5py.File(self.temp_hdf5_file.name, "r") as f_in, h5py.File(self.temp_output_file.name, "w") as f_out:
process_video_and_demo(f_in, f_out, self.video_paths[0], 0, 2)
# Check if new demo was created with correct data
self.assertIn("data/demo_2", f_out)
self.assertIn("data/demo_2/actions", f_out)
self.assertIn("data/demo_2/obs/eef_pos", f_out)
self.assertIn("data/demo_2/obs/eef_quat", f_out)
self.assertIn("data/demo_2/obs/gripper_pos", f_out)
self.assertIn("data/demo_2/obs/table_cam", f_out)
self.assertIn("data/demo_2/obs/wrist_cam", f_out)
# Check data shapes
self.assertEqual(f_out["data/demo_2/actions"].shape, (10, 7))
self.assertEqual(f_out["data/demo_2/obs/eef_pos"].shape, (10, 3))
self.assertEqual(f_out["data/demo_2/obs/eef_quat"].shape, (10, 4))
self.assertEqual(f_out["data/demo_2/obs/gripper_pos"].shape, (10, 1))
self.assertEqual(f_out["data/demo_2/obs/table_cam"].shape, (10, 704, 1280, 3))
self.assertEqual(f_out["data/demo_2/obs/wrist_cam"].shape, (10, 704, 1280, 3))
# Check attributes
self.assertEqual(f_out["data/demo_2"].attrs["num_samples"], 10)
def test_main_function(self):
"""Test the main function."""
# Mock command line arguments
import sys
original_argv = sys.argv
sys.argv = [
"mp4_to_hdf5.py",
"--input_file",
self.temp_hdf5_file.name,
"--videos_dir",
self.temp_videos_dir,
"--output_file",
self.temp_output_file.name,
]
try:
main()
# Check if output file was created with correct data
with h5py.File(self.temp_output_file.name, "r") as f:
# Check if original demos were copied
self.assertIn("data/demo_0", f)
self.assertIn("data/demo_1", f)
# Check if new demos were created
self.assertIn("data/demo_2", f)
self.assertIn("data/demo_3", f)
# Check data in new demos
for demo_id in [2, 3]:
self.assertIn(f"data/demo_{demo_id}/actions", f)
self.assertIn(f"data/demo_{demo_id}/obs/eef_pos", f)
self.assertIn(f"data/demo_{demo_id}/obs/eef_quat", f)
self.assertIn(f"data/demo_{demo_id}/obs/gripper_pos", f)
self.assertIn(f"data/demo_{demo_id}/obs/table_cam", f)
self.assertIn(f"data/demo_{demo_id}/obs/wrist_cam", f)
finally:
# Restore original argv
sys.argv = original_argv
if __name__ == "__main__":
unittest.main()
......@@ -281,8 +281,8 @@ Changed
:meth:`~isaaclab.utils.math.quat_apply` and :meth:`~isaaclab.utils.math.quat_apply_inverse` for speed.
0.40.9 (2025-05-19)
~~~~~~~~~~~~~~~~~~~
0.40.10 (2025-05-19)
~~~~~~~~~~~~~~~~~~~~
Fixed
^^^^^
......@@ -291,7 +291,7 @@ Fixed
of assets and sensors.used from the experience files and the double definition is removed.
0.40.8 (2025-01-30)
0.40.9 (2025-01-30)
~~~~~~~~~~~~~~~~~~~
Added
......@@ -301,7 +301,7 @@ Added
in the simulation.
0.40.7 (2025-05-16)
0.40.8 (2025-05-16)
~~~~~~~~~~~~~~~~~~~
Added
......@@ -316,7 +316,7 @@ Changed
resampling call.
0.40.6 (2025-05-16)
0.40.7 (2025-05-16)
~~~~~~~~~~~~~~~~~~~
Fixed
......@@ -325,7 +325,7 @@ Fixed
* Fixed penetration issue for negative border height in :class:`~isaaclab.terrains.terrain_generator.TerrainGeneratorCfg`.
0.40.5 (2025-05-16)
0.40.6 (2025-05-20)
~~~~~~~~~~~~~~~~~~~
Changed
......@@ -340,7 +340,7 @@ Added
* Added :meth:`~isaaclab.utils.math.rigid_body_twist_transform`
0.40.4 (2025-05-15)
0.40.5 (2025-05-15)
~~~~~~~~~~~~~~~~~~~
Fixed
......@@ -354,13 +354,22 @@ Fixed
unused USD camera parameters.
0.40.3 (2025-05-14)
0.40.4 (2025-05-14)
~~~~~~~~~~~~~~~~~~~
* Added a new attribute :attr:`articulation_root_prim_path` to the :class:`~isaaclab.assets.ArticulationCfg` class
to allow explicitly specifying the prim path of the articulation root.
0.40.3 (2025-05-14)
~~~~~~~~~~~~~~~~~~~
Changed
^^^^^^^
* Made modifications to :func:`isaaclab.envs.mdp.image` to handle image normalization for normal maps.
0.40.2 (2025-05-14)
~~~~~~~~~~~~~~~~~~~
......
......@@ -352,7 +352,7 @@ def image(
if (data_type == "distance_to_camera") and convert_perspective_to_orthogonal:
images = math_utils.orthogonalize_perspective_depth(images, sensor.data.intrinsic_matrices)
# rgb/depth image normalization
# rgb/depth/normals image normalization
if normalize:
if data_type == "rgb":
images = images.float() / 255.0
......@@ -360,6 +360,8 @@ def image(
images -= mean_tensor
elif "distance_to" in data_type or "depth" in data_type:
images[images == float("inf")] = 0
elif "normals" in data_type:
images = (images + 1.0) * 0.5
return images.clone()
......
[package]
# Semantic Versioning is used: https://semver.org/
version = "1.0.8"
version = "1.0.9"
# Description
category = "isaaclab"
......
Changelog
---------
1.0.9 (2025-05-20)
~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0`` environment for Cosmos vision stacking.
1.0.8 (2025-05-01)
~~~~~~~~~~~~~~~~~~
......
......@@ -12,6 +12,7 @@ from .franka_stack_ik_abs_mimic_env_cfg import FrankaCubeStackIKAbsMimicEnvCfg
from .franka_stack_ik_rel_blueprint_mimic_env_cfg import FrankaCubeStackIKRelBlueprintMimicEnvCfg
from .franka_stack_ik_rel_mimic_env import FrankaCubeStackIKRelMimicEnv
from .franka_stack_ik_rel_mimic_env_cfg import FrankaCubeStackIKRelMimicEnvCfg
from .franka_stack_ik_rel_visuomotor_cosmos_mimic_env_cfg import FrankaCubeStackIKRelVisuomotorCosmosMimicEnvCfg
from .franka_stack_ik_rel_visuomotor_mimic_env_cfg import FrankaCubeStackIKRelVisuomotorMimicEnvCfg
##
......@@ -53,3 +54,14 @@ gym.register(
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0",
entry_point="isaaclab_mimic.envs:FrankaCubeStackIKRelMimicEnv",
kwargs={
"env_cfg_entry_point": (
franka_stack_ik_rel_visuomotor_cosmos_mimic_env_cfg.FrankaCubeStackIKRelVisuomotorCosmosMimicEnvCfg
),
},
disable_env_checker=True,
)
# Copyright (c) 2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: Apache-2.0
from isaaclab.envs.mimic_env_cfg import MimicEnvCfg, SubTaskConfig
from isaaclab.utils import configclass
from isaaclab_tasks.manager_based.manipulation.stack.config.franka.stack_ik_rel_visuomotor_cosmos_env_cfg import (
FrankaCubeStackVisuomotorCosmosEnvCfg,
)
@configclass
class FrankaCubeStackIKRelVisuomotorCosmosMimicEnvCfg(FrankaCubeStackVisuomotorCosmosEnvCfg, MimicEnvCfg):
"""
Isaac Lab Mimic environment config class for Franka Cube Stack IK Rel Visuomotor Cosmos env.
"""
def __post_init__(self):
# post init of parents
super().__post_init__()
# Override the existing values
self.datagen_config.name = "isaac_lab_franka_stack_ik_rel_visuomotor_cosmos_D0"
self.datagen_config.generation_guarantee = True
self.datagen_config.generation_keep_failed = True
self.datagen_config.generation_num_trials = 10
self.datagen_config.generation_select_src_per_subtask = True
self.datagen_config.generation_transform_first_robot_pose = False
self.datagen_config.generation_interpolate_from_last_target_pose = True
self.datagen_config.generation_relative = True
self.datagen_config.max_num_failures = 25
self.datagen_config.seed = 1
# The following are the subtask configurations for the stack task.
subtask_configs = []
subtask_configs.append(
SubTaskConfig(
# Each subtask involves manipulation with respect to a single object frame.
object_ref="cube_2",
# This key corresponds to the binary indicator in "datagen_info" that signals
# when this subtask is finished (e.g., on a 0 to 1 edge).
subtask_term_signal="grasp_1",
# Specifies time offsets for data generation when splitting a trajectory into
# subtask segments. Random offsets are added to the termination boundary.
subtask_term_offset_range=(10, 20),
# Selection strategy for the source subtask segment during data generation
selection_strategy="nearest_neighbor_object",
# Optional parameters for the selection strategy function
selection_strategy_kwargs={"nn_k": 3},
# Amount of action noise to apply during this subtask
action_noise=0.03,
# Number of interpolation steps to bridge to this subtask segment
num_interpolation_steps=5,
# Additional fixed steps for the robot to reach the necessary pose
num_fixed_steps=0,
# If True, apply action noise during the interpolation phase and execution
apply_noise_during_interpolation=False,
)
)
subtask_configs.append(
SubTaskConfig(
# Each subtask involves manipulation with respect to a single object frame.
object_ref="cube_1",
# Corresponding key for the binary indicator in "datagen_info" for completion
subtask_term_signal="stack_1",
# Time offsets for data generation when splitting a trajectory
subtask_term_offset_range=(10, 20),
# Selection strategy for source subtask segment
selection_strategy="nearest_neighbor_object",
# Optional parameters for the selection strategy function
selection_strategy_kwargs={"nn_k": 3},
# Amount of action noise to apply during this subtask
action_noise=0.03,
# Number of interpolation steps to bridge to this subtask segment
num_interpolation_steps=5,
# Additional fixed steps for the robot to reach the necessary pose
num_fixed_steps=0,
# If True, apply action noise during the interpolation phase and execution
apply_noise_during_interpolation=False,
)
)
subtask_configs.append(
SubTaskConfig(
# Each subtask involves manipulation with respect to a single object frame.
object_ref="cube_3",
# Corresponding key for the binary indicator in "datagen_info" for completion
subtask_term_signal="grasp_2",
# Time offsets for data generation when splitting a trajectory
subtask_term_offset_range=(10, 20),
# Selection strategy for source subtask segment
selection_strategy="nearest_neighbor_object",
# Optional parameters for the selection strategy function
selection_strategy_kwargs={"nn_k": 3},
# Amount of action noise to apply during this subtask
action_noise=0.03,
# Number of interpolation steps to bridge to this subtask segment
num_interpolation_steps=5,
# Additional fixed steps for the robot to reach the necessary pose
num_fixed_steps=0,
# If True, apply action noise during the interpolation phase and execution
apply_noise_during_interpolation=False,
)
)
subtask_configs.append(
SubTaskConfig(
# Each subtask involves manipulation with respect to a single object frame.
object_ref="cube_2",
# End of final subtask does not need to be detected
subtask_term_signal=None,
# No time offsets for the final subtask
subtask_term_offset_range=(0, 0),
# Selection strategy for source subtask segment
selection_strategy="nearest_neighbor_object",
# Optional parameters for the selection strategy function
selection_strategy_kwargs={"nn_k": 3},
# Amount of action noise to apply during this subtask
action_noise=0.03,
# Number of interpolation steps to bridge to this subtask segment
num_interpolation_steps=5,
# Additional fixed steps for the robot to reach the necessary pose
num_fixed_steps=0,
# If True, apply action noise during the interpolation phase and execution
apply_noise_during_interpolation=False,
)
)
self.subtask_configs["franka"] = subtask_configs
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.10.39"
version = "0.10.40"
# Description
title = "Isaac Lab Environments"
......
Changelog
---------
0.10.39 (2025-06-26)
0.10.40 (2025-06-26)
~~~~~~~~~~~~~~~~~~~~
Fixed
......@@ -10,7 +10,7 @@ Fixed
* Relaxed upper range pin for protobuf python dependency for more permissive installation.
0.10.38 (2025-05-22)
0.10.39 (2025-05-22)
~~~~~~~~~~~~~~~~~~~~
Fixed
......@@ -19,7 +19,7 @@ Fixed
* Fixed redundant body_names assignment in rough_env_cfg.py for H1 robot.
0.10.37 (2025-06-16)
0.10.38 (2025-06-16)
~~~~~~~~~~~~~~~~~~~~
Changed
......@@ -28,7 +28,7 @@ Changed
* Show available RL library configs on error message when an entry point key is not available for a given task.
0.10.36 (2025-05-15)
0.10.37 (2025-05-15)
~~~~~~~~~~~~~~~~~~~~
Added
......@@ -38,7 +38,7 @@ Added
implements assembly tasks to insert pegs into their corresponding sockets.
0.10.35 (2025-05-21)
0.10.36 (2025-05-21)
~~~~~~~~~~~~~~~~~~~~
Added
......@@ -48,6 +48,21 @@ Added
can be pushed to a visualization dashboard to track improvements or regressions.
0.10.35 (2025-05-21)
~~~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-v0`` stacking environment with multi-modality camera inputs at higher resolution.
Changed
^^^^^^^
* Updated the ``Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0`` stacking environment to support visual domain randomization events during model evaluation.
* Made the task termination condition for the stacking task more strict.
0.10.34 (2025-05-22)
~~~~~~~~~~~~~~~~~~~~
......
......@@ -11,6 +11,7 @@ from . import (
stack_ik_rel_blueprint_env_cfg,
stack_ik_rel_env_cfg,
stack_ik_rel_instance_randomize_env_cfg,
stack_ik_rel_visuomotor_cosmos_env_cfg,
stack_ik_rel_visuomotor_env_cfg,
stack_joint_pos_env_cfg,
stack_joint_pos_instance_randomize_env_cfg,
......@@ -67,6 +68,16 @@ gym.register(
disable_env_checker=True,
)
gym.register(
id="Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-v0",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point": stack_ik_rel_visuomotor_cosmos_env_cfg.FrankaCubeStackVisuomotorCosmosEnvCfg,
"robomimic_bc_cfg_entry_point": os.path.join(agents.__path__[0], "robomimic/bc_rnn_image_cosmos.json"),
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Stack-Cube-Franka-IK-Abs-v0",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
......
{
"algo_name": "bc",
"experiment": {
"name": "bc_rnn_image_franka_stack_cosmos",
"validate": false,
"logging": {
"terminal_output_to_txt": true,
"log_tb": true
},
"save": {
"enabled": true,
"every_n_seconds": null,
"every_n_epochs": 20,
"epochs": [],
"on_best_validation": false,
"on_best_rollout_return": false,
"on_best_rollout_success_rate": true
},
"epoch_every_n_steps": 500,
"env": null,
"additional_envs": null,
"render": false,
"render_video": false,
"rollout": {
"enabled": false
}
},
"train": {
"data": null,
"num_data_workers": 4,
"hdf5_cache_mode": "low_dim",
"hdf5_use_swmr": true,
"hdf5_load_next_obs": false,
"hdf5_normalize_obs": false,
"hdf5_filter_key": null,
"hdf5_validation_filter_key": null,
"seq_length": 10,
"pad_seq_length": true,
"frame_stack": 1,
"pad_frame_stack": true,
"dataset_keys": [
"actions",
"rewards",
"dones"
],
"goal_mode": null,
"cuda": true,
"batch_size": 16,
"num_epochs": 600,
"seed": 101
},
"algo": {
"optim_params": {
"policy": {
"optimizer_type": "adam",
"learning_rate": {
"initial": 0.0001,
"decay_factor": 0.1,
"epoch_schedule": [],
"scheduler_type": "multistep"
},
"regularization": {
"L2": 0.0
}
}
},
"loss": {
"l2_weight": 1.0,
"l1_weight": 0.0,
"cos_weight": 0.0
},
"actor_layer_dims": [],
"gaussian": {
"enabled": false,
"fixed_std": false,
"init_std": 0.1,
"min_std": 0.01,
"std_activation": "softplus",
"low_noise_eval": true
},
"gmm": {
"enabled": true,
"num_modes": 5,
"min_std": 0.0001,
"std_activation": "softplus",
"low_noise_eval": true
},
"vae": {
"enabled": false,
"latent_dim": 14,
"latent_clip": null,
"kl_weight": 1.0,
"decoder": {
"is_conditioned": true,
"reconstruction_sum_across_elements": false
},
"prior": {
"learn": false,
"is_conditioned": false,
"use_gmm": false,
"gmm_num_modes": 10,
"gmm_learn_weights": false,
"use_categorical": false,
"categorical_dim": 10,
"categorical_gumbel_softmax_hard": false,
"categorical_init_temp": 1.0,
"categorical_temp_anneal_step": 0.001,
"categorical_min_temp": 0.3
},
"encoder_layer_dims": [
300,
400
],
"decoder_layer_dims": [
300,
400
],
"prior_layer_dims": [
300,
400
]
},
"rnn": {
"enabled": true,
"horizon": 10,
"hidden_dim": 1000,
"rnn_type": "LSTM",
"num_layers": 2,
"open_loop": false,
"kwargs": {
"bidirectional": false
}
},
"transformer": {
"enabled": false,
"context_length": 10,
"embed_dim": 512,
"num_layers": 6,
"num_heads": 8,
"emb_dropout": 0.1,
"attn_dropout": 0.1,
"block_output_dropout": 0.1,
"sinusoidal_embedding": false,
"activation": "gelu",
"supervise_all_steps": false,
"nn_parameter_for_timesteps": true
}
},
"observation": {
"modalities": {
"obs": {
"low_dim": [
"eef_pos",
"eef_quat",
"gripper_pos"
],
"rgb": [
"table_cam"
],
"depth": [],
"scan": []
},
"goal": {
"low_dim": [],
"rgb": [],
"depth": [],
"scan": []
}
},
"encoder": {
"low_dim": {
"core_class": null,
"core_kwargs": {},
"obs_randomizer_class": null,
"obs_randomizer_kwargs": {}
},
"rgb": {
"core_class": "VisualCore",
"core_kwargs": {
"feature_dimension": 64,
"flatten": true,
"backbone_class": "ResNet18Conv",
"backbone_kwargs": {
"pretrained": false,
"input_coord_conv": false
},
"pool_class": "SpatialSoftmax",
"pool_kwargs": {
"num_kp": 32,
"learnable_temperature": false,
"temperature": 1.0,
"noise_std": 0.0,
"output_variance": false
}
},
"obs_randomizer_class": "CropRandomizer",
"obs_randomizer_kwargs": {
"crop_height": 180,
"crop_width": 180,
"num_crops": 1,
"pos_enc": false
}
},
"depth": {
"core_class": "VisualCore",
"core_kwargs": {},
"obs_randomizer_class": null,
"obs_randomizer_kwargs": {}
},
"scan": {
"core_class": "ScanCore",
"core_kwargs": {},
"obs_randomizer_class": null,
"obs_randomizer_kwargs": {}
}
}
}
}
# Copyright (c) 2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
import isaaclab.sim as sim_utils
from isaaclab.managers import ObservationGroupCfg as ObsGroup
from isaaclab.managers import ObservationTermCfg as ObsTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.sensors import CameraCfg
from isaaclab.utils import configclass
from isaaclab_tasks.manager_based.manipulation.stack import mdp
from . import stack_ik_rel_visuomotor_env_cfg
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group with state values."""
actions = ObsTerm(func=mdp.last_action)
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object = ObsTerm(func=mdp.object_obs)
cube_positions = ObsTerm(func=mdp.cube_positions_in_world_frame)
cube_orientations = ObsTerm(func=mdp.cube_orientations_in_world_frame)
eef_pos = ObsTerm(func=mdp.ee_frame_pos)
eef_quat = ObsTerm(func=mdp.ee_frame_quat)
gripper_pos = ObsTerm(func=mdp.gripper_pos)
table_cam = ObsTerm(
func=mdp.image, params={"sensor_cfg": SceneEntityCfg("table_cam"), "data_type": "rgb", "normalize": False}
)
wrist_cam = ObsTerm(
func=mdp.image, params={"sensor_cfg": SceneEntityCfg("wrist_cam"), "data_type": "rgb", "normalize": False}
)
table_cam_segmentation = ObsTerm(
func=mdp.image,
params={"sensor_cfg": SceneEntityCfg("table_cam"), "data_type": "semantic_segmentation", "normalize": True},
)
table_cam_normals = ObsTerm(
func=mdp.image,
params={"sensor_cfg": SceneEntityCfg("table_cam"), "data_type": "normals", "normalize": True},
)
table_cam_depth = ObsTerm(
func=mdp.image,
params={
"sensor_cfg": SceneEntityCfg("table_cam"),
"data_type": "distance_to_image_plane",
"normalize": True,
},
)
def __post_init__(self):
self.enable_corruption = False
self.concatenate_terms = False
@configclass
class SubtaskCfg(ObsGroup):
"""Observations for subtask group."""
grasp_1 = ObsTerm(
func=mdp.object_grasped,
params={
"robot_cfg": SceneEntityCfg("robot"),
"ee_frame_cfg": SceneEntityCfg("ee_frame"),
"object_cfg": SceneEntityCfg("cube_2"),
},
)
stack_1 = ObsTerm(
func=mdp.object_stacked,
params={
"robot_cfg": SceneEntityCfg("robot"),
"upper_object_cfg": SceneEntityCfg("cube_2"),
"lower_object_cfg": SceneEntityCfg("cube_1"),
},
)
grasp_2 = ObsTerm(
func=mdp.object_grasped,
params={
"robot_cfg": SceneEntityCfg("robot"),
"ee_frame_cfg": SceneEntityCfg("ee_frame"),
"object_cfg": SceneEntityCfg("cube_3"),
},
)
def __post_init__(self):
self.enable_corruption = False
self.concatenate_terms = False
# observation groups
policy: PolicyCfg = PolicyCfg()
subtask_terms: SubtaskCfg = SubtaskCfg()
@configclass
class FrankaCubeStackVisuomotorCosmosEnvCfg(stack_ik_rel_visuomotor_env_cfg.FrankaCubeStackVisuomotorEnvCfg):
observations: ObservationsCfg = ObservationsCfg()
def __post_init__(self):
# post init of parent
super().__post_init__()
SEMANTIC_MAPPING = {
"class:cube_1": (120, 230, 255, 255),
"class:cube_2": (255, 36, 66, 255),
"class:cube_3": (55, 255, 139, 255),
"class:table": (255, 237, 218, 255),
"class:ground": (100, 100, 100, 255),
"class:robot": (204, 110, 248, 255),
"class:UNLABELLED": (150, 150, 150, 255),
"class:BACKGROUND": (200, 200, 200, 255),
}
# Set cameras
# Set wrist camera
self.scene.wrist_cam = CameraCfg(
prim_path="{ENV_REGEX_NS}/Robot/panda_hand/wrist_cam",
update_period=0.0,
height=200,
width=200,
data_types=["rgb", "distance_to_image_plane"],
spawn=sim_utils.PinholeCameraCfg(
focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 2)
),
offset=CameraCfg.OffsetCfg(
pos=(0.13, 0.0, -0.15), rot=(-0.70614, 0.03701, 0.03701, -0.70614), convention="ros"
),
)
# Set table view camera
self.scene.table_cam = CameraCfg(
prim_path="{ENV_REGEX_NS}/table_cam",
update_period=0.0,
height=200,
width=200,
data_types=["rgb", "semantic_segmentation", "normals", "distance_to_image_plane"],
colorize_semantic_segmentation=True,
semantic_segmentation_mapping=SEMANTIC_MAPPING,
spawn=sim_utils.PinholeCameraCfg(
focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 2)
),
offset=CameraCfg.OffsetCfg(
pos=(1.0, 0.0, 0.4), rot=(0.35355, -0.61237, -0.61237, 0.35355), convention="ros"
),
)
# Set settings for camera rendering
self.rerender_on_reset = True
self.sim.render.antialiasing_mode = "OFF" # disable dlss
# List of image observations in policy observations
self.image_obs_list = ["table_cam", "wrist_cam"]
......@@ -11,13 +11,17 @@
import isaaclab.sim as sim_utils
from isaaclab.controllers.differential_ik_cfg import DifferentialIKControllerCfg
from isaaclab.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
from isaaclab.managers import EventTermCfg as EventTerm
from isaaclab.managers import ObservationGroupCfg as ObsGroup
from isaaclab.managers import ObservationTermCfg as ObsTerm
from isaaclab.managers import SceneEntityCfg
from isaaclab.sensors import CameraCfg
from isaaclab.utils import configclass
from isaaclab.utils.assets import ISAAC_NUCLEUS_DIR, NVIDIA_NUCLEUS_DIR
from isaaclab_tasks.manager_based.manipulation.stack import mdp
from isaaclab_tasks.manager_based.manipulation.stack.mdp import franka_stack_events
from ... import mdp
from . import stack_joint_pos_env_cfg
##
......@@ -26,6 +30,84 @@ from . import stack_joint_pos_env_cfg
from isaaclab_assets.robots.franka import FRANKA_PANDA_HIGH_PD_CFG # isort: skip
@configclass
class EventCfg(stack_joint_pos_env_cfg.EventCfg):
"""Configuration for events."""
randomize_light = EventTerm(
func=franka_stack_events.randomize_scene_lighting_domelight,
mode="reset",
params={
"intensity_range": (1500.0, 10000.0),
"color_variation": 0.4,
"textures": [
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Cloudy/abandoned_parking_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Cloudy/evening_road_01_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Cloudy/lakeside_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/autoshop_01_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/carpentry_shop_01_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/hospital_room_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/hotel_room_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/old_bus_depot_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/small_empty_house_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Indoor/surgery_4k.hdr",
f"{NVIDIA_NUCLEUS_DIR}/Assets/Skies/Studio/photo_studio_01_4k.hdr",
],
"default_intensity": 3000.0,
"default_color": (0.75, 0.75, 0.75),
"default_texture": "",
},
)
randomize_table_visual_material = EventTerm(
func=franka_stack_events.randomize_visual_texture_material,
mode="reset",
params={
"asset_cfg": SceneEntityCfg("table"),
"textures": [
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Ash/Ash_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Bamboo_Planks/Bamboo_Planks_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Birch/Birch_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Cherry/Cherry_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Mahogany_Planks/Mahogany_Planks_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Oak/Oak_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Plywood/Plywood_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Timber/Timber_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Timber_Cladding/Timber_Cladding_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Wood/Walnut_Planks/Walnut_Planks_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Stone/Marble/Marble_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Steel_Stainless/Steel_Stainless_BaseColor.png",
],
"default_texture": (
f"{ISAAC_NUCLEUS_DIR}/Props/Mounts/SeattleLabTable/Materials/Textures/DemoTable_TableBase_BaseColor.png"
),
},
)
randomize_robot_arm_visual_texture = EventTerm(
func=franka_stack_events.randomize_visual_texture_material,
mode="reset",
params={
"asset_cfg": SceneEntityCfg("robot"),
"textures": [
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Aluminum_Cast/Aluminum_Cast_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Aluminum_Polished/Aluminum_Polished_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Brass/Brass_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Bronze/Bronze_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Brushed_Antique_Copper/Brushed_Antique_Copper_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Cast_Metal_Silver_Vein/Cast_Metal_Silver_Vein_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Copper/Copper_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Gold/Gold_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Iron/Iron_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/RustedMetal/RustedMetal_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Silver/Silver_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Steel_Carbon/Steel_Carbon_BaseColor.png",
f"{NVIDIA_NUCLEUS_DIR}/Materials/Base/Metals/Steel_Stainless/Steel_Stainless_BaseColor.png",
],
},
)
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
......@@ -96,13 +178,21 @@ class ObservationsCfg:
class FrankaCubeStackVisuomotorEnvCfg(stack_joint_pos_env_cfg.FrankaCubeStackEnvCfg):
observations: ObservationsCfg = ObservationsCfg()
# Evaluation settings
eval_mode = False
eval_type = None
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set events
self.events = EventCfg()
# Set Franka as robot
# We switch here to a stiffer PD controller for IK tracking to be better.
self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
self.scene.robot.spawn.semantic_tags = [("class", "robot")]
# Set actions for the specific robot type (franka)
self.actions.arm_action = DifferentialInverseKinematicsActionCfg(
......
......@@ -11,6 +11,8 @@ import random
import torch
from typing import TYPE_CHECKING
from isaacsim.core.utils.extensions import enable_extension
import isaaclab.utils.math as math_utils
from isaaclab.assets import Articulation, AssetBase
from isaaclab.managers import SceneEntityCfg
......@@ -57,21 +59,75 @@ def randomize_joint_by_gaussian_offset(
asset.write_joint_state_to_sim(joint_pos, joint_vel, env_ids=env_ids)
def sample_random_color(base=(0.75, 0.75, 0.75), variation=0.1):
"""
Generates a randomized color that stays close to the base color while preserving overall brightness.
The relative balance between the R, G, and B components is maintained by ensuring that
the sum of random offsets is zero.
Parameters:
base (tuple): The base RGB color with each component between 0 and 1.
variation (float): Maximum deviation to sample for each channel before balancing.
Returns:
tuple: A new RGB color with balanced random variation.
"""
# Generate random offsets for each channel in the range [-variation, variation]
offsets = [random.uniform(-variation, variation) for _ in range(3)]
# Compute the average offset
avg_offset = sum(offsets) / 3
# Adjust offsets so their sum is zero (maintaining brightness)
balanced_offsets = [offset - avg_offset for offset in offsets]
# Apply the balanced offsets to the base color and clamp each channel between 0 and 1
new_color = tuple(max(0, min(1, base_component + offset)) for base_component, offset in zip(base, balanced_offsets))
return new_color
def randomize_scene_lighting_domelight(
env: ManagerBasedEnv,
env_ids: torch.Tensor,
intensity_range: tuple[float, float],
color_variation: float,
textures: list[str],
default_intensity: float = 3000.0,
default_color: tuple[float, float, float] = (0.75, 0.75, 0.75),
default_texture: str = "",
asset_cfg: SceneEntityCfg = SceneEntityCfg("light"),
):
asset: AssetBase = env.scene[asset_cfg.name]
light_prim = asset.prims[0]
# Sample new light intensity
new_intensity = random.uniform(intensity_range[0], intensity_range[1])
# Set light intensity to light prim
intensity_attr = light_prim.GetAttribute("inputs:intensity")
intensity_attr.Set(new_intensity)
intensity_attr.Set(default_intensity)
color_attr = light_prim.GetAttribute("inputs:color")
color_attr.Set(default_color)
texture_file_attr = light_prim.GetAttribute("inputs:texture:file")
texture_file_attr.Set(default_texture)
if not hasattr(env.cfg, "eval_mode") or not env.cfg.eval_mode:
return
if env.cfg.eval_type in ["light_intensity", "all"]:
# Sample new light intensity
new_intensity = random.uniform(intensity_range[0], intensity_range[1])
# Set light intensity to light prim
intensity_attr.Set(new_intensity)
if env.cfg.eval_type in ["light_color", "all"]:
# Sample new light color
new_color = sample_random_color(base=default_color, variation=color_variation)
# Set light color to light prim
color_attr.Set(new_color)
if env.cfg.eval_type in ["light_texture", "all"]:
# Sample new light texture (background)
new_texture = random.sample(textures, 1)[0]
# Set light texture to light prim
texture_file_attr.Set(new_texture)
def sample_object_poses(
......@@ -184,3 +240,75 @@ def randomize_rigid_objects_in_focus(
)
env.rigid_objects_in_focus.append(selected_ids)
def randomize_visual_texture_material(
env: ManagerBasedEnv,
env_ids: torch.Tensor,
asset_cfg: SceneEntityCfg,
textures: list[str],
default_texture: str = "",
texture_rotation: tuple[float, float] = (0.0, 0.0),
):
"""Randomize the visual texture of bodies on an asset using Replicator API.
This function randomizes the visual texture of the bodies of the asset using the Replicator API.
The function samples random textures from the given texture paths and applies them to the bodies
of the asset. The textures are projected onto the bodies and rotated by the given angles.
.. note::
The function assumes that the asset follows the prim naming convention as:
"{asset_prim_path}/{body_name}/visuals" where the body name is the name of the body to
which the texture is applied. This is the default prim ordering when importing assets
from the asset converters in Isaac Lab.
.. note::
When randomizing the texture of individual assets, please make sure to set
:attr:`isaaclab.scene.InteractiveSceneCfg.replicate_physics` to False. This ensures that physics
parser will parse the individual asset properties separately.
"""
if hasattr(env.cfg, "eval_mode") and (
not env.cfg.eval_mode or env.cfg.eval_type not in [f"{asset_cfg.name}_texture", "all"]
):
return
# textures = [default_texture]
# enable replicator extension if not already enabled
enable_extension("omni.replicator.core")
# we import the module here since we may not always need the replicator
import omni.replicator.core as rep
# check to make sure replicate_physics is set to False, else raise error
# note: We add an explicit check here since texture randomization can happen outside of 'prestartup' mode
# and the event manager doesn't check in that case.
if env.cfg.scene.replicate_physics:
raise RuntimeError(
"Unable to randomize visual texture material with scene replication enabled."
" For stable USD-level randomization, please disable scene replication"
" by setting 'replicate_physics' to False in 'InteractiveSceneCfg'."
)
# convert from radians to degrees
texture_rotation = tuple(math.degrees(angle) for angle in texture_rotation)
# obtain the asset entity
asset = env.scene[asset_cfg.name]
# join all bodies in the asset
body_names = asset_cfg.body_names
if isinstance(body_names, str):
body_names_regex = body_names
elif isinstance(body_names, list):
body_names_regex = "|".join(body_names)
else:
body_names_regex = ".*"
if not hasattr(asset, "cfg"):
prims_group = rep.get.prims(path_pattern=f"{asset.prim_paths[0]}/visuals")
else:
prims_group = rep.get.prims(path_pattern=f"{asset.cfg.prim_path}/{body_names_regex}/visuals")
with prims_group:
rep.randomizer.texture(
textures=textures, project_uvw=True, texture_rotate=rep.distribution.uniform(*texture_rotation)
)
......@@ -27,7 +27,7 @@ def cubes_stacked(
cube_1_cfg: SceneEntityCfg = SceneEntityCfg("cube_1"),
cube_2_cfg: SceneEntityCfg = SceneEntityCfg("cube_2"),
cube_3_cfg: SceneEntityCfg = SceneEntityCfg("cube_3"),
xy_threshold: float = 0.05,
xy_threshold: float = 0.04,
height_threshold: float = 0.005,
height_diff: float = 0.0468,
gripper_open_val: torch.tensor = torch.tensor([0.04]),
......@@ -53,7 +53,9 @@ def cubes_stacked(
# Check cube positions
stacked = torch.logical_and(xy_dist_c12 < xy_threshold, xy_dist_c23 < xy_threshold)
stacked = torch.logical_and(h_dist_c12 - height_diff < height_threshold, stacked)
stacked = torch.logical_and(pos_diff_c12[:, 2] < 0.0, stacked)
stacked = torch.logical_and(h_dist_c23 - height_diff < height_threshold, stacked)
stacked = torch.logical_and(pos_diff_c23[:, 2] < 0.0, stacked)
# Check gripper positions
stacked = torch.logical_and(
......
......@@ -69,6 +69,7 @@ def test_environments(task_name, num_envs, device):
"Isaac-Stack-Cube-Instance-Randomize-Franka-IK-Rel-v0",
"Isaac-Stack-Cube-Instance-Randomize-Franka-v0",
"Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-v0",
"Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-v0",
]:
return
# skip automate environments as they require cuda installation
......
......@@ -126,24 +126,28 @@ def pytest_sessionstart(session):
"""Intercept pytest startup to execute tests in the correct order."""
# Get the workspace root directory (one level up from tools)
workspace_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
source_dir = os.path.join(workspace_root, "source")
source_dirs = [
os.path.join(workspace_root, "scripts"),
os.path.join(workspace_root, "source"),
]
if not os.path.exists(source_dir):
print(f"Error: source directory not found at {source_dir}")
pytest.exit("Source directory not found", returncode=1)
# Get all test files in the source directory
# Get all test files in the source directories
test_files = []
for root, _, files in os.walk(source_dir):
for file in files:
if file.startswith("test_") and file.endswith(".py"):
# Skip if the file is in TESTS_TO_SKIP
if file in test_settings.TESTS_TO_SKIP:
print(f"Skipping {file} as it's in the skip list")
continue
full_path = os.path.join(root, file)
test_files.append(full_path)
for source_dir in source_dirs:
if not os.path.exists(source_dir):
print(f"Error: source directory not found at {source_dir}")
pytest.exit("Source directory not found", returncode=1)
for root, _, files in os.walk(source_dir):
for file in files:
if file.startswith("test_") and file.endswith(".py"):
# Skip if the file is in TESTS_TO_SKIP
if file in test_settings.TESTS_TO_SKIP:
print(f"Skipping {file} as it's in the skip list")
continue
full_path = os.path.join(root, file)
test_files.append(full_path)
if not test_files:
print("No test files found in source directory")
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment