Commit 21009741 authored by shauryadNv's avatar shauryadNv Committed by Kelly Guo

Fixes docs for Mimic Cosmos pipeline (#493)

# Description

Fixes to Mimic-Cosmos documentation as per QA feedback.

## Type of change

- Bug fix (non-breaking change which fixes an issue)
- This change requires a documentation update

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 2afe12cf
...@@ -24,7 +24,7 @@ In the following example, we will show you how to use Isaac Lab Mimic to generat ...@@ -24,7 +24,7 @@ In the following example, we will show you how to use Isaac Lab Mimic to generat
--device cpu --enable_cameras --headless --num_envs 10 --generation_num_trials 1000 \ --device cpu --enable_cameras --headless --num_envs 10 --generation_num_trials 1000 \
--input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/mimic_dataset_1k.hdf5 \ --input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/mimic_dataset_1k.hdf5 \
--task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0 \ --task Isaac-Stack-Cube-Franka-IK-Rel-Visuomotor-Cosmos-Mimic-v0 \
--rendering_mode performance --rendering_mode balanced
The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task. The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task.
...@@ -92,14 +92,14 @@ We use the RGB, depth and shaded segmentation videos from the previous step as i ...@@ -92,14 +92,14 @@ We use the RGB, depth and shaded segmentation videos from the previous step as i
:align: center :align: center
:alt: RGB, depth and segmentation control inputs to Cosmos :alt: RGB, depth and segmentation control inputs to Cosmos
We provide an example augmentation output from `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1>`_ below: We provide an example augmentation output from `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1/tree/1dca0bd1b3060d112bfec23c287572beabff9758>`_ below:
.. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/cosmos_output.gif .. figure:: https://download.isaacsim.omniverse.nvidia.com/isaaclab/images/cosmos_output.gif
:width: 100% :width: 100%
:align: center :align: center
:alt: Cosmos Transfer1 augmentation output :alt: Cosmos Transfer1 augmentation output
We recommend using the `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1>`_ model for visual augmentation as we found it to produce the best results in the form of a highly diverse dataset with a wide range of visual variations. You can refer to `this example <https://github.com/nvidia-cosmos/cosmos-transfer1/blob/main/examples/inference_cosmos_transfer1_7b.md#example-2-multimodal-control>`_ for reference on how to use Transfer1 for this usecase. We further recommend the following settings to be used with the Transfer1 model for this task: We recommend using the `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmos-transfer1/tree/1dca0bd1b3060d112bfec23c287572beabff9758>`_ model for visual augmentation as we found it to produce the best results in the form of a highly diverse dataset with a wide range of visual variations. You can refer to `this example <https://github.com/nvidia-cosmos/cosmos-transfer1/blob/1dca0bd1b3060d112bfec23c287572beabff9758/examples/inference_cosmos_transfer1_7b.md#example-2-multimodal-control>`_ for reference on how to use Transfer1 for this usecase. We further recommend the following settings to be used with the Transfer1 model for this task:
.. rubric:: Hyperparameters .. rubric:: Hyperparameters
...@@ -109,16 +109,12 @@ We recommend using the `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmo ...@@ -109,16 +109,12 @@ We recommend using the `Cosmos Transfer1 <https://github.com/nvidia-cosmos/cosmo
* - ``negative_prompt`` * - ``negative_prompt``
- "The video captures a game playing, with bad crappy graphics and cartoonish frames. It represents a recording of old outdated games. The images are very pixelated and of poor CG quality. There are many subtitles in the footage. Overall, the video is unrealistic and appears cg. Plane background." - "The video captures a game playing, with bad crappy graphics and cartoonish frames. It represents a recording of old outdated games. The images are very pixelated and of poor CG quality. There are many subtitles in the footage. Overall, the video is unrealistic and appears cg. Plane background."
* - ``positive_prompt``
- "realistic, photorealistic, high fidelity, varied lighting, varied background"
* - ``sigma_max`` * - ``sigma_max``
- 50 - 50
* - ``control_weight`` * - ``control_weight``
- "0.3,0.3,0.6,0.7" - "0.3,0.3,0.6,0.7"
* - ``hint_key`` * - ``hint_key``
- "blur,canny,depth,segmentation" - "blur,canny,depth,segmentation"
* - ``control_input_preset_strength``
- "low"
Another crucial aspect to get good augmentations is the set of prompts used to control the Cosmos generation. We provide a script, ``cosmos_prompt_gen.py``, to construct prompts from a set of carefully chosen templates that handle various aspects of the augmentation process. Another crucial aspect to get good augmentations is the set of prompts used to control the Cosmos generation. We provide a script, ``cosmos_prompt_gen.py``, to construct prompts from a set of carefully chosen templates that handle various aspects of the augmentation process.
...@@ -158,6 +154,47 @@ In case you want to create your own prompts, we suggest you refer to the followi ...@@ -158,6 +154,47 @@ In case you want to create your own prompts, we suggest you refer to the followi
4. It is vital to include details on key aspects of the input control video(s) that should be retained or left unchanged. In our prompts, we very clearly mention that the cube colors should be left unchanged such that the bottom cube is blue, the middle is red and the top is green. Note that we not only mention what should be left unchanged but also give details on what form that aspect currently has. 4. It is vital to include details on key aspects of the input control video(s) that should be retained or left unchanged. In our prompts, we very clearly mention that the cube colors should be left unchanged such that the bottom cube is blue, the middle is red and the top is green. Note that we not only mention what should be left unchanged but also give details on what form that aspect currently has.
Example command to use the Cosmos Transfer1 model for this usecase:
.. code:: bash
export CUDA_VISIBLE_DEVICES="${CUDA_VISIBLE_DEVICES:=0}"
export CHECKPOINT_DIR="${CHECKPOINT_DIR:=./checkpoints}"
export NUM_GPU="${NUM_GPU:=1}"
PYTHONPATH=$(pwd) torchrun --nproc_per_node=$NUM_GPU --nnodes=1 --node_rank=0 cosmos_transfer1/diffusion/inference/transfer.py \
--checkpoint_dir $CHECKPOINT_DIR \
--video_save_folder outputs/mimic_cosmos \
--controlnet_specs ./controlnet_specs/demo_0.json \
--offload_text_encoder_model \
--offload_guardrail_models \
--num_gpus $NUM_GPU
Example ``./controlnet_specs/demo_0.json`` json file to use with the above command:
.. code:: json
{
"prompt": "A robotic arm is picking up and stacking cubes inside a foggy industrial scrapyard at dawn, surrounded by piles of old robotic parts and twisted metal. The background includes large magnetic cranes, rusted conveyor belts, and flickering yellow floodlights struggling to penetrate the fog. The robot arm is bright teal with a glossy surface and silver stripes on the outer edges; the joints rotate smoothly and the pistons reflect a pale cyan hue. The robot arm is mounted on a table that is light oak wood with a natural grain pattern and a glossy varnish that reflects overhead lights softly; small burn marks dot one corner. The arm is connected to the base mounted on the table. The bottom cube is deep blue, the second cube is bright red, and the top cube is vivid green, maintaining their correct order after stacking. Sunlight pouring in from a large, open window bathes the table and robotic arm in a warm golden light. The shadows are soft, and the scene feels natural and inviting with a slight contrast between light and shadow.",
"negative_prompt": "The video captures a game playing, with bad crappy graphics and cartoonish frames. It represents a recording of old outdated games. The images are very pixelated and of poor CG quality. There are many subtitles in the footage. Overall, the video is unrealistic and appears cg. Plane background.",
"input_video_path" : "mimic_generated_dataset_mp4/demo_0_table_cam.mp4",
"sigma_max": 50,
"vis": {
"input_control": "mimic_generated_dataset_mp4/demo_0_table_cam.mp4",
"control_weight": 0.3
},
"edge": {
"control_weight": 0.3
},
"depth": {
"input_control": "mimic_dataset_1k_mp4/demo_0_table_cam_depth.mp4",
"control_weight": 0.6
},
"seg": {
"input_control": "mimic_dataset_1k_mp4/demo_0_table_cam_shaded_segmentation.mp4",
"control_weight": 0.7
}
}
MP4 to HDF5 Conversion MP4 to HDF5 Conversion
^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
...@@ -346,7 +383,7 @@ Example usage for the cube stacking task: ...@@ -346,7 +383,7 @@ Example usage for the cube stacking task:
--enable_cameras \ --enable_cameras \
--seeds 0 \ --seeds 0 \
--num_rollouts 15 \ --num_rollouts 15 \
--rendering_mode performance --rendering_mode balanced
We use the above script to compare models trained with 1000 Mimic-generated demonstrations, 2000 Mimic-generated demonstrations and 2000 Cosmos-Mimic-generated demonstrations (1000 original mimic + 1000 Cosmos augmented) respectively. We use the same seeds (0, 1000 and 5000) for all three models and provide the metrics (averaged across best checkpoints for each seed) below: We use the above script to compare models trained with 1000 Mimic-generated demonstrations, 2000 Mimic-generated demonstrations and 2000 Cosmos-Mimic-generated demonstrations (1000 original mimic + 1000 Cosmos augmented) respectively. We use the same seeds (0, 1000 and 5000) for all three models and provide the metrics (averaged across best checkpoints for each seed) below:
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment