Updates Mimic APIs/configs/docs for future dexmimic compatibility (#216)

# Description Doc and config changes from @karsten-nvidia: - Add additional details on custom environments to mimic docs. - Update comments in mimic configuration to make it easier telling apart what's important. - Some minor cleanups in existing docs. - Add "common pitfalls" section to docs to guide users how to get successful data generation/training Mimic API and config changes to support forward dexmimic compatibility: - Use dictionaries of subtasks in mimic env config; keys are eef_names - Mimic Env APIs now use dictionary of eef_names to enable mulit-eef support in future - Data generation code updated accordingly to use new Mimic env APIs ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: peterd-NV <peterd@nvidia.com> Signed-off-by: Kelly Guo <kellyg@nvidia.com> Signed-off-by: Kelly Guo <kellyguo123@hotmail.com> Signed-off-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Co-authored-by: CY Chen <cyc@nvidia.com> Co-authored-by: oahmednv <oahmed@Nvidia.com> Co-authored-by: Toni-SM <aserranomuno@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyguo123@hotmail.com> Co-authored-by: rwiltz <165190220+rwiltz@users.noreply.github.com> Co-authored-by: nv-cupright <92540563+nv-cupright@users.noreply.github.com> Co-authored-by: Alexander Poddubny <143108850+nv-apoddubny@users.noreply.github.com> Co-authored-by: chengronglai <chengrongl@nvidia.com> Co-authored-by: David Hoeller <dhoeller@nvidia.com> Co-authored-by: matthewtrepte <mtrepte@nvidia.com> Co-authored-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Co-authored-by: Karsten Patzwaldt <kpatzwaldt@nvidia.com>

Updates Mimic APIs/configs/docs for future dexmimic compatibility (#216)
# Description Doc and config changes from @karsten-nvidia: - Add additional details on custom environments to mimic docs. - Update comments in mimic configuration to make it easier telling apart what's important. - Some minor cleanups in existing docs. - Add "common pitfalls" section to docs to guide users how to get successful data generation/training Mimic API and config changes to support forward dexmimic compatibility: - Use dictionaries of subtasks in mimic env config; keys are eef_names - Mimic Env APIs now use dictionary of eef_names to enable mulit-eef support in future - Data generation code updated accordingly to use new Mimic env APIs ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: peterd-NV <peterd@nvidia.com> Signed-off-by: Kelly Guo <kellyg@nvidia.com> Signed-off-by: Kelly Guo <kellyguo123@hotmail.com> Signed-off-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Co-authored-by: CY Chen <cyc@nvidia.com> Co-authored-by: oahmednv <oahmed@Nvidia.com> Co-authored-by: Toni-SM <aserranomuno@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyguo123@hotmail.com> Co-authored-by: rwiltz <165190220+rwiltz@users.noreply.github.com> Co-authored-by: nv-cupright <92540563+nv-cupright@users.noreply.github.com> Co-authored-by: Alexander Poddubny <143108850+nv-apoddubny@users.noreply.github.com> Co-authored-by: chengronglai <chengrongl@nvidia.com> Co-authored-by: David Hoeller <dhoeller@nvidia.com> Co-authored-by: matthewtrepte <mtrepte@nvidia.com> Co-authored-by: Ashwin Varghese Kuruttukulam <123109010+ashwinvkNV@users.noreply.github.com> Co-authored-by: Karsten Patzwaldt <kpatzwaldt@nvidia.com>
8a58a23c · peterd-NV · Kelly Guo · 31f4e9cd · 8a58a23c · 8a58a23c
Commit 8a58a23c authored Jan 14, 2025 by peterd-NV Committed by Kelly Guo Jan 30, 2025
11 changed files
--- a/docs/source/api/lab_mimic/isaaclab_mimic.datagen.rst
+++ b/docs/source/api/lab_mimic/isaaclab_mimic.datagen.rst
@@ -39,13 +39,6 @@ Datagen Info Pool
  :members:
  :inherited-members:
-Selection Strategy
------------------
-.. autoclass:: SelectionStrategy
-  :members:
-  :inherited-members:
 Random Strategy
 ---------------

--- a/docs/source/overview/teleop_imitation.rst
+++ b/docs/source/overview/teleop_imitation.rst
@@ -68,120 +68,213 @@ Imitation Learning
 Using the teleoperation devices, it is also possible to collect data for
 learning from demonstrations (LfD). For this, we provide scripts to collect data into the open HDF5 format.
-.. note::
+Collecting demonstrations
+^^^^^^^^^^^^^^^^^^^^^^^^^
-  This tutorial assumes you have a ``datasets`` directory under the ``IsaacLab`` repo. Create this directory by running ``cd IsaacLab`` and ``mkdir datasets``.
-1. Collect demonstrations with teleoperation for the environment
+To collect demonstrations with teleoperation for the environment ``Isaac-Stack-Cube-Franka-IK-Rel-v0``, use the following commands:
-   ``Isaac-Stack-Cube-Franka-IK-Rel-v0``:
-   .. code:: bash
+.. code:: bash
-      # step a: collect data with a selected teleoperation device. Replace <teleop_device> with your preferred input device.
+   # step a: create folder for datasets
+   mkdir -p datasets
+   # step b: collect data with a selected teleoperation device. Replace <teleop_device> with your preferred input device.
   # Available options: spacemouse, keyboard
   ./isaaclab.sh -p scripts/tools/record_demos.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --teleop_device <teleop_device> --dataset_file ./datasets/dataset.hdf5 --num_demos 10
-      # step b: replay the collected dataset
+   # step a: replay the collected dataset
   ./isaaclab.sh -p scripts/tools/replay_demos.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --dataset_file ./datasets/dataset.hdf5
-   .. note::
+.. note::
   The order of the stacked cubes should be blue (bottom), red (middle), green (top).
-   About 10 successful demonstrations are required in order for the following steps to succeed.
+About 10 successful demonstrations are required in order for the following steps to succeed.
+Here are some tips to perform demonstrations that lead to successful policy training:
+* Keep demonstrations short. Shorter demonstrations mean fewer decisions for the policy, making training easier.
+* Take a direct path. Do not follow along arbitrary axis, but move straight toward the goal.
+* Do not pause. Perform smooth, continuous motions instead. It is not obvious for a policy why and when to pause, hence continuous motions are easier to learn.
+If, while performing a demonstration, a mistake is made, or the current demonstration should not be recorded for some other reason, press the ``R`` key to discard the current demonstration, and reset to a new starting position.
-   Here are some tips to perform demonstrations that lead to successful policy training:
+.. note::
+   Non-determinism may be observed during replay as physics in IsaacLab are not determimnistically reproducible when using ``env.reset``.
-   * Keep demonstrations short. Shorter demonstrations mean fewer decisions for the policy, making training easier.
-   * Take a direct path. Do not follow along arbitrary axis, but move straight toward the goal.
-   * Do not pause. Perform smooth, continuous motions instead. It is not obvious for a policy why and when to pause, hence continuous motions are easier to learn.
-   If, while performing a demonstration, a mistake is made, or the current demonstration should not be recorded for some other reason, press the ``R`` key to discard the current demonstration, and reset to a new starting position.
+Generating additional demonstrations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-2. Generate additional demonstrations using Isaac Lab Mimic
+Additional demonstrations can be generated using Isaac Lab Mimic.
-   Isaac Lab Mimic is a feature in Isaac Lab that allows to generate additional demonstrations automatically, allowing a policy to learn successfully even from just a handful of manual demonstrations.
+Isaac Lab Mimic is a feature in Isaac Lab that allows generation of additional demonstrations automatically, allowing a policy to learn successfully even from just a handful of manual demonstrations.
-   In order to use Isaac Lab Mimic with the recorded dataset, first annotate the subtasks in the recording:
+In order to use Isaac Lab Mimic with the recorded dataset, first annotate the subtasks in the recording:
-   .. code:: bash
+.. code:: bash
   ./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/annotate_demos.py --input_file ./datasets/dataset.hdf5 --output_file ./datasets/annotated_dataset.hdf5 --task Isaac-Stack-Cube-Franka-IK-Rel-Mimic-v0 --auto
-   Then, use Isaac Lab Mimic to generate some additional demonstrations:
+Then, use Isaac Lab Mimic to generate some additional demonstrations:
-   .. code:: bash
+.. code:: bash
   ./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py --input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/generated_dataset_small.hdf5 --num_envs 10 --generation_num_trials 10
-   .. note::
+.. note::
   The output_file of the ``annotate_demos.py`` script is the input_file to the ``generate_dataset.py`` script
-   .. note::
+.. note::
   Isaac Lab is designed to work with manipulators with grippers. The gripper commands in the demonstrations are extracted separately and temporally replayed during the generation of additional demonstrations.
-   Inspect the output of generated data (filename: ``generated_dataset_small.hdf5``), and if satisfactory, generate the full dataset:
+Inspect the output of generated data (filename: ``generated_dataset_small.hdf5``), and if satisfactory, generate the full dataset:
-   .. code:: bash
+.. code:: bash
   ./isaaclab.sh -p scripts/imitation_learning/isaaclab_mimic/generate_dataset.py --input_file ./datasets/annotated_dataset.hdf5 --output_file ./datasets/generated_dataset.hdf5 --num_envs 10 --generation_num_trials 1000 --headless
-   The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task.
+The number of demonstrations can be increased or decreased, 1000 demonstrations have been shown to provide good training results for this task.
+Additionally, the number of environments in the ``--num_envs`` parameter can be adjusted to speed up data generation. The suggested number of 10 can be executed even on a laptop GPU. On a more powerful desktop machine, set it to 100 or higher for significant speedup of this step.
-   Additionally, the number of environments in the ``--num_envs`` parameter can be adjusted to speed up data generation. The suggested number of 10 can be executed even on a laptop GPU. On a more powerful desktop machine, set it to 100 or higher for significant speedup of this step.
+Robomimic setup
+^^^^^^^^^^^^^^^
-3. Setup robomimic for training a policy
+As an example, we will train a BC agent implemented in `Robomimic <https://robomimic.github.io/>`__ to train a policy. Any other framework or training method could be used.
-   As an example, we will train a BC agent implemented in `Robomimic <https://robomimic.github.io/>`__ to train a policy. Any other framework or training method could be used.
+To install the robomimic framework, use the following commands:
-   .. code:: bash
+.. code:: bash
   # install the dependencies
   sudo apt install cmake build-essential
   # install python module (for robomimic)
   ./isaaclab.sh -i robomimic
-4. Train a BC agent for ``Isaac-Stack-Cube-Franka-IK-Rel-v0`` using the Mimic generated data:
+Training an agent
+^^^^^^^^^^^^^^^^^
+We can now train a BC agent for ``Isaac-Stack-Cube-Franka-IK-Rel-v0`` using the Mimic generated data:
-   .. code:: bash
+.. code:: bash
   ./isaaclab.sh -p scripts/imitation_learning/robomimic/train.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --algo bc --dataset ./datasets/generated_dataset.hdf5
-   By default, the training script will save a model checkpoint every 100 epochs. The trained models and logs will be saved to logs/robomimic/Isaac-Stack-Cube-Franka-IK-Rel-v0/bc
+By default, the training script will save a model checkpoint every 100 epochs. The trained models and logs will be saved to logs/robomimic/Isaac-Stack-Cube-Franka-IK-Rel-v0/bc
+Visualizing results
+^^^^^^^^^^^^^^^^^^^
+By inferencing using the generated model, we can visualize the results of the policy in the same environment:
+.. code:: bash
+   ./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --num_rollouts 50 --checkpoint /PATH/TO/desired_model_checkpoint.pth
+Common Pitfalls when Generating Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+**Demonstrations are too long:**
+* Longer time horizon is harder to learn for a policy
+* Start close to the first object and minimize motions
+**Demonstrations are not smooth:**
+* Irregular motion is hard for policy to decipher
+* Better teleop devices result in better data (i.e. SpaceMouse is better than Keyboard)
-5. Play the learned model to visualize results:
+**Pauses in demonstrations:**
-   .. code:: bash
+* Pauses are difficult to learn
+* Keep the human motions smooth and fluid
+**Excessive number of subtasks:**
+* Minimize the number of defined subtasks for completing a given task
+* Less subtacks results in less stitching of trajectories, yielding higher data generation success rate
+**Lack of action noise:**
+* Action noise makes policies more robust
+**Recording cropped too tight:**
+* If recording stops on the frame the success term triggers, it may not re-trigger during replay
+* Allow for some buffer at the end of recording
+**Non-deterministic replay:**
+* Physics in IsaacLab are not deterministically reproducible when using ``env.reset`` so demonstrations may fail on replay
+* Collect more human demos than needed, use the ones that succeed during annotation
+* All data in Isaac Lab Mimic generated HDF5 file represent a successful demo and can be used for training (even if non-determinism causes failure when replayed)
-      ./isaaclab.sh -p scripts/imitation_learning/robomimic/play.py --task Isaac-Stack-Cube-Franka-IK-Rel-v0 --checkpoint /PATH/TO/desired_model_checkpoint.pth
 Creating Your Own Isaac Lab Mimic Compatible Environments
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-In order to use Isaac Lab Mimic to generate additional demonstrations automatically with an existing Isaac Lab environment, the environment
+How it works
-needs to be made "Mimic compatible" by implementing additional functions which are used during data generation.
+^^^^^^^^^^^^
+Isaac Lab Mimic works by splitting the input demonstrations into subtasks. Subtasks are user-defined segments in the demonstrations that are common to all demonstrations. Examples for subtasks are "grasp an object", "move end effector to some pre-defined position", "release object" etc.. Note that most subtasks are defined with respect to some object that the robot interacts with.
+Subtasks need to be defined, and then annotated for each input demonstration. Annotation can either happen algorithmically by defining heuristics for subtask detection, as was done in the example above, or it can be done manually.
+With subtasks defined and annotated, Isaac Lab Mimic utilizes a small number of helper methods to then transform the subtask segments, and generate new demonstrations by stitching them together to match the new task at hand.
+For each thusly generated candidate demonstration, Isaac Lab Mimic uses a boolean success criteria to determine whether the demonstration succeeded in performing the task, and if so, add it to the output dataset. Success rate of candidate demonstrations can be as high as 70% in simple cases, and as low as <1%, depending on the difficulty of the task, and the complexity of the robot itself.
+Configuration and subtask definition
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Subtasks, among other configuration settings for Isaac Lab Mimic, are defined in a Mimic compatible environment configuration class that is created by extending the existing environment config with additional Mimic required parameters.
-Mimic compatible environments are derived from the :class:`~isaaclab.envs.ManagerBasedRLMimicEnv` base class and must implement the following functions:
+All Mimic required config parameters are specified in the :class:`~isaaclab.envs.MimicEnvCfg` class.
+The config class :class:`~isaaclab_mimic.envs.FrankaCubeStackIKRelMimicEnvCfg` serves as an example of creating a Mimic compatible environment config class for the Franka stacking task that was used in the examples above.
+The ``DataGenConfig`` member contains various parameters that influence how data is generated. It is initially sufficient to just set the ``name`` parameter, and revise the rest later.
+Subtasks are a list of ``SubTaskConfig`` objects, of which the most important members are:
+* ``object_ref`` is the object that is being interacted with. This will be used to adjust motions relative to this object during data generation. Can be ``None`` if the current subtask does not involve any object.
+* ``subtask_term_signal`` is the ID of the signal indicating whether the subtask is active or not.
+Subtask annotation
+^^^^^^^^^^^^^^^^^^
+Once the subtasks are defined, they need to be annotated in the source data. There are two methods to annotate source demonstrations for subtask boundaries: Manual annotation or using heuristics.
+It is often easiest to perform manual annotations, since the number of input demonstrations is usually very small. To perform manual annotations, use the ``annotate_demos.py`` script without the ``--auto`` flag. Then press ``B`` to pause, ``N`` to continue, and ``S`` to annotate a subtask boundary.
+For more accurate boundaries, or to speed up repeated processing of a given task for experiments, heuristics can be implemented to perform the same task. Heuristics are observations in the environment. An example how to add subtask terms can be found in ``source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/stack/stack_env_cfg.py``, where they are added as an observation group called ``SubtaskCfg``. This example is using prebuilt heuristics, but custom heuristics are easily implemented.
+Helpers for demonstration generation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Helpers needed for Isaac Lab Mimic are defined in the environment. All tasks that are to be used with Isaac Lab Mimic are derived from the :class:`~isaaclab.envs.ManagerBasedRLMimicEnv` base class, and must implement the following functions:
 * ``get_robot_eef_pose``: Returns the current robot end effector pose in the same frame as used by the robot end effector controller.
-* ``target_eef_pose_to_action``: Takes a target pose for the end effector controller and returns an action which achieves the target pose.
+* ``target_eef_pose_to_action``: Takes a target pose and a gripper action for the end effector controller and returns an action which achieves the target pose.
-* ``action_to_target_eef_pos``: Takes an action and returns a target pose for the end effector controller.
+* ``action_to_target_eef_pose``: Takes an action and returns a target pose for the end effector controller.
-* ``action_to_gripper_action``: Takes an action and returns the gripper actuation part of the action.
+* ``actions_to_gripper_actions``: Takes a sequence of actions and returns the gripper actuation part of the actions.
 * ``get_object_poses``: Returns the pose of each object in the scene that is used for data generation.
-* ``get_subtask_term_signals``: Returns a dictionary of binary flags for each subtask in a task. The flag of 1 is set when the subtask has been completed and 0 otherwise.
+* ``get_subtask_term_signals``: Returns a dictionary of binary flags for each subtask in a task. The flag of true is set when the subtask has been completed and false otherwise.
 The class :class:`~isaaclab_mimic.envs.FrankaCubeStackIKRelMimicEnv` shows an example of creating a Mimic compatible environment from an existing Isaac Lab environment.
-A Mimic compatible environment config class must also be created by extending the existing environment config with additional Mimic required parameters.
+Registering the environment
-All Mimic required config parameters are specified in the :class:`~isaaclab.envs.MimicEnvCfg` class.
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The config class :class:`~isaaclab_mimic.envs.FrankaCubeStackIKRelMimicEnvCfg` shows an example of creating a Mimic compatible environment config class for the Franka stacking task.
+Once both Mimic compatible environment and environment config classes have been created, a new Mimic compatible environment can be registered using ``gym.register``. For the Franka stacking task in the examples above, the Mimic environment is registered as ``Isaac-Stack-Cube-Franka-IK-Rel-Mimic-v0``.
-Once both Mimic compatible environment and environment config classes have been created, a new Mimic compatible environment can be registered using ``gym.register`` and used
+The registered environment is now ready to be used with Isaac Lab Mimic.
-with Isaac Lab Mimic data generation. For the Franka stacking task in the examples above, the Mimic environment is registered as ``Isaac-Stack-Cube-Franka-IK-Rel-Mimic-v0``.
--- a/scripts/imitation_learning/isaaclab_mimic/annotate_demos.py
+++ b/scripts/imitation_learning/isaaclab_mimic/annotate_demos.py
@@ -55,6 +55,7 @@ import isaaclab_mimic.envs  # noqa: F401
 # Only enables inputs if this script is NOT headless mode
 if not args_cli.headless and not os.environ.get("HEADLESS", 0):
    from isaaclab.devices import Se3Keyboard
+from isaaclab.envs import ManagerBasedRLMimicEnv
 from isaaclab.envs.mdp.recorders.recorders_cfg import ActionStateRecorderManagerCfg
 from isaaclab.managers import RecorderTerm, RecorderTermCfg
 from isaaclab.utils import configclass
@@ -88,11 +89,16 @@ class PreStepDatagenInfoRecorder(RecorderTerm):
    """Recorder term that records the datagen info data in each step."""
    def record_pre_step(self):
+        eef_pose_dict = {}
+        for eef_name in self._env.cfg.subtask_configs.keys():
+            eef_pose_dict[eef_name] = self._env.get_robot_eef_pose(eef_name)
        datagen_info = {
-            "object_pose": self._env.scene.get_state(is_relative=True)["rigid_object"],
+            "object_pose": self._env.get_object_poses(),
-            "target_eef_pose": self._env.action_to_target_eef_pos(self._env.action_manager.action),
+            "eef_pose": eef_pose_dict,
+            "target_eef_pose": self._env.action_to_target_eef_pose(self._env.action_manager.action),
        }
-        return "obs", datagen_info
+        return "obs/datagen_info", datagen_info
 @configclass
@@ -106,7 +112,7 @@ class PreStepSubtaskTermsObservationsRecorder(RecorderTerm):
    """Recorder term that records the subtask completion observations in each step."""
    def record_pre_step(self):
-        return "obs/subtask_term_signals", self._env.obs_buf["subtask_terms"]
+        return "obs/datagen_info/subtask_term_signals", self._env.get_subtask_term_signals()
 @configclass
@@ -164,12 +170,26 @@ def main():
    # Set up recorder terms for mimic annotations
    env_cfg.env_name = args_cli.task
    env_cfg.recorders: MimicRecorderManagerCfg = MimicRecorderManagerCfg()
+    if not args_cli.auto:
+        # disable subtask term signals recorder term if in manual mode
+        env_cfg.recorders.record_pre_step_subtask_term_signals = None
    env_cfg.recorders.dataset_export_dir_path = output_dir
    env_cfg.recorders.dataset_filename = output_file_name
    # create environment from loaded config
    env = gym.make(args_cli.task, cfg=env_cfg)
+    if not isinstance(env.unwrapped, ManagerBasedRLMimicEnv):
+        raise ValueError("The environment should be derived from ManagerBasedRLMimicEnv")
+    if args_cli.auto:
+        # check if the mimic API env.unwrapped.get_subtask_term_signals() is implemented
+        if env.unwrapped.get_subtask_term_signals.__func__ is ManagerBasedRLMimicEnv.get_subtask_term_signals:
+            raise NotImplementedError(
+                "The environment does not implement the get_subtask_term_signals method required "
+                "to run automatic annotations."
+            )
    # reset environment
    env.reset()
@@ -219,13 +239,12 @@ def main():
                            f"                          to number of subtasks {len(args_cli.signals)}"
                        )
                    annotated_episode = env.unwrapped.recorder_manager.get_episode(0)
-                    del annotated_episode.data["obs"]["subtask_term_signals"]
                    for subtask_index in range(len(args_cli.signals)):
                        # subtask termination signal is false until subtask is complete, and true afterwards
                        subtask_signals = torch.ones(len(actions), dtype=torch.bool)
                        subtask_signals[: subtask_indices[subtask_index]] = False
                        annotated_episode.add(
-                            f"obs/subtask_term_signals/{args_cli.signals[subtask_index]}", subtask_signals
+                            f"obs/datagen_info/subtask_term_signals/{args_cli.signals[subtask_index]}", subtask_signals
                        )
                # set success to the recorded episode data and export to file

--- a/scripts/imitation_learning/isaaclab_mimic/consolidated_demo.py
+++ b/scripts/imitation_learning/isaaclab_mimic/consolidated_demo.py
@@ -85,6 +85,7 @@ from isaaclab_mimic.datagen.data_generator import DataGenerator
 from isaaclab_mimic.datagen.datagen_info_pool import DataGenInfoPool
 from isaaclab.devices import Se3Keyboard, Se3SpaceMouse
+from isaaclab.envs import ManagerBasedRLMimicEnv
 from isaaclab.envs.mdp.recorders.recorders_cfg import ActionStateRecorderManagerCfg
 from isaaclab.managers import DatasetExportMode, RecorderTerm, RecorderTermCfg
 from isaaclab.utils import configclass
@@ -104,11 +105,16 @@ class PreStepDatagenInfoRecorder(RecorderTerm):
    """Recorder term that records the datagen info data in each step."""
    def record_pre_step(self):
+        eef_pose_dict = {}
+        for eef_name in self._env.cfg.subtask_configs.keys():
+            eef_pose_dict[eef_name] = self._env.get_robot_eef_pose(eef_name)
        datagen_info = {
-            "object_pose": self._env.scene.get_state(is_relative=True)["rigid_object"],
+            "object_pose": self._env.get_object_poses(),
-            "target_eef_pose": self._env.action_to_target_eef_pos(self._env.action_manager.action),
+            "eef_pose": eef_pose_dict,
+            "target_eef_pose": self._env.action_to_target_eef_pose(self._env.action_manager.action),
        }
-        return "obs", datagen_info
+        return "obs/datagen_info", datagen_info
 @configclass
@@ -122,7 +128,7 @@ class PreStepSubtaskTermsObservationsRecorder(RecorderTerm):
    """Recorder term that records the subtask completion observations in each step."""
    def record_pre_step(self):
-        return "obs/subtask_term_signals", self._env.obs_buf["subtask_terms"]
+        return "obs/datagen_info/subtask_term_signals", self._env.get_subtask_term_signals()
 @configclass
@@ -397,6 +403,15 @@ def main():
    # create environment
    env = gym.make(env_name, cfg=env_cfg)
+    if not isinstance(env.unwrapped, ManagerBasedRLMimicEnv):
+        raise ValueError("The environment should be derived from ManagerBasedRLMimicEnv")
+    # check if the mimic API env.unwrapped.get_subtask_term_signals() is implemented
+    if env.unwrapped.get_subtask_term_signals.__func__ is ManagerBasedRLMimicEnv.get_subtask_term_signals:
+        raise NotImplementedError(
+            "The environment does not implement the get_subtask_term_signals method required to run this script."
+        )
    # set seed for generation
    random.seed(env.unwrapped.cfg.datagen_config.seed)
    np.random.seed(env.unwrapped.cfg.datagen_config.seed)

--- a/source/isaaclab/isaaclab/envs/manager_based_rl_mimic_env.py
+++ b/source/isaaclab/isaaclab/envs/manager_based_rl_mimic_env.py
@@ -3,6 +3,9 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause
+import torch
+from collections.abc import Sequence
 import isaaclab.utils.math as PoseUtils
 from isaaclab.envs import ManagerBasedRLEnv
@@ -30,79 +33,98 @@ class ManagerBasedRLMimicEnv(ManagerBasedRLEnv):
        - Dataset Versatility: The synthetic data retains a quality that compares favorably with additional human demos.
    """
-    def get_robot_eef_pose(self, env_ind=0):
+    def get_robot_eef_pose(self, eef_name: str, env_ids: Sequence[int] | None = None) -> torch.Tensor:
        """
        Get current robot end effector pose. Should be the same frame as used by the robot end-effector controller.
+        Args:
+            eef_name: Name of the end effector.
+            env_ids: Environment indices to get the pose for. If None, all envs are considered.
        Returns:
-            pose (torch.Tensor): 4x4 eef pose matrix
+            A torch.Tensor eef pose matrix. Shape is (len(env_ids), 4, 4)
        """
        raise NotImplementedError
-    def target_eef_pose_to_action(self, target_eef_pose, relative=True, env_ind=0):
+    def target_eef_pose_to_action(
+        self, target_eef_pose_dict: dict, gripper_action_dict: dict, noise: float | None = None, env_id: int = 0
+    ) -> torch.Tensor:
        """
-        Takes a target pose for the end effector controller and returns an action
+        Takes a target pose and gripper action for the end effector controller and returns an action
-        to try and achieve that target pose.
+        (usually a normalized delta pose action) to try and achieve that target pose.
+        Noise is added to the target pose action if specified.
        Args:
-            target_eef_pose (torch.Tensor): 4x4 target eef pose
+            target_eef_pose_dict: Dictionary of 4x4 target eef pose for each end-effector.
-            relative (bool): if True, use relative pose actions, else absolute pose actions
+            gripper_action_dict: Dictionary of gripper actions for each end-effector.
+            noise: Noise to add to the action. If None, no noise is added.
+            env_id: Environment index to compute the action for.
        Returns:
-            action (torch.Tensor): action compatible with env.step (minus gripper actuation)
+            An action torch.Tensor that's compatible with env.step().
        """
        raise NotImplementedError
-    def action_to_target_eef_pos(self, action, relative=True, env_ind=0):
+    def action_to_target_eef_pose(self, action: torch.Tensor) -> dict[str, torch.Tensor]:
        """
        Converts action (compatible with env.step) to a target pose for the end effector controller.
        Inverse of @target_eef_pose_to_action. Usually used to infer a sequence of target controller poses
        from a demonstration trajectory using the recorded actions.
        Args:
-            action (torch.Tensor): environment action
+            action: Environment action. Shape is (num_envs, action_dim).
-            relative (bool): if True, use relative pose actions, else absolute pose actions
        Returns:
-            target_eef_pose (torch.Tensor): 4x4 target eef pose that @action corresponds to
+            A dictionary of eef pose torch.Tensor that @action corresponds to.
        """
        raise NotImplementedError
-    def action_to_gripper_action(self, action):
+    def actions_to_gripper_actions(self, actions: torch.Tensor) -> dict[str, torch.Tensor]:
        """
-        Extracts the gripper actuation part of an action (compatible with env.step).
+        Extracts the gripper actuation part from a sequence of env actions (compatible with env.step).
        Args:
-            action (torch.Tensor): environment action
+            actions: environment actions. The shape is (num_envs, num steps in a demo, action_dim).
        Returns:
-            gripper_action (torch.Tensor): subset of environment action for gripper actuation
+            A dictionary of torch.Tensor gripper actions. Key to each dict is an eef_name.
        """
        raise NotImplementedError
-    def get_object_poses(self, env_ind=0):
+    def get_object_poses(self, env_ids: Sequence[int] | None = None):
        """
        Gets the pose of each object relevant to Isaac Lab Mimic data generation in the current scene.
+        Args:
+            env_ids: Environment indices to get the pose for. If None, all envs are considered.
        Returns:
-            object_poses (dict): dictionary that maps object name (str) to object pose matrix (4x4 torch.Tensor)
+            A dictionary that maps object names to object pose matrix (4x4 torch.Tensor)
        """
+        if env_ids is None:
+            env_ids = slice(None)
        rigid_object_states = self.scene.get_state(is_relative=True)["rigid_object"]
        object_pose_matrix = dict()
        for obj_name, obj_state in rigid_object_states.items():
            object_pose_matrix[obj_name] = PoseUtils.make_pose(
-                obj_state["root_pose"][env_ind, :3], PoseUtils.matrix_from_quat(obj_state["root_pose"][env_ind, 3:7])
+                obj_state["root_pose"][env_ids, :3], PoseUtils.matrix_from_quat(obj_state["root_pose"][env_ids, 3:7])
            )
        return object_pose_matrix
-    def get_subtask_term_signals(self, env_ind=0):
+    def get_subtask_term_signals(self, env_ids: Sequence[int] | None = None) -> dict[str, torch.Tensor]:
        """
-        Gets a dictionary of binary flags for each subtask in a task. The flag is 1
+        Gets a dictionary of termination signal flags for each subtask in a task. The flag is 1
-        when the subtask has been completed and 0 otherwise. Isaac Lab Mimic only uses this
+        when the subtask has been completed and 0 otherwise. The implementation of this method is
-        when parsing source demonstrations at the start of data generation, and it only
+        required if intending to enable automatic subtask term signal annotation when running the
-        uses the first 0 -> 1 transition in this signal to detect the end of a subtask.
+        dataset annotation tool. This method can be kept unimplemented if intending to use manual
+        subtask term signal annotation.
+        Args:
+            env_ids: Environment indices to get the termination signals for. If None, all envs are considered.
        Returns:
-            subtask_term_signals (dict): dictionary that maps subtask name to termination flag (0 or 1)
+            A dictionary termination signal flags (False or True) for each subtask.
        """
        raise NotImplementedError

--- a/source/isaaclab/isaaclab/envs/mimic_env_cfg.py
+++ b/source/isaaclab/isaaclab/envs/mimic_env_cfg.py
@@ -17,43 +17,135 @@ from isaaclab.utils import configclass
 class DataGenConfig:
    """Configuration settings for data generation processes within the Isaac Lab Mimic environment."""
-    name: str = "demo"  # The name of the datageneration, default is "demo"
+    # The name of the datageneration, default is "demo"
-    source_dataset_path: str = None  # Path to the source dataset for mimic generation
+    name: str = "demo"
-    generation_path: str = None  # Path where the generated data will be saved
-    generation_guarantee: bool = False  # Whether to guarantee generation of data (e.g., retry until successful)
+    # If set to True, generation will be retried until
-    generation_keep_failed: bool = True  # Whether to keep failed generation trials
+    # generation_num_trials successful demos have been generated.
-    generation_num_trials: int = 10  # Number of trial to be generated
+    # If set to False, generation will stop after generation_num_trails,
-    generation_select_src_per_subtask: bool = False  # Whether to select source data per subtask
+    # independent of whether they were all successful or not.
-    generation_transform_first_robot_pose: bool = False  # Whether to transform the first robot pose during generation
+    generation_guarantee: bool = True
-    generation_interpolate_from_last_target_pose: bool = True  # Whether to interpolate from last target pose
-    task_name: str = None  # Name of the task being configured
+    ##############################################################
-    max_num_failures: int = 50  # Maximum number of failures allowed before stopping generation
+    # Debugging parameters, which can help determining low success
-    num_demo_to_render: int = 50  # Number of demonstrations to render
+    # rates.
-    num_fail_demo_to_render: int = 50  # Number of failed demonstrations to render
-    seed: int = 1  # Seed for randomization to ensure reproducibility
+    # Whether to keep failed generation trials. Keeping failed
+    # demonstrations is useful for visualizing and debugging low
+    # success rates.
+    generation_keep_failed: bool = False
+    # Maximum number of failures allowed before stopping generation
+    max_num_failures: int = 50
+    # Seed for randomization to ensure reproducibility
+    seed: int = 1
+    ##############################################################
+    # The following values can be changed on the command line, and
+    # only serve as defaults.
+    # Path to the source dataset for mimic generation
+    source_dataset_path: str = None
+    # Path where the generated data will be saved
+    generation_path: str = None
+    # Number of trial to be generated
+    generation_num_trials: int = 10
+    # Name of the task being configured
+    task_name: str = None
+    ##############################################################
+    # Advanced configuration, does not usually need to be changed
+    # Whether to select source data per subtask
+    # Note: this requires subtasks to be properly temporally
+    #       constrained, and may require additional subtasks to allow
+    #       for time synchronization.
+    generation_select_src_per_subtask: bool = False
+    # Whether to transform the first robot pose during generation
+    generation_transform_first_robot_pose: bool = False
+    # Whether to interpolate from last target pose
+    generation_interpolate_from_last_target_pose: bool = True
 @configclass
 class SubTaskConfig:
-    """Configuration settings specific to the management of individual subtasks."""
+    """
+    Configuration settings specific to the management of individual
+    subtasks.
+    """
+    ##############################################################
+    # Mandatory options that should be defined for every subtask
+    # Reference to the object involved in this subtask, None if no
+    # object is involved (this is rarely the case).
+    object_ref: str = None
+    # Signal for subtask termination
+    subtask_term_signal: str = None
+    ##############################################################
+    # Advanced options for tuning the generation results
+    # Strategy on how to select a subtask segment. Can be either
+    # 'random', 'nearest_neighbor_object' or
+    # 'nearest_neighbor_robot_distance'. Details can be found in
+    # source/isaaclab_mimic/isaaclab_mimic/datagen/selection_strategy.py
+    #
+    # Note: for 'nearest_neighbor_object' and
+    #       'nearest_neighbor_robot_distance', the subtask needs to have
+    #       'object_ref' set to a value other than 'None' above. At the
+    #       same time, if 'object_ref' is not 'None', then either of
+    #       those strategies will usually yield higher success rates
+    #       than the default 'random' strategy.
+    selection_strategy: str = "random"
+    # Additional arguments to the selected strategy. See details on
+    # each strategy in
+    # source/isaaclab_mimic/isaaclab_mimic/datagen/selection_strategy.py
+    # Arguments will be passed through to the `select_source_demo`
+    # method.
+    selection_strategy_kwargs: dict = {}
-    object_ref: str = None  # Reference to the object involved in this subtask
+    # Range for start offset of the first subtask
-    subtask_term_signal: str = None  # Signal for subtask termination
+    first_subtask_start_offset_range: tuple = (0, 0)
-    subtask_term_offset_range: tuple = (0, 0)  # Range for offsetting subtask termination
-    selection_strategy: str = None  # Strategy for selecting subtask
+    # Range for offsetting subtask termination
-    selection_strategy_kwargs: dict = {}  # Keyword arguments for the selection strategy
+    subtask_term_offset_range: tuple = (0, 0)
-    action_noise: float = 0.03  # Amplitude of action noise applied
-    num_interpolation_steps: int = 5  # Number of steps for interpolation between waypoints
+    # Amplitude of action noise applied
-    num_fixed_steps: int = 0  # Number of fixed steps for the subtask
+    action_noise: float = 0.03
-    apply_noise_during_interpolation: bool = False  # Whether to apply noise during interpolation
+    # Number of steps for interpolation between waypoints
+    num_interpolation_steps: int = 5
+    # Number of fixed steps for the subtask
+    num_fixed_steps: int = 0
+    # Whether to apply noise during interpolation
+    apply_noise_during_interpolation: bool = False
 @configclass
 class MimicEnvCfg:
-    """Configuration class for the Mimic environment integration.
+    """
+    Configuration class for the Mimic environment integration.
-    This class consolidates various configuration aspects for the Isaac Lab Mimic data generation pipeline.
+    This class consolidates various configuration aspects for the
+    Isaac Lab Mimic data generation pipeline.
    """
-    datagen_config: DataGenConfig = DataGenConfig()  # Configuration for the data generation
+    # Overall configuration for the data generation
-    subtask_configs: list[SubTaskConfig] = []  # List of configurations for each subtask
+    datagen_config: DataGenConfig = DataGenConfig()
+    # Dictionary of list of subtask configurations for each end-effector.
+    # Keys are end-effector names.
+    # Currently, only a single end-effector is supported by Isaac Lab Mimic
+    # so `subtask_configs` must always be of size 1.
+    subtask_configs: dict[str, list[SubTaskConfig]] = {}
--- a/source/isaaclab_mimic/isaaclab_mimic/datagen/data_generator.py
+++ b/source/isaaclab_mimic/isaaclab_mimic/datagen/data_generator.py
@@ -47,9 +47,14 @@ class DataGenerator:
        assert isinstance(self.env_cfg, MimicEnvCfg)
        self.dataset_path = dataset_path
+        if len(self.env_cfg.subtask_configs) != 1:
+            raise ValueError("Data generation currently supports only one end-effector.")
+        (self.eef_name,) = self.env_cfg.subtask_configs.keys()
+        (self.subtask_configs,) = self.env_cfg.subtask_configs.values()
        # sanity check on task spec offset ranges - final subtask should not have any offset randomization
-        assert self.env_cfg.subtask_configs[-1].subtask_term_offset_range[0] == 0
+        assert self.subtask_configs[-1].subtask_term_offset_range[0] == 0
-        assert self.env_cfg.subtask_configs[-1].subtask_term_offset_range[1] == 0
+        assert self.subtask_configs[-1].subtask_term_offset_range[1] == 0
        self.demo_keys = demo_keys
@@ -88,8 +93,8 @@ class DataGenerator:
        # add them to subtask end indices, and then set them as the start indices of next subtask too
        for i in range(src_subtask_indices.shape[1] - 1):
            end_offsets = np.random.randint(
-                low=self.env_cfg.subtask_configs[i].subtask_term_offset_range[0],
+                low=self.subtask_configs[i].subtask_term_offset_range[0],
-                high=self.env_cfg.subtask_configs[i].subtask_term_offset_range[1] + 1,
+                high=self.subtask_configs[i].subtask_term_offset_range[1] + 1,
                size=src_subtask_indices.shape[0],
            )
            src_subtask_indices[:, i, 1] = src_subtask_indices[:, i, 1] + end_offsets
@@ -235,6 +240,8 @@ class DataGenerator:
                src_demo_inds (list): list of selected source demonstration indices for each subtask
                src_demo_labels (np.array): same as @src_demo_inds, but repeated to have a label for each timestep of the trajectory
        """
+        eef_names = list(self.env_cfg.subtask_configs.keys())
+        eef_name = eef_names[0]
        # reset the env to create a new task demo instance
        env_id_tensor = torch.tensor([env_id], dtype=torch.int64, device=self.env.device)
@@ -257,17 +264,17 @@ class DataGenerator:
        )  # like @generated_src_demo_inds, but padded to align with size of @generated_actions
        prev_src_demo_datagen_info_pool_size = 0
-        for subtask_ind in range(len(self.env_cfg.subtask_configs)):
+        for subtask_ind in range(len(self.subtask_configs)):
            # some things only happen on first subtask
            is_first_subtask = subtask_ind == 0
            # name of object for this subtask
-            subtask_object_name = self.env_cfg.subtask_configs[subtask_ind].object_ref
+            subtask_object_name = self.subtask_configs[subtask_ind].object_ref
            # corresponding current object pose
            cur_object_pose = (
-                self.env.get_object_poses(env_ind=env_id)[subtask_object_name]
+                self.env.get_object_poses(env_ids=[env_id])[subtask_object_name][0]
                if (subtask_object_name is not None)
                else None
            )
@@ -288,13 +295,13 @@ class DataGenerator:
                # Run source demo selection or use selected demo from previous iteration
                if need_source_demo_selection:
                    selected_src_demo_ind = self.select_source_demo(
-                        eef_pose=self.env.get_robot_eef_pose(env_ind=env_id),
+                        eef_pose=self.env.get_robot_eef_pose(eef_name, env_ids=[env_id])[0],
                        object_pose=cur_object_pose,
                        subtask_ind=subtask_ind,
                        src_subtask_inds=all_subtask_inds[:, subtask_ind],
                        subtask_object_name=subtask_object_name,
-                        selection_strategy_name=self.env_cfg.subtask_configs[subtask_ind].selection_strategy,
+                        selection_strategy_name=self.subtask_configs[subtask_ind].selection_strategy,
-                        selection_strategy_kwargs=self.env_cfg.subtask_configs[subtask_ind].selection_strategy_kwargs,
+                        selection_strategy_kwargs=self.subtask_configs[subtask_ind].selection_strategy_kwargs,
                    )
                assert selected_src_demo_ind is not None
@@ -356,17 +363,19 @@ class DataGenerator:
            else:
                # Interpolation segment will start from current robot eef pose.
                init_sequence = WaypointSequence.from_poses(
-                    poses=self.env.get_robot_eef_pose(env_ind=env_id)[None],
+                    eef_names=eef_names,
+                    poses=self.env.get_robot_eef_pose(eef_name, env_ids=[env_id])[0][None],
                    gripper_actions=src_subtask_gripper_actions[0:1],
-                    action_noise=self.env_cfg.subtask_configs[subtask_ind].action_noise,
+                    action_noise=self.subtask_configs[subtask_ind].action_noise,
                )
            traj_to_execute.add_waypoint_sequence(init_sequence)
            # Construct trajectory for the transformed segment.
            transformed_seq = WaypointSequence.from_poses(
+                eef_names=eef_names,
                poses=transformed_eef_poses,
                gripper_actions=src_subtask_gripper_actions,
-                action_noise=self.env_cfg.subtask_configs[subtask_ind].action_noise,
+                action_noise=self.subtask_configs[subtask_ind].action_noise,
            )
            transformed_traj = WaypointTrajectory()
            transformed_traj.add_waypoint_sequence(transformed_seq)
@@ -375,11 +384,12 @@ class DataGenerator:
            # Interpolation will happen from the initial pose (@init_sequence) to the first element of @transformed_seq.
            traj_to_execute.merge(
                transformed_traj,
-                num_steps_interp=self.env_cfg.subtask_configs[subtask_ind].num_interpolation_steps,
+                eef_names=eef_names,
-                num_steps_fixed=self.env_cfg.subtask_configs[subtask_ind].num_fixed_steps,
+                num_steps_interp=self.subtask_configs[subtask_ind].num_interpolation_steps,
+                num_steps_fixed=self.subtask_configs[subtask_ind].num_fixed_steps,
                action_noise=(
-                    float(self.env_cfg.subtask_configs[subtask_ind].apply_noise_during_interpolation)
+                    float(self.subtask_configs[subtask_ind].apply_noise_during_interpolation)
-                    * self.env_cfg.subtask_configs[subtask_ind].action_noise
+                    * self.subtask_configs[subtask_ind].action_noise
                ),
            )

--- a/source/isaaclab_mimic/isaaclab_mimic/datagen/datagen_info_pool.py
+++ b/source/isaaclab_mimic/isaaclab_mimic/datagen/datagen_info_pool.py
@@ -36,9 +36,13 @@ class DataGenInfoPool:
        self._asyncio_lock = asyncio_lock
-        self.subtask_term_signals = [subtask_config.subtask_term_signal for subtask_config in env_cfg.subtask_configs]
+        if len(env_cfg.subtask_configs) != 1:
+            raise ValueError("Data generation currently supports only one end-effector.")
+        (subtask_configs,) = env_cfg.subtask_configs.values()
+        self.subtask_term_signals = [subtask_config.subtask_term_signal for subtask_config in subtask_configs]
        self.subtask_term_offset_ranges = [
-            subtask_config.subtask_term_offset_range for subtask_config in env_cfg.subtask_configs
+            subtask_config.subtask_term_offset_range for subtask_config in subtask_configs
        ]
    @property
@@ -82,20 +86,24 @@ class DataGenInfoPool:
            episode (EpisodeData): episode to add
        """
        ep_grp = episode.data
+        eef_name = list(self.env.cfg.subtask_configs.keys())[0]
        # extract datagen info
+        if "datagen_info" in ep_grp["obs"]:
+            eef_pose = ep_grp["obs"]["datagen_info"]["eef_pose"][eef_name]
+            object_poses_dict = ep_grp["obs"]["datagen_info"]["object_pose"]
+            target_eef_pose = ep_grp["obs"]["datagen_info"]["target_eef_pose"][eef_name]
+            subtask_term_signals_dict = ep_grp["obs"]["datagen_info"]["subtask_term_signals"]
+        else:
            # Extract eef poses
            eef_pos = ep_grp["obs"]["eef_pos"]
-        # format (w, x, y, z)
+            eef_quat = ep_grp["obs"]["eef_quat"]  # format (w, x, y, z)
-        eef_quat = ep_grp["obs"]["eef_quat"]
            eef_rot_matrices = PoseUtils.matrix_from_quat(eef_quat)  # shape (N, 3, 3)
            # Create pose matrices for all environments
            eef_pose = PoseUtils.make_pose(eef_pos, eef_rot_matrices)  # shape (N, 4, 4)
+            # Object poses
            object_poses_dict = dict()
-        # TODO: change object_pose key in the dataset to object_state since it is not just the pose
            for object_name, value in ep_grp["obs"]["object_pose"].items():
                # object_pose
                value = value["root_pose"]
@@ -104,19 +112,23 @@ class DataGenInfoPool:
                # Convert to rotation matrices
                object_rot_matrices = PoseUtils.matrix_from_quat(value[:, 3:7])  # shape (N, 3, 3)
                object_rot_positions = value[:, 0:3]  # shape (N, 3)
                object_poses_dict[object_name] = PoseUtils.make_pose(object_rot_positions, object_rot_matrices)
+            # Target eef pose
+            target_eef_pose = ep_grp["obs"]["target_eef_pose"]
+            # Subtask termination signalsS
+            subtask_term_signals_dict = (ep_grp["obs"]["subtask_term_signals"],)
        # Extract gripper actions
-        gripper_actions = self.env.action_to_gripper_action(ep_grp["actions"])
+        gripper_actions = self.env.actions_to_gripper_actions(ep_grp["actions"])[eef_name]
        ep_datagen_info_obj = DatagenInfo(
            eef_pose=eef_pose,
            object_poses=object_poses_dict,
-            subtask_term_signals=ep_grp["obs"]["subtask_term_signals"],
+            subtask_term_signals=subtask_term_signals_dict,
-            target_eef_pose=ep_grp["obs"]["target_eef_pose"],
+            target_eef_pose=target_eef_pose,
            gripper_action=gripper_actions,
        )
        self._datagen_infos.append(ep_datagen_info_obj)

--- a/source/isaaclab_mimic/isaaclab_mimic/datagen/waypoint.py
+++ b/source/isaaclab_mimic/isaaclab_mimic/datagen/waypoint.py
@@ -18,7 +18,7 @@ class Waypoint:
    Represents a single desired 6-DoF waypoint, along with corresponding gripper actuation for this point.
    """
-    def __init__(self, pose, gripper_action, noise=None):
+    def __init__(self, eef_names, pose, gripper_action, noise=None):
        """
        Args:
            pose (torch.Tensor): 4x4 pose target for robot controller
@@ -26,10 +26,10 @@ class Waypoint:
            noise (float or None): action noise amplitude to apply during execution at this timestep
                (for arm actions, not gripper actions)
        """
+        self.eef_names = eef_names
        self.pose = pose
        self.gripper_action = gripper_action
        self.noise = noise
-        assert len(self.gripper_action.shape) == 1
    def __str__(self):
        """String representation of the waypoint."""
@@ -54,7 +54,7 @@ class WaypointSequence:
            self.sequence = deepcopy(sequence)
    @classmethod
-    def from_poses(cls, poses, gripper_actions, action_noise):
+    def from_poses(cls, eef_names, poses, gripper_actions, action_noise):
        """
        Instantiate a WaypointSequence object given a sequence of poses,
        gripper actions, and action noise.
@@ -79,6 +79,7 @@ class WaypointSequence:
        # make WaypointSequence instance
        sequence = [
            Waypoint(
+                eef_names=eef_names,
                pose=poses[t],
                gripper_action=gripper_actions[t],
                noise=action_noise[t, 0],
@@ -201,6 +202,7 @@ class WaypointTrajectory:
    def add_waypoint_sequence_for_target_pose(
        self,
+        eef_names,
        pose,
        gripper_action,
        num_steps,
@@ -252,6 +254,7 @@ class WaypointTrajectory:
        # add waypoint sequence for this set of poses
        sequence = WaypointSequence.from_poses(
+            eef_names=eef_names,
            poses=poses,
            gripper_actions=gripper_actions,
            action_noise=action_noise,
@@ -278,6 +281,7 @@ class WaypointTrajectory:
    def merge(
        self,
        other,
+        eef_names,
        num_steps_interp=None,
        num_steps_fixed=None,
        action_noise=0.0,
@@ -311,6 +315,7 @@ class WaypointTrajectory:
            if need_interp:
                # interpolation segment
                self.add_waypoint_sequence_for_target_pose(
+                    eef_names=eef_names,
                    pose=target_for_interpolation.pose,
                    gripper_action=target_for_interpolation.gripper_action,
                    num_steps=num_steps_interp,
@@ -324,6 +329,7 @@ class WaypointTrajectory:
                # account for the fact that we pop'd the first element of @other in anticipation of an interpolation segment
                num_steps_fixed_to_use = num_steps_fixed if need_interp else (num_steps_fixed + 1)
                self.add_waypoint_sequence_for_target_pose(
+                    eef_names=eef_names,
                    pose=target_for_interpolation.pose,
                    gripper_action=target_for_interpolation.gripper_action,
                    num_steps=num_steps_fixed_to_use,
@@ -382,17 +388,15 @@ class WaypointTrajectory:
                obs = env.obs_buf
                state = env.scene.get_state(is_relative=True)
-                # convert target pose to arm action
+                # convert target pose and gripper action to env action
-                action_pose = env.target_eef_pose_to_action(target_eef_pose=waypoint.pose, env_ind=env_id)
+                target_eef_pose_dict = {waypoint.eef_names[0]: waypoint.pose}
+                gripper_action_dict = {waypoint.eef_names[0]: waypoint.gripper_action}
-                # maybe add noise to action using torch.randn
+                play_action = env.target_eef_pose_to_action(
-                if waypoint.noise is not None:
+                    target_eef_pose_dict=target_eef_pose_dict,
-                    noise = waypoint.noise * torch.randn_like(action_pose)
+                    gripper_action_dict=gripper_action_dict,
-                    action_pose += noise
+                    noise=waypoint.noise,
-                    action_pose = torch.clamp(action_pose, -1.0, 1.0)
+                    env_id=env_id,
+                )
-                # add in gripper action
-                play_action = torch.cat([action_pose, waypoint.gripper_action], dim=0)
                # step environment
                if not isinstance(play_action, torch.Tensor):

--- a/source/isaaclab_mimic/isaaclab_mimic/envs/franka_stack_ik_rel_mimic_env.py
+++ b/source/isaaclab_mimic/isaaclab_mimic/envs/franka_stack_ik_rel_mimic_env.py
@@ -4,6 +4,7 @@
 # SPDX-License-Identifier: Apache-2.0
 import torch
+from collections.abc import Sequence
 import isaaclab.utils.math as PoseUtils
 from isaaclab.envs import ManagerBasedRLMimicEnv
@@ -14,79 +15,92 @@ class FrankaCubeStackIKRelMimicEnv(ManagerBasedRLMimicEnv):
    Isaac Lab Mimic environment wrapper class for Franka Cube Stack IK Rel env.
    """
-    def get_robot_eef_pose(self, env_ind=0):
+    def get_robot_eef_pose(self, eef_name: str, env_ids: Sequence[int] | None = None) -> torch.Tensor:
        """
        Get current robot end effector pose. Should be the same frame as used by the robot end-effector controller.
+        Args:
+            eef_name: Name of the end effector.
+            env_ids: Environment indices to get the pose for. If None, all envs are considered.
        Returns:
-            pose (torch.Tensor): 4x4 eef pose matrix
+            A torch.Tensor eef pose matrix. Shape is (len(env_ids), 4, 4)
        """
+        if env_ids is None:
+            env_ids = slice(None)
        # Retrieve end effector pose from the observation buffer
-        eef_pos = self.obs_buf["policy"]["eef_pos"][env_ind]
+        eef_pos = self.obs_buf["policy"]["eef_pos"][env_ids]
-        eef_quat = self.obs_buf["policy"]["eef_quat"][env_ind]
+        eef_quat = self.obs_buf["policy"]["eef_quat"][env_ids]
        # Quaternion format is w,x,y,z
        return PoseUtils.make_pose(eef_pos, PoseUtils.matrix_from_quat(eef_quat))
-    def target_eef_pose_to_action(self, target_eef_pose, relative=True, env_ind=0):
+    def target_eef_pose_to_action(
+        self, target_eef_pose_dict: dict, gripper_action_dict: dict, noise: float | None = None, env_id: int = 0
+    ) -> torch.Tensor:
        """
-        Takes a target pose for the end effector controller and returns an action
+        Takes a target pose and gripper action for the end effector controller and returns an action
        (usually a normalized delta pose action) to try and achieve that target pose.
+        Noise is added to the target pose action if specified.
        Args:
-            target_eef_pose (torch.Tensor): 4x4 target eef pose
+            target_eef_pose_dict: Dictionary of 4x4 target eef pose for each end-effector.
-            relative (bool): if True, use relative pose actions, else absolute pose actions
+            gripper_action_dict: Dictionary of gripper actions for each end-effector.
+            noise: Noise to add to the action. If None, no noise is added.
+            env_id: Environment index to get the action for.
        Returns:
-            action (torch.Tensor): action compatible with env.step (minus gripper actuation)
+            An action torch.Tensor that's compatible with env.step().
        """
+        eef_name = list(self.cfg.subtask_configs.keys())[0]
        # target position and rotation
+        (target_eef_pose,) = target_eef_pose_dict.values()
        target_pos, target_rot = PoseUtils.unmake_pose(target_eef_pose)
        # current position and rotation
-        curr_pose = self.get_robot_eef_pose(env_ind=env_ind)
+        curr_pose = self.get_robot_eef_pose(eef_name, env_ids=[env_id])[0]
        curr_pos, curr_rot = PoseUtils.unmake_pose(curr_pose)
-        if relative:
        # normalized delta position action
        delta_position = target_pos - curr_pos
-            # delta_position = np.clip(delta_position / max_dpos, -1., 1.)
        # normalized delta rotation action
        delta_rot_mat = target_rot.matmul(curr_rot.transpose(-1, -2))
        delta_quat = PoseUtils.quat_from_matrix(delta_rot_mat)
        delta_rotation = PoseUtils.axis_angle_from_quat(delta_quat)
-            # delta_rotation = np.clip(delta_rotation / max_drot, -1., 1.)
+        # get gripper action for single eef
-            return torch.cat([delta_position, delta_rotation], dim=0)
+        (gripper_action,) = gripper_action_dict.values()
-        else:
-            raise NotImplementedError("Absolute pose actions are not implemented.")
+        # add noise to action
-            return
+        pose_action = torch.cat([delta_position, delta_rotation], dim=0)
+        if noise is not None:
+            noise = noise * torch.randn_like(pose_action)
+            pose_action += noise
+            pose_action = torch.clamp(pose_action, -1.0, 1.0)
+        return torch.cat([pose_action, gripper_action], dim=0)
-    def action_to_target_eef_pos(self, action, relative=True, env_ind=0):
+    def action_to_target_eef_pose(self, action: torch.Tensor) -> dict[str, torch.Tensor]:
        """
        Converts action (compatible with env.step) to a target pose for the end effector controller.
        Inverse of @target_eef_pose_to_action. Usually used to infer a sequence of target controller poses
        from a demonstration trajectory using the recorded actions.
        Args:
-            action (torch.Tensor): environment action
+            action: Environment action. Shape is (num_envs, action_dim)
-            relative (bool): if True, use relative pose actions, else absolute pose actions
        Returns:
-            target_eef_pose (torch.Tensor): 4x4 target eef pose that @action corresponds to
+            A dictionary of eef pose torch.Tensor that @action corresponds to
        """
+        eef_name = list(self.cfg.subtask_configs.keys())[0]
-        target_poses = []
+        delta_position = action[:, :3]
+        delta_rotation = action[:, 3:6]
-        for env_ind in range(self.scene.num_envs):
-            delta_position = action[env_ind][:3]
-            delta_rotation = action[env_ind][3:6]
        # current position and rotation
-            curr_pose = self.get_robot_eef_pose(env_ind=env_ind)
+        curr_pose = self.get_robot_eef_pose(eef_name, env_ids=None)
        curr_pos, curr_rot = PoseUtils.unmake_pose(curr_pose)
        # get pose target
@@ -94,51 +108,54 @@ class FrankaCubeStackIKRelMimicEnv(ManagerBasedRLMimicEnv):
        # Convert delta_rotation to axis angle form
        delta_rotation_angle = torch.linalg.norm(delta_rotation, dim=-1, keepdim=True)
-            # make sure that axis is a unit vector
-            # Check for invalid division
-            if torch.isclose(delta_rotation_angle, torch.tensor([0.0], device=delta_rotation_angle.device)):
-                # Quaternion format is wxyz
-                delta_quat = torch.tensor([1.0, 0.0, 0.0, 0.0], device=delta_rotation_angle.device)
-            else:
        delta_rotation_axis = delta_rotation / delta_rotation_angle
-                delta_quat = PoseUtils.quat_from_angle_axis(delta_rotation_angle, delta_rotation_axis).squeeze(0)
-            delta_rot_mat = PoseUtils.matrix_from_quat(delta_quat)
+        # Handle invalid division for the case when delta_rotation_angle is close to zero
+        is_close_to_zero_angle = torch.isclose(delta_rotation_angle, torch.zeros_like(delta_rotation_angle)).squeeze(1)
+        delta_rotation_axis[is_close_to_zero_angle] = torch.zeros_like(delta_rotation_axis)[is_close_to_zero_angle]
+        delta_quat = PoseUtils.quat_from_angle_axis(delta_rotation_angle.squeeze(1), delta_rotation_axis).squeeze(0)
+        delta_rot_mat = PoseUtils.matrix_from_quat(delta_quat)
        target_rot = torch.matmul(delta_rot_mat, curr_rot)
-            target_pose = PoseUtils.make_pose(target_pos, target_rot).clone()
+        target_poses = PoseUtils.make_pose(target_pos, target_rot).clone()
-            target_poses.append(target_pose)
+        return {eef_name: target_poses}
-        return target_poses
-    def action_to_gripper_action(self, action):
+    def actions_to_gripper_actions(self, actions: torch.Tensor) -> dict[str, torch.Tensor]:
        """
-        Extracts the gripper actuation part of an action (compatible with env.step).
+        Extracts the gripper actuation part from a sequence of env actions (compatible with env.step).
        Args:
-            action (torch.Tensor): environment action of shape N x action_dim. Where N is number of steps in a demo
+            actions: environment actions. The shape is (num_envs, num steps in a demo, action_dim).
        Returns:
-            gripper_action (torch.Tensor): subset of environment action for gripper actuation of shape N x gripper_action_dim
+            A dictionary of torch.Tensor gripper actions. Key to each dict is an eef_name.
        """
        # last dimension is gripper action
-        return action[:, -1:]
+        return {list(self.cfg.subtask_configs.keys())[0]: actions[:, -1:]}
-    def get_subtask_term_signals(self, env_ind=0):
+    def get_subtask_term_signals(self, env_ids: Sequence[int] | None = None) -> dict[str, torch.Tensor]:
        """
-        Gets a dictionary of binary flags for each subtask in a task. The flag is 1
+        Gets a dictionary of termination signal flags for each subtask in a task. The flag is 1
-        when the subtask has been completed and 0 otherwise. Isaac Lab Mimic only uses this
+        when the subtask has been completed and 0 otherwise. The implementation of this method is
-        when parsing source demonstrations at the start of data generation, and it only
+        required if intending to enable automatic subtask term signal annotation when running the
-        uses the first 0 -> 1 transition in this signal to detect the end of a subtask.
+        dataset annotation tool. This method can be kept unimplemented if intending to use manual
+        subtask term signal annotation.
+        Args:
+            env_ids: Environment indices to get the termination signals for. If None, all envs are considered.
        Returns:
-            subtask_term_signals (dict): dictionary that maps subtask name to termination flag (0 or 1)
+            A dictionary termination signal flags (False or True) for each subtask.
        """
-        signals = dict()
+        if env_ids is None:
+            env_ids = slice(None)
+        signals = dict()
        subtask_terms = self.obs_buf["subtask_terms"]
-        signals["grasp_1"] = subtask_terms["grasp_1"][env_ind]
+        signals["grasp_1"] = subtask_terms["grasp_1"][env_ids]
-        signals["grasp_2"] = subtask_terms["grasp_2"][env_ind]
+        signals["grasp_2"] = subtask_terms["grasp_2"][env_ids]
-        signals["stack_1"] = subtask_terms["stack_1"][env_ind]
+        signals["stack_1"] = subtask_terms["stack_1"][env_ids]
        # final subtask is placing cubeC on cubeA (motion relative to cubeA) - but final subtask signal is not needed
        return signals
--- a/source/isaaclab_mimic/isaaclab_mimic/envs/franka_stack_ik_rel_mimic_env_cfg.py
+++ b/source/isaaclab_mimic/isaaclab_mimic/envs/franka_stack_ik_rel_mimic_env_cfg.py
@@ -31,12 +31,11 @@ class FrankaCubeStackIKRelMimicEnvCfg(FrankaCubeStackEnvCfg, MimicEnvCfg):
        self.datagen_config.generation_transform_first_robot_pose = False
        self.datagen_config.generation_interpolate_from_last_target_pose = True
        self.datagen_config.max_num_failures = 25
-        self.datagen_config.num_demo_to_render = 10
-        self.datagen_config.num_fail_demo_to_render = 25
        self.datagen_config.seed = 1
        # The following are the subtask configurations for the stack task.
-        self.subtask_configs.append(
+        subtask_configs = []
+        subtask_configs.append(
            SubTaskConfig(
                # Each subtask involves manipulation with respect to a single object frame.
                object_ref="cube_2",
@@ -60,7 +59,7 @@ class FrankaCubeStackIKRelMimicEnvCfg(FrankaCubeStackEnvCfg, MimicEnvCfg):
                apply_noise_during_interpolation=False,
            )
        )
-        self.subtask_configs.append(
+        subtask_configs.append(
            SubTaskConfig(
                # Each subtask involves manipulation with respect to a single object frame.
                object_ref="cube_1",
@@ -82,7 +81,7 @@ class FrankaCubeStackIKRelMimicEnvCfg(FrankaCubeStackEnvCfg, MimicEnvCfg):
                apply_noise_during_interpolation=False,
            )
        )
-        self.subtask_configs.append(
+        subtask_configs.append(
            SubTaskConfig(
                # Each subtask involves manipulation with respect to a single object frame.
                object_ref="cube_3",
@@ -104,7 +103,7 @@ class FrankaCubeStackIKRelMimicEnvCfg(FrankaCubeStackEnvCfg, MimicEnvCfg):
                apply_noise_during_interpolation=False,
            )
        )
-        self.subtask_configs.append(
+        subtask_configs.append(
            SubTaskConfig(
                # Each subtask involves manipulation with respect to a single object frame.
                object_ref="cube_2",
@@ -126,3 +125,4 @@ class FrankaCubeStackIKRelMimicEnvCfg(FrankaCubeStackEnvCfg, MimicEnvCfg):
                apply_noise_during_interpolation=False,
            )
        )
+        self.subtask_configs["franka"] = subtask_configs