Resets cuda device after each app.update call (#2283)

# Description Calling app.update may change the cuda device that was previously set by Isaac Lab. This change forces the cuda device to be set back to the desired device after each app.update call made in SimulationContext in reset, step, and render. This fixes NCCL errors on distributed setups for certain environments (especially when rendering is enabled), where previously it would generate errors that different ranks were running on the same device. ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Resets cuda device after each app.update call (#2283)
# Description Calling app.update may change the cuda device that was previously set by Isaac Lab. This change forces the cuda device to be set back to the desired device after each app.update call made in SimulationContext in reset, step, and render. This fixes NCCL errors on distributed setups for certain environments (especially when rendering is enabled), where previously it would generate errors that different ranks were running on the same device. ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
09590912 · Kelly Guo · GitHub · 203955e4 · 09590912 · 09590912
Unverified Commit 09590912 authored Apr 10, 2025 by Kelly Guo Committed by GitHub Apr 10, 2025
Showing with 23 additions and 1 deletion

extension.toml source/isaaclab/config/extension.toml +1 -1

CHANGELOG.rst source/isaaclab/docs/CHANGELOG.rst +11 -0

simulation_context.py source/isaaclab/isaaclab/sim/simulation_context.py +11 -0

No files found.
--- a/source/isaaclab/config/extension.toml
+++ b/source/isaaclab/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.36.5"
+version = "0.36.6"

 # Description
 title = "Isaac Lab framework for Robot Learning"

--- a/source/isaaclab/docs/CHANGELOG.rst
+++ b/source/isaaclab/docs/CHANGELOG.rst
 Changelog
 ---------

+0.36.6 (2025-04-09)
+~~~~~~~~~~~~~~~~~~~
+
+Changed
+^^^^^^^
+
+* Added call to set cuda device after each ``app.update()`` call in :class:`~isaaclab.sim.SimulationContext`.
+  This is now required for multi-GPU workflows because some underlying logic in ``app.update()`` is modifying
+  the cuda device, which results in NCCL errors on distributed setups.
+
+
 0.36.5 (2025-04-01)
 ~~~~~~~~~~~~~~~~~~~


--- a/source/isaaclab/isaaclab/sim/simulation_context.py
+++ b/source/isaaclab/isaaclab/sim/simulation_context.py
@@ -452,6 +452,9 @@ class SimulationContext(_SimulationContext):

    def reset(self, soft: bool = False):
        super().reset(soft=soft)
+        # app.update() may be changing the cuda device in reset, so we force it back to our desired device here
+        if "cuda" in self.device:
+            torch.cuda.set_device(self.device)
        # enable kinematic rendering with fabric
        if self.physics_sim_view:
            self.physics_sim_view._backend.initialize_kinematic_bodies()
@@ -488,6 +491,10 @@ class SimulationContext(_SimulationContext):
        # step the simulation
        super().step(render=render)

+        # app.update() may be changing the cuda device in step, so we force it back to our desired device here
+        if "cuda" in self.device:
+            torch.cuda.set_device(self.device)
+
    def render(self, mode: RenderMode | None = None):
        """Refreshes the rendering components including UI elements and view-ports depending on the render mode.

@@ -527,6 +534,10 @@ class SimulationContext(_SimulationContext):
            self._app.update()
            self.set_setting("/app/player/playSimulations", True)

+        # app.update() may be changing the cuda device, so we force it back to our desired device here
+        if "cuda" in self.device:
+            torch.cuda.set_device(self.device)
+
    """
    Operations - Override (extension)
    """