Updates multi-node training commands to also support Spark (#3978)

# Description Removes rendezvous backend for multi-node training since it doesn't seem to be necessary and prevents multi-node setup on the DGX Spark. ## Type of change - Documentation update ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

Updates multi-node training commands to also support Spark (#3978)
# Description Removes rendezvous backend for multi-node training since it doesn't seem to be necessary and prevents multi-node setup on the DGX Spark. ## Type of change - Documentation update ## Checklist - [x] I have read and understood the [contribution guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html) - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
ec38e601 · Kelly Guo · GitHub · a736a00b · ec38e601 · ec38e601
Unverified Commit ec38e601 authored Nov 12, 2025 by Kelly Guo Committed by GitHub Nov 12, 2025
Hide whitespace changes
Inline Side-by-side

Showing with 6 additions and 8 deletions

multi_gpu.rst docs/source/features/multi_gpu.rst +6 -6

index.rst docs/source/setup/installation/index.rst +0 -2

No files found.
--- a/docs/source/features/multi_gpu.rst
+++ b/docs/source/features/multi_gpu.rst
@@ -141,14 +141,14 @@ For the master node, use the following command, where ``--nproc_per_node`` repre
        .. code-block:: shell
-            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
    .. tab-item:: rsl_rl
        :sync: rsl_rl
        .. code-block:: shell
-            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
    .. tab-item:: skrl
        :sync: skrl
@@ -160,7 +160,7 @@ For the master node, use the following command, where ``--nproc_per_node`` repre
                .. code-block:: shell
-                    python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+                    python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
            .. tab-item:: JAX
                :sync: jax
@@ -181,14 +181,14 @@ For non-master nodes, use the following command, replacing ``--node_rank`` with
        .. code-block:: shell
-            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
    .. tab-item:: rsl_rl
        :sync: rsl_rl
        .. code-block:: shell
-            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+            python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/rsl_rl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
    .. tab-item:: skrl
        :sync: skrl
@@ -200,7 +200,7 @@ For non-master nodes, use the following command, replacing ``--node_rank`` with
                .. code-block:: shell
-                    python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
+                    python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr=<ip_of_master> --master_port=5555 scripts/reinforcement_learning/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
            .. tab-item:: JAX
                :sync: jax

--- a/docs/source/setup/installation/index.rst
+++ b/docs/source/setup/installation/index.rst
@@ -89,8 +89,6 @@ Other notable limitations with respect to Isaac Lab include...
 #. Livestream and Hub Workstation Cache are not supported on the DGX spark.
-#. Multi-node training may require direct connections between Spark machines or additional network configurations.
 #. :ref:`Running Cosmos Transfer1 <running-cosmos>` is not currently supported on the DGX Spark.
 Troubleshooting