Unverified Commit 35345c63 authored by Toni-SM's avatar Toni-SM Committed by GitHub

Adds distributed multi-GPU learning support for skrl (#574)

This PR updates `skrl` to version `>=1.2.0` and include the support for
distributed multi-gpu learning in Isaac Lab docs,
See [skrl-v1.2.0](https://github.com/Toni-SM/skrl/releases/tag/1.2.0)
for more details

## Type of change

- New feature (non-breaking change which adds functionality)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have run all the tests with `./isaaclab.sh --test` and they pass
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 1a7c86b9
...@@ -10,7 +10,7 @@ Multi-GPU Training ...@@ -10,7 +10,7 @@ Multi-GPU Training
------------------ ------------------
For complex reinforcement learning environments, it may be desirable to scale up training across multiple GPUs. For complex reinforcement learning environments, it may be desirable to scale up training across multiple GPUs.
This is possible in Isaac Lab with the ``rl_games`` RL library through the use of the This is possible in Isaac Lab with the ``rl_games`` and ``skrl`` RL libraries through the use of the
`PyTorch distributed <https://pytorch.org/docs/stable/distributed.html>`_ framework. `PyTorch distributed <https://pytorch.org/docs/stable/distributed.html>`_ framework.
In this workflow, ``torch.distributed`` is used to launch multiple processes of training, where the number of In this workflow, ``torch.distributed`` is used to launch multiple processes of training, where the number of
processes must be equal to or less than the number of GPUs available. Each process runs on processes must be equal to or less than the number of GPUs available. Each process runs on
...@@ -23,12 +23,23 @@ at the end of the epoch. ...@@ -23,12 +23,23 @@ at the end of the epoch.
:align: center :align: center
:alt: Multi-GPU training paradigm :alt: Multi-GPU training paradigm
|
To train with multiple GPUs, use the following command, where ``--proc_per_node`` represents the number of available GPUs: To train with multiple GPUs, use the following command, where ``--proc_per_node`` represents the number of available GPUs:
.. code-block:: shell .. tabs::
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed .. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
Due to limitations of NCCL on Windows, this feature is currently supported on Linux only. Due to limitations of NCCL on Windows, this feature is currently supported on Linux only.
...@@ -41,17 +52,37 @@ To scale up training beyond multiple GPUs on a single machine, it is also possib ...@@ -41,17 +52,37 @@ To scale up training beyond multiple GPUs on a single machine, it is also possib
To train across multiple nodes/machines, it is required to launch an individual process on each node. To train across multiple nodes/machines, it is required to launch an individual process on each node.
For the master node, use the following command, where ``--proc_per_node`` represents the number of available GPUs, and ``--nnodes`` represents the number of nodes: For the master node, use the following command, where ``--proc_per_node`` represents the number of available GPUs, and ``--nnodes`` represents the number of nodes:
.. code-block:: shell .. tabs::
.. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed .. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
Note that the port (``5555``) can be replaced with any other available port. Note that the port (``5555``) can be replaced with any other available port.
For non-master nodes, use the following command, replacing ``--node_rank`` with the index of each machine: For non-master nodes, use the following command, replacing ``--node_rank`` with the index of each machine:
.. code-block:: shell .. tabs::
.. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
For more details on multi-node training with PyTorch, please visit the `PyTorch documentation <https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html>`_. As mentioned in the PyTorch documentation, "multinode training is bottlenecked by inter-node communication latencies". When this latency is high, it is possible multi-node training will perform worse than running on a single node instance. For more details on multi-node training with PyTorch, please visit the `PyTorch documentation <https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html>`_. As mentioned in the PyTorch documentation, "multinode training is bottlenecked by inter-node communication latencies". When this latency is high, it is possible multi-node training will perform worse than running on a single node instance.
......
[package] [package]
# Note: Semantic Versioning is used: https://semver.org/ # Note: Semantic Versioning is used: https://semver.org/
version = "0.7.7" version = "0.7.8"
# Description # Description
title = "Isaac Lab Environments" title = "Isaac Lab Environments"
...@@ -22,7 +22,7 @@ requirements = [ ...@@ -22,7 +22,7 @@ requirements = [
"stable-baselines3>=2.1", "stable-baselines3>=2.1",
"rl-games==1.6.1", "rl-games==1.6.1",
"rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git", "rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git",
"skrl>=1.1.0" "skrl>=1.2.0"
] ]
modules = [ modules = [
......
Changelog Changelog
--------- ---------
0.7.8 (2024-06-26)
~~~~~~~~~~~~~~~~~~
Changed
^^^^^^^
* Updated the skrl RL library integration to the latest release (>= 1.2.0)
0.7.7 (2024-06-14) 0.7.7 (2024-06-14)
~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
......
...@@ -19,23 +19,16 @@ Or, equivalently, by directly calling the skrl library API as follows: ...@@ -19,23 +19,16 @@ Or, equivalently, by directly calling the skrl library API as follows:
from skrl.envs.torch.wrappers import wrap_env from skrl.envs.torch.wrappers import wrap_env
env = wrap_env(env, wrapper="isaac-orbit") env = wrap_env(env, wrapper="isaaclab")
""" """
# needed to import for type hinting: Agent | list[Agent] # needed to import for type hinting: Agent | list[Agent]
from __future__ import annotations from __future__ import annotations
import copy from skrl.envs.wrappers.torch import wrap_env
import torch
import tqdm
from skrl.agents.torch import Agent
from skrl.envs.wrappers.torch import Wrapper, wrap_env
from skrl.resources.preprocessors.torch import RunningStandardScaler # noqa: F401 from skrl.resources.preprocessors.torch import RunningStandardScaler # noqa: F401
from skrl.resources.schedulers.torch import KLAdaptiveLR # noqa: F401 from skrl.resources.schedulers.torch import KLAdaptiveLR # noqa: F401
from skrl.trainers.torch import Trainer
from skrl.trainers.torch.sequential import SEQUENTIAL_TRAINER_DEFAULT_CONFIG
from skrl.utils.model_instantiators.torch import Shape # noqa: F401 from skrl.utils.model_instantiators.torch import Shape # noqa: F401
from omni.isaac.lab.envs import DirectRLEnv, ManagerBasedRLEnv from omni.isaac.lab.envs import DirectRLEnv, ManagerBasedRLEnv
...@@ -114,173 +107,4 @@ def SkrlVecEnvWrapper(env: ManagerBasedRLEnv): ...@@ -114,173 +107,4 @@ def SkrlVecEnvWrapper(env: ManagerBasedRLEnv):
f"The environment must be inherited from ManagerBasedRLEnv or DirectRLEnv. Environment type: {type(env)}" f"The environment must be inherited from ManagerBasedRLEnv or DirectRLEnv. Environment type: {type(env)}"
) )
# wrap and return the environment # wrap and return the environment
return wrap_env(env, wrapper="isaac-orbit") return wrap_env(env, wrapper="isaaclab")
"""
Custom trainer for skrl.
"""
class SkrlSequentialLogTrainer(Trainer):
"""Sequential trainer with logging of episode information.
This trainer inherits from the :class:`skrl.trainers.base_class.Trainer` class. It is used to
train agents in a sequential manner (i.e., one after the other in each interaction with the
environment). It is most suitable for on-policy RL agents such as PPO, A2C, etc.
It modifies the :class:`skrl.trainers.torch.sequential.SequentialTrainer` class with the following
differences:
* It also log episode information to the agent's logger.
* It does not close the environment at the end of the training.
Reference:
https://skrl.readthedocs.io/en/latest/api/trainers.html#base-class
"""
def __init__(
self,
env: Wrapper,
agents: Agent | list[Agent],
agents_scope: list[int] | None = None,
cfg: dict | None = None,
):
"""Initializes the trainer.
Args:
env: Environment to train on.
agents: Agents to train.
agents_scope: Number of environments for each agent to
train on. Defaults to None.
cfg: Configuration dictionary. Defaults to None.
"""
# update the config
_cfg = copy.deepcopy(SEQUENTIAL_TRAINER_DEFAULT_CONFIG)
_cfg.update(cfg if cfg is not None else {})
# store agents scope
agents_scope = agents_scope if agents_scope is not None else []
# initialize the base class
super().__init__(env=env, agents=agents, agents_scope=agents_scope, cfg=_cfg)
# init agents
if self.env.num_agents > 1:
for agent in self.agents:
agent.init(trainer_cfg=self.cfg)
else:
self.agents.init(trainer_cfg=self.cfg)
def train(self):
"""Train the agents sequentially.
This method executes the training loop for the agents. It performs the following steps:
* Pre-interaction: Perform any pre-interaction operations.
* Compute actions: Compute the actions for the agents.
* Step the environments: Step the environments with the computed actions.
* Record the environments' transitions: Record the transitions from the environments.
* Log custom environment data: Log custom environment data.
* Post-interaction: Perform any post-interaction operations.
* Reset the environments: Reset the environments if they are terminated or truncated.
"""
# init agent
self.agents.init(trainer_cfg=self.cfg)
self.agents.set_running_mode("train")
# reset env
states, infos = self.env.reset()
# training loop
for timestep in tqdm.tqdm(range(self.timesteps), disable=self.disable_progressbar):
# pre-interaction
self.agents.pre_interaction(timestep=timestep, timesteps=self.timesteps)
# compute actions
with torch.no_grad():
actions = self.agents.act(states, timestep=timestep, timesteps=self.timesteps)[0]
# step the environments
next_states, rewards, terminated, truncated, infos = self.env.step(actions)
# note: here we do not call render scene since it is done in the env.step() method
# record the environments' transitions
with torch.no_grad():
self.agents.record_transition(
states=states,
actions=actions,
rewards=rewards,
next_states=next_states,
terminated=terminated,
truncated=truncated,
infos=infos,
timestep=timestep,
timesteps=self.timesteps,
)
# log custom environment data
if "log" in infos:
for k, v in infos["log"].items():
if isinstance(v, torch.Tensor) and v.numel() == 1:
self.agents.track_data(f"EpisodeInfo / {k}", v.item())
# post-interaction
self.agents.post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset the environments
# note: here we do not call reset scene since it is done in the env.step() method
# update states
states.copy_(next_states)
def eval(self) -> None:
"""Evaluate the agents sequentially.
This method executes the following steps in loop:
* Compute actions: Compute the actions for the agents.
* Step the environments: Step the environments with the computed actions.
* Record the environments' transitions: Record the transitions from the environments.
* Log custom environment data: Log custom environment data.
"""
# set running mode
if self.num_agents > 1:
for agent in self.agents:
agent.set_running_mode("eval")
else:
self.agents.set_running_mode("eval")
# single agent
if self.num_agents == 1:
self.single_agent_eval()
return
# reset env
states, infos = self.env.reset()
# evaluation loop
for timestep in tqdm.tqdm(range(self.initial_timestep, self.timesteps), disable=self.disable_progressbar):
# compute actions
with torch.no_grad():
actions = torch.vstack([
agent.act(states[scope[0] : scope[1]], timestep=timestep, timesteps=self.timesteps)[0]
for agent, scope in zip(self.agents, self.agents_scope)
])
# step the environments
next_states, rewards, terminated, truncated, infos = self.env.step(actions)
with torch.no_grad():
# write data to TensorBoard
for agent, scope in zip(self.agents, self.agents_scope):
# track data
agent.record_transition(
states=states[scope[0] : scope[1]],
actions=actions[scope[0] : scope[1]],
rewards=rewards[scope[0] : scope[1]],
next_states=next_states[scope[0] : scope[1]],
terminated=terminated[scope[0] : scope[1]],
truncated=truncated[scope[0] : scope[1]],
infos=infos,
timestep=timestep,
timesteps=self.timesteps,
)
# log custom environment data
if "log" in infos:
for k, v in infos["log"].items():
if isinstance(v, torch.Tensor) and v.numel() == 1:
agent.track_data(k, v.item())
# perform post-interaction
super(type(agent), agent).post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset environments
# note: here we do not call reset scene since it is done in the env.step() method
states.copy_(next_states)
...@@ -39,7 +39,7 @@ PYTORCH_INDEX_URL = ["https://download.pytorch.org/whl/cu118"] ...@@ -39,7 +39,7 @@ PYTORCH_INDEX_URL = ["https://download.pytorch.org/whl/cu118"]
# Extra dependencies for RL agents # Extra dependencies for RL agents
EXTRAS_REQUIRE = { EXTRAS_REQUIRE = {
"sb3": ["stable-baselines3>=2.1"], "sb3": ["stable-baselines3>=2.1"],
"skrl": ["skrl>=1.1.0"], "skrl": ["skrl>=1.2.0"],
"rl-games": ["rl-games==1.6.1", "gym"], # rl-games still needs gym :( "rl-games": ["rl-games==1.6.1", "gym"], # rl-games still needs gym :(
"rsl-rl": ["rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git"], "rsl-rl": ["rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
"robomimic": [], "robomimic": [],
......
...@@ -60,7 +60,7 @@ def main(): ...@@ -60,7 +60,7 @@ def main():
# create isaac environment # create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg) env = gym.make(args_cli.task, cfg=env_cfg)
# wrap around environment for skrl # wrap around environment for skrl
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaac-orbit")` env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaaclab")`
# instantiate models using skrl model instantiator utility # instantiate models using skrl model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html # https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
......
...@@ -29,7 +29,11 @@ parser.add_argument( ...@@ -29,7 +29,11 @@ parser.add_argument(
parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.") parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.") parser.add_argument("--task", type=str, default=None, help="Name of the task.")
parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment") parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment")
parser.add_argument(
"--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes."
)
parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.") parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.")
# append AppLauncher cli args # append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser) AppLauncher.add_app_launcher_args(parser)
# parse the arguments # parse the arguments
...@@ -50,6 +54,7 @@ from datetime import datetime ...@@ -50,6 +54,7 @@ from datetime import datetime
from skrl.agents.torch.ppo import PPO, PPO_DEFAULT_CONFIG from skrl.agents.torch.ppo import PPO, PPO_DEFAULT_CONFIG
from skrl.memories.torch import RandomMemory from skrl.memories.torch import RandomMemory
from skrl.trainers.torch import SequentialTrainer
from skrl.utils import set_seed from skrl.utils import set_seed
from skrl.utils.model_instantiators.torch import deterministic_model, gaussian_model, shared_model from skrl.utils.model_instantiators.torch import deterministic_model, gaussian_model, shared_model
...@@ -58,7 +63,7 @@ from omni.isaac.lab.utils.io import dump_pickle, dump_yaml ...@@ -58,7 +63,7 @@ from omni.isaac.lab.utils.io import dump_pickle, dump_yaml
import omni.isaac.lab_tasks # noqa: F401 import omni.isaac.lab_tasks # noqa: F401
from omni.isaac.lab_tasks.utils import load_cfg_from_registry, parse_env_cfg from omni.isaac.lab_tasks.utils import load_cfg_from_registry, parse_env_cfg
from omni.isaac.lab_tasks.utils.wrappers.skrl import SkrlSequentialLogTrainer, SkrlVecEnvWrapper, process_skrl_cfg from omni.isaac.lab_tasks.utils.wrappers.skrl import SkrlVecEnvWrapper, process_skrl_cfg
def main(): def main():
...@@ -86,6 +91,11 @@ def main(): ...@@ -86,6 +91,11 @@ def main():
# update log_dir # update log_dir
log_dir = os.path.join(log_root_path, log_dir) log_dir = os.path.join(log_root_path, log_dir)
# multi-gpu training config
if args_cli.distributed:
# update env config device
env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
# max iterations for training # max iterations for training
if args_cli.max_iterations: if args_cli.max_iterations:
experiment_cfg["trainer"]["timesteps"] = args_cli.max_iterations * experiment_cfg["agent"]["rollouts"] experiment_cfg["trainer"]["timesteps"] = args_cli.max_iterations * experiment_cfg["agent"]["rollouts"]
...@@ -110,7 +120,7 @@ def main(): ...@@ -110,7 +120,7 @@ def main():
print_dict(video_kwargs, nesting=4) print_dict(video_kwargs, nesting=4)
env = gym.wrappers.RecordVideo(env, **video_kwargs) env = gym.wrappers.RecordVideo(env, **video_kwargs)
# wrap around environment for skrl # wrap around environment for skrl
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaac-orbit")` env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaaclab")`
# set seed for the experiment (override from command line) # set seed for the experiment (override from command line)
set_seed(args_cli_seed if args_cli_seed is not None else experiment_cfg["seed"]) set_seed(args_cli_seed if args_cli_seed is not None else experiment_cfg["seed"])
...@@ -173,7 +183,8 @@ def main(): ...@@ -173,7 +183,8 @@ def main():
# configure and instantiate a custom RL trainer for logging episode events # configure and instantiate a custom RL trainer for logging episode events
# https://skrl.readthedocs.io/en/latest/api/trainers.html # https://skrl.readthedocs.io/en/latest/api/trainers.html
trainer_cfg = experiment_cfg["trainer"] trainer_cfg = experiment_cfg["trainer"]
trainer = SkrlSequentialLogTrainer(cfg=trainer_cfg, env=env, agents=agent) trainer_cfg["close_environment_at_exit"] = False
trainer = SequentialTrainer(cfg=trainer_cfg, env=env, agents=agent)
# train the agent # train the agent
trainer.train() trainer.train()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment