Unverified Commit 35345c63 authored by Toni-SM's avatar Toni-SM Committed by GitHub

Adds distributed multi-GPU learning support for skrl (#574)

This PR updates `skrl` to version `>=1.2.0` and include the support for
distributed multi-gpu learning in Isaac Lab docs,
See [skrl-v1.2.0](https://github.com/Toni-SM/skrl/releases/tag/1.2.0)
for more details

## Type of change

- New feature (non-breaking change which adds functionality)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have run all the tests with `./isaaclab.sh --test` and they pass
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent 1a7c86b9
......@@ -10,7 +10,7 @@ Multi-GPU Training
------------------
For complex reinforcement learning environments, it may be desirable to scale up training across multiple GPUs.
This is possible in Isaac Lab with the ``rl_games`` RL library through the use of the
This is possible in Isaac Lab with the ``rl_games`` and ``skrl`` RL libraries through the use of the
`PyTorch distributed <https://pytorch.org/docs/stable/distributed.html>`_ framework.
In this workflow, ``torch.distributed`` is used to launch multiple processes of training, where the number of
processes must be equal to or less than the number of GPUs available. Each process runs on
......@@ -23,12 +23,23 @@ at the end of the epoch.
:align: center
:alt: Multi-GPU training paradigm
|
To train with multiple GPUs, use the following command, where ``--proc_per_node`` represents the number of available GPUs:
.. code-block:: shell
.. tabs::
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
Due to limitations of NCCL on Windows, this feature is currently supported on Linux only.
......@@ -41,17 +52,37 @@ To scale up training beyond multiple GPUs on a single machine, it is also possib
To train across multiple nodes/machines, it is required to launch an individual process on each node.
For the master node, use the following command, where ``--proc_per_node`` represents the number of available GPUs, and ``--nnodes`` represents the number of nodes:
.. code-block:: shell
.. tabs::
.. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=0 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=localhost:5555 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
Note that the port (``5555``) can be replaced with any other available port.
For non-master nodes, use the following command, replacing ``--node_rank`` with the index of each machine:
.. code-block:: shell
.. tabs::
.. group-tab:: rl_games
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
.. group-tab:: skrl
.. code-block:: shell
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-v0 --headless --distributed
python -m torch.distributed.run --nproc_per_node=2 --nnodes=2 --node_rank=1 --rdzv_id=123 --rdzv_backend=c10d --rdzv_endpoint=ip_of_master_machine:5555 source/standalone/workflows/skrl/train.py --task=Isaac-Cartpole-v0 --headless --distributed
For more details on multi-node training with PyTorch, please visit the `PyTorch documentation <https://pytorch.org/tutorials/intermediate/ddp_series_multinode.html>`_. As mentioned in the PyTorch documentation, "multinode training is bottlenecked by inter-node communication latencies". When this latency is high, it is possible multi-node training will perform worse than running on a single node instance.
......
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.7.7"
version = "0.7.8"
# Description
title = "Isaac Lab Environments"
......@@ -22,7 +22,7 @@ requirements = [
"stable-baselines3>=2.1",
"rl-games==1.6.1",
"rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git",
"skrl>=1.1.0"
"skrl>=1.2.0"
]
modules = [
......
Changelog
---------
0.7.8 (2024-06-26)
~~~~~~~~~~~~~~~~~~
Changed
^^^^^^^
* Updated the skrl RL library integration to the latest release (>= 1.2.0)
0.7.7 (2024-06-14)
~~~~~~~~~~~~~~~~~~
......
......@@ -19,23 +19,16 @@ Or, equivalently, by directly calling the skrl library API as follows:
from skrl.envs.torch.wrappers import wrap_env
env = wrap_env(env, wrapper="isaac-orbit")
env = wrap_env(env, wrapper="isaaclab")
"""
# needed to import for type hinting: Agent | list[Agent]
from __future__ import annotations
import copy
import torch
import tqdm
from skrl.agents.torch import Agent
from skrl.envs.wrappers.torch import Wrapper, wrap_env
from skrl.envs.wrappers.torch import wrap_env
from skrl.resources.preprocessors.torch import RunningStandardScaler # noqa: F401
from skrl.resources.schedulers.torch import KLAdaptiveLR # noqa: F401
from skrl.trainers.torch import Trainer
from skrl.trainers.torch.sequential import SEQUENTIAL_TRAINER_DEFAULT_CONFIG
from skrl.utils.model_instantiators.torch import Shape # noqa: F401
from omni.isaac.lab.envs import DirectRLEnv, ManagerBasedRLEnv
......@@ -114,173 +107,4 @@ def SkrlVecEnvWrapper(env: ManagerBasedRLEnv):
f"The environment must be inherited from ManagerBasedRLEnv or DirectRLEnv. Environment type: {type(env)}"
)
# wrap and return the environment
return wrap_env(env, wrapper="isaac-orbit")
"""
Custom trainer for skrl.
"""
class SkrlSequentialLogTrainer(Trainer):
"""Sequential trainer with logging of episode information.
This trainer inherits from the :class:`skrl.trainers.base_class.Trainer` class. It is used to
train agents in a sequential manner (i.e., one after the other in each interaction with the
environment). It is most suitable for on-policy RL agents such as PPO, A2C, etc.
It modifies the :class:`skrl.trainers.torch.sequential.SequentialTrainer` class with the following
differences:
* It also log episode information to the agent's logger.
* It does not close the environment at the end of the training.
Reference:
https://skrl.readthedocs.io/en/latest/api/trainers.html#base-class
"""
def __init__(
self,
env: Wrapper,
agents: Agent | list[Agent],
agents_scope: list[int] | None = None,
cfg: dict | None = None,
):
"""Initializes the trainer.
Args:
env: Environment to train on.
agents: Agents to train.
agents_scope: Number of environments for each agent to
train on. Defaults to None.
cfg: Configuration dictionary. Defaults to None.
"""
# update the config
_cfg = copy.deepcopy(SEQUENTIAL_TRAINER_DEFAULT_CONFIG)
_cfg.update(cfg if cfg is not None else {})
# store agents scope
agents_scope = agents_scope if agents_scope is not None else []
# initialize the base class
super().__init__(env=env, agents=agents, agents_scope=agents_scope, cfg=_cfg)
# init agents
if self.env.num_agents > 1:
for agent in self.agents:
agent.init(trainer_cfg=self.cfg)
else:
self.agents.init(trainer_cfg=self.cfg)
def train(self):
"""Train the agents sequentially.
This method executes the training loop for the agents. It performs the following steps:
* Pre-interaction: Perform any pre-interaction operations.
* Compute actions: Compute the actions for the agents.
* Step the environments: Step the environments with the computed actions.
* Record the environments' transitions: Record the transitions from the environments.
* Log custom environment data: Log custom environment data.
* Post-interaction: Perform any post-interaction operations.
* Reset the environments: Reset the environments if they are terminated or truncated.
"""
# init agent
self.agents.init(trainer_cfg=self.cfg)
self.agents.set_running_mode("train")
# reset env
states, infos = self.env.reset()
# training loop
for timestep in tqdm.tqdm(range(self.timesteps), disable=self.disable_progressbar):
# pre-interaction
self.agents.pre_interaction(timestep=timestep, timesteps=self.timesteps)
# compute actions
with torch.no_grad():
actions = self.agents.act(states, timestep=timestep, timesteps=self.timesteps)[0]
# step the environments
next_states, rewards, terminated, truncated, infos = self.env.step(actions)
# note: here we do not call render scene since it is done in the env.step() method
# record the environments' transitions
with torch.no_grad():
self.agents.record_transition(
states=states,
actions=actions,
rewards=rewards,
next_states=next_states,
terminated=terminated,
truncated=truncated,
infos=infos,
timestep=timestep,
timesteps=self.timesteps,
)
# log custom environment data
if "log" in infos:
for k, v in infos["log"].items():
if isinstance(v, torch.Tensor) and v.numel() == 1:
self.agents.track_data(f"EpisodeInfo / {k}", v.item())
# post-interaction
self.agents.post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset the environments
# note: here we do not call reset scene since it is done in the env.step() method
# update states
states.copy_(next_states)
def eval(self) -> None:
"""Evaluate the agents sequentially.
This method executes the following steps in loop:
* Compute actions: Compute the actions for the agents.
* Step the environments: Step the environments with the computed actions.
* Record the environments' transitions: Record the transitions from the environments.
* Log custom environment data: Log custom environment data.
"""
# set running mode
if self.num_agents > 1:
for agent in self.agents:
agent.set_running_mode("eval")
else:
self.agents.set_running_mode("eval")
# single agent
if self.num_agents == 1:
self.single_agent_eval()
return
# reset env
states, infos = self.env.reset()
# evaluation loop
for timestep in tqdm.tqdm(range(self.initial_timestep, self.timesteps), disable=self.disable_progressbar):
# compute actions
with torch.no_grad():
actions = torch.vstack([
agent.act(states[scope[0] : scope[1]], timestep=timestep, timesteps=self.timesteps)[0]
for agent, scope in zip(self.agents, self.agents_scope)
])
# step the environments
next_states, rewards, terminated, truncated, infos = self.env.step(actions)
with torch.no_grad():
# write data to TensorBoard
for agent, scope in zip(self.agents, self.agents_scope):
# track data
agent.record_transition(
states=states[scope[0] : scope[1]],
actions=actions[scope[0] : scope[1]],
rewards=rewards[scope[0] : scope[1]],
next_states=next_states[scope[0] : scope[1]],
terminated=terminated[scope[0] : scope[1]],
truncated=truncated[scope[0] : scope[1]],
infos=infos,
timestep=timestep,
timesteps=self.timesteps,
)
# log custom environment data
if "log" in infos:
for k, v in infos["log"].items():
if isinstance(v, torch.Tensor) and v.numel() == 1:
agent.track_data(k, v.item())
# perform post-interaction
super(type(agent), agent).post_interaction(timestep=timestep, timesteps=self.timesteps)
# reset environments
# note: here we do not call reset scene since it is done in the env.step() method
states.copy_(next_states)
return wrap_env(env, wrapper="isaaclab")
......@@ -39,7 +39,7 @@ PYTORCH_INDEX_URL = ["https://download.pytorch.org/whl/cu118"]
# Extra dependencies for RL agents
EXTRAS_REQUIRE = {
"sb3": ["stable-baselines3>=2.1"],
"skrl": ["skrl>=1.1.0"],
"skrl": ["skrl>=1.2.0"],
"rl-games": ["rl-games==1.6.1", "gym"], # rl-games still needs gym :(
"rsl-rl": ["rsl-rl@git+https://github.com/leggedrobotics/rsl_rl.git"],
"robomimic": [],
......
......@@ -60,7 +60,7 @@ def main():
# create isaac environment
env = gym.make(args_cli.task, cfg=env_cfg)
# wrap around environment for skrl
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaac-orbit")`
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaaclab")`
# instantiate models using skrl model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
......
......@@ -29,7 +29,11 @@ parser.add_argument(
parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
parser.add_argument("--task", type=str, default=None, help="Name of the task.")
parser.add_argument("--seed", type=int, default=None, help="Seed used for the environment")
parser.add_argument(
"--distributed", action="store_true", default=False, help="Run training with multiple GPUs or nodes."
)
parser.add_argument("--max_iterations", type=int, default=None, help="RL Policy training iterations.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
......@@ -50,6 +54,7 @@ from datetime import datetime
from skrl.agents.torch.ppo import PPO, PPO_DEFAULT_CONFIG
from skrl.memories.torch import RandomMemory
from skrl.trainers.torch import SequentialTrainer
from skrl.utils import set_seed
from skrl.utils.model_instantiators.torch import deterministic_model, gaussian_model, shared_model
......@@ -58,7 +63,7 @@ from omni.isaac.lab.utils.io import dump_pickle, dump_yaml
import omni.isaac.lab_tasks # noqa: F401
from omni.isaac.lab_tasks.utils import load_cfg_from_registry, parse_env_cfg
from omni.isaac.lab_tasks.utils.wrappers.skrl import SkrlSequentialLogTrainer, SkrlVecEnvWrapper, process_skrl_cfg
from omni.isaac.lab_tasks.utils.wrappers.skrl import SkrlVecEnvWrapper, process_skrl_cfg
def main():
......@@ -86,6 +91,11 @@ def main():
# update log_dir
log_dir = os.path.join(log_root_path, log_dir)
# multi-gpu training config
if args_cli.distributed:
# update env config device
env_cfg.sim.device = f"cuda:{app_launcher.local_rank}"
# max iterations for training
if args_cli.max_iterations:
experiment_cfg["trainer"]["timesteps"] = args_cli.max_iterations * experiment_cfg["agent"]["rollouts"]
......@@ -110,7 +120,7 @@ def main():
print_dict(video_kwargs, nesting=4)
env = gym.wrappers.RecordVideo(env, **video_kwargs)
# wrap around environment for skrl
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaac-orbit")`
env = SkrlVecEnvWrapper(env) # same as: `wrap_env(env, wrapper="isaaclab")`
# set seed for the experiment (override from command line)
set_seed(args_cli_seed if args_cli_seed is not None else experiment_cfg["seed"])
......@@ -173,7 +183,8 @@ def main():
# configure and instantiate a custom RL trainer for logging episode events
# https://skrl.readthedocs.io/en/latest/api/trainers.html
trainer_cfg = experiment_cfg["trainer"]
trainer = SkrlSequentialLogTrainer(cfg=trainer_cfg, env=env, agents=agent)
trainer_cfg["close_environment_at_exit"] = False
trainer = SequentialTrainer(cfg=trainer_cfg, env=env, agents=agent)
# train the agent
trainer.train()
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment