Unverified Commit c5fe1b5f authored by Toni-SM's avatar Toni-SM Committed by GitHub

Adds Gymnasium spaces showcase tasks (#2109)

# Description

This PR add a set of Direct-workflow tasks that showcase the
definition/use of the various Gymnasium observation and action spaces
supported in Isaac Lab.

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)
- This change requires a documentation update

## Screenshots


![image](https://github.com/user-attachments/assets/36b526ac-0eb7-45fa-81fa-3d0a09c1c1c5)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------
Co-authored-by: 's avatarKelly Guo <kellyg@nvidia.com>
parent 1b03bf2f
......@@ -82,3 +82,31 @@ a {
padding-top: 0.0rem !important;
padding-bottom: 0.0rem !important;
}
/* showcase task tables */
.showcase-table {
min-width: 75%;
}
.showcase-table td {
border-color: gray;
border-style: solid;
border-width: 1px;
}
.showcase-table p {
margin: 0;
padding: 0;
}
.showcase-table .rot90 {
transform: rotate(-90deg);
margin: 0;
padding: 0;
}
.showcase-table .center {
text-align: center;
vertical-align: middle;
}
......@@ -335,6 +335,121 @@ Others
.. |quadcopter| image:: ../_static/tasks/others/quadcopter.jpg
.. |humanoid_amp| image:: ../_static/tasks/others/humanoid_amp.jpg
Spaces showcase
~~~~~~~~~~~~~~~
The |cartpole_showcase| folder contains showcase tasks (based on the *Cartpole* and *Cartpole-Camera* Direct tasks)
for the definition/use of the various Gymnasium observation and action spaces supported in Isaac Lab.
.. |cartpole_showcase| replace:: `cartpole_showcase <https://github.com/isaac-sim/IsaacLab/tree/main/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole_showcase>`__
.. note::
Currently, only Isaac Lab's Direct workflow supports the definition of observation and action spaces other than ``Box``.
See Direct workflow's :py:obj:`~isaaclab.envs.DirectRLEnvCfg.observation_space` / :py:obj:`~isaaclab.envs.DirectRLEnvCfg.action_space`
documentation for more details.
The following tables summarize the different pairs of showcased spaces for the *Cartpole* and *Cartpole-Camera* tasks.
Replace ``<OBSERVATION>`` and ``<ACTION>`` with the observation and action spaces to be explored in the task names for training and evaluation.
.. raw:: html
<table class="showcase-table">
<caption>
<p>Showcase spaces for the <strong>Cartpole</strong> task</p>
<p><code>Isaac-Cartpole-Showcase-&lt;OBSERVATION&gt;-&lt;ACTION&gt;-Direct-v0</code></p>
</caption>
<tbody>
<tr>
<td colspan="2" rowspan="2"></td>
<td colspan="5" class="center">action space</td>
</tr>
<tr>
<td><strong>&nbsp;Box</strong></td>
<td><strong>&nbsp;Discrete</strong></td>
<td><strong>&nbsp;MultiDiscrete</strong></td>
</tr>
<tr>
<td rowspan="5" class="rot90 center"><p>observation</p><p>space</p></td>
<td><strong>&nbsp;Box</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;Discrete</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;MultiDiscrete</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;Dict</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;Tuple</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
</tbody>
</table>
<br>
<table class="showcase-table">
<caption>
<p>Showcase spaces for the <strong>Cartpole-Camera</strong> task</p>
<p><code>Isaac-Cartpole-Camera-Showcase-&lt;OBSERVATION&gt;-&lt;ACTION&gt;-Direct-v0</code></p>
</caption>
<tbody>
<tr>
<td colspan="2" rowspan="2"></td>
<td colspan="5" class="center">action space</td>
</tr>
<tr>
<td><strong>&nbsp;Box</strong></td>
<td><strong>&nbsp;Discrete</strong></td>
<td><strong>&nbsp;MultiDiscrete</strong></td>
</tr>
<tr>
<td rowspan="5" class="rot90 center"><p>observation</p><p>space</p></td>
<td><strong>&nbsp;Box</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;Discrete</strong></td>
<td class="center">-</td>
<td class="center">-</td>
<td class="center">-</td>
</tr>
<tr>
<td><strong>&nbsp;MultiDiscrete</strong></td>
<td class="center">-</td>
<td class="center">-</td>
<td class="center">-</td>
</tr>
<tr>
<td><strong>&nbsp;Dict</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
<tr>
<td><strong>&nbsp;Tuple</strong></td>
<td class="center">x</td>
<td class="center">x</td>
<td class="center">x</td>
</tr>
</tbody></table>
Multi-agent
------------
......@@ -404,6 +519,42 @@ Comprehensive List of Environments
-
- Direct
- **rl_games** (PPO), **skrl** (IPPO, PPO, MAPPO)
* - Isaac-Cartpole-Camera-Showcase-Box-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Box-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Box-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Dict-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Dict-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Dict-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Tuple-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Tuple-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Camera-Showcase-Tuple-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Depth-Camera-Direct-v0
-
- Direct
......@@ -432,6 +583,66 @@ Comprehensive List of Environments
-
- Manager Based
- **rl_games** (PPO)
* - Isaac-Cartpole-Showcase-Box-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Box-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Box-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Dict-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Dict-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Dict-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Discrete-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Discrete-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Discrete-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-MultiDiscrete-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-MultiDiscrete-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-MultiDiscrete-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Tuple-Box-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Tuple-Discrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-Showcase-Tuple-MultiDiscrete-Direct-v0
-
- Direct
- **skrl** (PPO)
* - Isaac-Cartpole-v0
-
- Manager Based
......
......@@ -69,7 +69,7 @@ import skrl
from packaging import version
# check for minimum supported skrl version
SKRL_VERSION = "1.4.1"
SKRL_VERSION = "1.4.2"
if version.parse(skrl.__version__) < version.parse(SKRL_VERSION):
skrl.logger.error(
f"Unsupported skrl version: {skrl.__version__}. "
......
......@@ -71,7 +71,7 @@ import skrl
from packaging import version
# check for minimum supported skrl version
SKRL_VERSION = "1.4.1"
SKRL_VERSION = "1.4.2"
if version.parse(skrl.__version__) < version.parse(SKRL_VERSION):
skrl.logger.error(
f"Unsupported skrl version: {skrl.__version__}. "
......
......@@ -42,7 +42,7 @@ PYTORCH_INDEX_URL = ["https://download.pytorch.org/whl/cu118"]
# Extra dependencies for RL agents
EXTRAS_REQUIRE = {
"sb3": ["stable-baselines3>=2.1"],
"skrl": ["skrl>=1.4.1"],
"skrl": ["skrl>=1.4.2"],
"rl-games": ["rl-games==1.6.1", "gym"], # rl-games still needs gym :(
"rsl-rl": ["rsl-rl-lib>=2.1.1"],
}
......
......@@ -43,7 +43,7 @@ class TestSKRLVecEnvWrapper(unittest.TestCase):
cls.registered_tasks.append(task_spec.id)
# sort environments by name
cls.registered_tasks.sort()
cls.registered_tasks = cls.registered_tasks[:5]
cls.registered_tasks = cls.registered_tasks[:3]
# this flag is necessary to prevent a bug where the simulation gets stuck randomly when running the
# test on many environments.
......
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.10.25"
version = "0.10.26"
# Description
title = "Isaac Lab Environments"
......
Changelog
---------
0.10.26 (2025-03-18)
~~~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added Gymnasium spaces showcase tasks (``Isaac-Cartpole-Showcase-*-Direct-v0``, and ``Isaac-Cartpole-Camera-Showcase-*-Direct-v0``).
0.10.25 (2025-03-10)
~~~~~~~~~~~~~~~~~~~~
......
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Cartpole environment showcase for the supported Gymnasium spaces."""
from .cartpole import * # noqa
from .cartpole_camera import * # noqa
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Cartpole balancing environment.
"""
import gymnasium as gym
from . import agents
###########################
# Register Gym environments
###########################
###
# Observation space as Box
###
gym.register(
id="Isaac-Cartpole-Showcase-Box-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Box-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Box-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:BoxMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as Discrete
###
gym.register(
id="Isaac-Cartpole-Showcase-Discrete-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Discrete-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Discrete-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DiscreteMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_discrete_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as MultiDiscrete
###
gym.register(
id="Isaac-Cartpole-Showcase-MultiDiscrete-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-MultiDiscrete-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-MultiDiscrete-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:MultiDiscreteMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_multidiscrete_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as Dict
###
gym.register(
id="Isaac-Cartpole-Showcase-Dict-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Dict-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Dict-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:DictMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as Tuple
###
gym.register(
id="Isaac-Cartpole-Showcase-Tuple-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Tuple-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Showcase-Tuple-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_env:CartpoleShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_env_cfg:TupleMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_multidiscrete_ppo_cfg.yaml",
},
)
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_box_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_box_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: OBSERVATIONS
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_box_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["joint-positions"] obs["joint-velocities"]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(+)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_dict_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["joint-positions"] obs["joint-velocities"]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(+)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_dict_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["joint-positions"] obs["joint-velocities"]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(+)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS["joint-positions"]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS["joint-velocities"]
layers: [16, 16]
activations: elu
- name: net
input: net_pos + net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_dict_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_discrete_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_discrete_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_discrete_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_multidiscrete_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_multidiscrete_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# one_hot(obs)
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net
input: one_hot_encoding(OBSERVATION_SPACE, OBSERVATIONS)
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null # pre-processor should not be used with Discrete/MultiDiscrete observations
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_multidiscrete_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(*)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_tuple_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(*)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_tuple_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━▼━━━━━┓ ┏━━━━━━▼━━━━━┓
# ┃ net_pos ┃ ┃ net_vel ┃
# ┡━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━┩
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# │ linear(16) │ │ linear(16) │
# │ elu │ │ elu │
# └──────┬─────┘ └─────┬──────┘
# │ │
# └─────────▶(*)◀─────────┘
# │
# ┏━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━┩
# │ identity │
# shared └─────┬─────┘
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: net_pos
input: OBSERVATIONS[0]
layers: [16, 16]
activations: elu
- name: net_vel
input: OBSERVATIONS[1]
layers: [16, 16]
activations: elu
- name: net
input: net_pos * net_vel
layers: []
activations: []
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 32
learning_epochs: 8
mini_batches: 8
discount_factor: 0.99
lambda: 0.95
learning_rate: 5.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: RunningStandardScaler
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 2.0
kl_threshold: 0.0
rewards_shaper_scale: 0.1
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_direct_tuple_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 4800
environment_info: log
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
import gymnasium as gym
import torch
from isaaclab_tasks.direct.cartpole.cartpole_env import CartpoleEnv, CartpoleEnvCfg
class CartpoleShowcaseEnv(CartpoleEnv):
cfg: CartpoleEnvCfg
def _pre_physics_step(self, actions: torch.Tensor) -> None:
self.actions = actions.clone()
def _apply_action(self) -> None:
# fundamental spaces
# - Box
if isinstance(self.single_action_space, gym.spaces.Box):
target = self.cfg.action_scale * self.actions
# - Discrete
elif isinstance(self.single_action_space, gym.spaces.Discrete):
target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
target = torch.where(self.actions == 1, -self.cfg.action_scale, target)
target = torch.where(self.actions == 2, self.cfg.action_scale, target)
# - MultiDiscrete
elif isinstance(self.single_action_space, gym.spaces.MultiDiscrete):
# value
target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
target = torch.where(self.actions[:, [0]] == 1, self.cfg.action_scale / 2.0, target)
target = torch.where(self.actions[:, [0]] == 2, self.cfg.action_scale, target)
# direction
target = torch.where(self.actions[:, [1]] == 0, -target, target)
else:
raise NotImplementedError(f"Action space {type(self.single_action_space)} not implemented")
# set target
self.cartpole.set_joint_effort_target(target, joint_ids=self._cart_dof_idx)
def _get_observations(self) -> dict:
# fundamental spaces
# - Box
if isinstance(self.single_observation_space["policy"], gym.spaces.Box):
obs = torch.cat(
(
self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
),
dim=-1,
)
# - Discrete
elif isinstance(self.single_observation_space["policy"], gym.spaces.Discrete):
data = (
torch.cat(
(
self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1),
self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1),
),
dim=-1,
)
>= 0
)
obs = torch.zeros((self.num_envs,), dtype=torch.int32, device=self.device)
obs = torch.where(discretization_indices(data, [False, False, False, True]), 1, obs)
obs = torch.where(discretization_indices(data, [False, False, True, False]), 2, obs)
obs = torch.where(discretization_indices(data, [False, False, True, True]), 3, obs)
obs = torch.where(discretization_indices(data, [False, True, False, False]), 4, obs)
obs = torch.where(discretization_indices(data, [False, True, False, True]), 5, obs)
obs = torch.where(discretization_indices(data, [False, True, True, False]), 6, obs)
obs = torch.where(discretization_indices(data, [False, True, True, True]), 7, obs)
obs = torch.where(discretization_indices(data, [True, False, False, False]), 8, obs)
obs = torch.where(discretization_indices(data, [True, False, False, True]), 9, obs)
obs = torch.where(discretization_indices(data, [True, False, True, False]), 10, obs)
obs = torch.where(discretization_indices(data, [True, False, True, True]), 11, obs)
obs = torch.where(discretization_indices(data, [True, True, False, False]), 12, obs)
obs = torch.where(discretization_indices(data, [True, True, False, True]), 13, obs)
obs = torch.where(discretization_indices(data, [True, True, True, False]), 14, obs)
obs = torch.where(discretization_indices(data, [True, True, True, True]), 15, obs)
# - MultiDiscrete
elif isinstance(self.single_observation_space["policy"], gym.spaces.MultiDiscrete):
zeros = torch.zeros((self.num_envs,), dtype=torch.int32, device=self.device)
ones = torch.ones_like(zeros)
obs = torch.cat(
(
torch.where(
discretization_indices(self.joint_pos[:, self._pole_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
ones,
zeros,
).unsqueeze(dim=1),
torch.where(
discretization_indices(self.joint_pos[:, self._cart_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
ones,
zeros,
).unsqueeze(dim=1),
torch.where(
discretization_indices(self.joint_vel[:, self._pole_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
ones,
zeros,
).unsqueeze(dim=1),
torch.where(
discretization_indices(self.joint_vel[:, self._cart_dof_idx[0]].unsqueeze(dim=1) >= 0, [True]),
ones,
zeros,
).unsqueeze(dim=1),
),
dim=-1,
)
# composite spaces
# - Tuple
elif isinstance(self.single_observation_space["policy"], gym.spaces.Tuple):
obs = (self.joint_pos, self.joint_vel)
# - Dict
elif isinstance(self.single_observation_space["policy"], gym.spaces.Dict):
obs = {"joint-positions": self.joint_pos, "joint-velocities": self.joint_vel}
else:
raise NotImplementedError(
f"Observation space {type(self.single_observation_space['policy'])} not implemented"
)
observations = {"policy": obs}
return observations
def discretization_indices(x: torch.Tensor, condition: list[bool]) -> torch.Tensor:
return torch.prod(x == torch.tensor(condition, device=x.device), axis=-1).to(torch.bool)
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Cartpole balancing environment with camera.
"""
import gymnasium as gym
from . import agents
###########################
# Register Gym environments
###########################
###
# Observation space as Box
###
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Box-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Box-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Box-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:BoxMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_box_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as Dict
###
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Dict-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Dict-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Dict-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:DictMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_dict_multidiscrete_ppo_cfg.yaml",
},
)
###
# Observation space as Tuple
###
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Tuple-Box-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleBoxEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_box_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Tuple-Discrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_discrete_ppo_cfg.yaml",
},
)
gym.register(
id="Isaac-Cartpole-Camera-Showcase-Tuple-MultiDiscrete-Direct-v0",
entry_point=f"{__name__}.cartpole_camera_env:CartpoleCameraShowcaseEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": f"{__name__}.cartpole_camera_env_cfg:TupleMultiDiscreteEnvCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_tuple_multidiscrete_ppo_cfg.yaml",
},
)
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓
# ┃ features_extractor ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │
# │ relu │
# │ conv2d(64, 4, 2) │
# │ relu │
# │ conv2d(64, 3, 1) │
# │ relu │
# │ flatten │
# └──────────┬──────────┘
# │
# ┏━━━━━━▼━━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━━┩
# │ linear(512) │
# │ elu │
# └──────┬──────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_box_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓
# ┃ features_extractor ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │
# │ relu │
# │ conv2d(64, 4, 2) │
# │ relu │
# │ conv2d(64, 3, 1) │
# │ relu │
# │ flatten │
# └──────────┬──────────┘
# │
# ┏━━━━━━▼━━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━━┩
# │ linear(512) │
# │ elu │
# └──────┬──────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_box_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs
# │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓
# ┃ features_extractor ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │
# │ relu │
# │ conv2d(64, 4, 2) │
# │ relu │
# │ conv2d(64, 3, 1) │
# │ relu │
# │ flatten │
# └──────────┬──────────┘
# │
# ┏━━━━━━▼━━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━━┩
# │ linear(512) │
# │ elu │
# └──────┬──────┘
# shared │
# ......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
activations: relu
- name: net
input: features_extractor
layers: [512]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_box_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["camera"] obs["joint-velocities"]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ │
# ┃ features_extractor ┃ │
# ┡━━━━━━━━━━━━━━━━━━━━━┩ │
# │ conv2d(32, 8, 4) │ │
# │ relu │ │
# │ conv2d(64, 4, 2) │ │
# │ relu │ │
# │ conv2d(64, 3, 1) │ │
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀────────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: features_extractor
input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_dict_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["camera"] obs["joint-velocities"]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ │
# ┃ features_extractor ┃ │
# ┡━━━━━━━━━━━━━━━━━━━━━┩ │
# │ conv2d(32, 8, 4) │ │
# │ relu │ │
# │ conv2d(64, 4, 2) │ │
# │ relu │ │
# │ conv2d(64, 3, 1) │ │
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀────────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_dict_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs["camera"] obs["joint-velocities"]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ │
# ┃ features_extractor ┃ │
# ┡━━━━━━━━━━━━━━━━━━━━━┩ │
# │ conv2d(32, 8, 4) │ │
# │ relu │ │
# │ conv2d(64, 4, 2) │ │
# │ relu │ │
# │ conv2d(64, 3, 1) │ │
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀────────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS["camera"], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS, (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: net
input: concatenate([features_extractor, OBSERVATIONS["joint-velocities"]])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_dict_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ ┏━━━━━━━▼━━━━━━━━┓
# ┃ features_extractor ┃ ┃ proprioception ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │ │ linear(16) │
# │ relu │ │ elu │
# │ conv2d(64, 4, 2) │ │ linear(8) │
# │ relu │ │ elu │
# │ conv2d(64, 3, 1) │ └───────┬────────┘
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀───────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see gaussian_model parameters
class: GaussianMixin
clip_actions: False
clip_log_std: True
min_log_std: -20.0
max_log_std: 2.0
initial_log_std: 0.0
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_tuple_box"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ ┏━━━━━━━▼━━━━━━━━┓
# ┃ features_extractor ┃ ┃ proprioception ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │ │ linear(16) │
# │ relu │ │ elu │
# │ conv2d(64, 4, 2) │ │ linear(8) │
# │ relu │ │ elu │
# │ conv2d(64, 3, 1) │ └───────┬────────┘
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀───────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see categorical_model parameters
class: CategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_tuple_discrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
seed: 42
# Models are instantiated using skrl's model instantiator utility
# https://skrl.readthedocs.io/en/latest/api/utils/model_instantiators.html
#
# obs[0] obs[1]
# │ │
# ┏━━━━━━━━━━▼━━━━━━━━━━┓ ┏━━━━━━━▼━━━━━━━━┓
# ┃ features_extractor ┃ ┃ proprioception ┃
# ┡━━━━━━━━━━━━━━━━━━━━━┩ ┡━━━━━━━━━━━━━━━━┩
# │ conv2d(32, 8, 4) │ │ linear(16) │
# │ relu │ │ elu │
# │ conv2d(64, 4, 2) │ │ linear(8) │
# │ relu │ │ elu │
# │ conv2d(64, 3, 1) │ └───────┬────────┘
# │ relu │ │
# │ flatten │ │
# │ linear(512) │ │
# │ tanh │ │
# │ linear(16) │ │
# │ tanh │ │
# └──────────┬──────────┘ |
# │ │
# └─▶(concatenate)◀───────┘
# │
# ┏━━━━━━▼━━━━━┓
# ┃ net ┃
# ┡━━━━━━━━━━━━┩
# │ linear(32) │
# │ elu │
# │ linear(32) │
# │ elu │
# └──────┬─────┘
# shared │
# .......................│.......................
# non-shared │
# ┏━━━━━━━━━━━▼━━━━━━━━━━━┓
# ┃ policy|value output ┃
# ┡━━━━━━━━━━━━━━━━━━━━━━━┩
# │ linear(num_actions|1) │
# └───────────┬───────────┘
# ▼
models:
separate: False
policy: # see multicategorical_model parameters
class: MultiCategoricalMixin
unnormalized_log_prob: True
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ACTIONS
value: # see deterministic_model parameters
class: DeterministicMixin
clip_actions: False
network:
- name: features_extractor
input: permute(OBSERVATIONS[0], (0, 3, 1, 2)) # PyTorch NHWC -> NCHW. Warning: don't permute for JAX since it expects NHWC
layers:
- conv2d: {out_channels: 32, kernel_size: 8, stride: 4, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 4, stride: 2, padding: 0}
- conv2d: {out_channels: 64, kernel_size: 3, stride: 1, padding: 0}
- flatten
- linear: 512
- linear: 16
activations: [relu, relu, relu, null, tanh, tanh]
- name: proprioception
input: OBSERVATIONS[1]
layers: [16, 8]
activations: elu
- name: net
input: concatenate([features_extractor, proprioception])
layers: [32, 32]
activations: elu
output: ONE
# Rollout memory
# https://skrl.readthedocs.io/en/latest/api/memories/random.html
memory:
class: RandomMemory
memory_size: -1 # automatically determined (same as agent:rollouts)
# PPO agent configuration (field names are from PPO_DEFAULT_CONFIG)
# https://skrl.readthedocs.io/en/latest/api/agents/ppo.html
agent:
class: PPO
rollouts: 64
learning_epochs: 4
mini_batches: 32
discount_factor: 0.99
lambda: 0.95
learning_rate: 1.0e-04
learning_rate_scheduler: KLAdaptiveLR
learning_rate_scheduler_kwargs:
kl_threshold: 0.008
state_preprocessor: null
state_preprocessor_kwargs: null
value_preprocessor: RunningStandardScaler
value_preprocessor_kwargs: null
random_timesteps: 0
learning_starts: 0
grad_norm_clip: 1.0
ratio_clip: 0.2
value_clip: 0.2
clip_predicted_values: True
entropy_loss_scale: 0.0
value_loss_scale: 1.0
kl_threshold: 0.0
rewards_shaper_scale: 1.0
time_limit_bootstrap: False
mixed_precision: False
# logging and checkpoint
experiment:
directory: "cartpole_camera_direct_tuple_multidiscrete"
experiment_name: ""
write_interval: auto
checkpoint_interval: auto
# Sequential trainer
# https://skrl.readthedocs.io/en/latest/api/trainers/sequential.html
trainer:
class: SequentialTrainer
timesteps: 32000
environment_info: log
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
import gymnasium as gym
import torch
from isaaclab_tasks.direct.cartpole.cartpole_camera_env import CartpoleCameraEnv, CartpoleRGBCameraEnvCfg
class CartpoleCameraShowcaseEnv(CartpoleCameraEnv):
cfg: CartpoleRGBCameraEnvCfg
def _pre_physics_step(self, actions: torch.Tensor) -> None:
self.actions = actions.clone()
def _apply_action(self) -> None:
# fundamental spaces
# - Box
if isinstance(self.single_action_space, gym.spaces.Box):
target = self.cfg.action_scale * self.actions
# - Discrete
elif isinstance(self.single_action_space, gym.spaces.Discrete):
target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
target = torch.where(self.actions == 1, -self.cfg.action_scale, target)
target = torch.where(self.actions == 2, self.cfg.action_scale, target)
# - MultiDiscrete
elif isinstance(self.single_action_space, gym.spaces.MultiDiscrete):
# value
target = torch.zeros((self.num_envs, 1), dtype=torch.float32, device=self.device)
target = torch.where(self.actions[:, [0]] == 1, self.cfg.action_scale / 2.0, target)
target = torch.where(self.actions[:, [0]] == 2, self.cfg.action_scale, target)
# direction
target = torch.where(self.actions[:, [1]] == 0, -target, target)
else:
raise NotImplementedError(f"Action space {type(self.single_action_space)} not implemented")
# set target
self._cartpole.set_joint_effort_target(target, joint_ids=self._cart_dof_idx)
def _get_observations(self) -> dict:
# get camera data
data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "depth"
if "rgb" in self.cfg.tiled_camera.data_types:
camera_data = self._tiled_camera.data.output[data_type] / 255.0
# normalize the camera data for better training results
mean_tensor = torch.mean(camera_data, dim=(1, 2), keepdim=True)
camera_data -= mean_tensor
elif "depth" in self.cfg.tiled_camera.data_types:
camera_data = self._tiled_camera.data.output[data_type]
camera_data[camera_data == float("inf")] = 0
# fundamental spaces
# - Box
if isinstance(self.single_observation_space["policy"], gym.spaces.Box):
obs = camera_data
# composite spaces
# - Tuple
elif isinstance(self.single_observation_space["policy"], gym.spaces.Tuple):
obs = (camera_data, self.joint_vel)
# - Dict
elif isinstance(self.single_observation_space["policy"], gym.spaces.Dict):
obs = {"joint-velocities": self.joint_vel, "camera": camera_data}
else:
raise NotImplementedError(
f"Observation space {type(self.single_observation_space['policy'])} not implemented"
)
observations = {"policy": obs}
return observations
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment