Unverified Commit 81618f21 authored by ooctipus's avatar ooctipus Committed by GitHub

Disallows string value written in sb3_ppo_cfg.yaml get evaluated in process_sb3_cfg (#3110)

# Description

This PR adds stricter interpretation rules to value specified in
sb3_ppo_cfg.yaml, disallowing eval on any dict, which my contain
arbitrary code that makes program vulnerable.

Now, `eval` is got rid of, only str that start with `nn.` can be used to
only import the module from torch.nn. That seems to cover all usage for
lab so far, I can make more accommodations if there are more cases but
it seems like it is currently sufficient.

Fixes # (issue)

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Bug fix (non-breaking change which fixes an issue)

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------
Signed-off-by: 's avatarooctipus <zhengyuz@nvidia.com>
Co-authored-by: 's avatarMayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Co-authored-by: 's avatarKelly Guo <kellyg@nvidia.com>
parent aec72bdc
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.2.3"
version = "0.2.4"
# Description
title = "Isaac Lab RL"
......
Changelog
---------
0.2.4 (2025-08-07)
~~~~~~~~~~~~~~~~~~
Fixed
^^^^^
* Disallowed string values in ``sb3_ppo_cfg.yaml`` from being passed to ``eval()`` in
:meth:`~isaaclab_rl.sb3.process_sb3_cfg`. This change prevents accidental or malicious
code execution when loading configuration files, improving overall security and reliability.
0.2.3 (2025-06-29)
~~~~~~~~~~~~~~~~~~
......
......@@ -53,14 +53,15 @@ def process_sb3_cfg(cfg: dict, num_envs: int) -> dict:
https://github.com/DLR-RM/rl-baselines3-zoo/blob/0e5eb145faefa33e7d79c7f8c179788574b20da5/utils/exp_manager.py#L358
"""
def update_dict(hyperparams: dict[str, Any]) -> dict[str, Any]:
def update_dict(hyperparams: dict[str, Any], depth: int) -> dict[str, Any]:
for key, value in hyperparams.items():
if isinstance(value, dict):
update_dict(value)
else:
if key in ["policy_kwargs", "replay_buffer_class", "replay_buffer_kwargs"]:
hyperparams[key] = eval(value)
elif key in ["learning_rate", "clip_range", "clip_range_vf"]:
update_dict(value, depth + 1)
if isinstance(value, str):
if value.startswith("nn."):
hyperparams[key] = getattr(nn, value[3:])
if depth == 0:
if key in ["learning_rate", "clip_range", "clip_range_vf"]:
if isinstance(value, str):
_, initial_value = value.split("_")
initial_value = float(initial_value)
......@@ -81,7 +82,7 @@ def process_sb3_cfg(cfg: dict, num_envs: int) -> dict:
return hyperparams
# parse agent configuration and convert to classes
return update_dict(cfg)
return update_dict(cfg, depth=0)
"""
......
......@@ -16,11 +16,10 @@ n_epochs: 20
ent_coef: 0.01
learning_rate: !!float 3e-4
clip_range: !!float 0.2
policy_kwargs: "dict(
activation_fn=nn.ELU,
net_arch=[32, 32],
squash_output=False,
)"
policy_kwargs:
activation_fn: 'nn.ELU'
net_arch: [32, 32]
squash_output: False
vf_coef: 1.0
max_grad_norm: 1.0
device: "cuda:0"
......@@ -21,9 +21,10 @@ learning_rate: !!float 3e-5
use_sde: True
clip_range: 0.4
device: "cuda:0"
policy_kwargs: "dict(
log_std_init=-1,
ortho_init=False,
activation_fn=nn.ReLU,
net_arch=dict(pi=[256, 256], vf=[256, 256])
)"
policy_kwargs:
log_std_init: -1
ortho_init: False
activation_fn: 'nn.ReLU'
net_arch:
pi: [256, 256]
vf: [256, 256]
......@@ -16,11 +16,10 @@ n_epochs: 20
ent_coef: 0.01
learning_rate: !!float 3e-4
clip_range: !!float 0.2
policy_kwargs: "dict(
activation_fn=nn.ELU,
net_arch=[32, 32],
squash_output=False,
)"
policy_kwargs:
activation_fn: 'nn.ELU'
net_arch: [32, 32]
squash_output: False
vf_coef: 1.0
max_grad_norm: 1.0
device: "cuda:0"
......@@ -19,9 +19,9 @@ n_epochs: 5
gae_lambda: 0.95
max_grad_norm: 1.0
vf_coef: 0.5
policy_kwargs: "dict(
activation_fn=nn.ELU,
net_arch=[400, 200, 100],
optimizer_kwargs=dict(eps=1e-8),
ortho_init=False,
)"
policy_kwargs:
activation_fn: 'nn.ELU'
net_arch: [400, 200, 100]
optimizer_kwargs:
eps: !!float 1e-8
ortho_init: False
......@@ -15,12 +15,12 @@ n_epochs: 5
ent_coef: 0.005
learning_rate: !!float 1e-3
clip_range: !!float 0.2
policy_kwargs: "dict(
activation_fn=nn.ELU,
net_arch=[512, 256, 128],
optimizer_kwargs=dict(eps=1e-8),
ortho_init=False,
)"
policy_kwargs:
activation_fn: 'nn.ELU'
net_arch: [512, 256, 128]
optimizer_kwargs:
eps: !!float 1e-8
ortho_init: False
vf_coef: 1.0
max_grad_norm: 1.0
normalize_input: True
......
......@@ -19,10 +19,11 @@ ent_coef: 0.00
vf_coef: 0.0001
learning_rate: !!float 3e-4
clip_range: 0.2
policy_kwargs: "dict(
activation_fn=nn.ELU,
net_arch=dict(pi=[256, 128, 64], vf=[256, 128, 64])
)"
policy_kwargs:
activation_fn: 'nn.ELU'
net_arch:
pi: [256, 128, 64]
vf: [256, 128, 64]
target_kl: 0.01
max_grad_norm: 1.0
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment