Disallows string value written in sb3_ppo_cfg.yaml get evaluated in process_sb3_cfg (#3110)

# Description This PR adds stricter interpretation rules to value specified in sb3_ppo_cfg.yaml, disallowing eval on any dict, which my contain arbitrary code that makes program vulnerable. Now, `eval` is got rid of, only str that start with `nn.` can be used to only import the module from torch.nn. That seems to cover all usage for lab so far, I can make more accommodations if there are more cases but it seems like it is currently sufficient. Fixes # (issue)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: ooctipus <zhengyuz@nvidia.com> Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>

Disallows string value written in sb3_ppo_cfg.yaml get evaluated in process_sb3_cfg (#3110)
# Description This PR adds stricter interpretation rules to value specified in sb3_ppo_cfg.yaml, disallowing eval on any dict, which my contain arbitrary code that makes program vulnerable. Now, `eval` is got rid of, only str that start with `nn.` can be used to only import the module from torch.nn. That seems to cover all usage for lab so far, I can make more accommodations if there are more cases but it seems like it is currently sufficient. Fixes # (issue)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Signed-off-by: ooctipus <zhengyuz@nvidia.com> Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
81618f21 · ooctipus · GitHub · aec72bdc · 81618f21 · 81618f21
Unverified Commit 81618f21 authored Aug 21, 2025 by ooctipus Committed by GitHub Aug 21, 2025
9 changed files
--- a/source/isaaclab_rl/config/extension.toml
+++ b/source/isaaclab_rl/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.2.3"
+version = "0.2.4"

 # Description
 title = "Isaac Lab RL"

--- a/source/isaaclab_rl/docs/CHANGELOG.rst
+++ b/source/isaaclab_rl/docs/CHANGELOG.rst
 Changelog
 ---------

+0.2.4 (2025-08-07)
+~~~~~~~~~~~~~~~~~~
+
+Fixed
+^^^^^
+
+* Disallowed string values in ``sb3_ppo_cfg.yaml`` from being passed to ``eval()`` in
+  :meth:`~isaaclab_rl.sb3.process_sb3_cfg`. This change prevents accidental or malicious
+  code execution when loading configuration files, improving overall security and reliability.
+
+
 0.2.3 (2025-06-29)
 ~~~~~~~~~~~~~~~~~~


--- a/source/isaaclab_rl/isaaclab_rl/sb3.py
+++ b/source/isaaclab_rl/isaaclab_rl/sb3.py
@@ -53,14 +53,15 @@ def process_sb3_cfg(cfg: dict, num_envs: int) -> dict:
        https://github.com/DLR-RM/rl-baselines3-zoo/blob/0e5eb145faefa33e7d79c7f8c179788574b20da5/utils/exp_manager.py#L358
    """

-    def update_dict(hyperparams: dict[str, Any]) -> dict[str, Any]:
+    def update_dict(hyperparams: dict[str, Any], depth: int) -> dict[str, Any]:
        for key, value in hyperparams.items():
            if isinstance(value, dict):
-                update_dict(value)
-            else:
-                if key in ["policy_kwargs", "replay_buffer_class", "replay_buffer_kwargs"]:
-                    hyperparams[key] = eval(value)
-                elif key in ["learning_rate", "clip_range", "clip_range_vf"]:
+                update_dict(value, depth + 1)
+            if isinstance(value, str):
+                if value.startswith("nn."):
+                    hyperparams[key] = getattr(nn, value[3:])
+            if depth == 0:
+                if key in ["learning_rate", "clip_range", "clip_range_vf"]:
                    if isinstance(value, str):
                        _, initial_value = value.split("_")
                        initial_value = float(initial_value)
@@ -81,7 +82,7 @@ def process_sb3_cfg(cfg: dict, num_envs: int) -> dict:
        return hyperparams

    # parse agent configuration and convert to classes
-    return update_dict(cfg)
+    return update_dict(cfg, depth=0)


 """

--- a/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/direct/cartpole/agents/sb3_ppo_cfg.yaml
@@ -16,11 +16,10 @@ n_epochs: 20
 ent_coef: 0.01
 learning_rate: !!float 3e-4
 clip_range: !!float 0.2
-policy_kwargs: "dict(
-                  activation_fn=nn.ELU,
-                  net_arch=[32, 32],
-                  squash_output=False,
-                )"
+policy_kwargs:
+  activation_fn: 'nn.ELU'
+  net_arch: [32, 32]
+  squash_output: False
 vf_coef: 1.0
 max_grad_norm: 1.0
 device: "cuda:0"
--- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/ant/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/ant/agents/sb3_ppo_cfg.yaml
@@ -21,9 +21,10 @@ learning_rate: !!float 3e-5
 use_sde: True
 clip_range: 0.4
 device: "cuda:0"
-policy_kwargs: "dict(
-                  log_std_init=-1,
-                  ortho_init=False,
-                  activation_fn=nn.ReLU,
-                  net_arch=dict(pi=[256, 256], vf=[256, 256])
-                )"
+policy_kwargs:
+  log_std_init: -1
+  ortho_init: False
+  activation_fn: 'nn.ReLU'
+  net_arch:
+    pi: [256, 256]
+    vf: [256, 256]
--- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/cartpole/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/cartpole/agents/sb3_ppo_cfg.yaml
@@ -16,11 +16,10 @@ n_epochs: 20
 ent_coef: 0.01
 learning_rate: !!float 3e-4
 clip_range: !!float 0.2
-policy_kwargs: "dict(
-                  activation_fn=nn.ELU,
-                  net_arch=[32, 32],
-                  squash_output=False,
-                )"
+policy_kwargs:
+  activation_fn: 'nn.ELU'
+  net_arch: [32, 32]
+  squash_output: False
 vf_coef: 1.0
 max_grad_norm: 1.0
 device: "cuda:0"
--- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/classic/humanoid/agents/sb3_ppo_cfg.yaml
@@ -19,9 +19,9 @@ n_epochs: 5
 gae_lambda: 0.95
 max_grad_norm: 1.0
 vf_coef: 0.5
-policy_kwargs: "dict(
-  activation_fn=nn.ELU,
-  net_arch=[400, 200, 100],
-  optimizer_kwargs=dict(eps=1e-8),
-  ortho_init=False,
-  )"
+policy_kwargs:
+  activation_fn: 'nn.ELU'
+  net_arch: [400, 200, 100]
+  optimizer_kwargs:
+    eps: !!float 1e-8
+  ortho_init: False
--- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/a1/agents/sb3_ppo_cfg.yaml
@@ -15,12 +15,12 @@ n_epochs: 5
 ent_coef: 0.005
 learning_rate: !!float 1e-3
 clip_range: !!float 0.2
-policy_kwargs: "dict(
-                  activation_fn=nn.ELU,
-                  net_arch=[512, 256, 128],
-                  optimizer_kwargs=dict(eps=1e-8),
-                  ortho_init=False,
-                )"
+policy_kwargs:
+  activation_fn: 'nn.ELU'
+  net_arch: [512, 256, 128]
+  optimizer_kwargs:
+    eps: !!float 1e-8
+  ortho_init: False
 vf_coef: 1.0
 max_grad_norm: 1.0
 normalize_input: True

--- a/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/sb3_ppo_cfg.yaml
+++ b/source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/lift/config/franka/agents/sb3_ppo_cfg.yaml
@@ -19,10 +19,11 @@ ent_coef: 0.00
 vf_coef: 0.0001
 learning_rate: !!float 3e-4
 clip_range: 0.2
-policy_kwargs: "dict(
-                  activation_fn=nn.ELU,
-                  net_arch=dict(pi=[256, 128, 64], vf=[256, 128, 64])
-                )"
+policy_kwargs:
+  activation_fn: 'nn.ELU'
+  net_arch:
+    pi: [256, 128, 64]
+    vf: [256, 128, 64]
 target_kl: 0.01
 max_grad_norm: 1.0