Unverified Commit 4737968a authored by matthewtrepte's avatar matthewtrepte Committed by GitHub

Adds training benchmark unit tests with input config (#2503)

# Description

<!--
Thank you for your interest in sending a pull request. Please make sure
to check the contribution guidelines.

Link:
https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html
-->

Add unit test for training and evaluating environments across all
workflows using an input config.

The config determines which environments to train, how long to train,
and training KPI thresholds.

Every training + evaluation is one pytest, which fails if any of the
thresholds aren't met.

A KPI json can be created to summarize the trainings. This json also
functions as a KPI payload which will be uploaded to a Grafana Dashboard
every week using OSMO CI, using the full mode in the config. Can be used
to track improvements and regressions.

Dashboard -
[LINK](https://grafana.nvidia.com/d/cejwy2w46gtmob/24b94e54-6b8b-50a4-969a-121dc573a34c?orgId=105)

<!-- As a practice, it is recommended to open an issue to have
discussions on the proposed pull request.
This makes it easier for the community to keep track of what is being
developed or added, and if a given feature
is demanded by more than one party. -->

## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- This change requires a documentation update

## Screenshots

Please attach before and after screenshots of the change if applicable.

<!--
Example:

| Before | After |
| ------ | ----- |
| _gif/png before_ | _gif/png after_ |

To upload images to a PR -- simply drag and drop an image while in edit
mode and it should upload the image directly. You can then paste that
source into the above before/after sections.
-->

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------
Co-authored-by: 's avatarKelly Guo <kellyg@nvidia.com>
Co-authored-by: 's avatarKelly Guo <kellyguo123@hotmail.com>
parent 17f12fdf
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.10.31"
version = "0.10.32"
# Description
title = "Isaac Lab Environments"
......
Changelog
---------
0.10.32 (2025-05-21)
~~~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added unit tests for benchmarking environments with configurable settings. Output KPI payloads
can be pushed to a visualization dashboard to track improvements or regressions.
0.10.31 (2025-04-02)
~~~~~~~~~~~~~~~~~~~~
......
# mode for very simple functional testing without checking thresholds
fast_test:
rl_games:Isaac-Ant-v0:
max_iterations: 10
lower_thresholds:
reward: -99999
episode_length: -99999
upper_thresholds:
duration: 99999
# mode for capturing KPIs across all environments without checking thresholds
full_test:
Isaac-*:
max_iterations: 500
lower_thresholds:
reward: -99999
episode_length: -99999
upper_thresholds:
duration: 99999
# mode for PR tests (default mode)
fast:
rl_games:Isaac-Ant-v0:
max_iterations: 200
lower_thresholds:
reward: 45
episode_length: 900
upper_thresholds:
duration: 500
skrl:Isaac-Cartpole-RGB-Camera-Direct-v0:
max_iterations: 50
lower_thresholds:
reward: 190
episode_length: 230
upper_thresholds:
duration: 450
rsl_rl:Isaac-Humanoid-v0:
max_iterations: 200
lower_thresholds:
reward: 50
episode_length: 750
upper_thresholds:
duration: 500
rl_games:Isaac-Quadcopter-Direct-v0:
max_iterations: 200
lower_thresholds:
reward: 100
episode_length: 400
upper_thresholds:
duration: 250
skrl:Isaac-Shadow-Hand-Over-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 30
episode_length: 250
upper_thresholds:
duration: 600
rsl_rl:Isaac-Velocity-Rough-Anymal-C-v0:
max_iterations: 300
lower_thresholds:
reward: 7
episode_length: 900
upper_thresholds:
duration: 1800
rl_games:Isaac-Repose-Cube-Allegro-Direct-v0:
max_iterations: 500
lower_thresholds:
reward: 200
episode_length: 150
upper_thresholds:
duration: 1500
# mode for weekly CI
full:
Isaac-Ant-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 7000
episode_length: 700
upper_thresholds:
duration: 500
Isaac-Ant-v0:
max_iterations: 1000
lower_thresholds:
reward: 100
episode_length: 700
upper_thresholds:
duration: 800
Isaac-Cart-Double-Pendulum-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 400
episode_length: 150
upper_thresholds:
duration: 500
Isaac-Cartpole-Depth-Camera-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 200
episode_length: 150
upper_thresholds:
duration: 3000
Isaac-Cartpole-Depth-v0:
max_iterations: 300
lower_thresholds:
reward: 1
episode_length: 150
upper_thresholds:
duration: 3000
Isaac-Cartpole-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 200
episode_length: 150
upper_thresholds:
duration: 500
Isaac-Cartpole-RGB-Camera-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 200
episode_length: 150
upper_thresholds:
duration: 3000
Isaac-Cartpole-RGB-ResNet18-v0:
max_iterations: 300
lower_thresholds:
reward: 1
episode_length: 100
upper_thresholds:
duration: 4000
Isaac-Cartpole-RGB-TheiaTiny-v0:
max_iterations: 300
lower_thresholds:
reward: 1
episode_length: 150
upper_thresholds:
duration: 4000
Isaac-Cartpole-RGB-v0:
max_iterations: 300
lower_thresholds:
reward: -2
episode_length: 150
upper_thresholds:
duration: 4000
Isaac-Cartpole-v0:
max_iterations: 1000
lower_thresholds:
reward: 3
episode_length: 150
upper_thresholds:
duration: 1500
Isaac-Factory-GearMesh-Direct-v0:
max_iterations: 100
lower_thresholds:
reward: 200
episode_length: 250
upper_thresholds:
duration: 6000
Isaac-Factory-NutThread-Direct-v0:
max_iterations: 100
lower_thresholds:
reward: 400
episode_length: 400
upper_thresholds:
duration: 5000
Isaac-Factory-PegInsert-Direct-v0:
max_iterations: 100
lower_thresholds:
reward: 125
episode_length: 130
upper_thresholds:
duration: 4000
Isaac-Franka-Cabinet-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 2000
episode_length: 400
upper_thresholds:
duration: 1000
Isaac-Humanoid-Direct-v0:
max_iterations: 300
lower_thresholds:
reward: 2000
episode_length: 600
upper_thresholds:
duration: 1000
Isaac-Humanoid-v0:
max_iterations: 1000
lower_thresholds:
reward: 100
episode_length: 600
upper_thresholds:
duration: 2500
Isaac-Lift-Cube-Franka-v0:
max_iterations: 300
lower_thresholds:
reward: 90
episode_length: 100
upper_thresholds:
duration: 1000
Isaac-Navigation-Flat-Anymal-C-v0:
max_iterations: 300
lower_thresholds:
reward: 0.5
episode_length: 20
upper_thresholds:
duration: 2000
Isaac-Open-Drawer-Franka-v0:
max_iterations: 200
lower_thresholds:
reward: 60
episode_length: 150
upper_thresholds:
duration: 3000
Isaac-Quadcopter-Direct-v0:
max_iterations: 500
lower_thresholds:
reward: 90
episode_length: 300
upper_thresholds:
duration: 500
Isaac-Reach-Franka-*:
max_iterations: 1000
lower_thresholds:
reward: 0.25
episode_length: 150
upper_thresholds:
duration: 1500
Isaac-Reach-UR10-v0:
max_iterations: 1000
lower_thresholds:
reward: 0.25
episode_length: 150
upper_thresholds:
duration: 1500
Isaac-Repose-Cube-Allegro-Direct-v0:
max_iterations: 500
lower_thresholds:
reward: 200
episode_length: 150
upper_thresholds:
duration: 1500
Isaac-Repose-Cube-Allegro-*:
max_iterations: 500
lower_thresholds:
reward: 15
episode_length: 300
upper_thresholds:
duration: 1500
Isaac-Repose-Cube-Shadow-Direct-v0:
max_iterations: 3000
lower_thresholds:
reward: 1000
episode_length: 300
upper_thresholds:
duration: 10000
Isaac-Repose-Cube-Shadow-OpenAI-FF-Direct-v0:
max_iterations: 3000
lower_thresholds:
reward: 1000
episode_length: 50
upper_thresholds:
duration: 15000
Isaac-Repose-Cube-Shadow-OpenAI-LSTM-Direct-v0:
max_iterations: 3000
lower_thresholds:
reward: 1000
episode_length: 100
upper_thresholds:
duration: 30000
Isaac-Repose-Cube-Shadow-Vision-Direct-v0:
max_iterations: 3000
lower_thresholds:
reward: 1000
episode_length: 400
upper_thresholds:
duration: 40000
Isaac-Shadow-Hand-Over-Direct-v0:
max_iterations: 3000
lower_thresholds:
reward: 1000
episode_length: 150
upper_thresholds:
duration: 10000
Isaac-Velocity-Flat-*:
max_iterations: 1000
lower_thresholds:
reward: 15
episode_length: 700
upper_thresholds:
duration: 3000
Isaac-Velocity-Flat-Spot-v0:
max_iterations: 1000
lower_thresholds:
reward: 150
episode_length: 700
upper_thresholds:
duration: 6000
Isaac-Velocity-Rough-*:
max_iterations: 1000
lower_thresholds:
reward: 7
episode_length: 700
upper_thresholds:
duration: 6000
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
import json
import pytest
import test_utils as utils
# Global variable for storing KPI data
GLOBAL_KPI_STORE = {}
def pytest_addoption(parser):
parser.addoption(
"--workflows",
action="store",
nargs="+",
default=["rl_games", "rsl_rl", "sb3", "skrl"],
help="List of workflows. Must be equal to or a subset of the default list.",
)
parser.addoption(
"--config_path",
action="store",
default="configs.yaml",
help="Path to config file for environment training and evaluation.",
)
parser.addoption(
"--mode",
action="store",
default="fast",
help="Coverage mode defined in the config file.",
)
parser.addoption("--num_gpus", action="store", type=int, default=1, help="Number of GPUs for distributed training.")
parser.addoption(
"--save_kpi_payload",
action="store_true",
help="To collect output metrics into a KPI payload that can be uploaded to a dashboard.",
)
parser.addoption(
"--tag",
action="store",
default="",
help="Optional tag to add to the KPI payload for filtering on the Grafana dashboard.",
)
@pytest.fixture
def workflows(request):
return request.config.getoption("--workflows")
@pytest.fixture
def config_path(request):
return request.config.getoption("--config_path")
@pytest.fixture
def mode(request):
return request.config.getoption("--mode")
@pytest.fixture
def num_gpus(request):
return request.config.getoption("--num_gpus")
@pytest.fixture
def save_kpi_payload(request):
return request.config.getoption("--save_kpi_payload")
@pytest.fixture
def tag(request):
return request.config.getoption("--tag")
# Fixture for storing KPI data in a global variable
@pytest.fixture(scope="session")
def kpi_store():
return GLOBAL_KPI_STORE # Using global variable for storing KPI data
# This hook dynamically generates test cases based on the --workflows option.
# For any test that includes a 'workflow' fixture, this will parametrize it
# with all values passed via the command line option --workflows.
def pytest_generate_tests(metafunc):
if "workflow" in metafunc.fixturenames:
workflows = metafunc.config.getoption("workflows")
metafunc.parametrize("workflow", workflows)
# The pytest session finish hook
def pytest_sessionfinish(session, exitstatus):
# Access global variable instead of fixture
tag = session.config.getoption("--tag")
utils.process_kpi_data(GLOBAL_KPI_STORE, tag=tag)
print(json.dumps(GLOBAL_KPI_STORE, indent=2))
save_kpi_payload = session.config.getoption("--save_kpi_payload")
if save_kpi_payload:
print("Saving KPI data...")
utils.output_payloads(GLOBAL_KPI_STORE)
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Launch Isaac Sim Simulator first."""
from isaaclab.app import AppLauncher
# Launch omniverse app
app_launcher = AppLauncher(headless=True, enable_cameras=True)
simulation_app = app_launcher.app
import gymnasium as gym
import os
import subprocess
import sys
import time
import carb
import pytest
import test_utils as utils
from isaaclab.utils.pretrained_checkpoint import WORKFLOW_EXPERIMENT_NAME_VARIABLE, WORKFLOW_TRAINER
def setup_environment():
"""Setup environment for testing."""
# Acquire all Isaac environments names
registered_task_specs = []
for task_spec in gym.registry.values():
if "Isaac" in task_spec.id and not task_spec.id.endswith("Play-v0"):
registered_task_specs.append(task_spec)
# Sort environments by name
registered_task_specs.sort(key=lambda x: x.id)
# This flag is necessary to prevent a bug where the simulation gets stuck randomly when running the
# test on many environments.
carb_settings_iface = carb.settings.get_settings()
carb_settings_iface.set_bool("/physics/cooking/ujitsoCollisionCooking", False)
return registered_task_specs
def train_job(workflow, task, env_config, num_gpus):
"""Train a single job for a given workflow, task, and configuration, and return the duration."""
cmd = [
sys.executable,
WORKFLOW_TRAINER[workflow],
"--task",
task,
"--enable_cameras",
"--headless",
]
# Add max iterations if specified
max_iterations = env_config.get("max_iterations")
if max_iterations is not None:
cmd.extend(["--max_iterations", str(max_iterations)])
if num_gpus > 1:
cmd.append(f"--nnprod_per_node={num_gpus}")
cmd.append("--distributed")
# Add experiment name variable
cmd.append(f"{WORKFLOW_EXPERIMENT_NAME_VARIABLE[workflow]}={task}")
print("Running : " + " ".join(cmd))
start_time = time.time()
subprocess.run(cmd)
duration = time.time() - start_time
return duration
@pytest.mark.parametrize("task_spec", setup_environment())
def test_train_environments(workflow, task_spec, config_path, mode, num_gpus, kpi_store):
"""Train environments provided in the config file, save KPIs, and evaluate against thresholds"""
# Skip if workflow not supported for this task
if workflow + "_cfg_entry_point" not in task_spec.kwargs:
pytest.skip(f"Workflow {workflow} not supported for task {task_spec.id}")
# Load environment config
task = task_spec.id
if config_path.startswith("/"):
full_config_path = config_path
else:
full_config_path = os.path.join(os.path.dirname(__file__), config_path)
env_configs = utils.get_env_configs(full_config_path)
env_config = utils.get_env_config(env_configs, mode, workflow, task)
# Skip if config not found
if not env_config:
pytest.skip(f"No config found for task {task} in {mode} mode")
job_name = f"{workflow}:{task}"
print(f">>> Training: {job_name}")
# Train and capture duration
duration = train_job(workflow, task, env_config, num_gpus)
print(f">>> Evaluating trained: {job_name}")
# Check if training logs were output and all thresholds passed
kpi_payload = utils.evaluate_job(workflow, task, env_config, duration)
success_flag = kpi_payload["success"]
print(f">>> Trained {job_name} success flag: {success_flag}.")
print("-" * 80)
# Save KPI
kpi_store[job_name] = kpi_payload
# Verify job was successful
if not kpi_payload["success"]:
pytest.fail(f"Job {job_name} failed to meet success criteria")
# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
import glob
import json
import math
import numpy as np
import os
import re
import yaml
from datetime import datetime
import carb
from tensorboard.backend.event_processing import event_accumulator
def get_env_configs(configs_path):
"""Get environment configurations from yaml filepath."""
with open(configs_path) as env_configs_file:
env_configs = yaml.safe_load(env_configs_file)
return env_configs
def get_env_config(env_configs, mode, workflow, task):
"""Get the environment configuration."""
if mode not in env_configs:
raise ValueError(f"Mode {mode} is not supported in the config file.")
extended_task = f"{workflow}:{task}"
# return a direct match with extended task name
if extended_task in env_configs[mode]:
return env_configs[mode][extended_task]
# else, return a direct match with task name
if task in env_configs[mode]:
return env_configs[mode][task]
# else, return a regex match with extended task name
for env_config_key in env_configs[mode].keys():
if re.match(env_config_key, extended_task):
return env_configs[mode][env_config_key]
# else, return a regex match with task name
for env_config_key in env_configs[mode].keys():
if re.match(env_config_key, task):
return env_configs[mode][env_config_key]
# if no match is found, return None
return None
def evaluate_job(workflow, task, env_config, duration):
"""Evaluate the job."""
log_data = _retrieve_logs(workflow, task)
kpi_payload = {"success": True, "msg": ""}
# handle case where no log files are found
if not log_data:
kpi_payload["success"] = False
kpi_payload["msg"] = "error: training did not finish!"
return kpi_payload
thresholds = {**env_config.get("lower_thresholds", {}), **env_config.get("upper_thresholds", {})}
# evaluate all thresholds from the config
for threshold_name, threshold_val in thresholds.items():
uses_lower_threshold = threshold_name in env_config["lower_thresholds"]
if threshold_name == "duration":
val = duration
else:
val = _extract_log_val(threshold_name, log_data, uses_lower_threshold, workflow)
# skip non-numeric values
if val is None or not isinstance(val, (int, float)) or (isinstance(val, float) and math.isnan(val)):
continue
val = round(val, 4)
if uses_lower_threshold:
# print(f"{threshold_name}: {val} > {round(threshold_val, 4)}")
if val < threshold_val:
kpi_payload["success"] = False
else:
# print(f"{threshold_name}: {val} < {round(threshold_val, 4)}")
if val > threshold_val:
kpi_payload["success"] = False
kpi_payload[threshold_name] = val
if threshold_name == "reward":
normalized_reward = val / threshold_val
kpi_payload[f"{threshold_name}_normalized"] = normalized_reward
kpi_payload[f"{threshold_name}_threshold"] = threshold_val
# add max iterations to the payload
max_iterations = env_config.get("max_iterations")
if max_iterations is not None:
kpi_payload["max_iterations"] = max_iterations
return kpi_payload
def process_kpi_data(kpi_payloads, tag=""):
"""Combine and augment the KPI payloads."""
# accumulate workflow outcomes
totals = {}
successes = {}
failures_did_not_finish = {}
failures_did_not_pass_thresholds = {}
for job_id, kpi_payload in kpi_payloads.items():
workflow = job_id.split(":")[0]
if workflow not in totals:
totals[workflow] = 0
successes[workflow] = 0
failures_did_not_finish[workflow] = 0
failures_did_not_pass_thresholds[workflow] = 0
totals[workflow] += 1
if kpi_payload["success"]:
successes[workflow] += 1
else:
if kpi_payload["msg"] == "error: training did not finish!":
failures_did_not_finish[workflow] += 1
else:
failures_did_not_pass_thresholds[workflow] += 1
kpi_payloads["overall"] = {
"totals": totals,
"successes": successes,
"failures_did_not_finish": failures_did_not_finish,
"failures_did_not_pass_thresholds": failures_did_not_pass_thresholds,
"timestamp": datetime.now().isoformat(),
"tag": tag,
}
return kpi_payloads
def output_payloads(payloads):
"""Output the KPI payloads to a json file."""
# first grab all log files
repo_path = os.path.join(carb.tokens.get_tokens_interface().resolve("${app}"), "..")
output_path = os.path.join(repo_path, "logs/kpi.json")
# create directory if it doesn't exist
if not os.path.exists(os.path.dirname(output_path)):
os.makedirs(os.path.dirname(output_path))
# save file
with open(output_path, "w") as payload_file:
json.dump(payloads, payload_file, indent=4)
def _retrieve_logs(workflow, task):
"""Retrieve training logs."""
# first grab all log files
repo_path = os.path.join(carb.tokens.get_tokens_interface().resolve("${app}"), "..")
if workflow == "rl_games":
log_files_path = os.path.join(repo_path, f"logs/{workflow}/{task}/*/summaries/*")
else:
log_files_path = os.path.join(repo_path, f"logs/{workflow}/{task}/*/*.tfevents.*")
log_files = glob.glob(log_files_path)
# handle case where no log files are found
if not log_files:
return None
# find most recent
latest_log_file = max(log_files, key=os.path.getctime)
# parse tf file into a dictionary
log_data = _parse_tf_logs(latest_log_file)
return log_data
def _parse_tf_logs(log):
"""Parse the tensorflow filepath into a dictionary."""
log_data = {}
ea = event_accumulator.EventAccumulator(log)
ea.Reload()
tags = ea.Tags()["scalars"]
for tag in tags:
log_data[tag] = []
for event in ea.Scalars(tag):
log_data[tag].append((event.step, event.value))
return log_data
def _extract_log_val(name, log_data, uses_lower_threshold, workflow):
"""Extract the value from the log data."""
try:
if name == "reward":
reward_tags = {
"rl_games": "rewards/iter",
"rsl_rl": "Train/mean_reward",
"sb3": None, # TODO: complete when sb3 is fixed
"skrl": "Reward / Total reward (mean)",
}
tag = reward_tags.get(workflow)
if tag:
return _extract_reward(log_data, tag)
elif name == "episode_length":
episode_tags = {
"rl_games": "episode_lengths/iter",
"rsl_rl": "Train/mean_episode_length",
"sb3": None, # TODO: complete when sb3 is fixed
"skrl": "Episode / Total timesteps (mean)",
}
tag = episode_tags.get(workflow)
if tag:
return _extract_feature(log_data, tag, uses_lower_threshold)
elif name == "training_time":
return {"rl_games": log_data["rewards/time"][-1][0], "rsl_rl": None, "sb3": None, "skrl": None}.get(
workflow
)
except Exception:
return None
raise ValueError(f"Env Config name {name} is not supported.")
def _extract_feature(log_data, feature, uses_lower_threshold):
"""Extract the feature from the log data."""
log_data = np.array(log_data[feature])[:, 1]
if uses_lower_threshold:
return max(log_data)
else:
return min(log_data)
def _extract_reward(log_data, feature, k=8):
"""Extract the averaged max reward from the log data."""
log_data = np.array(log_data[feature])[:, 1]
# find avg of k max values
k = min(len(log_data), k)
averaged_reward = np.mean(np.partition(log_data, -k)[-k:])
return averaged_reward
......@@ -30,6 +30,7 @@ PER_TEST_TIMEOUTS = {
"test_skrl_wrapper.py": 200,
"test_operational_space.py": 300,
"test_terrain_importer.py": 200,
"test_environments_training.py": 5000,
}
"""A dictionary of tests and their timeouts in seconds.
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment