Adds training benchmark unit tests with input config (#2503)

# Description  Add unit test for training and evaluating environments across all workflows using an input config. The config determines which environments to train, how long to train, and training KPI thresholds. Every training + evaluation is one pytest, which fails if any of the thresholds aren't met. A KPI json can be created to summarize the trainings. This json also functions as a KPI payload which will be uploaded to a Grafana Dashboard every week using OSMO CI, using the full mode in the config. Can be used to track improvements and regressions. Dashboard - [LINK](https://grafana.nvidia.com/d/cejwy2w46gtmob/24b94e54-6b8b-50a4-969a-121dc573a34c?orgId=105)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyguo123@hotmail.com>

Adds training benchmark unit tests with input config (#2503)
# Description  Add unit test for training and evaluating environments across all workflows using an input config. The config determines which environments to train, how long to train, and training KPI thresholds. Every training + evaluation is one pytest, which fails if any of the thresholds aren't met. A KPI json can be created to summarize the trainings. This json also functions as a KPI payload which will be uploaded to a Grafana Dashboard every week using OSMO CI, using the full mode in the config. Can be used to track improvements and regressions. Dashboard - [LINK](https://grafana.nvidia.com/d/cejwy2w46gtmob/24b94e54-6b8b-50a4-969a-121dc573a34c?orgId=105)  ## Type of change  - Bug fix (non-breaking change which fixes an issue) - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update ## Screenshots Please attach before and after screenshots of the change if applicable.  ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyguo123@hotmail.com>
4737968a · matthewtrepte · GitHub · 17f12fdf · 4737968a · 4737968a
Unverified Commit 4737968a authored May 23, 2025 by matthewtrepte Committed by GitHub May 23, 2025
7 changed files
--- a/source/isaaclab_tasks/config/extension.toml
+++ b/source/isaaclab_tasks/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.10.31"
+version = "0.10.32"

 # Description
 title = "Isaac Lab Environments"

--- a/source/isaaclab_tasks/docs/CHANGELOG.rst
+++ b/source/isaaclab_tasks/docs/CHANGELOG.rst
 Changelog
 ---------

+0.10.32 (2025-05-21)
+~~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added unit tests for benchmarking environments with configurable settings. Output KPI payloads
+  can be pushed to a visualization dashboard to track improvements or regressions.
+
+
 0.10.31 (2025-04-02)
 ~~~~~~~~~~~~~~~~~~~~


--- a/source/isaaclab_tasks/test/benchmarking/configs.yaml
+++ b/source/isaaclab_tasks/test/benchmarking/configs.yaml
+# mode for very simple functional testing without checking thresholds
+fast_test:
+  rl_games:Isaac-Ant-v0:
+    max_iterations: 10
+    lower_thresholds:
+      reward: -99999
+      episode_length: -99999
+    upper_thresholds:
+      duration: 99999
+
+# mode for capturing KPIs across all environments without checking thresholds
+full_test:
+  Isaac-*:
+    max_iterations: 500
+    lower_thresholds:
+      reward: -99999
+      episode_length: -99999
+    upper_thresholds:
+      duration: 99999
+
+# mode for PR tests (default mode)
+fast:
+  rl_games:Isaac-Ant-v0:
+    max_iterations: 200
+    lower_thresholds:
+      reward: 45
+      episode_length: 900
+    upper_thresholds:
+      duration: 500
+  skrl:Isaac-Cartpole-RGB-Camera-Direct-v0:
+    max_iterations: 50
+    lower_thresholds:
+      reward: 190
+      episode_length: 230
+    upper_thresholds:
+      duration: 450
+  rsl_rl:Isaac-Humanoid-v0:
+    max_iterations: 200
+    lower_thresholds:
+      reward: 50
+      episode_length: 750
+    upper_thresholds:
+      duration: 500
+  rl_games:Isaac-Quadcopter-Direct-v0:
+    max_iterations: 200
+    lower_thresholds:
+      reward: 100
+      episode_length: 400
+    upper_thresholds:
+      duration: 250
+  skrl:Isaac-Shadow-Hand-Over-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 30
+      episode_length: 250
+    upper_thresholds:
+      duration: 600
+  rsl_rl:Isaac-Velocity-Rough-Anymal-C-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 7
+      episode_length: 900
+    upper_thresholds:
+      duration: 1800
+  rl_games:Isaac-Repose-Cube-Allegro-Direct-v0:
+    max_iterations: 500
+    lower_thresholds:
+      reward: 200
+      episode_length: 150
+    upper_thresholds:
+      duration: 1500
+
+
+# mode for weekly CI
+full:
+  Isaac-Ant-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 7000
+      episode_length: 700
+    upper_thresholds:
+      duration: 500
+  Isaac-Ant-v0:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 100
+      episode_length: 700
+    upper_thresholds:
+      duration: 800
+  Isaac-Cart-Double-Pendulum-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 400
+      episode_length: 150
+    upper_thresholds:
+      duration: 500
+  Isaac-Cartpole-Depth-Camera-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 200
+      episode_length: 150
+    upper_thresholds:
+      duration: 3000
+  Isaac-Cartpole-Depth-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 1
+      episode_length: 150
+    upper_thresholds:
+      duration: 3000
+  Isaac-Cartpole-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 200
+      episode_length: 150
+    upper_thresholds:
+      duration: 500
+  Isaac-Cartpole-RGB-Camera-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 200
+      episode_length: 150
+    upper_thresholds:
+      duration: 3000
+  Isaac-Cartpole-RGB-ResNet18-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 1
+      episode_length: 100
+    upper_thresholds:
+      duration: 4000
+  Isaac-Cartpole-RGB-TheiaTiny-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 1
+      episode_length: 150
+    upper_thresholds:
+      duration: 4000
+  Isaac-Cartpole-RGB-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: -2
+      episode_length: 150
+    upper_thresholds:
+      duration: 4000
+  Isaac-Cartpole-v0:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 3
+      episode_length: 150
+    upper_thresholds:
+      duration: 1500
+  Isaac-Factory-GearMesh-Direct-v0:
+    max_iterations: 100
+    lower_thresholds:
+      reward: 200
+      episode_length: 250
+    upper_thresholds:
+      duration: 6000
+  Isaac-Factory-NutThread-Direct-v0:
+    max_iterations: 100
+    lower_thresholds:
+      reward: 400
+      episode_length: 400
+    upper_thresholds:
+      duration: 5000
+  Isaac-Factory-PegInsert-Direct-v0:
+    max_iterations: 100
+    lower_thresholds:
+      reward: 125
+      episode_length: 130
+    upper_thresholds:
+      duration: 4000
+  Isaac-Franka-Cabinet-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 2000
+      episode_length: 400
+    upper_thresholds:
+      duration: 1000
+  Isaac-Humanoid-Direct-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 2000
+      episode_length: 600
+    upper_thresholds:
+      duration: 1000
+  Isaac-Humanoid-v0:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 100
+      episode_length: 600
+    upper_thresholds:
+      duration: 2500
+  Isaac-Lift-Cube-Franka-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 90
+      episode_length: 100
+    upper_thresholds:
+      duration: 1000
+  Isaac-Navigation-Flat-Anymal-C-v0:
+    max_iterations: 300
+    lower_thresholds:
+      reward: 0.5
+      episode_length: 20
+    upper_thresholds:
+      duration: 2000
+  Isaac-Open-Drawer-Franka-v0:
+    max_iterations: 200
+    lower_thresholds:
+      reward: 60
+      episode_length: 150
+    upper_thresholds:
+      duration: 3000
+  Isaac-Quadcopter-Direct-v0:
+    max_iterations: 500
+    lower_thresholds:
+      reward: 90
+      episode_length: 300
+    upper_thresholds:
+      duration: 500
+  Isaac-Reach-Franka-*:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 0.25
+      episode_length: 150
+    upper_thresholds:
+      duration: 1500
+  Isaac-Reach-UR10-v0:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 0.25
+      episode_length: 150
+    upper_thresholds:
+      duration: 1500
+  Isaac-Repose-Cube-Allegro-Direct-v0:
+    max_iterations: 500
+    lower_thresholds:
+      reward: 200
+      episode_length: 150
+    upper_thresholds:
+      duration: 1500
+  Isaac-Repose-Cube-Allegro-*:
+    max_iterations: 500
+    lower_thresholds:
+      reward: 15
+      episode_length: 300
+    upper_thresholds:
+      duration: 1500
+  Isaac-Repose-Cube-Shadow-Direct-v0:
+    max_iterations: 3000
+    lower_thresholds:
+      reward: 1000
+      episode_length: 300
+    upper_thresholds:
+      duration: 10000
+  Isaac-Repose-Cube-Shadow-OpenAI-FF-Direct-v0:
+    max_iterations: 3000
+    lower_thresholds:
+      reward: 1000
+      episode_length: 50
+    upper_thresholds:
+      duration: 15000
+  Isaac-Repose-Cube-Shadow-OpenAI-LSTM-Direct-v0:
+    max_iterations: 3000
+    lower_thresholds:
+      reward: 1000
+      episode_length: 100
+    upper_thresholds:
+      duration: 30000
+  Isaac-Repose-Cube-Shadow-Vision-Direct-v0:
+    max_iterations: 3000
+    lower_thresholds:
+      reward: 1000
+      episode_length: 400
+    upper_thresholds:
+      duration: 40000
+  Isaac-Shadow-Hand-Over-Direct-v0:
+    max_iterations: 3000
+    lower_thresholds:
+      reward: 1000
+      episode_length: 150
+    upper_thresholds:
+      duration: 10000
+  Isaac-Velocity-Flat-*:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 15
+      episode_length: 700
+    upper_thresholds:
+      duration: 3000
+  Isaac-Velocity-Flat-Spot-v0:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 150
+      episode_length: 700
+    upper_thresholds:
+      duration: 6000
+  Isaac-Velocity-Rough-*:
+    max_iterations: 1000
+    lower_thresholds:
+      reward: 7
+      episode_length: 700
+    upper_thresholds:
+      duration: 6000
--- a/source/isaaclab_tasks/test/benchmarking/conftest.py
+++ b/source/isaaclab_tasks/test/benchmarking/conftest.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+import json
+
+import pytest
+import test_utils as utils
+
+# Global variable for storing KPI data
+GLOBAL_KPI_STORE = {}
+
+
+def pytest_addoption(parser):
+    parser.addoption(
+        "--workflows",
+        action="store",
+        nargs="+",
+        default=["rl_games", "rsl_rl", "sb3", "skrl"],
+        help="List of workflows. Must be equal to or a subset of the default list.",
+    )
+    parser.addoption(
+        "--config_path",
+        action="store",
+        default="configs.yaml",
+        help="Path to config file for environment training and evaluation.",
+    )
+    parser.addoption(
+        "--mode",
+        action="store",
+        default="fast",
+        help="Coverage mode defined in the config file.",
+    )
+    parser.addoption("--num_gpus", action="store", type=int, default=1, help="Number of GPUs for distributed training.")
+    parser.addoption(
+        "--save_kpi_payload",
+        action="store_true",
+        help="To collect output metrics into a KPI payload that can be uploaded to a dashboard.",
+    )
+    parser.addoption(
+        "--tag",
+        action="store",
+        default="",
+        help="Optional tag to add to the KPI payload for filtering on the Grafana dashboard.",
+    )
+
+
+@pytest.fixture
+def workflows(request):
+    return request.config.getoption("--workflows")
+
+
+@pytest.fixture
+def config_path(request):
+    return request.config.getoption("--config_path")
+
+
+@pytest.fixture
+def mode(request):
+    return request.config.getoption("--mode")
+
+
+@pytest.fixture
+def num_gpus(request):
+    return request.config.getoption("--num_gpus")
+
+
+@pytest.fixture
+def save_kpi_payload(request):
+    return request.config.getoption("--save_kpi_payload")
+
+
+@pytest.fixture
+def tag(request):
+    return request.config.getoption("--tag")
+
+
+# Fixture for storing KPI data in a global variable
+@pytest.fixture(scope="session")
+def kpi_store():
+    return GLOBAL_KPI_STORE  # Using global variable for storing KPI data
+
+
+# This hook dynamically generates test cases based on the --workflows option.
+# For any test that includes a 'workflow' fixture, this will parametrize it
+# with all values passed via the command line option --workflows.
+def pytest_generate_tests(metafunc):
+    if "workflow" in metafunc.fixturenames:
+        workflows = metafunc.config.getoption("workflows")
+        metafunc.parametrize("workflow", workflows)
+
+
+# The pytest session finish hook
+def pytest_sessionfinish(session, exitstatus):
+    # Access global variable instead of fixture
+    tag = session.config.getoption("--tag")
+    utils.process_kpi_data(GLOBAL_KPI_STORE, tag=tag)
+    print(json.dumps(GLOBAL_KPI_STORE, indent=2))
+    save_kpi_payload = session.config.getoption("--save_kpi_payload")
+    if save_kpi_payload:
+        print("Saving KPI data...")
+        utils.output_payloads(GLOBAL_KPI_STORE)
--- a/source/isaaclab_tasks/test/benchmarking/test_environments_training.py
+++ b/source/isaaclab_tasks/test/benchmarking/test_environments_training.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+"""Launch Isaac Sim Simulator first."""
+
+from isaaclab.app import AppLauncher
+
+# Launch omniverse app
+app_launcher = AppLauncher(headless=True, enable_cameras=True)
+simulation_app = app_launcher.app
+
+import gymnasium as gym
+import os
+import subprocess
+import sys
+import time
+
+import carb
+import pytest
+import test_utils as utils
+
+from isaaclab.utils.pretrained_checkpoint import WORKFLOW_EXPERIMENT_NAME_VARIABLE, WORKFLOW_TRAINER
+
+
+def setup_environment():
+    """Setup environment for testing."""
+    # Acquire all Isaac environments names
+    registered_task_specs = []
+    for task_spec in gym.registry.values():
+        if "Isaac" in task_spec.id and not task_spec.id.endswith("Play-v0"):
+            registered_task_specs.append(task_spec)
+
+    # Sort environments by name
+    registered_task_specs.sort(key=lambda x: x.id)
+
+    # This flag is necessary to prevent a bug where the simulation gets stuck randomly when running the
+    # test on many environments.
+    carb_settings_iface = carb.settings.get_settings()
+    carb_settings_iface.set_bool("/physics/cooking/ujitsoCollisionCooking", False)
+
+    return registered_task_specs
+
+
+def train_job(workflow, task, env_config, num_gpus):
+    """Train a single job for a given workflow, task, and configuration, and return the duration."""
+    cmd = [
+        sys.executable,
+        WORKFLOW_TRAINER[workflow],
+        "--task",
+        task,
+        "--enable_cameras",
+        "--headless",
+    ]
+
+    # Add max iterations if specified
+    max_iterations = env_config.get("max_iterations")
+    if max_iterations is not None:
+        cmd.extend(["--max_iterations", str(max_iterations)])
+
+    if num_gpus > 1:
+        cmd.append(f"--nnprod_per_node={num_gpus}")
+        cmd.append("--distributed")
+
+    # Add experiment name variable
+    cmd.append(f"{WORKFLOW_EXPERIMENT_NAME_VARIABLE[workflow]}={task}")
+
+    print("Running : " + " ".join(cmd))
+
+    start_time = time.time()
+    subprocess.run(cmd)
+    duration = time.time() - start_time
+
+    return duration
+
+
+@pytest.mark.parametrize("task_spec", setup_environment())
+def test_train_environments(workflow, task_spec, config_path, mode, num_gpus, kpi_store):
+    """Train environments provided in the config file, save KPIs, and evaluate against thresholds"""
+    # Skip if workflow not supported for this task
+    if workflow + "_cfg_entry_point" not in task_spec.kwargs:
+        pytest.skip(f"Workflow {workflow} not supported for task {task_spec.id}")
+
+    # Load environment config
+    task = task_spec.id
+    if config_path.startswith("/"):
+        full_config_path = config_path
+    else:
+        full_config_path = os.path.join(os.path.dirname(__file__), config_path)
+    env_configs = utils.get_env_configs(full_config_path)
+    env_config = utils.get_env_config(env_configs, mode, workflow, task)
+
+    # Skip if config not found
+    if not env_config:
+        pytest.skip(f"No config found for task {task} in {mode} mode")
+
+    job_name = f"{workflow}:{task}"
+    print(f">>> Training: {job_name}")
+
+    # Train and capture duration
+    duration = train_job(workflow, task, env_config, num_gpus)
+
+    print(f">>> Evaluating trained: {job_name}")
+    # Check if training logs were output and all thresholds passed
+    kpi_payload = utils.evaluate_job(workflow, task, env_config, duration)
+
+    success_flag = kpi_payload["success"]
+    print(f">>> Trained {job_name} success flag: {success_flag}.")
+    print("-" * 80)
+
+    # Save KPI
+    kpi_store[job_name] = kpi_payload
+
+    # Verify job was successful
+    if not kpi_payload["success"]:
+        pytest.fail(f"Job {job_name} failed to meet success criteria")
--- a/source/isaaclab_tasks/test/benchmarking/test_utils.py
+++ b/source/isaaclab_tasks/test/benchmarking/test_utils.py
+# Copyright (c) 2022-2025, The Isaac Lab Project Developers.
+# All rights reserved.
+#
+# SPDX-License-Identifier: BSD-3-Clause
+
+import glob
+import json
+import math
+import numpy as np
+import os
+import re
+import yaml
+from datetime import datetime
+
+import carb
+from tensorboard.backend.event_processing import event_accumulator
+
+
+def get_env_configs(configs_path):
+    """Get environment configurations from yaml filepath."""
+    with open(configs_path) as env_configs_file:
+        env_configs = yaml.safe_load(env_configs_file)
+    return env_configs
+
+
+def get_env_config(env_configs, mode, workflow, task):
+    """Get the environment configuration."""
+    if mode not in env_configs:
+        raise ValueError(f"Mode {mode} is not supported in the config file.")
+
+    extended_task = f"{workflow}:{task}"
+    # return a direct match with extended task name
+    if extended_task in env_configs[mode]:
+        return env_configs[mode][extended_task]
+
+    # else, return a direct match with task name
+    if task in env_configs[mode]:
+        return env_configs[mode][task]
+
+    # else, return a regex match with extended task name
+    for env_config_key in env_configs[mode].keys():
+        if re.match(env_config_key, extended_task):
+            return env_configs[mode][env_config_key]
+
+    # else, return a regex match with task name
+    for env_config_key in env_configs[mode].keys():
+        if re.match(env_config_key, task):
+            return env_configs[mode][env_config_key]
+
+    # if no match is found, return None
+    return None
+
+
+def evaluate_job(workflow, task, env_config, duration):
+    """Evaluate the job."""
+    log_data = _retrieve_logs(workflow, task)
+
+    kpi_payload = {"success": True, "msg": ""}
+
+    # handle case where no log files are found
+    if not log_data:
+        kpi_payload["success"] = False
+        kpi_payload["msg"] = "error: training did not finish!"
+        return kpi_payload
+
+    thresholds = {**env_config.get("lower_thresholds", {}), **env_config.get("upper_thresholds", {})}
+
+    # evaluate all thresholds from the config
+    for threshold_name, threshold_val in thresholds.items():
+        uses_lower_threshold = threshold_name in env_config["lower_thresholds"]
+        if threshold_name == "duration":
+            val = duration
+        else:
+            val = _extract_log_val(threshold_name, log_data, uses_lower_threshold, workflow)
+        # skip non-numeric values
+        if val is None or not isinstance(val, (int, float)) or (isinstance(val, float) and math.isnan(val)):
+            continue
+        val = round(val, 4)
+        if uses_lower_threshold:
+            # print(f"{threshold_name}: {val} > {round(threshold_val, 4)}")
+            if val < threshold_val:
+                kpi_payload["success"] = False
+        else:
+            # print(f"{threshold_name}: {val} < {round(threshold_val, 4)}")
+            if val > threshold_val:
+                kpi_payload["success"] = False
+        kpi_payload[threshold_name] = val
+        if threshold_name == "reward":
+            normalized_reward = val / threshold_val
+            kpi_payload[f"{threshold_name}_normalized"] = normalized_reward
+        kpi_payload[f"{threshold_name}_threshold"] = threshold_val
+
+    # add max iterations to the payload
+    max_iterations = env_config.get("max_iterations")
+    if max_iterations is not None:
+        kpi_payload["max_iterations"] = max_iterations
+
+    return kpi_payload
+
+
+def process_kpi_data(kpi_payloads, tag=""):
+    """Combine and augment the KPI payloads."""
+    # accumulate workflow outcomes
+    totals = {}
+    successes = {}
+    failures_did_not_finish = {}
+    failures_did_not_pass_thresholds = {}
+    for job_id, kpi_payload in kpi_payloads.items():
+        workflow = job_id.split(":")[0]
+        if workflow not in totals:
+            totals[workflow] = 0
+            successes[workflow] = 0
+            failures_did_not_finish[workflow] = 0
+            failures_did_not_pass_thresholds[workflow] = 0
+        totals[workflow] += 1
+        if kpi_payload["success"]:
+            successes[workflow] += 1
+        else:
+            if kpi_payload["msg"] == "error: training did not finish!":
+                failures_did_not_finish[workflow] += 1
+            else:
+                failures_did_not_pass_thresholds[workflow] += 1
+
+    kpi_payloads["overall"] = {
+        "totals": totals,
+        "successes": successes,
+        "failures_did_not_finish": failures_did_not_finish,
+        "failures_did_not_pass_thresholds": failures_did_not_pass_thresholds,
+        "timestamp": datetime.now().isoformat(),
+        "tag": tag,
+    }
+
+    return kpi_payloads
+
+
+def output_payloads(payloads):
+    """Output the KPI payloads to a json file."""
+    # first grab all log files
+    repo_path = os.path.join(carb.tokens.get_tokens_interface().resolve("${app}"), "..")
+    output_path = os.path.join(repo_path, "logs/kpi.json")
+    # create directory if it doesn't exist
+    if not os.path.exists(os.path.dirname(output_path)):
+        os.makedirs(os.path.dirname(output_path))
+    # save file
+    with open(output_path, "w") as payload_file:
+        json.dump(payloads, payload_file, indent=4)
+
+
+def _retrieve_logs(workflow, task):
+    """Retrieve training logs."""
+    # first grab all log files
+    repo_path = os.path.join(carb.tokens.get_tokens_interface().resolve("${app}"), "..")
+    if workflow == "rl_games":
+        log_files_path = os.path.join(repo_path, f"logs/{workflow}/{task}/*/summaries/*")
+    else:
+        log_files_path = os.path.join(repo_path, f"logs/{workflow}/{task}/*/*.tfevents.*")
+    log_files = glob.glob(log_files_path)
+    # handle case where no log files are found
+    if not log_files:
+        return None
+    # find most recent
+    latest_log_file = max(log_files, key=os.path.getctime)
+    # parse tf file into a dictionary
+    log_data = _parse_tf_logs(latest_log_file)
+    return log_data
+
+
+def _parse_tf_logs(log):
+    """Parse the tensorflow filepath into a dictionary."""
+    log_data = {}
+    ea = event_accumulator.EventAccumulator(log)
+    ea.Reload()
+    tags = ea.Tags()["scalars"]
+    for tag in tags:
+        log_data[tag] = []
+        for event in ea.Scalars(tag):
+            log_data[tag].append((event.step, event.value))
+    return log_data
+
+
+def _extract_log_val(name, log_data, uses_lower_threshold, workflow):
+    """Extract the value from the log data."""
+    try:
+        if name == "reward":
+            reward_tags = {
+                "rl_games": "rewards/iter",
+                "rsl_rl": "Train/mean_reward",
+                "sb3": None,  # TODO: complete when sb3 is fixed
+                "skrl": "Reward / Total reward (mean)",
+            }
+            tag = reward_tags.get(workflow)
+            if tag:
+                return _extract_reward(log_data, tag)
+
+        elif name == "episode_length":
+            episode_tags = {
+                "rl_games": "episode_lengths/iter",
+                "rsl_rl": "Train/mean_episode_length",
+                "sb3": None,  # TODO: complete when sb3 is fixed
+                "skrl": "Episode / Total timesteps (mean)",
+            }
+            tag = episode_tags.get(workflow)
+            if tag:
+                return _extract_feature(log_data, tag, uses_lower_threshold)
+
+        elif name == "training_time":
+            return {"rl_games": log_data["rewards/time"][-1][0], "rsl_rl": None, "sb3": None, "skrl": None}.get(
+                workflow
+            )
+    except Exception:
+        return None
+
+    raise ValueError(f"Env Config name {name} is not supported.")
+
+
+def _extract_feature(log_data, feature, uses_lower_threshold):
+    """Extract the feature from the log data."""
+    log_data = np.array(log_data[feature])[:, 1]
+
+    if uses_lower_threshold:
+        return max(log_data)
+    else:
+        return min(log_data)
+
+
+def _extract_reward(log_data, feature, k=8):
+    """Extract the averaged max reward from the log data."""
+    log_data = np.array(log_data[feature])[:, 1]
+
+    # find avg of k max values
+    k = min(len(log_data), k)
+    averaged_reward = np.mean(np.partition(log_data, -k)[-k:])
+
+    return averaged_reward
--- a/tools/test_settings.py
+++ b/tools/test_settings.py
@@ -30,6 +30,7 @@ PER_TEST_TIMEOUTS = {
    "test_skrl_wrapper.py": 200,
    "test_operational_space.py": 300,
    "test_terrain_importer.py": 200,
+    "test_environments_training.py": 5000,
 }
 """A dictionary of tests and their timeouts in seconds.