Unverified Commit 8b4e26aa authored by Özhan Özen's avatar Özhan Özen Committed by GitHub

Fixes ray initialization to correctly direct subprocess output (#3533)

# Description

When running Ray directly from tuner.py, Ray is not correctly
initialized within `invoke_tuning_run()`. The two problems associated
with this are discussed in #3532. To solve them, this PR:

1. Removes `ray_init()` from `util.get_gpu_node_resources()`. Now, ray
needs to be initialized before calling `util.get_gpu_node_resources()`.
This change actually reverses #3350, which was merged to add the missing
initialization when using `tuner.py`, but it is safer to explicitly
initialize Ray with the correct arguments outside of the
`util.get_gpu_node_resources()`.
2. Moves Ray initialization within `invoke_tuning_run()` to be before
`util.get_gpu_node_resources()` so we explicitly initialize it before
and do not raise an exception later.
3. Adds a warning when calling `ray_init()` if Ray was already
initialized.

Fixes #3532

## Type of change

- Bug fix (non-breaking change which fixes an issue)


## Screenshots

Change 1:
<img width="901" height="62" alt="Screenshot 2025-09-23 at 16 52 55"
src="https://github.com/user-attachments/assets/59fdb69d-fc29-41c4-980f-1af450ef5036"
/>

Change 2:
<img width="520" height="339" alt="Screenshot 2025-09-23 at 16 52 33"
src="https://github.com/user-attachments/assets/04f51cd6-9e76-485b-b162-ce4662aec417"
/>

Change 3:
<img width="784" height="60" alt="Screenshot 2025-09-23 at 16 55 21"
src="https://github.com/user-attachments/assets/6187b513-24ce-48cb-bac9-50cd665c185a"
/>


## Checklist

- [x] I have read and understood the [contribution
guidelines](https://isaac-sim.github.io/IsaacLab/main/source/refs/contributing.html)
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------
Co-authored-by: 's avatargarylvov <67614381+garylvov@users.noreply.github.com>
parent ed680898
...@@ -217,17 +217,17 @@ def invoke_tuning_run(cfg: dict, args: argparse.Namespace) -> None: ...@@ -217,17 +217,17 @@ def invoke_tuning_run(cfg: dict, args: argparse.Namespace) -> None:
print("[WARNING]: Not saving checkpoints, just running experiment...") print("[WARNING]: Not saving checkpoints, just running experiment...")
print("[INFO]: Model parameters and metrics will be preserved.") print("[INFO]: Model parameters and metrics will be preserved.")
print("[WARNING]: For homogeneous cluster resources only...") print("[WARNING]: For homogeneous cluster resources only...")
# Initialize Ray
util.ray_init(
ray_address=args.ray_address,
log_to_driver=True,
)
# Get available resources # Get available resources
resources = util.get_gpu_node_resources() resources = util.get_gpu_node_resources()
print(f"[INFO]: Available resources {resources}") print(f"[INFO]: Available resources {resources}")
if not ray.is_initialized():
ray.init(
address=args.ray_address,
log_to_driver=True,
num_gpus=len(resources),
)
print(f"[INFO]: Using config {cfg}") print(f"[INFO]: Using config {cfg}")
# Configure the search algorithm and the repeater # Configure the search algorithm and the repeater
......
...@@ -320,6 +320,8 @@ def ray_init(ray_address: str = "auto", runtime_env: dict[str, Any] | None = Non ...@@ -320,6 +320,8 @@ def ray_init(ray_address: str = "auto", runtime_env: dict[str, Any] | None = Non
f" runtime_env={runtime_env}" f" runtime_env={runtime_env}"
) )
ray.init(address=ray_address, runtime_env=runtime_env, log_to_driver=log_to_driver) ray.init(address=ray_address, runtime_env=runtime_env, log_to_driver=log_to_driver)
else:
print("[WARNING]: Attempting to initialize Ray but it is already initialized!")
def get_gpu_node_resources( def get_gpu_node_resources(
...@@ -343,7 +345,7 @@ def get_gpu_node_resources( ...@@ -343,7 +345,7 @@ def get_gpu_node_resources(
or simply the resource for a single node if requested. or simply the resource for a single node if requested.
""" """
if not ray.is_initialized(): if not ray.is_initialized():
ray_init() raise RuntimeError("Ray must be initialized before calling get_gpu_node_resources().")
nodes = ray.nodes() nodes = ray.nodes()
node_resources = [] node_resources = []
total_cpus = 0 total_cpus = 0
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment