Unverified Commit 317be41c authored by yijieg's avatar yijieg Committed by GitHub

Updates AutoMate with more documentation and options (#2674)

# Description

Fix the issues reported from QA.
- Change number of trajectories for disassembly task and the job will
output this number during running.
- Add explanation about disassembly task in environment doc (not
involving policy training and evaluation)
- Add flag for wandb to record learning curves for assembly tasks
- Add flag for max_iterations to set number of training epochs
- Add the command line for windows in run_w_id.py and
run_disassembly_w_id.py

## Type of change

- Bug fix (non-breaking change which fixes an issue)
- New feature (non-breaking change which adds functionality)
- This change requires a documentation update

## Checklist

- [ x ] I have run the [`pre-commit` checks](https://pre-commit.com/)
with `./isaaclab.sh --format`
- [ x ] I have made corresponding changes to the documentation
- [ x ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ x ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent ba2a7dc4
...@@ -216,11 +216,11 @@ We provide environments for both disassembly and assembly. ...@@ -216,11 +216,11 @@ We provide environments for both disassembly and assembly.
For addition instructions and Windows installation, please refer to the `CUDA installation page <https://developer.nvidia.com/cuda-12-8-0-download-archive>`_. For addition instructions and Windows installation, please refer to the `CUDA installation page <https://developer.nvidia.com/cuda-12-8-0-download-archive>`_.
* |disassembly-link|: The plug starts inserted in the socket. A low-level controller lifts th plug out and moves it to a random position. These trajectories serve as demonstrations for the reverse process, i.e., learning to assemble. To run disassembly for a specific task: ``./isaaclab.sh -p source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py --assembly_id=ASSEMBLY_ID`` * |disassembly-link|: The plug starts inserted in the socket. A low-level controller lifts the plug out and moves it to a random position. This process is purely scripted and does not involve any learned policy. Therefore, it does not require policy training or evaluation. The resulting trajectories serve as demonstrations for the reverse process, i.e., learning to assemble. To run disassembly for a specific task: ``python source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_disassembly_w_id.py --assembly_id=ASSEMBLY_ID --disassembly_dir=DISASSEMBLY_DIR``. All generated trajectories are saved to a local directory ``DISASSEMBLY_DIR``.
* |assembly-link|: The goal is to insert the plug into the socket. You can use this environment to train a policy via reinforcement learning or evaluate a pre-trained checkpoint. * |assembly-link|: The goal is to insert the plug into the socket. You can use this environment to train a policy via reinforcement learning or evaluate a pre-trained checkpoint.
* To train an assembly policy: ``./isaaclab.sh -p source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py --assembly_id=ASSEMBLY_ID --train`` * To train an assembly policy, we run the command ``python source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py --assembly_id=ASSEMBLY_ID --train``. We can customize the training process using the optional flags: ``--headless`` to run without opening the GUI windows, ``--max_iterations=MAX_ITERATIONS`` to set the number of training iterations, ``--num_envs=NUM_ENVS`` to set the number of parallel environments during training, ``--seed=SEED`` to assign the random seed, ``--wandb`` to enable logging to WandB (requires a WandB account). The policy checkpoints will be saved automatically during training in the directory ``logs/rl_games/Assembly/test``.
* To evaluate an assembly policy: ``./isaaclab.sh -p source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py --assembly_id=ASSEMBLY_ID --checkpoint=CHECKPOINT --log_eval`` * To evaluate an assembly policy, we run the command ``python source/isaaclab_tasks/isaaclab_tasks/direct/automate/run_w_id.py --assembly_id=ASSEMBLY_ID --checkpoint=CHECKPOINT --log_eval``. The evaluation results are stored in ``evaluation_{ASSEMBLY_ID}.h5``.
.. table:: .. table::
:widths: 33 37 30 :widths: 33 37 30
......
...@@ -71,7 +71,8 @@ class AssemblyEnv(DirectRLEnv): ...@@ -71,7 +71,8 @@ class AssemblyEnv(DirectRLEnv):
if self.cfg_task.sample_from != "rand": if self.cfg_task.sample_from != "rand":
self._init_eval_loading() self._init_eval_loading()
wandb.init(project="automate", name=self.cfg_task.assembly_id + "_" + datetime.now().strftime("%m/%d/%Y")) if self.cfg_task.wandb:
wandb.init(project="automate", name=self.cfg_task.assembly_id + "_" + datetime.now().strftime("%m/%d/%Y"))
def _init_eval_loading(self): def _init_eval_loading(self):
eval_held_asset_pose, eval_fixed_asset_pose, eval_success = automate_log.load_log_from_hdf5( eval_held_asset_pose, eval_fixed_asset_pose, eval_success = automate_log.load_log_from_hdf5(
...@@ -553,7 +554,8 @@ class AssemblyEnv(DirectRLEnv): ...@@ -553,7 +554,8 @@ class AssemblyEnv(DirectRLEnv):
rew_buf = self._update_rew_buf(curr_successes) rew_buf = self._update_rew_buf(curr_successes)
self.ep_succeeded = torch.logical_or(self.ep_succeeded, curr_successes) self.ep_succeeded = torch.logical_or(self.ep_succeeded, curr_successes)
wandb.log(self.extras) if self.cfg_task.wandb:
wandb.log(self.extras)
# Only log episode success rates at the end of an episode. # Only log episode success rates at the end of an episode.
if torch.any(self.reset_buf): if torch.any(self.reset_buf):
...@@ -577,11 +579,12 @@ class AssemblyEnv(DirectRLEnv): ...@@ -577,11 +579,12 @@ class AssemblyEnv(DirectRLEnv):
) )
self.extras["curr_max_disp"] = self.curr_max_disp self.extras["curr_max_disp"] = self.curr_max_disp
wandb.log({ if self.cfg_task.wandb:
"success": torch.mean(self.ep_succeeded.float()), wandb.log({
"reward": torch.mean(rew_buf), "success": torch.mean(self.ep_succeeded.float()),
"sbc_rwd_scale": sbc_rwd_scale, "reward": torch.mean(rew_buf),
}) "sbc_rwd_scale": sbc_rwd_scale,
})
if self.cfg_task.if_logging_eval: if self.cfg_task.if_logging_eval:
self.success_log = torch.cat([self.success_log, self.ep_succeeded.reshape((self.num_envs, 1))], dim=0) self.success_log = torch.cat([self.success_log, self.ep_succeeded.reshape((self.num_envs, 1))], dim=0)
......
...@@ -138,6 +138,7 @@ class AssemblyTask: ...@@ -138,6 +138,7 @@ class AssemblyTask:
if_logging_eval: bool = False if_logging_eval: bool = False
num_eval_trials: int = 100 num_eval_trials: int = 100
eval_filename: str = "evaluation_00015.h5" eval_filename: str = "evaluation_00015.h5"
wandb: bool = False
# Fine-tuning # Fine-tuning
sample_from: str = "rand" # gp, gmm, idv, rand sample_from: str = "rand" # gp, gmm, idv, rand
......
...@@ -115,10 +115,10 @@ class Hole8mm(FixedAssetCfg): ...@@ -115,10 +115,10 @@ class Hole8mm(FixedAssetCfg):
class Extraction(DisassemblyTask): class Extraction(DisassemblyTask):
name = "extraction" name = "extraction"
assembly_id = "00731" assembly_id = "00015"
assembly_dir = f"{ASSET_DIR}/{assembly_id}/" assembly_dir = f"{ASSET_DIR}/{assembly_id}/"
disassembly_dir = "disassembly_dir" disassembly_dir = "disassembly_dir"
num_log_traj = 100 num_log_traj = 1000
fixed_asset_cfg = Hole8mm() fixed_asset_cfg = Hole8mm()
held_asset_cfg = Peg8mm() held_asset_cfg = Peg8mm()
......
...@@ -7,6 +7,7 @@ import argparse ...@@ -7,6 +7,7 @@ import argparse
import os import os
import re import re
import subprocess import subprocess
import sys
def update_task_param(task_cfg, assembly_id, disassembly_dir): def update_task_param(task_cfg, assembly_id, disassembly_dir):
...@@ -61,9 +62,12 @@ def main(): ...@@ -61,9 +62,12 @@ def main():
args.disassembly_dir, args.disassembly_dir,
) )
bash_command = ( if sys.platform.startswith("win"):
"./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task=Isaac-AutoMate-Disassembly-Direct-v0" bash_command = "isaaclab.bat -p"
) elif sys.platform.startswith("linux"):
bash_command = "./isaaclab.sh -p"
bash_command += " scripts/reinforcement_learning/rl_games/train.py --task=Isaac-AutoMate-Disassembly-Direct-v0"
bash_command += f" --num_envs={str(args.num_envs)}" bash_command += f" --num_envs={str(args.num_envs)}"
bash_command += f" --seed={str(args.seed)}" bash_command += f" --seed={str(args.seed)}"
......
...@@ -6,9 +6,10 @@ ...@@ -6,9 +6,10 @@
import argparse import argparse
import re import re
import subprocess import subprocess
import sys
def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval): def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval, if_wandb):
# Read the file lines. # Read the file lines.
with open(task_cfg) as f: with open(task_cfg) as f:
lines = f.readlines() lines = f.readlines()
...@@ -20,6 +21,7 @@ def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval): ...@@ -20,6 +21,7 @@ def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval):
if_sbc_pattern = re.compile(r"^(.*if_sbc\s*:\s*bool\s*=\s*).*$") if_sbc_pattern = re.compile(r"^(.*if_sbc\s*:\s*bool\s*=\s*).*$")
if_log_eval_pattern = re.compile(r"^(.*if_logging_eval\s*:\s*bool\s*=\s*).*$") if_log_eval_pattern = re.compile(r"^(.*if_logging_eval\s*:\s*bool\s*=\s*).*$")
eval_file_pattern = re.compile(r"^(.*eval_filename\s*:\s*str\s*=\s*).*$") eval_file_pattern = re.compile(r"^(.*eval_filename\s*:\s*str\s*=\s*).*$")
if_wandb_pattern = re.compile(r"^(.*wandb\s*:\s*bool\s*=\s*).*$")
for line in lines: for line in lines:
if "assembly_id =" in line: if "assembly_id =" in line:
...@@ -30,6 +32,8 @@ def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval): ...@@ -30,6 +32,8 @@ def update_task_param(task_cfg, assembly_id, if_sbc, if_log_eval):
line = if_log_eval_pattern.sub(rf"\1{str(if_log_eval)}", line) line = if_log_eval_pattern.sub(rf"\1{str(if_log_eval)}", line)
elif "eval_filename: str = " in line: elif "eval_filename: str = " in line:
line = eval_file_pattern.sub(r"\1'{}'".format(f"evaluation_{assembly_id}.h5"), line) line = eval_file_pattern.sub(r"\1'{}'".format(f"evaluation_{assembly_id}.h5"), line)
elif "wandb: bool =" in line:
line = if_wandb_pattern.sub(rf"\1{str(if_wandb)}", line)
updated_lines.append(line) updated_lines.append(line)
...@@ -47,28 +51,30 @@ def main(): ...@@ -47,28 +51,30 @@ def main():
default="source/isaaclab_tasks/isaaclab_tasks/direct/automate/assembly_tasks_cfg.py", default="source/isaaclab_tasks/isaaclab_tasks/direct/automate/assembly_tasks_cfg.py",
) )
parser.add_argument("--assembly_id", type=str, help="New assembly ID to set.") parser.add_argument("--assembly_id", type=str, help="New assembly ID to set.")
parser.add_argument("--wandb", action="store_true", help="Use wandb to record learning curves")
parser.add_argument("--checkpoint", type=str, help="Checkpoint path.") parser.add_argument("--checkpoint", type=str, help="Checkpoint path.")
parser.add_argument("--num_envs", type=int, default=128, help="Number of parallel environment.") parser.add_argument("--num_envs", type=int, default=128, help="Number of parallel environment.")
parser.add_argument("--seed", type=int, default=-1, help="Random seed.") parser.add_argument("--seed", type=int, default=-1, help="Random seed.")
parser.add_argument("--train", action="store_true", help="Run training mode.") parser.add_argument("--train", action="store_true", help="Run training mode.")
parser.add_argument("--log_eval", action="store_true", help="Log evaluation results.") parser.add_argument("--log_eval", action="store_true", help="Log evaluation results.")
parser.add_argument("--headless", action="store_true", help="Run in headless mode.") parser.add_argument("--headless", action="store_true", help="Run in headless mode.")
parser.add_argument("--max_iterations", type=int, default=1500, help="Number of iteration for policy learning.")
args = parser.parse_args() args = parser.parse_args()
update_task_param(args.cfg_path, args.assembly_id, args.train, args.log_eval) update_task_param(args.cfg_path, args.assembly_id, args.train, args.log_eval, args.wandb)
bash_command = None bash_command = None
if sys.platform.startswith("win"):
bash_command = "isaaclab.bat -p"
elif sys.platform.startswith("linux"):
bash_command = "./isaaclab.sh -p"
if args.train: if args.train:
bash_command = ( bash_command += " scripts/reinforcement_learning/rl_games/train.py --task=Isaac-AutoMate-Assembly-Direct-v0"
"./isaaclab.sh -p scripts/reinforcement_learning/rl_games/train.py --task=Isaac-AutoMate-Assembly-Direct-v0" bash_command += f" --seed={str(args.seed)} --max_iterations={str(args.max_iterations)}"
)
bash_command += f" --seed={str(args.seed)}"
else: else:
if not args.checkpoint: if not args.checkpoint:
raise ValueError("No checkpoint provided for evaluation.") raise ValueError("No checkpoint provided for evaluation.")
bash_command = ( bash_command += " scripts/reinforcement_learning/rl_games/play.py --task=Isaac-AutoMate-Assembly-Direct-v0"
"./isaaclab.sh -p scripts/reinforcement_learning/rl_games/play.py --task=Isaac-AutoMate-Assembly-Direct-v0"
)
bash_command += f" --num_envs={str(args.num_envs)}" bash_command += f" --num_envs={str(args.num_envs)}"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment