Unverified Commit e7506fea authored by Mayank Mittal's avatar Mayank Mittal Committed by GitHub

Adds Franka cabinet opening environment from IsaacGymEnvs (#44)

# Description

This MR adds the cabinet environment from IsaacGymEnvs. The training is
checked for RSL-RL and RL-Games

## Type of change

- New feature (non-breaking change which adds functionality)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------
Co-authored-by: 's avatarzrene <rene.zurbruegg@gmail.com>
parent 2dc138a5
......@@ -45,6 +45,12 @@ Manipulation
Environments based on fixed-arm manipulation tasks.
For many of these tasks, we include configurations with different arm action spaces. For example,
for the reach environment:
* |lift-cube-link|: Franka arm with joint position control
* |lift-cube-ik-abs-link|: Franka arm with absolute IK control
* |lift-cube-ik-rel-link|: Franka arm with relative IK control
.. table::
:widths: 33 37 30
......@@ -58,15 +64,20 @@ Environments based on fixed-arm manipulation tasks.
+----------------+---------------------+-----------------------------------------------------------------------------+
| |lift-cube| | |lift-cube-link| | Pick a cube and bring it to a sampled target position with the Franka robot |
+----------------+---------------------+-----------------------------------------------------------------------------+
| |cabi-franka| | |cabi-franka-link| | Grasp the handle of a cabinet's drawer and open it with the Franka robot |
+----------------+---------------------+-----------------------------------------------------------------------------+
.. |reach-franka| image:: ../_static/tasks/manipulation/franka_reach.jpg
.. |reach-ur10| image:: ../_static/tasks/manipulation/ur10_reach.jpg
.. |lift-cube| image:: ../_static/tasks/manipulation/franka_lift.jpg
.. |cabi-franka| image:: ../_static/tasks/manipulation/franka_open_drawer.jpg
.. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/env_cfg.py>`__
.. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/env_cfg.py>`__
.. |reach-franka-link| replace:: `Isaac-Reach-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/franka/joint_pos_env_cfg.py>`__
.. |reach-ur10-link| replace:: `Isaac-Reach-UR10-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/reach/config/ur_10/joint_pos_env_cfg.py>`__
.. |lift-cube-link| replace:: `Isaac-Lift-Cube-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/joint_pos_env_cfg.py>`__
.. |lift-cube-ik-abs-link| replace:: `Isaac-Lift-Cube-Franka-IK-Abs-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_abs_env_cfg.py>`__
.. |lift-cube-ik-rel-link| replace:: `Isaac-Lift-Cube-Franka-IK-Rel-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/lift/config/franka/ik_rel_env_cfg.py>`__
.. |cabi-franka-link| replace:: `Isaac-Open-Drawer-Franka-v0 <https://github.com/NVIDIA-Omniverse/orbit/blob/main/source/extensions/omni.isaac.orbit_tasks/omni/isaac/orbit_tasks/manipulation/cabinet/config/franka/joint_pos_env_cfg.py>`__
Locomotion
......
......@@ -65,6 +65,9 @@ class ViewportCameraController:
# at each rendering step.
if self.cfg.asset_name is None:
raise ValueError(f"No asset name provided for viewer with origin type: '{self.cfg.origin_type}'.")
else:
# set the camera origin to the center of the world
self.update_view_to_world()
# subscribe to post update event so that camera view can be updated at each rendering step
app_interface = omni.kit.app.get_app_interface()
......
......@@ -103,12 +103,13 @@ class CommandTermCfg:
class CurriculumTermCfg(ManagerTermBaseCfg):
"""Configuration for a curriculum term."""
func: Callable[..., float | dict[str, float]] = MISSING
func: Callable[..., float | dict[str, float] | None] = MISSING
"""The name of the function to be called.
This function should take the environment object, environment indices
and any other parameters as input and return the curriculum state for
logging purposes.
logging purposes. If the function returns None, the curriculum state
is not logged.
"""
......
[package]
# Note: Semantic Versioning is used: https://semver.org/
version = "0.5.7"
version = "0.6.0"
# Description
title = "ORBIT Environments"
......
Changelog
---------
0.6.0 (2024-03-10)
~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added a new environment ``Isaac-Open-Drawer-Franka-v0`` for the Franka arm to open a drawer. It is
based on the IsaacGymEnvs cabinet environment.
Fixed
^^^^^
* Fixed logging of extra information for RL-Games wrapper. It expected the extra information to be under the
key ``"episode"``, but Orbit used the key ``"log"``. The wrapper now remaps the key to ``"episode"``.
0.5.7 (2024-02-28)
~~~~~~~~~~~~~~~~~~
......
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Manipulation environments to open drawers in a cabinet."""
# Reference: https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml#L32
seed: 42
# 512×1500×64
n_timesteps: 49152000
policy: 'MlpPolicy'
n_steps: 64
batch_size: 4096
gae_lambda: 0.95
gamma: 0.99
n_epochs: 20
ent_coef: 0.0
learning_rate: !!float 5e-4
clip_range: 0.2
policy_kwargs: "dict(
log_std_init=-2,
ortho_init=False,
activation_fn=nn.ELU,
net_arch=[256, 128, 64]
)"
target_kl: 0.008
max_grad_norm: 1.0
# Uses VecNormalize class to normalize obs
normalize_input: True
# Uses VecNormalize class to normalize rew
normalize_value: True
clip_obs: 5.0
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (c) 2022-2023, The ORBIT Project Developers.
# All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
from dataclasses import MISSING
import omni.isaac.orbit.sim as sim_utils
from omni.isaac.orbit.actuators.actuator_cfg import ImplicitActuatorCfg
from omni.isaac.orbit.assets import ArticulationCfg, AssetBaseCfg
from omni.isaac.orbit.envs import RLTaskEnvCfg
from omni.isaac.orbit.managers import ObservationGroupCfg as ObsGroup
from omni.isaac.orbit.managers import ObservationTermCfg as ObsTerm
from omni.isaac.orbit.managers import RandomizationTermCfg as RandTerm
from omni.isaac.orbit.managers import RewardTermCfg as RewTerm
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.managers import TerminationTermCfg as DoneTerm
from omni.isaac.orbit.scene import InteractiveSceneCfg
from omni.isaac.orbit.sensors import FrameTransformerCfg
from omni.isaac.orbit.sensors.frame_transformer import OffsetCfg
from omni.isaac.orbit.utils import configclass
from omni.isaac.orbit.utils.assets import ISAAC_NUCLEUS_DIR
from . import mdp
##
# Pre-defined configs
##
from omni.isaac.orbit.markers.config import FRAME_MARKER_CFG # isort: skip
FRAME_MARKER_SMALL_CFG = FRAME_MARKER_CFG.copy()
FRAME_MARKER_SMALL_CFG.markers["frame"].scale = (0.10, 0.10, 0.10)
##
# Scene definition
##
@configclass
class CabinetSceneCfg(InteractiveSceneCfg):
"""Configuration for the cabinet scene with a robot and a cabinet.
This is the abstract base implementation, the exact scene is defined in the derived classes
which need to set the robot and end-effector frames
"""
# robots, Will be populated by agent env cfg
robot: ArticulationCfg = MISSING
# End-effector, Will be populated by agent env cfg
ee_frame: FrameTransformerCfg = MISSING
cabinet = ArticulationCfg(
prim_path="{ENV_REGEX_NS}/Cabinet",
spawn=sim_utils.UsdFileCfg(
usd_path=f"{ISAAC_NUCLEUS_DIR}/Props/Sektion_Cabinet/sektion_cabinet_instanceable.usd",
activate_contact_sensors=False,
),
init_state=ArticulationCfg.InitialStateCfg(
pos=(0.8, 0, 0.4),
rot=(0.0, 0.0, 0.0, 1.0),
joint_pos={
"door_left_joint": 0.0,
"door_right_joint": 0.0,
"drawer_bottom_joint": 0.0,
"drawer_top_joint": 0.0,
},
),
actuators={
"drawers": ImplicitActuatorCfg(
joint_names_expr=["drawer_top_joint", "drawer_bottom_joint"],
effort_limit=87.0,
velocity_limit=100.0,
stiffness=10.0,
damping=1.0,
),
"doors": ImplicitActuatorCfg(
joint_names_expr=["door_left_joint", "door_right_joint"],
effort_limit=87.0,
velocity_limit=100.0,
stiffness=10.0,
damping=2.5,
),
},
)
# Frame definitions for the cabinet.
cabinet_frame = FrameTransformerCfg(
prim_path="{ENV_REGEX_NS}/Cabinet/sektion",
debug_vis=True,
visualizer_cfg=FRAME_MARKER_SMALL_CFG.replace(prim_path="/Visuals/CabinetFrameTransformer"),
target_frames=[
FrameTransformerCfg.FrameCfg(
prim_path="{ENV_REGEX_NS}/Cabinet/drawer_handle_top",
name="drawer_handle_top",
offset=OffsetCfg(
pos=(0.305, 0.0, 0.01),
rot=(0.5, 0.5, -0.5, -0.5), # align with end-effector frame
),
),
],
)
# plane
plane = AssetBaseCfg(
prim_path="/World/GroundPlane",
init_state=AssetBaseCfg.InitialStateCfg(),
spawn=sim_utils.GroundPlaneCfg(),
collision_group=-1,
)
# lights
light = AssetBaseCfg(
prim_path="/World/light",
spawn=sim_utils.DomeLightCfg(color=(0.75, 0.75, 0.75), intensity=3000.0),
)
##
# MDP settings
##
@configclass
class CommandsCfg:
"""Command terms for the MDP."""
null_command = mdp.NullCommandCfg()
@configclass
class ActionsCfg:
"""Action specifications for the MDP."""
body_joint_pos: mdp.JointPositionActionCfg = MISSING
finger_joint_pos: mdp.BinaryJointPositionActionCfg = MISSING
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group."""
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
cabinet_joint_pos = ObsTerm(
func=mdp.joint_pos_rel,
params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
)
cabinet_joint_vel = ObsTerm(
func=mdp.joint_vel_rel,
params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
)
rel_ee_drawer_distance = ObsTerm(func=mdp.rel_ee_drawer_distance)
actions = ObsTerm(func=mdp.last_action)
def __post_init__(self):
self.enable_corruption = True
self.concatenate_terms = True
# observation groups
policy: PolicyCfg = PolicyCfg()
@configclass
class RandomizationCfg:
"""Configuration for randomization."""
robot_physics_material = RandTerm(
func=mdp.randomize_rigid_body_material,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("robot", body_names=".*"),
"static_friction_range": (0.8, 1.25),
"dynamic_friction_range": (0.8, 1.25),
"restitution_range": (0.0, 0.0),
"num_buckets": 16,
},
)
cabinet_physics_material = RandTerm(
func=mdp.randomize_rigid_body_material,
mode="startup",
params={
"asset_cfg": SceneEntityCfg("cabinet", body_names="drawer_handle_top"),
"static_friction_range": (1.0, 1.25),
"dynamic_friction_range": (1.25, 1.5),
"restitution_range": (0.0, 0.0),
"num_buckets": 16,
},
)
reset_all = RandTerm(func=mdp.reset_scene_to_default, mode="reset")
reset_robot_joints = RandTerm(
func=mdp.reset_joints_by_offset,
mode="reset",
params={
"position_range": (-0.1, 0.1),
"velocity_range": (0.0, 0.0),
},
)
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
# 1. Approach the handle
approach_ee_handle = RewTerm(func=mdp.approach_ee_handle, weight=2.0, params={"threshold": 0.2})
align_ee_handle = RewTerm(func=mdp.align_ee_handle, weight=0.5)
# 2. Grasp the handle
approach_gripper_handle = RewTerm(func=mdp.approach_gripper_handle, weight=5.0, params={"offset": MISSING})
align_grasp_around_handle = RewTerm(func=mdp.align_grasp_around_handle, weight=0.125)
grasp_handle = RewTerm(
func=mdp.grasp_handle,
weight=0.5,
params={
"threshold": 0.03,
"open_joint_pos": MISSING,
"asset_cfg": SceneEntityCfg("robot", joint_names=MISSING),
},
)
# 3. Open the drawer
open_drawer_bonus = RewTerm(
func=mdp.open_drawer_bonus,
weight=7.5,
params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
)
multi_stage_open_drawer = RewTerm(
func=mdp.multi_stage_open_drawer,
weight=1.0,
params={"asset_cfg": SceneEntityCfg("cabinet", joint_names=["drawer_top_joint"])},
)
# 4. Penalize actions for cosmetic reasons
action_rate_l2 = RewTerm(func=mdp.action_rate_l2, weight=-1e-2)
joint_vel = RewTerm(func=mdp.joint_vel_l2, weight=-0.0001)
@configclass
class TerminationsCfg:
"""Termination terms for the MDP."""
time_out = DoneTerm(func=mdp.time_out, time_out=True)
##
# Environment configuration
##
@configclass
class CabinetEnvCfg(RLTaskEnvCfg):
"""Configuration for the cabinet environment."""
# Scene settings
scene: CabinetSceneCfg = CabinetSceneCfg(num_envs=4096, env_spacing=2.0)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
commands: CommandsCfg = CommandsCfg()
# MDP settings
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
randomization: RandomizationCfg = RandomizationCfg()
def __post_init__(self):
"""Post initialization."""
# general settings
self.decimation = 1
self.episode_length_s = 8.0
self.viewer.eye = (-2.0, 2.0, 2.0)
self.viewer.lookat = (0.8, 0.0, 0.5)
# simulation settings
self.sim.dt = 1 / 60 # 60Hz
self.sim.physx.bounce_threshold_velocity = 0.2
self.sim.physx.bounce_threshold_velocity = 0.01
self.sim.physx.friction_correlation_distance = 0.00625
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""Configurations for the cabinet environments."""
# We leave this file empty since we don't want to expose any configs in this package directly.
# We still need this file to import the "config" module in the parent package.
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
import gymnasium as gym
from . import agents, ik_abs_env_cfg, ik_rel_env_cfg, joint_pos_env_cfg
##
# Register Gym environments.
##
##
# Joint Position Control
##
gym.register(
id="Isaac-Open-Drawer-Franka-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": joint_pos_env_cfg.FrankaCabinetEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Open-Drawer-Franka-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": joint_pos_env_cfg.FrankaCabinetEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
##
# Inverse Kinematics - Absolute Pose Control
##
gym.register(
id="Isaac-Open-Drawer-Franka-IK-Abs-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_abs_env_cfg.FrankaCabinetEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Open-Drawer-Franka-IK-Abs-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_abs_env_cfg.FrankaCabinetEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
##
# Inverse Kinematics - Relative Pose Control
##
gym.register(
id="Isaac-Open-Drawer-Franka-IK-Rel-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_rel_env_cfg.FrankaCabinetEnvCfg,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Open-Drawer-Franka-IK-Rel-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_rel_env_cfg.FrankaCabinetEnvCfg_PLAY,
"rsl_rl_cfg_entry_point": agents.rsl_rl_cfg.CabinetPPORunnerCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
},
disable_env_checker=True,
)
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from . import rsl_rl_cfg # noqa: F401, F403
......@@ -40,36 +40,37 @@ params:
load_path: ''
config:
name: cabinet
name: franka_open_drawer
env_name: rlgpu
device: 'cuda:0'
device_name: 'cuda:0'
multi_gpu: False
ppo: True
mixed_precision: False
normalize_input: True
normalize_value: True
num_actors: -1
normalize_input: False
normalize_value: False
num_actors: -1 # configured from the script (based on num_envs)
reward_shaper:
scale_value: 1
normalize_advantage: True
normalize_advantage: False
gamma: 0.99
tau: 0.95
learning_rate: 5e-4
lr_schedule: adaptive
kl_threshold: 0.008
score_to_win: 100000000
max_epochs: 300 #300
save_best_after: 100
save_frequency: 100
score_to_win: 200
max_epochs: 400
save_best_after: 50
save_frequency: 50
print_stats: True
grad_norm: 1.0
entropy_coef: 0.0
entropy_coef: 0.001
truncate_grads: True
e_clip: 0.2
horizon_length: 32 #64
horizon_length: 96
minibatch_size: 4096
mini_epochs: 8
mini_epochs: 5
critic_coef: 4
clip_value: True
seq_len: 4
seq_length: 4
bounds_loss_coef: 0.0001
......@@ -3,7 +3,7 @@
#
# SPDX-License-Identifier: BSD-3-Clause
# TODO: be moved once CabinetEnv exists
from omni.isaac.orbit.utils import configclass
from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import (
RslRlOnPolicyRunnerCfg,
......@@ -11,34 +11,31 @@ from omni.isaac.orbit_tasks.utils.wrappers.rsl_rl import (
RslRlPpoAlgorithmCfg,
)
CABINET_RSL_RL_PPO_CFG = RslRlOnPolicyRunnerCfg(
num_steps_per_env=64,
max_iterations=200,
save_interval=50,
experiment_name="cabinet",
run_name="",
resume=False,
load_run=-1,
load_checkpoint=-1,
empirical_normalization=False,
policy=RslRlPpoActorCriticCfg(
@configclass
class CabinetPPORunnerCfg(RslRlOnPolicyRunnerCfg):
num_steps_per_env = 96
max_iterations = 400
save_interval = 50
experiment_name = "franka_open_drawer"
empirical_normalization = False
policy = RslRlPpoActorCriticCfg(
init_noise_std=1.0,
actor_hidden_dims=[256, 128, 64],
critic_hidden_dims=[256, 128, 64],
activation="elu",
),
algorithm=RslRlPpoAlgorithmCfg(
)
algorithm = RslRlPpoAlgorithmCfg(
value_loss_coef=1.0,
use_clipped_value_loss=True,
clip_param=0.2,
entropy_coef=0.00,
num_learning_epochs=8,
num_mini_batches=8,
entropy_coef=1e-3,
num_learning_epochs=5,
num_mini_batches=4,
learning_rate=5.0e-4,
schedule="adaptive",
gamma=0.99,
lam=0.95,
desired_kl=0.008,
desired_kl=0.02,
max_grad_norm=1.0,
),
)
)
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
from omni.isaac.orbit.utils import configclass
from . import joint_pos_env_cfg
##
# Pre-defined configs
##
from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG # isort: skip
@configclass
class FrankaCabinetEnvCfg(joint_pos_env_cfg.FrankaCabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set Franka as robot
# We switch here to a stiffer PD controller for IK tracking to be better.
self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Set actions for the specific robot type (franka)
self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
asset_name="robot",
joint_names=["panda_joint.*"],
body_name="panda_hand",
controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=False, ik_method="dls"),
body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
)
@configclass
class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
from omni.isaac.orbit.utils import configclass
from . import joint_pos_env_cfg
##
# Pre-defined configs
##
from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG # isort: skip
@configclass
class FrankaCabinetEnvCfg(joint_pos_env_cfg.FrankaCabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set Franka as robot
# We switch here to a stiffer PD controller for IK tracking to be better.
self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Set actions for the specific robot type (franka)
self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
asset_name="robot",
joint_names=["panda_joint.*"],
body_name="panda_hand",
controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=True, ik_method="dls"),
scale=0.5,
body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
)
@configclass
class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from omni.isaac.orbit.sensors import FrameTransformerCfg
from omni.isaac.orbit.sensors.frame_transformer.frame_transformer_cfg import OffsetCfg
from omni.isaac.orbit.utils import configclass
from omni.isaac.orbit_tasks.manipulation.cabinet import mdp
from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import CabinetEnvCfg
##
# Pre-defined configs
##
from omni.isaac.orbit_assets.franka import FRANKA_PANDA_CFG # isort: skip
from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import FRAME_MARKER_SMALL_CFG # isort: skip
@configclass
class FrankaCabinetEnvCfg(CabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set franka as robot
self.scene.robot = FRANKA_PANDA_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Set Actions for the specific robot type (franka)
self.actions.body_joint_pos = mdp.JointPositionActionCfg(
asset_name="robot",
joint_names=["panda_joint.*"],
scale=1.0,
use_default_offset=True,
)
self.actions.finger_joint_pos = mdp.BinaryJointPositionActionCfg(
asset_name="robot",
joint_names=["panda_finger.*"],
open_command_expr={"panda_finger_.*": 0.04},
close_command_expr={"panda_finger_.*": 0.0},
)
# Listens to the required transforms
# IMPORTANT: The order of the frames in the list is important. The first frame is the tool center point (TCP)
# the other frames are the fingers
self.scene.ee_frame = FrameTransformerCfg(
prim_path="{ENV_REGEX_NS}/Robot/panda_link0",
debug_vis=False,
visualizer_cfg=FRAME_MARKER_SMALL_CFG.replace(prim_path="/Visuals/EndEffectorFrameTransformer"),
target_frames=[
FrameTransformerCfg.FrameCfg(
prim_path="{ENV_REGEX_NS}/Robot/panda_hand",
name="ee_tcp",
offset=OffsetCfg(
pos=(0.0, 0.0, 0.1034),
),
),
FrameTransformerCfg.FrameCfg(
prim_path="{ENV_REGEX_NS}/Robot/panda_leftfinger",
name="tool_leftfinger",
offset=OffsetCfg(
pos=(0.0, 0.0, 0.046),
),
),
FrameTransformerCfg.FrameCfg(
prim_path="{ENV_REGEX_NS}/Robot/panda_rightfinger",
name="tool_rightfinger",
offset=OffsetCfg(
pos=(0.0, 0.0, 0.046),
),
),
],
)
# override rewards
self.rewards.approach_gripper_handle.params["offset"] = 0.04
self.rewards.grasp_handle.params["open_joint_pos"] = 0.04
self.rewards.grasp_handle.params["asset_cfg"].joint_names = ["panda_finger_.*"]
@configclass
class FrankaCabinetEnvCfg_PLAY(FrankaCabinetEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""This sub-module contains the functions that are specific to the cabinet environments."""
from omni.isaac.orbit.envs.mdp import * # noqa: F401, F403
from .observations import * # noqa: F401, F403
from .rewards import * # noqa: F401, F403
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
import torch
from typing import TYPE_CHECKING
from omni.isaac.orbit.assets import ArticulationData
from omni.isaac.orbit.sensors import FrameTransformerData
if TYPE_CHECKING:
from omni.isaac.orbit.envs import RLTaskEnv
def rel_ee_object_distance(env: RLTaskEnv) -> torch.Tensor:
"""The distance between the end-effector and the object."""
ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
object_data: ArticulationData = env.scene["object"].data
return object_data.root_pos_w - ee_tf_data.target_pos_w[..., 0, :]
def rel_ee_drawer_distance(env: RLTaskEnv) -> torch.Tensor:
"""The distance between the end-effector and the object."""
ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
cabinet_tf_data: FrameTransformerData = env.scene["cabinet_frame"].data
return cabinet_tf_data.target_pos_w[..., 0, :] - ee_tf_data.target_pos_w[..., 0, :]
def fingertips_pos(env: RLTaskEnv) -> torch.Tensor:
"""The position of the fingertips relative to the environment origins."""
ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
fingertips_pos = ee_tf_data.target_pos_w[..., 1:, :] - env.scene.env_origins.unsqueeze(1)
return fingertips_pos.view(env.num_envs, -1)
def ee_pos(env: RLTaskEnv) -> torch.Tensor:
"""The position of the end-effector relative to the environment origins."""
ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
ee_pos = ee_tf_data.target_pos_w[..., 0, :] - env.scene.env_origins
return ee_pos
def ee_quat(env: RLTaskEnv) -> torch.Tensor:
"""The orientation of the end-effector in the environment frame."""
ee_tf_data: FrameTransformerData = env.scene["ee_frame"].data
ee_quat = ee_tf_data.target_quat_w[..., 0, :]
# make first element of quaternion positive
ee_quat[ee_quat[:, 0] < 0] *= -1
return ee_quat
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from __future__ import annotations
import torch
from typing import TYPE_CHECKING
from omni.isaac.orbit.managers import SceneEntityCfg
from omni.isaac.orbit.utils.math import matrix_from_quat
if TYPE_CHECKING:
from omni.isaac.orbit.envs import RLTaskEnv
def approach_ee_handle(env: RLTaskEnv, threshold: float) -> torch.Tensor:
r"""Reward the robot for reaching the drawer handle using inverse-square law.
It uses a piecewise function to reward the robot for reaching the handle.
.. math::
reward = \begin{cases}
2 * (1 / (1 + distance^2))^2 & \text{if } distance \leq threshold \\
(1 / (1 + distance^2))^2 & \text{otherwise}
\end{cases}
"""
ee_tcp_pos = env.scene["ee_frame"].data.target_pos_w[..., 0, :]
handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
# Compute the distance of the end-effector to the handle
distance = torch.norm(handle_pos - ee_tcp_pos, dim=-1, p=2)
# Reward the robot for reaching the handle
reward = 1.0 / (1.0 + distance**2)
reward = torch.pow(reward, 2)
return torch.where(distance <= threshold, 2 * reward, reward)
def align_ee_handle(env: RLTaskEnv) -> torch.Tensor:
"""Reward for aligning the end-effector with the handle.
The reward is based on the alignment of the gripper with the handle. It is computed as follows:
.. math::
reward = 0.5 * (align_z^2 + align_x^2)
where :math:`align_z` is the dot product of the z direction of the gripper and the -x direction of the handle
and :math:`align_x` is the dot product of the x direction of the gripper and the -y direction of the handle.
"""
ee_tcp_quat = env.scene["ee_frame"].data.target_quat_w[..., 0, :]
handle_quat = env.scene["cabinet_frame"].data.target_quat_w[..., 0, :]
ee_tcp_rot_mat = matrix_from_quat(ee_tcp_quat)
handle_mat = matrix_from_quat(handle_quat)
# get current x and y direction of the handle
handle_x, handle_y = handle_mat[..., 0], handle_mat[..., 1]
# get current x and z direction of the gripper
ee_tcp_x, ee_tcp_z = ee_tcp_rot_mat[..., 0], ee_tcp_rot_mat[..., 2]
# make sure gripper aligns with the handle
# in this case, the z direction of the gripper should be close to the -x direction of the handle
# and the x direction of the gripper should be close to the -y direction of the handle
# dot product of z and x should be large
align_z = torch.bmm(ee_tcp_z.unsqueeze(1), -handle_x.unsqueeze(-1)).squeeze(-1).squeeze(-1)
align_x = torch.bmm(ee_tcp_x.unsqueeze(1), -handle_y.unsqueeze(-1)).squeeze(-1).squeeze(-1)
return 0.5 * (torch.sign(align_z) * align_z**2 + torch.sign(align_x) * align_x**2)
def align_grasp_around_handle(env: RLTaskEnv) -> torch.Tensor:
"""Bonus for correct hand orientation around the handle.
The correct hand orientation is when the left finger is above the handle and the right finger is below the handle.
"""
# Target object position: (num_envs, 3)
handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
# Fingertips position: (num_envs, n_fingertips, 3)
ee_fingertips_w = env.scene["ee_frame"].data.target_pos_w[..., 1:, :]
lfinger_pos = ee_fingertips_w[..., 0, :]
rfinger_pos = ee_fingertips_w[..., 1, :]
# Check if hand is in a graspable pose
is_graspable = (rfinger_pos[:, 2] < handle_pos[:, 2]) & (lfinger_pos[:, 2] > handle_pos[:, 2])
# bonus if left finger is above the drawer handle and right below
return is_graspable
def approach_gripper_handle(env: RLTaskEnv, offset: float = 0.04) -> torch.Tensor:
"""Reward the robot's gripper reaching the drawer handle with the right pose.
This function returns the distance of fingertips to the handle when the fingers are in a grasping orientation
(i.e., the left finger is above the handle and the right finger is below the handle). Otherwise, it returns zero.
"""
# Target object position: (num_envs, 3)
handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
# Fingertips position: (num_envs, n_fingertips, 3)
ee_fingertips_w = env.scene["ee_frame"].data.target_pos_w[..., 1:, :]
lfinger_pos = ee_fingertips_w[..., 0, :]
rfinger_pos = ee_fingertips_w[..., 1, :]
# Compute the distance of each finger from the handle
lfinger_dist = torch.abs(lfinger_pos[:, 2] - handle_pos[:, 2])
rfinger_dist = torch.abs(rfinger_pos[:, 2] - handle_pos[:, 2])
# Check if hand is in a graspable pose
is_graspable = (rfinger_pos[:, 2] < handle_pos[:, 2]) & (lfinger_pos[:, 2] > handle_pos[:, 2])
return is_graspable * ((offset - lfinger_dist) + (offset - rfinger_dist))
def grasp_handle(env: RLTaskEnv, threshold: float, open_joint_pos: float, asset_cfg: SceneEntityCfg) -> torch.Tensor:
"""Reward for closing the fingers when being close to the handle.
The :attr:`threshold` is the distance from the handle at which the fingers should be closed.
The :attr:`open_joint_pos` is the joint position when the fingers are open.
Note:
It is assumed that zero joint position corresponds to the fingers being closed.
"""
ee_tcp_pos = env.scene["ee_frame"].data.target_pos_w[..., 0, :]
handle_pos = env.scene["cabinet_frame"].data.target_pos_w[..., 0, :]
gripper_joint_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids]
distance = torch.norm(handle_pos - ee_tcp_pos, dim=-1, p=2)
is_close = distance <= threshold
return is_close * torch.sum(open_joint_pos - gripper_joint_pos, dim=-1)
def open_drawer_bonus(env: RLTaskEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
"""Bonus for opening the drawer given by the joint position of the drawer.
The bonus is given when the drawer is open. If the grasp is around the handle, the bonus is doubled.
"""
drawer_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids[0]]
is_graspable = align_grasp_around_handle(env).float()
return (is_graspable + 1.0) * drawer_pos
def multi_stage_open_drawer(env: RLTaskEnv, asset_cfg: SceneEntityCfg) -> torch.Tensor:
"""Multi-stage bonus for opening the drawer.
Depending on the drawer's position, the reward is given in three stages: easy, medium, and hard.
This helps the agent to learn to open the drawer in a controlled manner.
"""
drawer_pos = env.scene[asset_cfg.name].data.joint_pos[:, asset_cfg.joint_ids[0]]
is_graspable = align_grasp_around_handle(env).float()
open_easy = (drawer_pos > 0.01) * 0.5
open_medium = (drawer_pos > 0.2) * is_graspable
open_hard = (drawer_pos > 0.3) * is_graspable
return open_easy + open_medium + open_hard
......@@ -198,7 +198,7 @@ class LiftEnvCfg(RLTaskEnvCfg):
"""Configuration for the lifting environment."""
# Scene settings
scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5, replicate_physics=False)
scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
......
......@@ -5,18 +5,22 @@
import gymnasium as gym
from . import agents, env_cfg
from . import agents, ik_abs_env_cfg, ik_rel_env_cfg, joint_pos_env_cfg
##
# Register Gym environments.
##
##
# Joint Position Control
##
gym.register(
id="Isaac-Reach-Franka-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": env_cfg.FrankaReachEnvCfg,
"env_cfg_entry_point": joint_pos_env_cfg.FrankaReachEnvCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
......@@ -28,9 +32,66 @@ gym.register(
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": env_cfg.FrankaReachEnvCfg_PLAY,
"env_cfg_entry_point": joint_pos_env_cfg.FrankaReachEnvCfg_PLAY,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
},
)
##
# Inverse Kinematics - Absolute Pose Control
##
gym.register(
id="Isaac-Reach-Franka-IK-Abs-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_abs_env_cfg.FrankaReachEnvCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Reach-Franka-IK-Abs-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_abs_env_cfg.FrankaReachEnvCfg_PLAY,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
},
disable_env_checker=True,
)
##
# Inverse Kinematics - Relative Pose Control
##
gym.register(
id="Isaac-Reach-Franka-IK-Rel-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_rel_env_cfg.FrankaReachEnvCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
},
disable_env_checker=True,
)
gym.register(
id="Isaac-Reach-Franka-IK-Rel-Play-v0",
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
kwargs={
"env_cfg_entry_point": ik_rel_env_cfg.FrankaReachEnvCfg_PLAY,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_cfg:FrankaReachPPORunnerCfg",
"skrl_cfg_entry_point": f"{agents.__name__}:skrl_ppo_cfg.yaml",
},
disable_env_checker=True,
)
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
from omni.isaac.orbit.utils import configclass
from . import joint_pos_env_cfg
##
# Pre-defined configs
##
from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG # isort: skip
@configclass
class FrankaReachEnvCfg(joint_pos_env_cfg.FrankaReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set Franka as robot
# We switch here to a stiffer PD controller for IK tracking to be better.
self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Set actions for the specific robot type (franka)
self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
asset_name="robot",
joint_names=["panda_joint.*"],
body_name="panda_hand",
controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=False, ik_method="dls"),
body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
)
@configclass
class FrankaReachEnvCfg_PLAY(FrankaReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
from omni.isaac.orbit.controllers.differential_ik_cfg import DifferentialIKControllerCfg
from omni.isaac.orbit.envs.mdp.actions.actions_cfg import DifferentialInverseKinematicsActionCfg
from omni.isaac.orbit.utils import configclass
from . import joint_pos_env_cfg
##
# Pre-defined configs
##
from omni.isaac.orbit_assets.franka import FRANKA_PANDA_HIGH_PD_CFG # isort: skip
@configclass
class FrankaReachEnvCfg(joint_pos_env_cfg.FrankaReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# Set Franka as robot
# We switch here to a stiffer PD controller for IK tracking to be better.
self.scene.robot = FRANKA_PANDA_HIGH_PD_CFG.replace(prim_path="{ENV_REGEX_NS}/Robot")
# Set actions for the specific robot type (franka)
self.actions.body_joint_pos = DifferentialInverseKinematicsActionCfg(
asset_name="robot",
joint_names=["panda_joint.*"],
body_name="panda_hand",
controller=DifferentialIKControllerCfg(command_type="pose", use_relative_mode=True, ik_method="dls"),
scale=0.5,
body_offset=DifferentialInverseKinematicsActionCfg.OffsetCfg(pos=[0.0, 0.0, 0.107]),
)
@configclass
class FrankaReachEnvCfg_PLAY(FrankaReachEnvCfg):
def __post_init__(self):
# post init of parent
super().__post_init__()
# make a smaller scene for play
self.scene.num_envs = 50
self.scene.env_spacing = 2.5
# disable randomization for play
self.observations.policy.enable_corruption = False
......@@ -5,7 +5,7 @@
import gymnasium as gym
from . import agents, env_cfg
from . import agents, joint_pos_env_cfg
##
# Register Gym environments.
......@@ -16,7 +16,7 @@ gym.register(
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": env_cfg.UR10ReachEnvCfg,
"env_cfg_entry_point": joint_pos_env_cfg.UR10ReachEnvCfg,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UR10ReachPPORunnerCfg",
},
......@@ -27,7 +27,7 @@ gym.register(
entry_point="omni.isaac.orbit.envs:RLTaskEnv",
disable_env_checker=True,
kwargs={
"env_cfg_entry_point": env_cfg.UR10ReachEnvCfg_PLAY,
"env_cfg_entry_point": joint_pos_env_cfg.UR10ReachEnvCfg_PLAY,
"rl_games_cfg_entry_point": f"{agents.__name__}:rl_games_ppo_cfg.yaml",
"rsl_rl_cfg_entry_point": f"{agents.__name__}.rsl_rl_ppo_cfg:UR10ReachPPORunnerCfg",
},
......
......@@ -179,7 +179,7 @@ class ReachEnvCfg(RLTaskEnvCfg):
"""Configuration for the reach end-effector pose tracking environment."""
# Scene settings
scene: ReachSceneCfg = ReachSceneCfg(num_envs=4096, env_spacing=2.5, replicate_physics=False)
scene: ReachSceneCfg = ReachSceneCfg(num_envs=4096, env_spacing=2.5)
# Basic settings
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
......
......@@ -253,6 +253,9 @@ class RlGamesVecEnvWrapper(IVecEnv):
extras = {
k: v.to(device=self._rl_device, non_blocking=True) if hasattr(v, "to") else v for k, v in extras.items()
}
# remap extras from "log" to "episode"
if "log" in extras:
extras["episode"] = extras.pop("log")
return obs_and_states, rew, dones, extras
......
......@@ -110,9 +110,7 @@ def infer_state_machine(
sm_state[tid] = PickSmState.APPROACH_ABOVE_OBJECT
sm_wait_time[tid] = 0.0
elif state == PickSmState.APPROACH_ABOVE_OBJECT:
des_ee_pose[tid] = object_pose[tid]
# TODO: This is causing issues.
# des_ee_pose[tid] = wp.transform_multiply(des_ee_pose[tid], offset[tid])
des_ee_pose[tid] = wp.transform_multiply(offset[tid], object_pose[tid])
gripper_state[tid] = GripperState.OPEN
# TODO: error between current and desired ee pose below threshold
# wait for a while
......@@ -181,12 +179,16 @@ class PickAndLiftSm:
self.sm_dt = torch.full((self.num_envs,), self.dt, device=self.device)
self.sm_state = torch.full((self.num_envs,), 0, dtype=torch.int32, device=self.device)
self.sm_wait_time = torch.zeros((self.num_envs,), device=self.device)
# desired state
self.des_ee_pose = torch.zeros((self.num_envs, 7), device=self.device)
self.des_gripper_state = torch.full((self.num_envs,), 0.0, device=self.device)
# approach above object offset
self.offset = torch.zeros((self.num_envs, 7), device=self.device)
self.offset[:, 2] = 0.1
self.offset[:, -1] = 1.0 # warp expects quaternion as (x, y, z, w)
# convert to warp
self.sm_dt_wp = wp.from_torch(self.sm_dt, wp.float32)
self.sm_state_wp = wp.from_torch(self.sm_state, wp.int32)
......@@ -204,12 +206,17 @@ class PickAndLiftSm:
def compute(self, ee_pose: torch.Tensor, object_pose: torch.Tensor, des_object_pose: torch.Tensor):
"""Compute the desired state of the robot's end-effector and the gripper."""
# convert all transformations from (w, x, y, z) to (x, y, z, w)
ee_pose = ee_pose[:, [0, 1, 2, 4, 5, 6, 3]]
object_pose = object_pose[:, [0, 1, 2, 4, 5, 6, 3]]
des_object_pose = des_object_pose[:, [0, 1, 2, 4, 5, 6, 3]]
# convert to warp
ee_pose_wp = wp.from_torch(ee_pose.contiguous(), wp.transform)
object_pose_wp = wp.from_torch(object_pose.contiguous(), wp.transform)
des_object_pose_wp = wp.from_torch(des_object_pose.contiguous(), wp.transform)
# run state machine
# run state machine
wp.launch(
kernel=infer_state_machine,
dim=self.num_envs,
......@@ -227,8 +234,10 @@ class PickAndLiftSm:
device=self.device,
)
# convert transformations back to (w, x, y, z)
des_ee_pose = self.des_ee_pose[:, [0, 1, 2, 6, 3, 4, 5]]
# convert to torch
return torch.cat([self.des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)
return torch.cat([des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)
def main():
......
# Copyright (c) 2022-2024, The ORBIT Project Developers.
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause
"""
Script to run an environment with a cabinet opening state machine.
The state machine is implemented in the kernel function `infer_state_machine`.
It uses the `warp` library to run the state machine in parallel on the GPU.
.. code-block:: bash
./orbit.sh -p source/standalone/environments/state_machine/lift_cube_sm.py --num_envs 32
"""
"""Launch Omniverse Toolkit first."""
import argparse
from omni.isaac.orbit.app import AppLauncher
# add argparse arguments
parser = argparse.ArgumentParser(description="Pick and lift state machine for cabinet environments.")
parser.add_argument("--cpu", action="store_true", default=False, help="Use CPU pipeline.")
parser.add_argument(
"--disable_fabric", action="store_true", default=False, help="Disable fabric and use USD I/O operations."
)
parser.add_argument("--num_envs", type=int, default=None, help="Number of environments to simulate.")
# append AppLauncher cli args
AppLauncher.add_app_launcher_args(parser)
# parse the arguments
args_cli = parser.parse_args()
# launch omniverse app
app_launcher = AppLauncher(headless=args_cli.headless)
simulation_app = app_launcher.app
"""Rest everything else."""
import gymnasium as gym
import torch
import traceback
from collections.abc import Sequence
import carb
import warp as wp
from omni.isaac.orbit.sensors import FrameTransformer
import omni.isaac.orbit_tasks # noqa: F401
from omni.isaac.orbit_tasks.manipulation.cabinet.cabinet_env_cfg import CabinetEnvCfg
from omni.isaac.orbit_tasks.utils.parse_cfg import parse_env_cfg
# initialize warp
wp.init()
class GripperState:
"""States for the gripper."""
OPEN = wp.constant(1.0)
CLOSE = wp.constant(-1.0)
class OpenDrawerSmState:
"""States for the cabinet drawer opening state machine."""
REST = wp.constant(0)
APPROACH_INFRONT_HANDLE = wp.constant(1)
APPROACH_HANDLE = wp.constant(2)
GRASP_HANDLE = wp.constant(3)
OPEN_DRAWER = wp.constant(4)
RELEASE_HANDLE = wp.constant(5)
class OpenDrawerSmWaitTime:
"""Additional wait times (in s) for states for before switching."""
REST = wp.constant(0.5)
APPROACH_INFRONT_HANDLE = wp.constant(1.25)
APPROACH_HANDLE = wp.constant(1.0)
GRASP_HANDLE = wp.constant(1.0)
OPEN_DRAWER = wp.constant(3.0)
RELEASE_HANDLE = wp.constant(0.2)
@wp.kernel
def infer_state_machine(
dt: wp.array(dtype=float),
sm_state: wp.array(dtype=int),
sm_wait_time: wp.array(dtype=float),
ee_pose: wp.array(dtype=wp.transform),
handle_pose: wp.array(dtype=wp.transform),
des_ee_pose: wp.array(dtype=wp.transform),
gripper_state: wp.array(dtype=float),
handle_approach_offset: wp.array(dtype=wp.transform),
handle_grasp_offset: wp.array(dtype=wp.transform),
drawer_opening_rate: wp.array(dtype=wp.transform),
):
# retrieve thread id
tid = wp.tid()
# retrieve state machine state
state = sm_state[tid]
# decide next state
if state == OpenDrawerSmState.REST:
des_ee_pose[tid] = ee_pose[tid]
gripper_state[tid] = GripperState.OPEN
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.REST:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.APPROACH_INFRONT_HANDLE
sm_wait_time[tid] = 0.0
elif state == OpenDrawerSmState.APPROACH_INFRONT_HANDLE:
des_ee_pose[tid] = wp.transform_multiply(handle_approach_offset[tid], handle_pose[tid])
gripper_state[tid] = GripperState.OPEN
# TODO: error between current and desired ee pose below threshold
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.APPROACH_INFRONT_HANDLE:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.APPROACH_HANDLE
sm_wait_time[tid] = 0.0
elif state == OpenDrawerSmState.APPROACH_HANDLE:
des_ee_pose[tid] = handle_pose[tid]
gripper_state[tid] = GripperState.OPEN
# TODO: error between current and desired ee pose below threshold
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.APPROACH_HANDLE:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.GRASP_HANDLE
sm_wait_time[tid] = 0.0
elif state == OpenDrawerSmState.GRASP_HANDLE:
des_ee_pose[tid] = wp.transform_multiply(handle_grasp_offset[tid], handle_pose[tid])
gripper_state[tid] = GripperState.CLOSE
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.GRASP_HANDLE:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.OPEN_DRAWER
sm_wait_time[tid] = 0.0
elif state == OpenDrawerSmState.OPEN_DRAWER:
des_ee_pose[tid] = wp.transform_multiply(drawer_opening_rate[tid], handle_pose[tid])
gripper_state[tid] = GripperState.CLOSE
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.OPEN_DRAWER:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.RELEASE_HANDLE
sm_wait_time[tid] = 0.0
elif state == OpenDrawerSmState.RELEASE_HANDLE:
des_ee_pose[tid] = ee_pose[tid]
gripper_state[tid] = GripperState.CLOSE
# wait for a while
if sm_wait_time[tid] >= OpenDrawerSmWaitTime.RELEASE_HANDLE:
# move to next state and reset wait time
sm_state[tid] = OpenDrawerSmState.RELEASE_HANDLE
sm_wait_time[tid] = 0.0
# increment wait time
sm_wait_time[tid] = sm_wait_time[tid] + dt[tid]
class OpenDrawerSm:
"""A simple state machine in a robot's task space to open a drawer in the cabinet.
The state machine is implemented as a warp kernel. It takes in the current state of
the robot's end-effector and the object, and outputs the desired state of the robot's
end-effector and the gripper. The state machine is implemented as a finite state
machine with the following states:
1. REST: The robot is at rest.
2. APPROACH_HANDLE: The robot moves towards the handle of the drawer.
3. GRASP_HANDLE: The robot grasps the handle of the drawer.
4. OPEN_DRAWER: The robot opens the drawer.
5. RELEASE_HANDLE: The robot releases the handle of the drawer. This is the final state.
"""
def __init__(self, dt: float, num_envs: int, device: torch.device | str = "cpu"):
"""Initialize the state machine.
Args:
dt: The environment time step.
num_envs: The number of environments to simulate.
device: The device to run the state machine on.
"""
# save parameters
self.dt = float(dt)
self.num_envs = num_envs
self.device = device
# initialize state machine
self.sm_dt = torch.full((self.num_envs,), self.dt, device=self.device)
self.sm_state = torch.full((self.num_envs,), 0, dtype=torch.int32, device=self.device)
self.sm_wait_time = torch.zeros((self.num_envs,), device=self.device)
# desired state
self.des_ee_pose = torch.zeros((self.num_envs, 7), device=self.device)
self.des_gripper_state = torch.full((self.num_envs,), 0.0, device=self.device)
# approach infront of the handle
self.handle_approach_offset = torch.zeros((self.num_envs, 7), device=self.device)
self.handle_approach_offset[:, 0] = -0.1
self.handle_approach_offset[:, -1] = 1.0 # warp expects quaternion as (x, y, z, w)
# handle grasp offset
self.handle_grasp_offset = torch.zeros((self.num_envs, 7), device=self.device)
self.handle_grasp_offset[:, 0] = 0.025
self.handle_grasp_offset[:, -1] = 1.0 # warp expects quaternion as (x, y, z, w)
# drawer opening rate
self.drawer_opening_rate = torch.zeros((self.num_envs, 7), device=self.device)
self.drawer_opening_rate[:, 0] = -0.015
self.drawer_opening_rate[:, -1] = 1.0 # warp expects quaternion as (x, y, z, w)
# convert to warp
self.sm_dt_wp = wp.from_torch(self.sm_dt, wp.float32)
self.sm_state_wp = wp.from_torch(self.sm_state, wp.int32)
self.sm_wait_time_wp = wp.from_torch(self.sm_wait_time, wp.float32)
self.des_ee_pose_wp = wp.from_torch(self.des_ee_pose, wp.transform)
self.des_gripper_state_wp = wp.from_torch(self.des_gripper_state, wp.float32)
self.handle_approach_offset_wp = wp.from_torch(self.handle_approach_offset, wp.transform)
self.handle_grasp_offset_wp = wp.from_torch(self.handle_grasp_offset, wp.transform)
self.drawer_opening_rate_wp = wp.from_torch(self.drawer_opening_rate, wp.transform)
def reset_idx(self, env_ids: Sequence[int] | None = None):
"""Reset the state machine."""
if env_ids is None:
env_ids = slice(None)
# reset state machine
self.sm_state[env_ids] = 0
self.sm_wait_time[env_ids] = 0.0
def compute(self, ee_pose: torch.Tensor, handle_pose: torch.Tensor):
"""Compute the desired state of the robot's end-effector and the gripper."""
# convert all transformations from (w, x, y, z) to (x, y, z, w)
ee_pose = ee_pose[:, [0, 1, 2, 4, 5, 6, 3]]
handle_pose = handle_pose[:, [0, 1, 2, 4, 5, 6, 3]]
# convert to warp
ee_pose_wp = wp.from_torch(ee_pose.contiguous(), wp.transform)
handle_pose_wp = wp.from_torch(handle_pose.contiguous(), wp.transform)
# run state machine
wp.launch(
kernel=infer_state_machine,
dim=self.num_envs,
inputs=[
self.sm_dt_wp,
self.sm_state_wp,
self.sm_wait_time_wp,
ee_pose_wp,
handle_pose_wp,
self.des_ee_pose_wp,
self.des_gripper_state_wp,
self.handle_approach_offset_wp,
self.handle_grasp_offset_wp,
self.drawer_opening_rate_wp,
],
device=self.device,
)
# convert transformations back to (w, x, y, z)
des_ee_pose = self.des_ee_pose[:, [0, 1, 2, 6, 3, 4, 5]]
# convert to torch
return torch.cat([des_ee_pose, self.des_gripper_state.unsqueeze(-1)], dim=-1)
def main():
# parse configuration
env_cfg: CabinetEnvCfg = parse_env_cfg(
"Isaac-Open-Drawer-Franka-IK-Abs-v0",
use_gpu=not args_cli.cpu,
num_envs=args_cli.num_envs,
use_fabric=not args_cli.disable_fabric,
)
# create environment
env = gym.make("Isaac-Open-Drawer-Franka-IK-Abs-v0", cfg=env_cfg)
# reset environment at start
env.reset()
# create action buffers (position + quaternion)
actions = torch.zeros(env.unwrapped.action_space.shape, device=env.unwrapped.device)
actions[:, 3] = 1.0
# desired object orientation (we only do position control of object)
desired_orientation = torch.zeros((env.unwrapped.num_envs, 4), device=env.unwrapped.device)
desired_orientation[:, 1] = 1.0
# create state machine
open_sm = OpenDrawerSm(env_cfg.sim.dt * env_cfg.decimation, env.unwrapped.num_envs, env.unwrapped.device)
while simulation_app.is_running():
# run everything in inference mode
with torch.inference_mode():
# step environment
dones = env.step(actions)[-2]
# observations
# -- end-effector frame
ee_frame_tf: FrameTransformer = env.unwrapped.scene["ee_frame"]
tcp_rest_position = ee_frame_tf.data.target_pos_w[..., 0, :].clone() - env.unwrapped.scene.env_origins
tcp_rest_orientation = ee_frame_tf.data.target_quat_w[..., 0, :].clone()
# -- handle frame
cabinet_frame_tf: FrameTransformer = env.unwrapped.scene["cabinet_frame"]
cabinet_position = cabinet_frame_tf.data.target_pos_w[..., 0, :].clone() - env.unwrapped.scene.env_origins
cabinet_orientation = cabinet_frame_tf.data.target_quat_w[..., 0, :].clone()
# advance state machine
actions = open_sm.compute(
torch.cat([tcp_rest_position, tcp_rest_orientation], dim=-1),
torch.cat([cabinet_position, cabinet_orientation], dim=-1),
)
# reset state machine
if dones.any():
open_sm.reset_idx(dones.nonzero(as_tuple=False).squeeze(-1))
# close the environment
env.close()
if __name__ == "__main__":
try:
# run the main execution
main()
except Exception as err:
carb.log_error(err)
carb.log_error(traceback.format_exc())
raise
finally:
# close sim app
simulation_app.close()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment