Commit 02b0d76c authored by Kelly Guo's avatar Kelly Guo Committed by David Hoeller

Updates tiled rendering API with full RTX rendering and additional annotators (#97)

This change updates the current tiled rendering APIs to use the full RTX
tiled rendering feature, allowing for higher quality RGB renders and
support of additional annotators, including semantic segmentation,
instance segmentation, normals, and motion vectors.

This change also aligns output dimensions across TiledCamera, Camera,
and RayCasterCamera classes. All single-channel outputs will now have
dimension (H, W, C). Camera class now outputs RGB data with shape (H, W,
3).

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)
- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- This change requires a documentation update

Fixes issue https://github.com/isaac-sim/IsaacLab/issues/775

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------
Co-authored-by: 's avatarAlexander <143108850+nv-apoddubny@users.noreply.github.com>
Co-authored-by: 's avatarToni-SM <aserranomuno@nvidia.com>
parent 52422b6b
...@@ -9,9 +9,9 @@ Tiled Rendering ...@@ -9,9 +9,9 @@ Tiled Rendering
.. note:: .. note::
This feature is only available from Isaac Sim version 4.0.0 onwards. This feature is only available from Isaac Sim version 4.2.0 onwards.
Tiled rendering requires heavy memory resources. We recommend running at most 256 cameras in the scene. Tiled rendering in combination with image processing networks require heavy memory resources, especially at larger resolutions. We recommend running at 256 cameras in the scene on RTX 4090 GPUs or similar.
Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors. Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors.
This is useful for reinforcement learning environments requiring vision in the loop. This is useful for reinforcement learning environments requiring vision in the loop.
...@@ -20,7 +20,7 @@ one single large image instead of multiple smaller images that would have been p ...@@ -20,7 +20,7 @@ one single large image instead of multiple smaller images that would have been p
by each individual camera. This reduces the amount of time required for rendering and by each individual camera. This reduces the amount of time required for rendering and
provides a more efficient API for working with vision data. provides a more efficient API for working with vision data.
Isaac Lab provides tiled rendering APIs for RGB and depth data through the :class:`~sensors.TiledCamera` Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera`
class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg` class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg`
class, specifying parameters such as the regex expression for all camera paths, the transform class, specifying parameters such as the regex expression for all camera paths, the transform
for the cameras, the desired data type, the type of cameras to add to the scene, and the camera for the cameras, the desired data type, the type of cameras to add to the scene, and the camera
...@@ -59,6 +59,80 @@ environment. For example: ...@@ -59,6 +59,80 @@ environment. For example:
python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras
Annotators and Data Types
^^^^^^^^^^^^^^^^^^^^^^^^^
Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide APIs for retrieving various types annotator data from replicator:
* ``"rgb"``: A 3-channel rendered color image.
* ``"rgba"``: A 4-channel rendered color image with alpha channel.
* ``"distance_to_camera"``: An image containing the distance to camera optical center.
* ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
* ``"depth"``: The same as ``"distance_to_image_plane"``.
* ``"normals"``: An image containing the local surface normal vectors at each pixel.
* ``"motion_vectors"``: An image containing the motion vector data at each pixel.
* ``"semantic_segmentation"``: The semantic segmentation data.
* ``"instance_segmentation_fast"``: The instance segmentation data.
* ``"instance_id_segmentation_fast"``: The instance id segmentation data.
RGB and RGBA
""""""""""""
``rgb`` data type returns a 3-channel RGB colored image of type ``torch.uint8``, with dimension (B, H, W, 3).
``rgba`` data type returns a 4-channel RGBA colored image of type ``torch.uint8``, with dimension (B, H, W, 4).
To convert the ``torch.uint8`` data to ``torch.float32``, divide the buffer by 255.0 to obtain a ``torch.float32`` buffer containing data from 0 to 1.
Depth and Distances
"""""""""""""""""""
``distance_to_camera`` returns a single-channel depth image with distance to the camera optical center. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
``distance_to_image_plane`` returns a single-channel depth image with distances of 3D points from the camera plane along the camera's Z-axis. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
``depth`` is provided as an alias for ``distance_to_image_plane`` and will return the same data as the ``distance_to_image_plane`` annotator, with dimension (B, H, W, 1) and type ``torch.float32``.
Normals
"""""""
``normals`` returns an image containing the local surface normal vectors at each pixel. The buffer has dimension (B, H, W, 3), containing the (x, y, z) information for each vector, and has data type ``torch.float32``.
Motion Vectors
""""""""""""""
``motion_vectors`` returns the per-pixel motion vectors in image space, with a 2D array of motion vectors representing the relative motion of a pixel in the camera’s viewport between frames. The buffer has dimension (B, H, W, 2), representing x - the motion distance in the horizontal axis (image width) with movement to the left of the image being positive and movement to the right being negative and y - motion distance in the vertical axis (image height) with movement towards the top of the image being positive and movement to the bottom being negative. The data type is ``torch.float32``.
Semantic Segmentation
"""""""""""""""""""""
``semantic_segmentation`` outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['semantic_segmentation']`` containing ID to labels information.
If ``colorize_semantic_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to semantic labels.
If ``colorize_semantic_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the semantic ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from semantic ID to semantic labels.
Instance ID Segmentation
""""""""""""""""""""""""
``instance_id_segmentation_fast`` outputs instance ID segmentation of each entity in the camera’s viewport. The instance ID is unique for each prim in the scene with different paths. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_id_segmentation_fast']`` containing ID to labels information.
The main difference between ``instance_id_segmentation_fast`` and ``instance_segmentation_fast`` are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, where instance ID segmentation always goes down to the leaf prim.
If ``colorize_instance_id_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that entity.
If ``colorize_instance_id_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that entity.
Instance Segmentation
"""""""""""""""""""""
``instance_segmentation_fast`` outputs instance segmentation of each entity in the camera’s viewport. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_segmentation_fast']`` containing ID to labels and ID to semantic information.
If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from instance ID to semantic labels of that semantic entity.
Recording during training Recording during training
------------------------- -------------------------
......
...@@ -43,6 +43,9 @@ exts."omni.kit.window.viewport".blockingGetViewportDrawable = false ...@@ -43,6 +43,9 @@ exts."omni.kit.window.viewport".blockingGetViewportDrawable = false
# Fix PlayButtonGroup error # Fix PlayButtonGroup error
exts."omni.kit.widget.toolbar".PlayButton.enabled = false exts."omni.kit.widget.toolbar".PlayButton.enabled = false
# disable replicator orchestrator for better runtime perf
exts."omni.replicator.core".Orchestrator.enabled = false
[settings.app.settings] [settings.app.settings]
persistent = true persistent = true
dev_build = false dev_build = false
......
...@@ -37,18 +37,24 @@ app.version = "4.1.0" ...@@ -37,18 +37,24 @@ app.version = "4.1.0"
# set the default ros bridge to disable on startup # set the default ros bridge to disable on startup
isaac.startup.ros_bridge_extension = "" isaac.startup.ros_bridge_extension = ""
# Increase available descriptors to support more simultaneous cameras # Flags for better rendering performance
rtx.descriptorSets=30000 rtx.translucency.enabled = false
rtx.reflections.enabled = false
rtx.indirectDiffuse.enabled = false
rtx.transient.dlssg.enabled = false
rtx.directLighting.sampledLighting.enabled = true
rtx.directLighting.sampledLighting.samplesPerPixel = 1
rtx.sceneDb.ambientLightIntensity = 1.0
# rtx.shadows.enabled = false
# Enable new denoiser to reduce motion blur artifacts # Avoids replicator warning
rtx.newDenoiser.enabled=true rtx.pathtracing.maxSamplesPerLaunch = 1000000
# Disable present thread to improve performance # Disable present thread to improve performance
exts."omni.renderer.core".present.enabled=false exts."omni.renderer.core".present.enabled=false
# Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
rtx.raytracing.cached.enabled = false rtx.raytracing.cached.enabled = false
rtx.raytracing.lightcache.spatialCache.enabled = false
rtx.ambientOcclusion.enabled = false rtx.ambientOcclusion.enabled = false
rtx-transient.dlssg.enabled = false rtx-transient.dlssg.enabled = false
...@@ -61,6 +67,8 @@ renderer.multiGpu.maxGpuCount=1 ...@@ -61,6 +67,8 @@ renderer.multiGpu.maxGpuCount=1
# Force synchronous rendering to improve training results # Force synchronous rendering to improve training results
omni.replicator.asyncRendering = false omni.replicator.asyncRendering = false
# Avoids frame offset issue
app.updateOrder.checkForHydraRenderComplete = 1000
app.renderer.waitIdle=true app.renderer.waitIdle=true
app.hydraEngine.waitIdle=true app.hydraEngine.waitIdle=true
...@@ -69,6 +77,9 @@ app.audio.enabled = false ...@@ -69,6 +77,9 @@ app.audio.enabled = false
# Enable Vulkan - avoids torch+cu12 error on windows # Enable Vulkan - avoids torch+cu12 error on windows
app.vulkan = true app.vulkan = true
# disable replicator orchestrator for better runtime perf
exts."omni.replicator.core".Orchestrator.enabled = false
[settings.exts."omni.kit.registry.nucleus"] [settings.exts."omni.kit.registry.nucleus"]
registries = [ registries = [
{ name = "kit/default", url = "https://ovextensionsprod.blob.core.windows.net/exts/kit/prod/shared" }, { name = "kit/default", url = "https://ovextensionsprod.blob.core.windows.net/exts/kit/prod/shared" },
......
...@@ -216,6 +216,9 @@ app.audio.enabled = false ...@@ -216,6 +216,9 @@ app.audio.enabled = false
# Enable Vulkan - avoids torch+cu12 error on windows # Enable Vulkan - avoids torch+cu12 error on windows
app.vulkan = true app.vulkan = true
# disable replicator orchestrator for better runtime perf
exts."omni.replicator.core".Orchestrator.enabled = false
# Basic Kit App # Basic Kit App
################################ ################################
app.versionFile = "${exe-path}/VERSION" app.versionFile = "${exe-path}/VERSION"
......
...@@ -37,18 +37,24 @@ app.version = "4.1.0" ...@@ -37,18 +37,24 @@ app.version = "4.1.0"
# set the default ros bridge to disable on startup # set the default ros bridge to disable on startup
isaac.startup.ros_bridge_extension = "" isaac.startup.ros_bridge_extension = ""
# Increase available descriptors to support more simultaneous cameras # Flags for better rendering performance
rtx.descriptorSets=30000 rtx.translucency.enabled = false
rtx.reflections.enabled = false
rtx.indirectDiffuse.enabled = false
rtx.transient.dlssg.enabled = false
rtx.directLighting.sampledLighting.enabled = true
rtx.directLighting.sampledLighting.samplesPerPixel = 1
rtx.sceneDb.ambientLightIntensity = 1.0
# rtx.shadows.enabled = false
# Enable new denoiser to reduce motion blur artifacts # Avoids replicator warning
rtx.newDenoiser.enabled=true rtx.pathtracing.maxSamplesPerLaunch = 1000000
# Disable present thread to improve performance # Disable present thread to improve performance
exts."omni.renderer.core".present.enabled=false exts."omni.renderer.core".present.enabled=false
# Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
rtx.raytracing.cached.enabled = false rtx.raytracing.cached.enabled = false
rtx.raytracing.lightcache.spatialCache.enabled = false
rtx.ambientOcclusion.enabled = false rtx.ambientOcclusion.enabled = false
rtx-transient.dlssg.enabled = false rtx-transient.dlssg.enabled = false
...@@ -61,11 +67,16 @@ renderer.multiGpu.maxGpuCount=1 ...@@ -61,11 +67,16 @@ renderer.multiGpu.maxGpuCount=1
# Force synchronous rendering to improve training results # Force synchronous rendering to improve training results
omni.replicator.asyncRendering = false omni.replicator.asyncRendering = false
# Avoids frame offset issue
app.updateOrder.checkForHydraRenderComplete = 1000
app.renderer.waitIdle=true app.renderer.waitIdle=true
app.hydraEngine.waitIdle=true app.hydraEngine.waitIdle=true
app.audio.enabled = false app.audio.enabled = false
# disable replicator orchestrator for better runtime perf
exts."omni.replicator.core".Orchestrator.enabled = false
[settings.physics] [settings.physics]
updateToUsd = false updateToUsd = false
updateParticlesToUsd = false updateParticlesToUsd = false
......
[package] [package]
# Note: Semantic Versioning is used: https://semver.org/ # Note: Semantic Versioning is used: https://semver.org/
version = "0.23.10" version = "0.24.10"
# Description # Description
title = "Isaac Lab framework for Robot Learning" title = "Isaac Lab framework for Robot Learning"
......
Changelog Changelog
--------- ---------
0.23.10 (2024-09-10) 0.24.10 (2024-09-10)
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
Added Added
...@@ -10,7 +10,7 @@ Added ...@@ -10,7 +10,7 @@ Added
* Added config class, support, and tests for MJCF conversion via standalone python scripts. * Added config class, support, and tests for MJCF conversion via standalone python scripts.
0.23.9 (2024-09-09) 0.24.9 (2024-09-09)
~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~
Added Added
...@@ -22,7 +22,7 @@ Added ...@@ -22,7 +22,7 @@ Added
file or the command line argument. This ensures that the simulation results are reproducible across different runs. file or the command line argument. This ensures that the simulation results are reproducible across different runs.
0.23.8 (2024-09-08) 0.24.8 (2024-09-08)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Changed Changed
...@@ -32,7 +32,7 @@ Changed ...@@ -32,7 +32,7 @@ Changed
for faster processing of high dimensional input tensors. for faster processing of high dimensional input tensors.
0.23.7 (2024-09-06) 0.24.7 (2024-09-06)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Added Added
...@@ -43,7 +43,7 @@ Added ...@@ -43,7 +43,7 @@ Added
instance variables instead. instance variables instead.
0.23.6 (2024-09-05) 0.24.6 (2024-09-05)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Fixed Fixed
...@@ -53,7 +53,7 @@ Fixed ...@@ -53,7 +53,7 @@ Fixed
more-intuitive to control the y-axis motion based on the right-hand rule. more-intuitive to control the y-axis motion based on the right-hand rule.
0.23.5 (2024-08-29) 0.24.5 (2024-08-29)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Added Added
...@@ -63,7 +63,7 @@ Added ...@@ -63,7 +63,7 @@ Added
consistent with all other cameras (equal to type "depth"). consistent with all other cameras (equal to type "depth").
0.23.4 (2024-09-02) 0.24.4 (2024-09-02)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Fixed Fixed
...@@ -74,7 +74,7 @@ Fixed ...@@ -74,7 +74,7 @@ Fixed
* Added test to check :attr:`omni.isaac.lab.sensors.RayCasterCamera.set_intrinsic_matrices` * Added test to check :attr:`omni.isaac.lab.sensors.RayCasterCamera.set_intrinsic_matrices`
0.23.3 (2024-08-29) 0.24.3 (2024-08-29)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Fixed Fixed
...@@ -85,7 +85,7 @@ Fixed ...@@ -85,7 +85,7 @@ Fixed
which required initialization of the class to call the class-methods. which required initialization of the class to call the class-methods.
0.23.2 (2024-08-28) 0.24.2 (2024-08-28)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Added Added
...@@ -106,7 +106,7 @@ Fixed ...@@ -106,7 +106,7 @@ Fixed
the behavior equal to the USD Camera. the behavior equal to the USD Camera.
0.23.1 (2024-08-21) 0.24.1 (2024-08-21)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Changed Changed
...@@ -115,6 +115,23 @@ Changed ...@@ -115,6 +115,23 @@ Changed
* Disabled default viewport in certain headless scenarios for better performance. * Disabled default viewport in certain headless scenarios for better performance.
0.24.0 (2024-08-17)
~~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added additional annotators for :class:`omni.isaac.lab.sensors.camera.TiledCamera` class.
Changed
^^^^^^^
* Updated :class:`omni.isaac.lab.sensors.TiledCamera` to latest RTX tiled rendering API.
* Single channel outputs for :class:`omni.isaac.lab.sensors.TiledCamera`, :class:`omni.isaac.lab.sensors.Camera` and :class:`omni.isaac.lab.sensors.RayCasterCamera` now has shape (H, W, 1).
* Data type for RGB output for :class:`omni.isaac.lab.sensors.TiledCamera` changed from ``torch.float`` to ``torch.uint8``.
* Dimension of RGB output for :class:`omni.isaac.lab.sensors.Camera` changed from (H, W, 4) to (H, W, 3). Use type ``rgba`` to retrieve the previous dimension.
0.23.1 (2024-08-17) 0.23.1 (2024-08-17)
~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
......
...@@ -39,9 +39,11 @@ class Camera(SensorBase): ...@@ -39,9 +39,11 @@ class Camera(SensorBase):
Summarizing from the `replicator extension`_, the following sensor types are supported: Summarizing from the `replicator extension`_, the following sensor types are supported:
- ``"rgb"``: A rendered color image. - ``"rgb"``: A 3-channel rendered color image.
- ``"rgba"``: A 4-channel rendered color image with alpha channel.
- ``"distance_to_camera"``: An image containing the distance to camera optical center. - ``"distance_to_camera"``: An image containing the distance to camera optical center.
- ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis. - ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
- ``"depth"``: The same as ``"distance_to_image_plane"``.
- ``"normals"``: An image containing the local surface normal vectors at each pixel. - ``"normals"``: An image containing the local surface normal vectors at each pixel.
- ``"motion_vectors"``: An image containing the motion vector data at each pixel. - ``"motion_vectors"``: An image containing the motion vector data at each pixel.
- ``"semantic_segmentation"``: The semantic segmentation data. - ``"semantic_segmentation"``: The semantic segmentation data.
...@@ -458,8 +460,14 @@ class Camera(SensorBase): ...@@ -458,8 +460,14 @@ class Camera(SensorBase):
else: else:
device_name = "cpu" device_name = "cpu"
# create annotator node # Map special cases to their corresponding annotator names
rep_annotator = rep.AnnotatorRegistry.get_annotator(name, init_params, device=device_name) special_cases = {"rgba": "rgb", "depth": "distance_to_image_plane"}
# Get the annotator name, falling back to the original name if not a special case
annotator_name = special_cases.get(name, name)
# Create the annotator node
rep_annotator = rep.AnnotatorRegistry.get_annotator(annotator_name, init_params, device=device_name)
# attach annotator to render product
rep_annotator.attach(render_prod_path) rep_annotator.attach(render_prod_path)
# add to registry # add to registry
self._rep_registry[name].append(rep_annotator) self._rep_registry[name].append(rep_annotator)
...@@ -632,17 +640,27 @@ class Camera(SensorBase): ...@@ -632,17 +640,27 @@ class Camera(SensorBase):
if self.cfg.colorize_semantic_segmentation: if self.cfg.colorize_semantic_segmentation:
data = data.view(torch.uint8).reshape(height, width, -1) data = data.view(torch.uint8).reshape(height, width, -1)
else: else:
data = data.view(height, width) data = data.view(height, width, 1)
elif name == "instance_segmentation_fast": elif name == "instance_segmentation_fast":
if self.cfg.colorize_instance_segmentation: if self.cfg.colorize_instance_segmentation:
data = data.view(torch.uint8).reshape(height, width, -1) data = data.view(torch.uint8).reshape(height, width, -1)
else: else:
data = data.view(height, width) data = data.view(height, width, 1)
elif name == "instance_id_segmentation_fast": elif name == "instance_id_segmentation_fast":
if self.cfg.colorize_instance_id_segmentation: if self.cfg.colorize_instance_id_segmentation:
data = data.view(torch.uint8).reshape(height, width, -1) data = data.view(torch.uint8).reshape(height, width, -1)
else: else:
data = data.view(height, width) data = data.view(height, width, 1)
# make sure buffer dimensions are consistent as (H, W, C)
elif name == "distance_to_camera" or name == "distance_to_image_plane" or name == "depth":
data = data.view(height, width, 1)
# we only return the RGB channels from the RGBA output if rgb is required
# normals return (x, y, z) in first 3 channels, 4th channel is unused
elif name == "rgb" or name == "normals":
data = data[..., :3]
# motion vectors return (x, y) in first 2 channels, 3rd and 4th channels are unused
elif name == "motion_vectors":
data = data[..., :2]
# return the data and info # return the data and info
return data, info return data, info
......
...@@ -3,67 +3,18 @@ ...@@ -3,67 +3,18 @@
# #
# SPDX-License-Identifier: BSD-3-Clause # SPDX-License-Identifier: BSD-3-Clause
from dataclasses import MISSING
from typing import Literal
from omni.isaac.lab.sim import FisheyeCameraCfg, PinholeCameraCfg
from omni.isaac.lab.utils import configclass from omni.isaac.lab.utils import configclass
from ..sensor_base_cfg import SensorBaseCfg from .camera_cfg import CameraCfg
from .tiled_camera import TiledCamera from .tiled_camera import TiledCamera
@configclass @configclass
class TiledCameraCfg(SensorBaseCfg): class TiledCameraCfg(CameraCfg):
"""Configuration for a tiled rendering-based camera sensor.""" """Configuration for a tiled rendering-based camera sensor."""
@configclass
class OffsetCfg:
"""The offset pose of the sensor's frame from the sensor's parent frame."""
pos: tuple[float, float, float] = (0.0, 0.0, 0.0)
"""Translation w.r.t. the parent frame. Defaults to (0.0, 0.0, 0.0)."""
rot: tuple[float, float, float, float] = (1.0, 0.0, 0.0, 0.0)
"""Quaternion rotation (w, x, y, z) w.r.t. the parent frame. Defaults to (1.0, 0.0, 0.0, 0.0)."""
convention: Literal["opengl", "ros", "world"] = "ros"
"""The convention in which the frame offset is applied. Defaults to "ros".
- ``"opengl"`` - forward axis: ``-Z`` - up axis: ``+Y`` - Offset is applied in the OpenGL (Usd.Camera) convention.
- ``"ros"`` - forward axis: ``+Z`` - up axis: ``-Y`` - Offset is applied in the ROS convention.
- ``"world"`` - forward axis: ``+X`` - up axis: ``+Z`` - Offset is applied in the World Frame convention.
"""
class_type: type = TiledCamera class_type: type = TiledCamera
offset: OffsetCfg = OffsetCfg()
"""The offset pose of the sensor's frame from the sensor's parent frame. Defaults to identity.
Note:
The parent frame is the frame the sensor attaches to. For example, the parent frame of a
camera at path ``/World/envs/env_0/Robot/Camera`` is ``/World/envs/env_0/Robot``.
"""
spawn: PinholeCameraCfg | FisheyeCameraCfg | None = MISSING
"""Spawn configuration for the asset.
If None, then the prim is not spawned by the asset. Instead, it is assumed that the
asset is already present in the scene.
"""
data_types: list[str] = ["rgb"]
"""List of sensor names/types to enable for the camera. Defaults to ["rgb"].
Please refer to the :class:`TiledCamera` class for a list of available data types.
"""
width: int = MISSING
"""Width of the image in pixels."""
height: int = MISSING
"""Height of the image in pixels."""
return_latest_camera_pose: bool = False return_latest_camera_pose: bool = False
"""Whether to return the latest camera pose when fetching the camera's data. Defaults to False. """Whether to return the latest camera pose when fetching the camera's data. Defaults to False.
......
...@@ -293,10 +293,12 @@ class RayCasterCamera(RayCaster): ...@@ -293,10 +293,12 @@ class RayCasterCamera(RayCaster):
)[:, :, 0] )[:, :, 0]
# apply the maximum distance after the transformation # apply the maximum distance after the transformation
distance_to_image_plane = torch.clip(distance_to_image_plane, max=self.cfg.max_distance) distance_to_image_plane = torch.clip(distance_to_image_plane, max=self.cfg.max_distance)
self._data.output["distance_to_image_plane"][env_ids] = distance_to_image_plane.view(-1, *self.image_shape) self._data.output["distance_to_image_plane"][env_ids] = distance_to_image_plane.view(
-1, *self.image_shape, 1
)
if "distance_to_camera" in self.cfg.data_types: if "distance_to_camera" in self.cfg.data_types:
self._data.output["distance_to_camera"][env_ids] = torch.clip( self._data.output["distance_to_camera"][env_ids] = torch.clip(
ray_depth.view(-1, *self.image_shape), max=self.cfg.max_distance ray_depth.view(-1, *self.image_shape, 1), max=self.cfg.max_distance
) )
if "normals" in self.cfg.data_types: if "normals" in self.cfg.data_types:
self._data.output["normals"][env_ids] = ray_normal.view(-1, *self.image_shape, 3) self._data.output["normals"][env_ids] = ray_normal.view(-1, *self.image_shape, 3)
...@@ -343,7 +345,7 @@ class RayCasterCamera(RayCaster): ...@@ -343,7 +345,7 @@ class RayCasterCamera(RayCaster):
self._data.info = [{name: None for name in self.cfg.data_types}] * self._view.count self._data.info = [{name: None for name in self.cfg.data_types}] * self._view.count
for name in self.cfg.data_types: for name in self.cfg.data_types:
if name in ["distance_to_image_plane", "distance_to_camera"]: if name in ["distance_to_image_plane", "distance_to_camera"]:
shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width) shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width, 1)
elif name in ["normals"]: elif name in ["normals"]:
shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width, 3) shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width, 3)
else: else:
......
...@@ -5,6 +5,8 @@ ...@@ -5,6 +5,8 @@
"""Custom kernels for warp.""" """Custom kernels for warp."""
from typing import Any
import warp as wp import warp as wp
...@@ -75,13 +77,12 @@ def raycast_mesh_kernel( ...@@ -75,13 +77,12 @@ def raycast_mesh_kernel(
@wp.kernel @wp.kernel
def reshape_tiled_image( def reshape_tiled_image(
tiled_image_buffer: wp.array(dtype=float), tiled_image_buffer: Any,
batched_image: wp.array(dtype=float, ndim=4), batched_image: Any,
image_height: int, image_height: int,
image_width: int, image_width: int,
num_channels: int, num_channels: int,
num_tiles_x: int, num_tiles_x: int,
offset: int,
): ):
"""Reshapes a tiled image into a batch of images. """Reshapes a tiled image into a batch of images.
...@@ -96,7 +97,6 @@ def reshape_tiled_image( ...@@ -96,7 +97,6 @@ def reshape_tiled_image(
image_height: The height of the image. image_height: The height of the image.
num_channels: The number of channels in the image. num_channels: The number of channels in the image.
num_tiles_x: The number of tiles in x-direction. num_tiles_x: The number of tiles in x-direction.
offset: The offset in the image buffer. This is used when multiple image types are concatenated in the buffer.
""" """
# get the thread id # get the thread id
camera_id, height_id, width_id = wp.tid() camera_id, height_id, width_id = wp.tid()
...@@ -106,12 +106,28 @@ def reshape_tiled_image( ...@@ -106,12 +106,28 @@ def reshape_tiled_image(
tile_y_id = camera_id // num_tiles_x tile_y_id = camera_id // num_tiles_x
# compute the start index of the pixel in the tiled image buffer # compute the start index of the pixel in the tiled image buffer
pixel_start = ( pixel_start = (
offset num_channels * num_tiles_x * image_width * (image_height * tile_y_id + height_id)
+ num_channels * num_tiles_x * image_width * (image_height * tile_y_id + height_id)
+ num_channels * tile_x_id * image_width + num_channels * tile_x_id * image_width
+ num_channels * width_id + num_channels * width_id
) )
# copy the pixel values into the batched image # copy the pixel values into the batched image
for i in range(num_channels): for i in range(num_channels):
batched_image[camera_id, height_id, width_id, i] = tiled_image_buffer[pixel_start + i] batched_image[camera_id, height_id, width_id, i] = batched_image.dtype(tiled_image_buffer[pixel_start + i])
# uint32 -> int32 conversion is required for non-colored segmentation annotators
wp.overload(
reshape_tiled_image,
{"tiled_image_buffer": wp.array(dtype=wp.uint32), "batched_image": wp.array(dtype=wp.uint32, ndim=4)},
)
# uint8 is used for 4 channel annotators
wp.overload(
reshape_tiled_image,
{"tiled_image_buffer": wp.array(dtype=wp.uint8), "batched_image": wp.array(dtype=wp.uint8, ndim=4)},
)
# float32 is used for single channel annotators
wp.overload(
reshape_tiled_image,
{"tiled_image_buffer": wp.array(dtype=wp.float32), "batched_image": wp.array(dtype=wp.float32, ndim=4)},
)
...@@ -122,7 +122,7 @@ class TestCamera(unittest.TestCase): ...@@ -122,7 +122,7 @@ class TestCamera(unittest.TestCase):
camera.update(self.dt) camera.update(self.dt)
# check image data # check image data
for im_data in camera.data.output.to_dict().values(): for im_data in camera.data.output.to_dict().values():
self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width)) self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 1))
def test_camera_init_offset(self): def test_camera_init_offset(self):
"""Test camera initialization with offset using different conventions.""" """Test camera initialization with offset using different conventions."""
...@@ -229,7 +229,7 @@ class TestCamera(unittest.TestCase): ...@@ -229,7 +229,7 @@ class TestCamera(unittest.TestCase):
# check image data # check image data
for cam in [cam_1, cam_2]: for cam in [cam_1, cam_2]:
for im_data in cam.data.output.to_dict().values(): for im_data in cam.data.output.to_dict().values():
self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width)) self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 1))
def test_multi_camera_with_different_resolution(self): def test_multi_camera_with_different_resolution(self):
"""Test multi-camera initialization with cameras having different image resolutions.""" """Test multi-camera initialization with cameras having different image resolutions."""
...@@ -399,8 +399,12 @@ class TestCamera(unittest.TestCase): ...@@ -399,8 +399,12 @@ class TestCamera(unittest.TestCase):
camera_cfg = copy.deepcopy(self.camera_cfg) camera_cfg = copy.deepcopy(self.camera_cfg)
camera_cfg.data_types = [ camera_cfg.data_types = [
"rgb", "rgb",
"rgba",
"depth",
"distance_to_camera",
"distance_to_image_plane", "distance_to_image_plane",
"normals", "normals",
"motion_vectors",
"semantic_segmentation", "semantic_segmentation",
"instance_segmentation_fast", "instance_segmentation_fast",
"instance_id_segmentation_fast", "instance_id_segmentation_fast",
...@@ -422,22 +426,32 @@ class TestCamera(unittest.TestCase): ...@@ -422,22 +426,32 @@ class TestCamera(unittest.TestCase):
camera.update(self.dt) camera.update(self.dt)
# expected sizes # expected sizes
hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 4) hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
hw_1c_shape = (1, camera_cfg.height, camera_cfg.width) hw_2c_shape = (1, camera_cfg.height, camera_cfg.width, 2)
hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
# access image data and compare shapes # access image data and compare shapes
output = camera.data.output output = camera.data.output
self.assertEqual(output["rgb"].shape, hw_3c_shape) self.assertEqual(output["rgb"].shape, hw_3c_shape)
self.assertEqual(output["rgba"].shape, hw_4c_shape)
self.assertEqual(output["depth"].shape, hw_1c_shape)
self.assertEqual(output["distance_to_camera"].shape, hw_1c_shape)
self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape) self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape)
self.assertEqual(output["normals"].shape, hw_3c_shape) self.assertEqual(output["normals"].shape, hw_3c_shape)
self.assertEqual(output["semantic_segmentation"].shape, hw_3c_shape) self.assertEqual(output["motion_vectors"].shape, hw_2c_shape)
self.assertEqual(output["instance_segmentation_fast"].shape, hw_3c_shape) self.assertEqual(output["semantic_segmentation"].shape, hw_4c_shape)
self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_3c_shape) self.assertEqual(output["instance_segmentation_fast"].shape, hw_4c_shape)
self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_4c_shape)
# access image data and compare dtype # access image data and compare dtype
output = camera.data.output output = camera.data.output
self.assertEqual(output["rgb"].dtype, torch.uint8) self.assertEqual(output["rgb"].dtype, torch.uint8)
self.assertEqual(output["rgba"].dtype, torch.uint8)
self.assertEqual(output["depth"].dtype, torch.float)
self.assertEqual(output["distance_to_camera"].dtype, torch.float)
self.assertEqual(output["distance_to_image_plane"].dtype, torch.float) self.assertEqual(output["distance_to_image_plane"].dtype, torch.float)
self.assertEqual(output["normals"].dtype, torch.float) self.assertEqual(output["normals"].dtype, torch.float)
self.assertEqual(output["motion_vectors"].dtype, torch.float)
self.assertEqual(output["semantic_segmentation"].dtype, torch.uint8) self.assertEqual(output["semantic_segmentation"].dtype, torch.uint8)
self.assertEqual(output["instance_segmentation_fast"].dtype, torch.uint8) self.assertEqual(output["instance_segmentation_fast"].dtype, torch.uint8)
self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.uint8) self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.uint8)
...@@ -448,8 +462,12 @@ class TestCamera(unittest.TestCase): ...@@ -448,8 +462,12 @@ class TestCamera(unittest.TestCase):
camera_cfg = copy.deepcopy(self.camera_cfg) camera_cfg = copy.deepcopy(self.camera_cfg)
camera_cfg.data_types = [ camera_cfg.data_types = [
"rgb", "rgb",
"rgba",
"depth",
"distance_to_camera",
"distance_to_image_plane", "distance_to_image_plane",
"normals", "normals",
"motion_vectors",
"semantic_segmentation", "semantic_segmentation",
"instance_segmentation_fast", "instance_segmentation_fast",
"instance_id_segmentation_fast", "instance_id_segmentation_fast",
...@@ -470,13 +488,19 @@ class TestCamera(unittest.TestCase): ...@@ -470,13 +488,19 @@ class TestCamera(unittest.TestCase):
camera.update(self.dt) camera.update(self.dt)
# expected sizes # expected sizes
hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 4) hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
hw_1c_shape = (1, camera_cfg.height, camera_cfg.width) hw_2c_shape = (1, camera_cfg.height, camera_cfg.width, 2)
hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
# access image data and compare shapes # access image data and compare shapes
output = camera.data.output output = camera.data.output
self.assertEqual(output["rgb"].shape, hw_3c_shape) self.assertEqual(output["rgb"].shape, hw_3c_shape)
self.assertEqual(output["rgba"].shape, hw_4c_shape)
self.assertEqual(output["depth"].shape, hw_1c_shape)
self.assertEqual(output["distance_to_camera"].shape, hw_1c_shape)
self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape) self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape)
self.assertEqual(output["normals"].shape, hw_3c_shape) self.assertEqual(output["normals"].shape, hw_3c_shape)
self.assertEqual(output["motion_vectors"].shape, hw_2c_shape)
self.assertEqual(output["semantic_segmentation"].shape, hw_1c_shape) self.assertEqual(output["semantic_segmentation"].shape, hw_1c_shape)
self.assertEqual(output["instance_segmentation_fast"].shape, hw_1c_shape) self.assertEqual(output["instance_segmentation_fast"].shape, hw_1c_shape)
self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_1c_shape) self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_1c_shape)
...@@ -484,12 +508,100 @@ class TestCamera(unittest.TestCase): ...@@ -484,12 +508,100 @@ class TestCamera(unittest.TestCase):
# access image data and compare dtype # access image data and compare dtype
output = camera.data.output output = camera.data.output
self.assertEqual(output["rgb"].dtype, torch.uint8) self.assertEqual(output["rgb"].dtype, torch.uint8)
self.assertEqual(output["rgba"].dtype, torch.uint8)
self.assertEqual(output["depth"].dtype, torch.float)
self.assertEqual(output["distance_to_camera"].dtype, torch.float)
self.assertEqual(output["distance_to_image_plane"].dtype, torch.float) self.assertEqual(output["distance_to_image_plane"].dtype, torch.float)
self.assertEqual(output["normals"].dtype, torch.float) self.assertEqual(output["normals"].dtype, torch.float)
self.assertEqual(output["motion_vectors"].dtype, torch.float)
self.assertEqual(output["semantic_segmentation"].dtype, torch.int32) self.assertEqual(output["semantic_segmentation"].dtype, torch.int32)
self.assertEqual(output["instance_segmentation_fast"].dtype, torch.int32) self.assertEqual(output["instance_segmentation_fast"].dtype, torch.int32)
self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.int32) self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.int32)
def test_camera_resolution_rgb_only(self):
"""Test camera resolution is correctly set for RGB only."""
# Add all types
camera_cfg = copy.deepcopy(self.camera_cfg)
camera_cfg.data_types = [
"rgb",
]
# Create camera
camera = Camera(camera_cfg)
# Play sim
self.sim.reset()
# Simulate for a few steps
# note: This is a workaround to ensure that the textures are loaded.
# Check "Known Issues" section in the documentation for more details.
for _ in range(5):
self.sim.step()
camera.update(self.dt)
# expected sizes
hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
# access image data and compare shapes
output = camera.data.output
self.assertEqual(output["rgb"].shape, hw_3c_shape)
# access image data and compare dtype
self.assertEqual(output["rgb"].dtype, torch.uint8)
def test_camera_resolution_rgba_only(self):
"""Test camera resolution is correctly set for RGBA only."""
# Add all types
camera_cfg = copy.deepcopy(self.camera_cfg)
camera_cfg.data_types = [
"rgba",
]
# Create camera
camera = Camera(camera_cfg)
# Play sim
self.sim.reset()
# Simulate for a few steps
# note: This is a workaround to ensure that the textures are loaded.
# Check "Known Issues" section in the documentation for more details.
for _ in range(5):
self.sim.step()
camera.update(self.dt)
# expected sizes
hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
# access image data and compare shapes
output = camera.data.output
self.assertEqual(output["rgba"].shape, hw_4c_shape)
# access image data and compare dtype
self.assertEqual(output["rgba"].dtype, torch.uint8)
def test_camera_resolution_depth_only(self):
"""Test camera resolution is correctly set for depth only."""
# Add all types
camera_cfg = copy.deepcopy(self.camera_cfg)
camera_cfg.data_types = [
"depth",
]
# Create camera
camera = Camera(camera_cfg)
# Play sim
self.sim.reset()
# Simulate for a few steps
# note: This is a workaround to ensure that the textures are loaded.
# Check "Known Issues" section in the documentation for more details.
for _ in range(5):
self.sim.step()
camera.update(self.dt)
# expected sizes
hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
# access image data and compare shapes
output = camera.data.output
self.assertEqual(output["depth"].shape, hw_1c_shape)
# access image data and compare dtype
self.assertEqual(output["depth"].dtype, torch.float)
def test_throughput(self): def test_throughput(self):
"""Checks that the single camera gets created properly with a rig.""" """Checks that the single camera gets created properly with a rig."""
# Create directory temp dir to dump the results # Create directory temp dir to dump the results
...@@ -540,7 +652,7 @@ class TestCamera(unittest.TestCase): ...@@ -540,7 +652,7 @@ class TestCamera(unittest.TestCase):
print("----------------------------------------") print("----------------------------------------")
# Check image data # Check image data
for im_data in camera.data.output.values(): for im_data in camera.data.output.values():
self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width)) self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width, 1))
""" """
Helper functions. Helper functions.
......
...@@ -131,7 +131,7 @@ class TestWarpCamera(unittest.TestCase): ...@@ -131,7 +131,7 @@ class TestWarpCamera(unittest.TestCase):
# check image data # check image data
for im_data in camera.data.output.to_dict().values(): for im_data in camera.data.output.to_dict().values():
self.assertEqual( self.assertEqual(
im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width) im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
) )
def test_camera_resolution(self): def test_camera_resolution(self):
...@@ -148,7 +148,9 @@ class TestWarpCamera(unittest.TestCase): ...@@ -148,7 +148,9 @@ class TestWarpCamera(unittest.TestCase):
camera.update(self.dt) camera.update(self.dt)
# access image data and compare shapes # access image data and compare shapes
for im_data in camera.data.output.to_dict().values(): for im_data in camera.data.output.to_dict().values():
self.assertTrue(im_data.shape == (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width)) self.assertTrue(
im_data.shape == (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
)
def test_camera_init_offset(self): def test_camera_init_offset(self):
"""Test camera initialization with offset using different conventions.""" """Test camera initialization with offset using different conventions."""
...@@ -289,7 +291,7 @@ class TestWarpCamera(unittest.TestCase): ...@@ -289,7 +291,7 @@ class TestWarpCamera(unittest.TestCase):
for cam in [cam_1, cam_2]: for cam in [cam_1, cam_2]:
for im_data in cam.data.output.to_dict().values(): for im_data in cam.data.output.to_dict().values():
self.assertEqual( self.assertEqual(
im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width) im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
) )
def test_camera_set_world_poses(self): def test_camera_set_world_poses(self):
...@@ -402,7 +404,7 @@ class TestWarpCamera(unittest.TestCase): ...@@ -402,7 +404,7 @@ class TestWarpCamera(unittest.TestCase):
print("----------------------------------------") print("----------------------------------------")
# Check image data # Check image data
for im_data in camera.data.output.values(): for im_data in camera.data.output.values():
self.assertEqual(im_data.shape, (1, camera_cfg.pattern_cfg.height, camera_cfg.pattern_cfg.width)) self.assertEqual(im_data.shape, (1, camera_cfg.pattern_cfg.height, camera_cfg.pattern_cfg.width, 1))
def test_output_equal_to_usdcamera(self): def test_output_equal_to_usdcamera(self):
camera_pattern_cfg = patterns.PinholeCameraPatternCfg( camera_pattern_cfg = patterns.PinholeCameraPatternCfg(
......
...@@ -19,7 +19,6 @@ from omni.isaac.lab.envs import DirectRLEnv, DirectRLEnvCfg, ViewerCfg ...@@ -19,7 +19,6 @@ from omni.isaac.lab.envs import DirectRLEnv, DirectRLEnvCfg, ViewerCfg
from omni.isaac.lab.scene import InteractiveSceneCfg from omni.isaac.lab.scene import InteractiveSceneCfg
from omni.isaac.lab.sensors import TiledCamera, TiledCameraCfg, save_images_to_file from omni.isaac.lab.sensors import TiledCamera, TiledCameraCfg, save_images_to_file
from omni.isaac.lab.sim import SimulationCfg from omni.isaac.lab.sim import SimulationCfg
from omni.isaac.lab.sim.spawners.from_files import GroundPlaneCfg, spawn_ground_plane
from omni.isaac.lab.utils import configclass from omni.isaac.lab.utils import configclass
from omni.isaac.lab.utils.math import sample_uniform from omni.isaac.lab.utils.math import sample_uniform
...@@ -45,7 +44,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg): ...@@ -45,7 +44,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg):
# camera # camera
tiled_camera: TiledCameraCfg = TiledCameraCfg( tiled_camera: TiledCameraCfg = TiledCameraCfg(
prim_path="/World/envs/env_.*/Camera", prim_path="/World/envs/env_.*/Camera",
offset=TiledCameraCfg.OffsetCfg(pos=(-7.0, 0.0, 3.0), rot=(0.9945, 0.0, 0.1045, 0.0), convention="world"), offset=TiledCameraCfg.OffsetCfg(pos=(-5.0, 0.0, 2.0), rot=(1.0, 0.0, 0.0, 0.0), convention="world"),
data_types=["rgb"], data_types=["rgb"],
spawn=sim_utils.PinholeCameraCfg( spawn=sim_utils.PinholeCameraCfg(
focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0) focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0)
...@@ -60,7 +59,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg): ...@@ -60,7 +59,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg):
viewer = ViewerCfg(eye=(20.0, 20.0, 20.0)) viewer = ViewerCfg(eye=(20.0, 20.0, 20.0))
# scene # scene
scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=256, env_spacing=20.0, replicate_physics=True) scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=1024, env_spacing=20.0, replicate_physics=True)
# reset # reset
max_cart_pos = 3.0 # the cart is reset if it exceeds that position [m] max_cart_pos = 3.0 # the cart is reset if it exceeds that position [m]
...@@ -79,8 +78,8 @@ class CartpoleDepthCameraEnvCfg(CartpoleRGBCameraEnvCfg): ...@@ -79,8 +78,8 @@ class CartpoleDepthCameraEnvCfg(CartpoleRGBCameraEnvCfg):
# camera # camera
tiled_camera: TiledCameraCfg = TiledCameraCfg( tiled_camera: TiledCameraCfg = TiledCameraCfg(
prim_path="/World/envs/env_.*/Camera", prim_path="/World/envs/env_.*/Camera",
offset=TiledCameraCfg.OffsetCfg(pos=(-7.0, 0.0, 3.0), rot=(0.9945, 0.0, 0.1045, 0.0), convention="world"), offset=TiledCameraCfg.OffsetCfg(pos=(-5.0, 0.0, 2.0), rot=(1.0, 0.0, 0.0, 0.0), convention="world"),
data_types=["distance_to_camera"], data_types=["depth"],
spawn=sim_utils.PinholeCameraCfg( spawn=sim_utils.PinholeCameraCfg(
focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0) focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0)
), ),
...@@ -152,8 +151,7 @@ class CartpoleCameraEnv(DirectRLEnv): ...@@ -152,8 +151,7 @@ class CartpoleCameraEnv(DirectRLEnv):
"""Setup the scene with the cartpole and camera.""" """Setup the scene with the cartpole and camera."""
self._cartpole = Articulation(self.cfg.robot_cfg) self._cartpole = Articulation(self.cfg.robot_cfg)
self._tiled_camera = TiledCamera(self.cfg.tiled_camera) self._tiled_camera = TiledCamera(self.cfg.tiled_camera)
# add ground plane
spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg(size=(500, 500)))
# clone, filter, and replicate # clone, filter, and replicate
self.scene.clone_environments(copy_from_source=False) self.scene.clone_environments(copy_from_source=False)
self.scene.filter_collisions(global_prim_paths=[]) self.scene.filter_collisions(global_prim_paths=[])
...@@ -172,8 +170,16 @@ class CartpoleCameraEnv(DirectRLEnv): ...@@ -172,8 +170,16 @@ class CartpoleCameraEnv(DirectRLEnv):
self._cartpole.set_joint_effort_target(self.actions, joint_ids=self._cart_dof_idx) self._cartpole.set_joint_effort_target(self.actions, joint_ids=self._cart_dof_idx)
def _get_observations(self) -> dict: def _get_observations(self) -> dict:
data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "distance_to_camera" data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "depth"
observations = {"policy": self._tiled_camera.data.output[data_type].clone()} if "rgb" in self.cfg.tiled_camera.data_types:
camera_data = self._tiled_camera.data.output[data_type] / 255.0
# normalize the camera data for better training results
mean_tensor = torch.mean(camera_data, dim=(1, 2), keepdim=True)
camera_data -= mean_tensor
elif "depth" in self.cfg.tiled_camera.data_types:
camera_data = self._tiled_camera.data.output[data_type]
camera_data[camera_data == float("inf")] = 0
observations = {"policy": camera_data.clone()}
if self.cfg.write_image_to_file: if self.cfg.write_image_to_file:
save_images_to_file(observations["policy"], f"cartpole_{data_type}.png") save_images_to_file(observations["policy"], f"cartpole_{data_type}.png")
......
...@@ -96,7 +96,7 @@ class SensorsSceneCfg(InteractiveSceneCfg): ...@@ -96,7 +96,7 @@ class SensorsSceneCfg(InteractiveSceneCfg):
update_period=0.1, update_period=0.1,
height=480, height=480,
width=640, width=640,
data_types=["rgb", "distance_to_camera"], data_types=["rgb", "distance_to_image_plane"],
spawn=None, # the camera is already spawned in the scene spawn=None, # the camera is already spawned in the scene
offset=TiledCameraCfg.OffsetCfg(pos=(0.510, 0.0, 0.015), rot=(0.5, -0.5, 0.5, -0.5), convention="ros"), offset=TiledCameraCfg.OffsetCfg(pos=(0.510, 0.0, 0.015), rot=(0.5, -0.5, 0.5, -0.5), convention="ros"),
) )
...@@ -221,7 +221,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene): ...@@ -221,7 +221,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
print("-------------------------------") print("-------------------------------")
print(scene["tiled_camera"]) print(scene["tiled_camera"])
print("Received shape of rgb image: ", scene["tiled_camera"].data.output["rgb"].shape) print("Received shape of rgb image: ", scene["tiled_camera"].data.output["rgb"].shape)
print("Received shape of depth image: ", scene["tiled_camera"].data.output["distance_to_camera"].shape) print("Received shape of depth image: ", scene["tiled_camera"].data.output["distance_to_image_plane"].shape)
print("-------------------------------") print("-------------------------------")
print(scene["raycast_camera"]) print(scene["raycast_camera"])
print("Received shape of depth: ", scene["raycast_camera"].data.output["distance_to_image_plane"].shape) print("Received shape of depth: ", scene["raycast_camera"].data.output["distance_to_image_plane"].shape)
...@@ -242,7 +242,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene): ...@@ -242,7 +242,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
# compare generated Depth images across different cameras # compare generated Depth images across different cameras
depth_images = [ depth_images = [
scene["camera"].data.output["distance_to_image_plane"][0], scene["camera"].data.output["distance_to_image_plane"][0],
scene["tiled_camera"].data.output["distance_to_camera"][0, ..., 0], scene["tiled_camera"].data.output["distance_to_image_plane"][0, ..., 0],
scene["raycast_camera"].data.output["distance_to_image_plane"][0], scene["raycast_camera"].data.output["distance_to_image_plane"][0],
] ]
save_images_grid( save_images_grid(
......
...@@ -11,6 +11,8 @@ PER_TEST_TIMEOUTS = { ...@@ -11,6 +11,8 @@ PER_TEST_TIMEOUTS = {
"test_environments.py": 1200, # This test runs through all the environments for 100 steps each "test_environments.py": 1200, # This test runs through all the environments for 100 steps each
"test_environment_determinism.py": 200, # This test runs through many the environments for 100 steps each "test_environment_determinism.py": 200, # This test runs through many the environments for 100 steps each
"test_env_rendering_logic.py": 300, "test_env_rendering_logic.py": 300,
"test_camera.py": 500,
"test_tiled_camera.py": 300,
"test_rsl_rl_wrapper.py": 200, "test_rsl_rl_wrapper.py": 200,
"test_sb3_wrapper.py": 200, "test_sb3_wrapper.py": 200,
"test_skrl_wrapper.py": 200, "test_skrl_wrapper.py": 200,
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment