Updates tiled rendering API with full RTX rendering and additional annotators (#97)

This change updates the current tiled rendering APIs to use the full RTX tiled rendering feature, allowing for higher quality RGB renders and support of additional annotators, including semantic segmentation, instance segmentation, normals, and motion vectors. This change also aligns output dimensions across TiledCamera, Camera, and RayCasterCamera classes. All single-channel outputs will now have dimension (H, W, C). Camera class now outputs RGB data with shape (H, W, 3).  - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update Fixes issue https://github.com/isaac-sim/IsaacLab/issues/775 - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Alexander <143108850+nv-apoddubny@users.noreply.github.com> Co-authored-by: Toni-SM <aserranomuno@nvidia.com>

Updates tiled rendering API with full RTX rendering and additional annotators (#97)
This change updates the current tiled rendering APIs to use the full RTX tiled rendering feature, allowing for higher quality RGB renders and support of additional annotators, including semantic segmentation, instance segmentation, normals, and motion vectors. This change also aligns output dimensions across TiledCamera, Camera, and RayCasterCamera classes. All single-channel outputs will now have dimension (H, W, C). Camera class now outputs RGB data with shape (H, W, 3).  - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) - This change requires a documentation update Fixes issue https://github.com/isaac-sim/IsaacLab/issues/775 - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there  --------- Co-authored-by: Alexander <143108850+nv-apoddubny@users.noreply.github.com> Co-authored-by: Toni-SM <aserranomuno@nvidia.com>
02b0d76c · Kelly Guo · David Hoeller · 52422b6b · 02b0d76c · 02b0d76c
Commit 02b0d76c authored Aug 21, 2024 by Kelly Guo Committed by David Hoeller Sep 20, 2024
18 changed files
--- a/docs/source/features/tiled_rendering.rst
+++ b/docs/source/features/tiled_rendering.rst
@@ -9,9 +9,9 @@ Tiled Rendering

 .. note::

-    This feature is only available from Isaac Sim version 4.0.0 onwards.
+    This feature is only available from Isaac Sim version 4.2.0 onwards.

-    Tiled rendering requires heavy memory resources. We recommend running at most 256 cameras in the scene.
+    Tiled rendering in combination with image processing networks require heavy memory resources, especially at larger resolutions. We recommend running at 256 cameras in the scene on RTX 4090 GPUs or similar.

 Tiled rendering APIs provide a vectorized interface for collecting data from camera sensors.
 This is useful for reinforcement learning environments requiring vision in the loop.
@@ -20,7 +20,7 @@ one single large image instead of multiple smaller images that would have been p
 by each individual camera. This reduces the amount of time required for rendering and
 provides a more efficient API for working with vision data.

-Isaac Lab provides tiled rendering APIs for RGB and depth data through the :class:`~sensors.TiledCamera`
+Isaac Lab provides tiled rendering APIs for RGB, depth, along with other annotators through the :class:`~sensors.TiledCamera`
 class. Configurations for the tiled rendering APIs can be defined through the :class:`~sensors.TiledCameraCfg`
 class, specifying parameters such as the regex expression for all camera paths, the transform
 for the cameras, the desired data type, the type of cameras to add to the scene, and the camera
@@ -59,6 +59,80 @@ environment. For example:
    python source/standalone/workflows/rl_games/train.py --task=Isaac-Cartpole-RGB-Camera-Direct-v0 --headless --enable_cameras


+Annotators and Data Types
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Both :class:`~sensors.TiledCamera` and :class:`~sensors.Camera` classes provide APIs for retrieving various types annotator data from replicator:
+
+* ``"rgb"``: A 3-channel rendered color image.
+* ``"rgba"``: A 4-channel rendered color image with alpha channel.
+* ``"distance_to_camera"``: An image containing the distance to camera optical center.
+* ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
+* ``"depth"``: The same as ``"distance_to_image_plane"``.
+* ``"normals"``: An image containing the local surface normal vectors at each pixel.
+* ``"motion_vectors"``: An image containing the motion vector data at each pixel.
+* ``"semantic_segmentation"``: The semantic segmentation data.
+* ``"instance_segmentation_fast"``: The instance segmentation data.
+* ``"instance_id_segmentation_fast"``: The instance id segmentation data.
+
+RGB and RGBA
+""""""""""""
+
+``rgb`` data type returns a 3-channel RGB colored image of type ``torch.uint8``, with dimension (B, H, W, 3).
+
+``rgba`` data type returns a 4-channel RGBA colored image of type ``torch.uint8``, with dimension (B, H, W, 4).
+
+To convert the ``torch.uint8`` data to ``torch.float32``, divide the buffer by 255.0 to obtain a ``torch.float32`` buffer containing data from 0 to 1.
+
+Depth and Distances
+"""""""""""""""""""
+
+``distance_to_camera`` returns a single-channel depth image with distance to the camera optical center. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
+
+``distance_to_image_plane`` returns a single-channel depth image with distances of 3D points from the camera plane along the camera's Z-axis. The dimension for this annotator is (B, H, W, 1) and has type ``torch.float32``.
+
+``depth`` is provided as an alias for ``distance_to_image_plane`` and will return the same data as the ``distance_to_image_plane`` annotator, with dimension (B, H, W, 1) and type ``torch.float32``.
+
+Normals
+"""""""
+
+``normals`` returns an image containing the local surface normal vectors at each pixel. The buffer has dimension (B, H, W, 3), containing the (x, y, z) information for each vector, and has data type ``torch.float32``.
+
+Motion Vectors
+""""""""""""""
+
+``motion_vectors`` returns the per-pixel motion vectors in image space, with a 2D array of motion vectors representing the relative motion of a pixel in the camera’s viewport between frames. The buffer has dimension (B, H, W, 2), representing x - the motion distance in the horizontal axis (image width) with movement to the left of the image being positive and movement to the right being negative and y - motion distance in the vertical axis (image height) with movement towards the top of the image being positive and movement to the bottom being negative. The data type is ``torch.float32``.
+
+Semantic Segmentation
+"""""""""""""""""""""
+
+``semantic_segmentation`` outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['semantic_segmentation']`` containing ID to labels information.
+
+If ``colorize_semantic_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to semantic labels.
+
+If ``colorize_semantic_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the semantic ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from semantic ID to semantic labels.
+
+Instance ID Segmentation
+""""""""""""""""""""""""
+
+``instance_id_segmentation_fast`` outputs instance ID segmentation of each entity in the camera’s viewport. The instance ID is unique for each prim in the scene with different paths. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_id_segmentation_fast']`` containing ID to labels information.
+
+The main difference between ``instance_id_segmentation_fast`` and ``instance_segmentation_fast`` are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, where instance ID segmentation always goes down to the leaf prim.
+
+If ``colorize_instance_id_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that entity.
+
+If ``colorize_instance_id_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that entity.
+
+Instance Segmentation
+"""""""""""""""""""""
+
+``instance_segmentation_fast`` outputs instance segmentation of each entity in the camera’s viewport. In addition to the image buffer, an ``info`` dictionary can be retrieved with ``tiled_camera.data.info['instance_segmentation_fast']`` containing ID to labels and ID to semantic information.
+
+If ``colorize_instance_segmentation=True`` in the camera config, a 4-channel RGBA image will be returned with dimension (B, H, W, 4) and type ``torch.uint8``. The info ``idToLabels`` dictionary will be the mapping from color to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from color to semantic labels of that semantic entity.
+
+If ``colorize_instance_segmentation=False``, a buffer of dimension (B, H, W, 1) of type ``torch.int32`` will be returned, containing the instance ID of each pixel. The info ``idToLabels`` dictionary will be the mapping from instance ID to USD prim path of that semantic entity. The info ``idToSemantics`` dictionary will be the mapping from instance ID to semantic labels of that semantic entity.
+
+
 Recording during training
 -------------------------


--- a/source/apps/isaaclab.python.headless.kit
+++ b/source/apps/isaaclab.python.headless.kit
@@ -43,6 +43,9 @@ exts."omni.kit.window.viewport".blockingGetViewportDrawable = false
 # Fix PlayButtonGroup error
 exts."omni.kit.widget.toolbar".PlayButton.enabled = false

+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.app.settings]
 persistent = true
 dev_build = false

--- a/source/apps/isaaclab.python.headless.rendering.kit
+++ b/source/apps/isaaclab.python.headless.rendering.kit
@@ -37,18 +37,24 @@ app.version = "4.1.0"
 # set the default ros bridge to disable on startup
 isaac.startup.ros_bridge_extension = ""

-# Increase available descriptors to support more simultaneous cameras
-rtx.descriptorSets=30000
+# Flags for better rendering performance
+rtx.translucency.enabled = false
+rtx.reflections.enabled = false
+rtx.indirectDiffuse.enabled = false
+rtx.transient.dlssg.enabled = false
+rtx.directLighting.sampledLighting.enabled = true
+rtx.directLighting.sampledLighting.samplesPerPixel = 1
+rtx.sceneDb.ambientLightIntensity = 1.0
+# rtx.shadows.enabled = false

-# Enable new denoiser to reduce motion blur artifacts
-rtx.newDenoiser.enabled=true
+# Avoids replicator warning
+rtx.pathtracing.maxSamplesPerLaunch = 1000000

 # Disable present thread to improve performance
 exts."omni.renderer.core".present.enabled=false

 # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
 rtx.raytracing.cached.enabled = false
-rtx.raytracing.lightcache.spatialCache.enabled = false
 rtx.ambientOcclusion.enabled = false
 rtx-transient.dlssg.enabled = false

@@ -61,6 +67,8 @@ renderer.multiGpu.maxGpuCount=1
 # Force synchronous rendering to improve training results
 omni.replicator.asyncRendering = false

+# Avoids frame offset issue
+app.updateOrder.checkForHydraRenderComplete = 1000
 app.renderer.waitIdle=true
 app.hydraEngine.waitIdle=true

@@ -69,6 +77,9 @@ app.audio.enabled = false
 # Enable Vulkan - avoids torch+cu12 error on windows
 app.vulkan = true

+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.exts."omni.kit.registry.nucleus"]
 registries = [
    { name = "kit/default", url = "https://ovextensionsprod.blob.core.windows.net/exts/kit/prod/shared" },

--- a/source/apps/isaaclab.python.kit
+++ b/source/apps/isaaclab.python.kit
@@ -216,6 +216,9 @@ app.audio.enabled = false
 # Enable Vulkan - avoids torch+cu12 error on windows
 app.vulkan = true

+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 # Basic Kit App
 ################################
 app.versionFile = "${exe-path}/VERSION"

--- a/source/apps/isaaclab.python.rendering.kit
+++ b/source/apps/isaaclab.python.rendering.kit
@@ -37,18 +37,24 @@ app.version = "4.1.0"
 # set the default ros bridge to disable on startup
 isaac.startup.ros_bridge_extension = ""

-# Increase available descriptors to support more simultaneous cameras
-rtx.descriptorSets=30000
+# Flags for better rendering performance
+rtx.translucency.enabled = false
+rtx.reflections.enabled = false
+rtx.indirectDiffuse.enabled = false
+rtx.transient.dlssg.enabled = false
+rtx.directLighting.sampledLighting.enabled = true
+rtx.directLighting.sampledLighting.samplesPerPixel = 1
+rtx.sceneDb.ambientLightIntensity = 1.0
+# rtx.shadows.enabled = false

-# Enable new denoiser to reduce motion blur artifacts
-rtx.newDenoiser.enabled=true
+# Avoids replicator warning
+rtx.pathtracing.maxSamplesPerLaunch = 1000000

 # Disable present thread to improve performance
 exts."omni.renderer.core".present.enabled=false

 # Disabling these settings reduces renderer VRAM usage and improves rendering performance, but at some quality cost
 rtx.raytracing.cached.enabled = false
-rtx.raytracing.lightcache.spatialCache.enabled = false
 rtx.ambientOcclusion.enabled = false
 rtx-transient.dlssg.enabled = false

@@ -61,11 +67,16 @@ renderer.multiGpu.maxGpuCount=1
 # Force synchronous rendering to improve training results
 omni.replicator.asyncRendering = false

+# Avoids frame offset issue
+app.updateOrder.checkForHydraRenderComplete = 1000
 app.renderer.waitIdle=true
 app.hydraEngine.waitIdle=true

 app.audio.enabled = false

+# disable replicator orchestrator for better runtime perf
+exts."omni.replicator.core".Orchestrator.enabled = false
+
 [settings.physics]
 updateToUsd = false
 updateParticlesToUsd = false

--- a/source/extensions/omni.isaac.lab/config/extension.toml
+++ b/source/extensions/omni.isaac.lab/config/extension.toml
 [package]

 # Note: Semantic Versioning is used: https://semver.org/
-version = "0.23.10"
+version = "0.24.10"

 # Description
 title = "Isaac Lab framework for Robot Learning"

--- a/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
+++ b/source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
 Changelog
 ---------

-0.23.10 (2024-09-10)
+0.24.10 (2024-09-10)
 ~~~~~~~~~~~~~~~~~~~~

 Added
@@ -10,7 +10,7 @@ Added
 * Added config class, support, and tests for MJCF conversion via standalone python scripts.


-0.23.9 (2024-09-09)
+0.24.9 (2024-09-09)
 ~~~~~~~~~~~~~~~~~~~~

 Added
@@ -22,7 +22,7 @@ Added
  file or the command line argument. This ensures that the simulation results are reproducible across different runs.


-0.23.8 (2024-09-08)
+0.24.8 (2024-09-08)
 ~~~~~~~~~~~~~~~~~~~

 Changed
@@ -32,7 +32,7 @@ Changed
  for faster processing of high dimensional input tensors.


-0.23.7 (2024-09-06)
+0.24.7 (2024-09-06)
 ~~~~~~~~~~~~~~~~~~~

 Added
@@ -43,7 +43,7 @@ Added
  instance variables instead.


-0.23.6 (2024-09-05)
+0.24.6 (2024-09-05)
 ~~~~~~~~~~~~~~~~~~~

 Fixed
@@ -53,7 +53,7 @@ Fixed
  more-intuitive to control the y-axis motion based on the right-hand rule.


-0.23.5 (2024-08-29)
+0.24.5 (2024-08-29)
 ~~~~~~~~~~~~~~~~~~~

 Added
@@ -63,7 +63,7 @@ Added
  consistent with all other cameras (equal to type "depth").


-0.23.4 (2024-09-02)
+0.24.4 (2024-09-02)
 ~~~~~~~~~~~~~~~~~~~

 Fixed
@@ -74,7 +74,7 @@ Fixed
 * Added test to check :attr:`omni.isaac.lab.sensors.RayCasterCamera.set_intrinsic_matrices`


-0.23.3 (2024-08-29)
+0.24.3 (2024-08-29)
 ~~~~~~~~~~~~~~~~~~~

 Fixed
@@ -85,7 +85,7 @@ Fixed
  which required initialization of the class to call the class-methods.


-0.23.2 (2024-08-28)
+0.24.2 (2024-08-28)
 ~~~~~~~~~~~~~~~~~~~

 Added
@@ -106,7 +106,7 @@ Fixed
  the behavior equal to the USD Camera.


-0.23.1 (2024-08-21)
+0.24.1 (2024-08-21)
 ~~~~~~~~~~~~~~~~~~~

 Changed
@@ -115,6 +115,23 @@ Changed
 * Disabled default viewport in certain headless scenarios for better performance.


+0.24.0 (2024-08-17)
+~~~~~~~~~~~~~~~~~~~
+
+Added
+^^^^^
+
+* Added additional annotators for :class:`omni.isaac.lab.sensors.camera.TiledCamera` class.
+
+Changed
+^^^^^^^
+
+* Updated :class:`omni.isaac.lab.sensors.TiledCamera` to latest RTX tiled rendering API.
+* Single channel outputs for :class:`omni.isaac.lab.sensors.TiledCamera`, :class:`omni.isaac.lab.sensors.Camera` and :class:`omni.isaac.lab.sensors.RayCasterCamera` now has shape (H, W, 1).
+* Data type for RGB output for :class:`omni.isaac.lab.sensors.TiledCamera` changed from ``torch.float`` to ``torch.uint8``.
+* Dimension of RGB output for :class:`omni.isaac.lab.sensors.Camera` changed from (H, W, 4) to (H, W, 3). Use type ``rgba`` to retrieve the previous dimension.
+
+
 0.23.1 (2024-08-17)
 ~~~~~~~~~~~~~~~~~~~


--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/camera.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/camera.py
@@ -39,9 +39,11 @@ class Camera(SensorBase):

    Summarizing from the `replicator extension`_, the following sensor types are supported:

-    - ``"rgb"``: A rendered color image.
+    - ``"rgb"``: A 3-channel rendered color image.
+    - ``"rgba"``: A 4-channel rendered color image with alpha channel.
    - ``"distance_to_camera"``: An image containing the distance to camera optical center.
    - ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
+    - ``"depth"``: The same as ``"distance_to_image_plane"``.
    - ``"normals"``: An image containing the local surface normal vectors at each pixel.
    - ``"motion_vectors"``: An image containing the motion vector data at each pixel.
    - ``"semantic_segmentation"``: The semantic segmentation data.
@@ -458,8 +460,14 @@ class Camera(SensorBase):
                else:
                    device_name = "cpu"

-                # create annotator node
-                rep_annotator = rep.AnnotatorRegistry.get_annotator(name, init_params, device=device_name)
+                # Map special cases to their corresponding annotator names
+                special_cases = {"rgba": "rgb", "depth": "distance_to_image_plane"}
+                # Get the annotator name, falling back to the original name if not a special case
+                annotator_name = special_cases.get(name, name)
+                # Create the annotator node
+                rep_annotator = rep.AnnotatorRegistry.get_annotator(annotator_name, init_params, device=device_name)
+
+                # attach annotator to render product
                rep_annotator.attach(render_prod_path)
                # add to registry
                self._rep_registry[name].append(rep_annotator)
@@ -632,17 +640,27 @@ class Camera(SensorBase):
            if self.cfg.colorize_semantic_segmentation:
                data = data.view(torch.uint8).reshape(height, width, -1)
            else:
-                data = data.view(height, width)
+                data = data.view(height, width, 1)
        elif name == "instance_segmentation_fast":
            if self.cfg.colorize_instance_segmentation:
                data = data.view(torch.uint8).reshape(height, width, -1)
            else:
-                data = data.view(height, width)
+                data = data.view(height, width, 1)
        elif name == "instance_id_segmentation_fast":
            if self.cfg.colorize_instance_id_segmentation:
                data = data.view(torch.uint8).reshape(height, width, -1)
            else:
-                data = data.view(height, width)
+                data = data.view(height, width, 1)
+        # make sure buffer dimensions are consistent as (H, W, C)
+        elif name == "distance_to_camera" or name == "distance_to_image_plane" or name == "depth":
+            data = data.view(height, width, 1)
+        # we only return the RGB channels from the RGBA output if rgb is required
+        # normals return (x, y, z) in first 3 channels, 4th channel is unused
+        elif name == "rgb" or name == "normals":
+            data = data[..., :3]
+        # motion vectors return (x, y) in first 2 channels, 3rd and 4th channels are unused
+        elif name == "motion_vectors":
+            data = data[..., :2]

        # return the data and info
        return data, info

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/tiled_camera.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/tiled_camera.py
@@ -16,7 +16,8 @@ import carb
 import omni.usd
 import warp as wp
 from omni.isaac.core.prims import XFormPrimView
-from pxr import UsdGeom
+from omni.isaac.version import get_version
+from pxr import Usd, UsdGeom

 from omni.isaac.lab.utils.warp.kernels import reshape_tiled_image

@@ -28,27 +29,44 @@ if TYPE_CHECKING:


 class TiledCamera(Camera):
-    r"""The tiled rendering based camera sensor for acquiring RGB and depth data.
+    r"""The tiled rendering based camera sensor for acquiring the same data as the Camera class.

-    This class inherits from the :class:`Camera` class but uses the tiled-rendering API from Replicator to acquire
+    This class inherits from the :class:`Camera` class but uses the tiled-rendering API to acquire
    the visual data. Tiled-rendering concatenates the rendered images from multiple cameras into a single image.
    This allows for rendering multiple cameras in parallel and is useful for rendering large scenes with multiple
    cameras efficiently.

    The following sensor types are supported:

-    - ``"rgb"``: A rendered color image.
+    - ``"rgb"``: A 3-channel rendered color image.
+    - ``"rgba"``: A 4-channel rendered color image with alpha channel.
    - ``"distance_to_camera"``: An image containing the distance to camera optical center.
-    - ``"depth"``: An alias for ``"distance_to_camera"``.
+    - ``"distance_to_image_plane"``: An image containing distances of 3D points from camera plane along camera's z-axis.
+    - ``"depth"``: The same as ``"distance_to_image_plane"``.
+    - ``"normals"``: An image containing the local surface normal vectors at each pixel.
+    - ``"motion_vectors"``: An image containing the motion vector data at each pixel.
+    - ``"semantic_segmentation"``: The semantic segmentation data.
+    - ``"instance_segmentation_fast"``: The instance segmentation data.
+    - ``"instance_id_segmentation_fast"``: The instance id segmentation data.

-    .. attention::
-        Please note that the fidelity of RGB images may be lower than the standard camera sensor due to the
-        tiled rendering process. Various ray tracing effects such as reflections, refractions, and shadows may not be
-        accurately captured in the RGB images. We are currently working on improving the fidelity of the RGB images.
+    .. note::
+        Currently the following sensor types are not supported in a "view" format:
+
+        - ``"instance_segmentation"``: The instance segmentation data. Please use the fast counterparts instead.
+        - ``"instance_id_segmentation"``: The instance id segmentation data. Please use the fast counterparts instead.
+        - ``"bounding_box_2d_tight"``: The tight 2D bounding box data (only contains non-occluded regions).
+        - ``"bounding_box_2d_tight_fast"``: The tight 2D bounding box data (only contains non-occluded regions).
+        - ``"bounding_box_2d_loose"``: The loose 2D bounding box data (contains occluded regions).
+        - ``"bounding_box_2d_loose_fast"``: The loose 2D bounding box data (contains occluded regions).
+        - ``"bounding_box_3d"``: The 3D view space bounding box data.
+        - ``"bounding_box_3d_fast"``: The 3D view space bounding box data.
+
+    .. _replicator extension: https://docs.omniverse.nvidia.com/extensions/latest/ext_replicator/annotators_details.html#annotator-output
+    .. _USDGeom Camera: https://graphics.pixar.com/usd/docs/api/class_usd_geom_camera.html

    .. versionadded:: v1.0.0

-        This feature is available starting from Isaac Sim 4.0. Before this version, the tiled rendering APIs
+        This feature is available starting from Isaac Sim 4.2. Before this version, the tiled rendering APIs
        were not available.

    """
@@ -56,13 +74,6 @@ class TiledCamera(Camera):
    cfg: TiledCameraCfg
    """The configuration parameters."""

-    SUPPORTED_TYPES: set[str] = {"rgb", "distance_to_camera", "depth"}
-    """The set of sensor types that are supported.
-
-    .. note::
-        The ``"depth"`` type is an alias for ``"distance_to_camera"``.
-    """
-
    def __init__(self, cfg: TiledCameraCfg):
        """Initializes the tiled camera sensor.

@@ -71,8 +82,15 @@ class TiledCamera(Camera):

        Raises:
            RuntimeError: If no camera prim is found at the given path.
+            RuntimeError: If Isaac Sim version < 4.2
            ValueError: If the provided data types are not supported by the camera.
        """
+        isaac_sim_version = float(".".join(get_version()[2:4]))
+        if isaac_sim_version < 4.2:
+            raise RuntimeError(
+                f"TiledCamera is only available from Isaac Sim 4.2.0. Current version is {isaac_sim_version}. Please"
+                " update to Isaac Sim 4.2.0"
+            )
        super().__init__(cfg)

    def __del__(self):
@@ -80,7 +98,8 @@ class TiledCamera(Camera):
        # unsubscribe from callbacks
        SensorBase.__del__(self)
        # detach from the replicator registry
-        self._annotator.detach(self.render_product_paths)
+        for annotator in self._annotators.values():
+            annotator.detach(self.render_product_paths)

    def __str__(self) -> str:
        """Returns: A string containing information about the instance."""
@@ -88,6 +107,10 @@ class TiledCamera(Camera):
        return (
            f"Tiled Camera @ '{self.cfg.prim_path}': \n"
            f"\tdata types   : {self.data.output.sorted_keys} \n"
+            f"\tsemantic filter : {self.cfg.semantic_filter}\n"
+            f"\tcolorize semantic segm.   : {self.cfg.colorize_semantic_segmentation}\n"
+            f"\tcolorize instance segm.   : {self.cfg.colorize_instance_segmentation}\n"
+            f"\tcolorize instance id segm.: {self.cfg.colorize_instance_id_segmentation}\n"
            f"\tupdate period (s): {self.cfg.update_period}\n"
            f"\tshape        : {self.image_shape}\n"
            f"\tnumber of sensors : {self._view.count}"
@@ -161,34 +184,56 @@ class TiledCamera(Camera):
            sensor_prim = UsdGeom.Camera(cam_prim)
            self._sensor_prims.append(sensor_prim)

-        # start the orchestrator (if not already started)
-        rep.orchestrator._orchestrator._is_started = True
-        # check the data_types and remove "depth" if "distance_to_camera" is requested too
-        if "depth" in self.cfg.data_types and "distance_to_camera" in self.cfg.data_types:
-            carb.log_warn(
-                "Both 'depth' and 'distance_to_camera' are requested which are the same. 'depth' will be ignored."
-            )
-            self.cfg.data_types.remove("depth")
-        # NOTE: internally, "distance_to_camera" is named "depth", name is adjusted here
-        data_type = self.cfg.data_types.copy()
-        if "distance_to_camera" in data_type:
-            data_type.remove("distance_to_camera")
-            data_type.append("depth")
-        # Create a tiled sensor from the camera prims
-        rep_sensor = rep.create.tiled_sensor(
-            cameras=self._view.prim_paths,
-            camera_resolution=[self.image_shape[1], self.image_shape[0]],
-            tiled_resolution=self._tiled_image_shape(),
-            output_types=data_type,
-        )
-        # Get render product
-        render_prod_path = rep.create.render_product(camera=rep_sensor, resolution=self._tiled_image_shape())
-        if not isinstance(render_prod_path, str):
-            render_prod_path = render_prod_path.path
-        self._render_product_paths = [render_prod_path]
-        # Attach the annotator
-        self._annotator = rep.AnnotatorRegistry.get_annotator("RtxSensorGpu", device=self.device, do_array_copy=False)
-        self._annotator.attach(self._render_product_paths)
+        # get full resolution for all tiles
+        full_resolution = self._tiled_image_shape()
+
+        # Set carb settings for tiled rendering
+        carb_settings = carb.settings.get_settings()
+        carb_settings.set("/rtx/viewTile/height", self.cfg.height)
+        carb_settings.set("/rtx/viewTile/width", self.cfg.width)
+        carb_settings.set("/rtx/viewTile/count", self._view.count)
+
+        # Create render product
+        rp = rep.create.render_product(self._view.prim_paths[0], full_resolution)
+
+        # Attach all cameras to render product
+        rp_prim = stage.GetPrimAtPath(rp.path)
+        with Usd.EditContext(stage, stage.GetSessionLayer()):
+            rp_prim.GetRelationship("camera").SetTargets(self._view.prim_paths)
+        self._render_product_paths = [rp.path]
+
+        # Define the annotators based on requested data types
+        self._annotators = dict()
+        for annotator_type in self.cfg.data_types:
+            if annotator_type == "rgba" or annotator_type == "rgb":
+                annotator = rep.AnnotatorRegistry.get_annotator("rgb", device=self.device, do_array_copy=False)
+                self._annotators["rgba"] = annotator
+            elif annotator_type == "depth" or annotator_type == "distance_to_image_plane":
+                # keep depth for backwards compatibility
+                annotator = rep.AnnotatorRegistry.get_annotator(
+                    "distance_to_image_plane", device=self.device, do_array_copy=False
+                )
+                self._annotators["distance_to_image_plane"] = annotator
+            # note: we are verbose here to make it easier to understand the code.
+            #   if colorize is true, the data is mapped to colors and a uint8 4 channel image is returned.
+            #   if colorize is false, the data is returned as a uint32 image with ids as values.
+            else:
+                init_params = None
+                if annotator_type == "semantic_segmentation":
+                    init_params = {"colorize": self.cfg.colorize_semantic_segmentation}
+                elif annotator_type == "instance_segmentation_fast":
+                    init_params = {"colorize": self.cfg.colorize_instance_segmentation}
+                elif annotator_type == "instance_id_segmentation_fast":
+                    init_params = {"colorize": self.cfg.colorize_instance_id_segmentation}
+
+                annotator = rep.AnnotatorRegistry.get_annotator(
+                    annotator_type, init_params, device=self.device, do_array_copy=False
+                )
+                self._annotators[annotator_type] = annotator
+
+        # Attach the annotator to the render product
+        for annotator in self._annotators.values():
+            annotator.attach(self._render_product_paths)

        # Create internal buffers
        self._create_buffers()
@@ -198,38 +243,74 @@ class TiledCamera(Camera):
        self._frame[env_ids] += 1

        # Extract the flattened image buffer
-        tiled_data_buffer = self._annotator.get_data()
-        if isinstance(tiled_data_buffer, np.ndarray):
-            tiled_data_buffer = wp.array(tiled_data_buffer, device=self.device)
-        else:
-            tiled_data_buffer = tiled_data_buffer.to(device=self.device)
-
-        # The offset is needed when the buffer contains rgb and depth (the buffer has RGB data first and then depth)
-        offset = self._data.output["rgb"].numel() if "rgb" in self.cfg.data_types else 0
-        for data_type in self.cfg.data_types:
+        for data_type, annotator in self._annotators.items():
+            # check whether returned data is a dict (used for segmentation)
+            output = annotator.get_data()
+            if isinstance(output, dict):
+                tiled_data_buffer = output["data"]
+                self._data.info[data_type] = output["info"]
+            else:
+                tiled_data_buffer = output
+
+            # convert data buffer to warp array
+            if isinstance(tiled_data_buffer, np.ndarray):
+                tiled_data_buffer = wp.array(tiled_data_buffer, device=self.device, dtype=wp.uint8)
+            else:
+                tiled_data_buffer = tiled_data_buffer.to(device=self.device)
+
+            # process data for different segmentation types
+            # Note: Replicator returns raw buffers of dtype uint32 for segmentation types
+            #   so we need to convert them to uint8 4 channel images for colorized types
+            if (
+                (data_type == "semantic_segmentation" and self.cfg.colorize_semantic_segmentation)
+                or (data_type == "instance_segmentation_fast" and self.cfg.colorize_instance_segmentation)
+                or (data_type == "instance_id_segmentation_fast" and self.cfg.colorize_instance_id_segmentation)
+            ):
+                tiled_data_buffer = wp.array(
+                    ptr=tiled_data_buffer.ptr, shape=(*tiled_data_buffer.shape, 4), dtype=wp.uint8, device=self.device
+                )
+
            wp.launch(
                kernel=reshape_tiled_image,
                dim=(self._view.count, self.cfg.height, self.cfg.width),
                inputs=[
-                    tiled_data_buffer,
+                    tiled_data_buffer.flatten(),
                    wp.from_torch(self._data.output[data_type]),  # zero-copy alias
                    *list(self._data.output[data_type].shape[1:]),  # height, width, num_channels
                    self._tiling_grid_shape()[0],  # num_tiles_x
-                    offset if data_type == "distance_to_camera" or data_type == "depth" else 0,
                ],
                device=self.device,
            )

+            # alias rgb as first 3 channels of rgba
+            if data_type == "rgba" and "rgb" in self.cfg.data_types:
+                self._data.output["rgb"] = self._data.output["rgba"][..., :3]
+            # alias depth as distance_to_image_plane
+            elif data_type == "distance_to_image_plane" and "depth" in self.cfg.data_types:
+                self._data.output["depth"] = self._data.output["distance_to_image_plane"]
+
    """
    Private Helpers
    """

    def _check_supported_data_types(self, cfg: TiledCameraCfg):
-        """Checks if the data types are supported by the camera."""
-        if not set(cfg.data_types).issubset(TiledCamera.SUPPORTED_TYPES):
+        """Checks if the data types are supported by the ray-caster camera."""
+        # check if there is any intersection in unsupported types
+        # reason: these use np structured data types which we can't yet convert to torch tensor
+        common_elements = set(cfg.data_types) & Camera.UNSUPPORTED_TYPES
+        if common_elements:
+            # provide alternative fast counterparts
+            fast_common_elements = []
+            for item in common_elements:
+                if "instance_segmentation" in item or "instance_id_segmentation" in item:
+                    fast_common_elements.append(item + "_fast")
+            # raise error
            raise ValueError(
-                f"The TiledCamera class only supports the following types {TiledCamera.SUPPORTED_TYPES} but the"
-                f" following where provided: {cfg.data_types}"
+                f"TiledCamera class does not support the following sensor types: {common_elements}."
+                "\n\tThis is because these sensor types output numpy structured data types which"
+                "can't be converted to torch tensors easily."
+                "\n\tHint: If you need to work with these sensor types, we recommend using their fast counterparts."
+                f"\n\t\tFast counterparts: {fast_common_elements}"
            )

    def _create_buffers(self):
@@ -245,16 +326,62 @@ class TiledCamera(Camera):
        self._data.image_shape = self.image_shape
        # -- output data
        data_dict = dict()
+        if "rgba" in self.cfg.data_types or "rgb" in self.cfg.data_types:
+            data_dict["rgba"] = torch.zeros(
+                (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
+            ).contiguous()
        if "rgb" in self.cfg.data_types:
-            data_dict["rgb"] = torch.zeros(
-                (self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device
+            # RGB is the first 3 channels of RGBA
+            data_dict["rgb"] = data_dict["rgba"][..., :3]
+        if "depth" in self.cfg.data_types or "distance_to_image_plane" in self.cfg.data_types:
+            data_dict["distance_to_image_plane"] = torch.zeros(
+                (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device, dtype=torch.float32
+            ).contiguous()
+        if "depth" in self.cfg.data_types:
+            # assume distance_to_image_plane if depth is included in data types
+            data_dict["depth"] = data_dict["distance_to_image_plane"]
+        if "distance_to_camera" in self.cfg.data_types:
+            data_dict["distance_to_camera"] = torch.zeros(
+                (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device, dtype=torch.float32
            ).contiguous()
-        for data_type in ["distance_to_camera", "depth"]:
-            if data_type in self.cfg.data_types:
-                data_dict[data_type] = torch.zeros(
-                    (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device
+        if "normals" in self.cfg.data_types:
+            data_dict["normals"] = torch.zeros(
+                (self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device, dtype=torch.float32
+            ).contiguous()
+        if "motion_vectors" in self.cfg.data_types:
+            data_dict["motion_vectors"] = torch.zeros(
+                (self._view.count, self.cfg.height, self.cfg.width, 2), device=self.device, dtype=torch.float32
+            ).contiguous()
+        if "semantic_segmentation" in self.cfg.data_types:
+            if self.cfg.colorize_semantic_segmentation:
+                data_dict["semantic_segmentation"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
+                ).contiguous()
+            else:
+                data_dict["semantic_segmentation"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device, dtype=torch.int32
                ).contiguous()
+        if "instance_segmentation_fast" in self.cfg.data_types:
+            if self.cfg.colorize_instance_segmentation:
+                data_dict["instance_segmentation_fast"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
+                ).contiguous()
+            else:
+                data_dict["instance_segmentation_fast"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device, dtype=torch.int32
+                ).contiguous()
+        if "instance_id_segmentation_fast" in self.cfg.data_types:
+            if self.cfg.colorize_instance_id_segmentation:
+                data_dict["instance_id_segmentation_fast"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
+                ).contiguous()
+            else:
+                data_dict["instance_id_segmentation_fast"] = torch.zeros(
+                    (self._view.count, self.cfg.height, self.cfg.width, 1), device=self.device, dtype=torch.int32
+                ).contiguous()
+
        self._data.output = TensorDict(data_dict, batch_size=self._view.count, device=self.device)
+        self._data.info = dict()

    def _tiled_image_shape(self) -> tuple[int, int]:
        """Returns a tuple containing the dimension of the tiled image."""

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/tiled_camera_cfg.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/camera/tiled_camera_cfg.py
@@ -3,67 +3,18 @@
 #
 # SPDX-License-Identifier: BSD-3-Clause

-from dataclasses import MISSING
-from typing import Literal
-
-from omni.isaac.lab.sim import FisheyeCameraCfg, PinholeCameraCfg
 from omni.isaac.lab.utils import configclass

-from ..sensor_base_cfg import SensorBaseCfg
+from .camera_cfg import CameraCfg
 from .tiled_camera import TiledCamera


 @configclass
-class TiledCameraCfg(SensorBaseCfg):
+class TiledCameraCfg(CameraCfg):
    """Configuration for a tiled rendering-based camera sensor."""

-    @configclass
-    class OffsetCfg:
-        """The offset pose of the sensor's frame from the sensor's parent frame."""
-
-        pos: tuple[float, float, float] = (0.0, 0.0, 0.0)
-        """Translation w.r.t. the parent frame. Defaults to (0.0, 0.0, 0.0)."""
-
-        rot: tuple[float, float, float, float] = (1.0, 0.0, 0.0, 0.0)
-        """Quaternion rotation (w, x, y, z) w.r.t. the parent frame. Defaults to (1.0, 0.0, 0.0, 0.0)."""
-
-        convention: Literal["opengl", "ros", "world"] = "ros"
-        """The convention in which the frame offset is applied. Defaults to "ros".
-
-        - ``"opengl"`` - forward axis: ``-Z`` - up axis: ``+Y`` - Offset is applied in the OpenGL (Usd.Camera) convention.
-        - ``"ros"``    - forward axis: ``+Z`` - up axis: ``-Y`` - Offset is applied in the ROS convention.
-        - ``"world"``  - forward axis: ``+X`` - up axis: ``+Z`` - Offset is applied in the World Frame convention.
-        """
-
    class_type: type = TiledCamera

-    offset: OffsetCfg = OffsetCfg()
-    """The offset pose of the sensor's frame from the sensor's parent frame. Defaults to identity.
-
-    Note:
-        The parent frame is the frame the sensor attaches to. For example, the parent frame of a
-        camera at path ``/World/envs/env_0/Robot/Camera`` is ``/World/envs/env_0/Robot``.
-    """
-
-    spawn: PinholeCameraCfg | FisheyeCameraCfg | None = MISSING
-    """Spawn configuration for the asset.
-
-    If None, then the prim is not spawned by the asset. Instead, it is assumed that the
-    asset is already present in the scene.
-    """
-
-    data_types: list[str] = ["rgb"]
-    """List of sensor names/types to enable for the camera. Defaults to ["rgb"].
-
-    Please refer to the :class:`TiledCamera` class for a list of available data types.
-    """
-
-    width: int = MISSING
-    """Width of the image in pixels."""
-
-    height: int = MISSING
-    """Height of the image in pixels."""
-
    return_latest_camera_pose: bool = False
    """Whether to return the latest camera pose when fetching the camera's data. Defaults to False.


--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/ray_caster/ray_caster_camera.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/sensors/ray_caster/ray_caster_camera.py
@@ -293,10 +293,12 @@ class RayCasterCamera(RayCaster):
            )[:, :, 0]
            # apply the maximum distance after the transformation
            distance_to_image_plane = torch.clip(distance_to_image_plane, max=self.cfg.max_distance)
-            self._data.output["distance_to_image_plane"][env_ids] = distance_to_image_plane.view(-1, *self.image_shape)
+            self._data.output["distance_to_image_plane"][env_ids] = distance_to_image_plane.view(
+                -1, *self.image_shape, 1
+            )
        if "distance_to_camera" in self.cfg.data_types:
            self._data.output["distance_to_camera"][env_ids] = torch.clip(
-                ray_depth.view(-1, *self.image_shape), max=self.cfg.max_distance
+                ray_depth.view(-1, *self.image_shape, 1), max=self.cfg.max_distance
            )
        if "normals" in self.cfg.data_types:
            self._data.output["normals"][env_ids] = ray_normal.view(-1, *self.image_shape, 3)
@@ -343,7 +345,7 @@ class RayCasterCamera(RayCaster):
        self._data.info = [{name: None for name in self.cfg.data_types}] * self._view.count
        for name in self.cfg.data_types:
            if name in ["distance_to_image_plane", "distance_to_camera"]:
-                shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width)
+                shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width, 1)
            elif name in ["normals"]:
                shape = (self.cfg.pattern_cfg.height, self.cfg.pattern_cfg.width, 3)
            else:

--- a/source/extensions/omni.isaac.lab/omni/isaac/lab/utils/warp/kernels.py
+++ b/source/extensions/omni.isaac.lab/omni/isaac/lab/utils/warp/kernels.py
@@ -5,6 +5,8 @@

 """Custom kernels for warp."""

+from typing import Any
+
 import warp as wp


@@ -75,13 +77,12 @@ def raycast_mesh_kernel(

 @wp.kernel
 def reshape_tiled_image(
-    tiled_image_buffer: wp.array(dtype=float),
-    batched_image: wp.array(dtype=float, ndim=4),
+    tiled_image_buffer: Any,
+    batched_image: Any,
    image_height: int,
    image_width: int,
    num_channels: int,
    num_tiles_x: int,
-    offset: int,
 ):
    """Reshapes a tiled image into a batch of images.

@@ -96,7 +97,6 @@ def reshape_tiled_image(
        image_height: The height of the image.
        num_channels: The number of channels in the image.
        num_tiles_x: The number of tiles in x-direction.
-        offset: The offset in the image buffer. This is used when multiple image types are concatenated in the buffer.
    """
    # get the thread id
    camera_id, height_id, width_id = wp.tid()
@@ -106,12 +106,28 @@ def reshape_tiled_image(
    tile_y_id = camera_id // num_tiles_x
    # compute the start index of the pixel in the tiled image buffer
    pixel_start = (
-        offset
-        + num_channels * num_tiles_x * image_width * (image_height * tile_y_id + height_id)
+        num_channels * num_tiles_x * image_width * (image_height * tile_y_id + height_id)
        + num_channels * tile_x_id * image_width
        + num_channels * width_id
    )

    # copy the pixel values into the batched image
    for i in range(num_channels):
-        batched_image[camera_id, height_id, width_id, i] = tiled_image_buffer[pixel_start + i]
+        batched_image[camera_id, height_id, width_id, i] = batched_image.dtype(tiled_image_buffer[pixel_start + i])
+
+
+# uint32 -> int32 conversion is required for non-colored segmentation annotators
+wp.overload(
+    reshape_tiled_image,
+    {"tiled_image_buffer": wp.array(dtype=wp.uint32), "batched_image": wp.array(dtype=wp.uint32, ndim=4)},
+)
+# uint8 is used for 4 channel annotators
+wp.overload(
+    reshape_tiled_image,
+    {"tiled_image_buffer": wp.array(dtype=wp.uint8), "batched_image": wp.array(dtype=wp.uint8, ndim=4)},
+)
+# float32 is used for single channel annotators
+wp.overload(
+    reshape_tiled_image,
+    {"tiled_image_buffer": wp.array(dtype=wp.float32), "batched_image": wp.array(dtype=wp.float32, ndim=4)},
+)
--- a/source/extensions/omni.isaac.lab/test/sensors/test_camera.py
+++ b/source/extensions/omni.isaac.lab/test/sensors/test_camera.py
@@ -122,7 +122,7 @@ class TestCamera(unittest.TestCase):
            camera.update(self.dt)
            # check image data
            for im_data in camera.data.output.to_dict().values():
-                self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width))
+                self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 1))

    def test_camera_init_offset(self):
        """Test camera initialization with offset using different conventions."""
@@ -229,7 +229,7 @@ class TestCamera(unittest.TestCase):
            # check image data
            for cam in [cam_1, cam_2]:
                for im_data in cam.data.output.to_dict().values():
-                    self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width))
+                    self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 1))

    def test_multi_camera_with_different_resolution(self):
        """Test multi-camera initialization with cameras having different image resolutions."""
@@ -399,8 +399,12 @@ class TestCamera(unittest.TestCase):
        camera_cfg = copy.deepcopy(self.camera_cfg)
        camera_cfg.data_types = [
            "rgb",
+            "rgba",
+            "depth",
+            "distance_to_camera",
            "distance_to_image_plane",
            "normals",
+            "motion_vectors",
            "semantic_segmentation",
            "instance_segmentation_fast",
            "instance_id_segmentation_fast",
@@ -422,22 +426,32 @@ class TestCamera(unittest.TestCase):
        camera.update(self.dt)

        # expected sizes
-        hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
-        hw_1c_shape = (1, camera_cfg.height, camera_cfg.width)
+        hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
+        hw_2c_shape = (1, camera_cfg.height, camera_cfg.width, 2)
+        hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
+        hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
        # access image data and compare shapes
        output = camera.data.output
        self.assertEqual(output["rgb"].shape, hw_3c_shape)
+        self.assertEqual(output["rgba"].shape, hw_4c_shape)
+        self.assertEqual(output["depth"].shape, hw_1c_shape)
+        self.assertEqual(output["distance_to_camera"].shape, hw_1c_shape)
        self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape)
        self.assertEqual(output["normals"].shape, hw_3c_shape)
-        self.assertEqual(output["semantic_segmentation"].shape, hw_3c_shape)
-        self.assertEqual(output["instance_segmentation_fast"].shape, hw_3c_shape)
-        self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_3c_shape)
+        self.assertEqual(output["motion_vectors"].shape, hw_2c_shape)
+        self.assertEqual(output["semantic_segmentation"].shape, hw_4c_shape)
+        self.assertEqual(output["instance_segmentation_fast"].shape, hw_4c_shape)
+        self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_4c_shape)

        # access image data and compare dtype
        output = camera.data.output
        self.assertEqual(output["rgb"].dtype, torch.uint8)
+        self.assertEqual(output["rgba"].dtype, torch.uint8)
+        self.assertEqual(output["depth"].dtype, torch.float)
+        self.assertEqual(output["distance_to_camera"].dtype, torch.float)
        self.assertEqual(output["distance_to_image_plane"].dtype, torch.float)
        self.assertEqual(output["normals"].dtype, torch.float)
+        self.assertEqual(output["motion_vectors"].dtype, torch.float)
        self.assertEqual(output["semantic_segmentation"].dtype, torch.uint8)
        self.assertEqual(output["instance_segmentation_fast"].dtype, torch.uint8)
        self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.uint8)
@@ -448,8 +462,12 @@ class TestCamera(unittest.TestCase):
        camera_cfg = copy.deepcopy(self.camera_cfg)
        camera_cfg.data_types = [
            "rgb",
+            "rgba",
+            "depth",
+            "distance_to_camera",
            "distance_to_image_plane",
            "normals",
+            "motion_vectors",
            "semantic_segmentation",
            "instance_segmentation_fast",
            "instance_id_segmentation_fast",
@@ -470,13 +488,19 @@ class TestCamera(unittest.TestCase):
        camera.update(self.dt)

        # expected sizes
-        hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
-        hw_1c_shape = (1, camera_cfg.height, camera_cfg.width)
+        hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
+        hw_2c_shape = (1, camera_cfg.height, camera_cfg.width, 2)
+        hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
+        hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
        # access image data and compare shapes
        output = camera.data.output
        self.assertEqual(output["rgb"].shape, hw_3c_shape)
+        self.assertEqual(output["rgba"].shape, hw_4c_shape)
+        self.assertEqual(output["depth"].shape, hw_1c_shape)
+        self.assertEqual(output["distance_to_camera"].shape, hw_1c_shape)
        self.assertEqual(output["distance_to_image_plane"].shape, hw_1c_shape)
        self.assertEqual(output["normals"].shape, hw_3c_shape)
+        self.assertEqual(output["motion_vectors"].shape, hw_2c_shape)
        self.assertEqual(output["semantic_segmentation"].shape, hw_1c_shape)
        self.assertEqual(output["instance_segmentation_fast"].shape, hw_1c_shape)
        self.assertEqual(output["instance_id_segmentation_fast"].shape, hw_1c_shape)
@@ -484,12 +508,100 @@ class TestCamera(unittest.TestCase):
        # access image data and compare dtype
        output = camera.data.output
        self.assertEqual(output["rgb"].dtype, torch.uint8)
+        self.assertEqual(output["rgba"].dtype, torch.uint8)
+        self.assertEqual(output["depth"].dtype, torch.float)
+        self.assertEqual(output["distance_to_camera"].dtype, torch.float)
        self.assertEqual(output["distance_to_image_plane"].dtype, torch.float)
        self.assertEqual(output["normals"].dtype, torch.float)
+        self.assertEqual(output["motion_vectors"].dtype, torch.float)
        self.assertEqual(output["semantic_segmentation"].dtype, torch.int32)
        self.assertEqual(output["instance_segmentation_fast"].dtype, torch.int32)
        self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.int32)

+    def test_camera_resolution_rgb_only(self):
+        """Test camera resolution is correctly set for RGB only."""
+        # Add all types
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = [
+            "rgb",
+        ]
+        # Create camera
+        camera = Camera(camera_cfg)
+
+        # Play sim
+        self.sim.reset()
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+        camera.update(self.dt)
+
+        # expected sizes
+        hw_3c_shape = (1, camera_cfg.height, camera_cfg.width, 3)
+        # access image data and compare shapes
+        output = camera.data.output
+        self.assertEqual(output["rgb"].shape, hw_3c_shape)
+        # access image data and compare dtype
+        self.assertEqual(output["rgb"].dtype, torch.uint8)
+
+    def test_camera_resolution_rgba_only(self):
+        """Test camera resolution is correctly set for RGBA only."""
+        # Add all types
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = [
+            "rgba",
+        ]
+        # Create camera
+        camera = Camera(camera_cfg)
+
+        # Play sim
+        self.sim.reset()
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+        camera.update(self.dt)
+
+        # expected sizes
+        hw_4c_shape = (1, camera_cfg.height, camera_cfg.width, 4)
+        # access image data and compare shapes
+        output = camera.data.output
+        self.assertEqual(output["rgba"].shape, hw_4c_shape)
+        # access image data and compare dtype
+        self.assertEqual(output["rgba"].dtype, torch.uint8)
+
+    def test_camera_resolution_depth_only(self):
+        """Test camera resolution is correctly set for depth only."""
+        # Add all types
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = [
+            "depth",
+        ]
+        # Create camera
+        camera = Camera(camera_cfg)
+
+        # Play sim
+        self.sim.reset()
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+        camera.update(self.dt)
+
+        # expected sizes
+        hw_1c_shape = (1, camera_cfg.height, camera_cfg.width, 1)
+        # access image data and compare shapes
+        output = camera.data.output
+        self.assertEqual(output["depth"].shape, hw_1c_shape)
+        # access image data and compare dtype
+        self.assertEqual(output["depth"].dtype, torch.float)
+
    def test_throughput(self):
        """Checks that the single camera gets created properly with a rig."""
        # Create directory temp dir to dump the results
@@ -540,7 +652,7 @@ class TestCamera(unittest.TestCase):
            print("----------------------------------------")
            # Check image data
            for im_data in camera.data.output.values():
-                self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width))
+                self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width, 1))

    """
    Helper functions.

--- a/source/extensions/omni.isaac.lab/test/sensors/test_ray_caster_camera.py
+++ b/source/extensions/omni.isaac.lab/test/sensors/test_ray_caster_camera.py
@@ -131,7 +131,7 @@ class TestWarpCamera(unittest.TestCase):
            # check image data
            for im_data in camera.data.output.to_dict().values():
                self.assertEqual(
-                    im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width)
+                    im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
                )

    def test_camera_resolution(self):
@@ -148,7 +148,9 @@ class TestWarpCamera(unittest.TestCase):
        camera.update(self.dt)
        # access image data and compare shapes
        for im_data in camera.data.output.to_dict().values():
-            self.assertTrue(im_data.shape == (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width))
+            self.assertTrue(
+                im_data.shape == (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
+            )

    def test_camera_init_offset(self):
        """Test camera initialization with offset using different conventions."""
@@ -289,7 +291,7 @@ class TestWarpCamera(unittest.TestCase):
            for cam in [cam_1, cam_2]:
                for im_data in cam.data.output.to_dict().values():
                    self.assertEqual(
-                        im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width)
+                        im_data.shape, (1, self.camera_cfg.pattern_cfg.height, self.camera_cfg.pattern_cfg.width, 1)
                    )

    def test_camera_set_world_poses(self):
@@ -402,7 +404,7 @@ class TestWarpCamera(unittest.TestCase):
            print("----------------------------------------")
            # Check image data
            for im_data in camera.data.output.values():
-                self.assertEqual(im_data.shape, (1, camera_cfg.pattern_cfg.height, camera_cfg.pattern_cfg.width))
+                self.assertEqual(im_data.shape, (1, camera_cfg.pattern_cfg.height, camera_cfg.pattern_cfg.width, 1))

    def test_output_equal_to_usdcamera(self):
        camera_pattern_cfg = patterns.PinholeCameraPatternCfg(

--- a/source/extensions/omni.isaac.lab/test/sensors/test_tiled_camera.py
+++ b/source/extensions/omni.isaac.lab/test/sensors/test_tiled_camera.py
@@ -114,9 +114,10 @@ class TestTiledCamera(unittest.TestCase):
            for im_type, im_data in camera.data.output.to_dict().items():
                if im_type == "rgb":
                    self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 3))
-                else:
+                    self.assertGreater((im_data / 255.0).mean().item(), 0.0)
+                elif im_type == "depth":
                    self.assertEqual(im_data.shape, (1, self.camera_cfg.height, self.camera_cfg.width, 1))
-                self.assertGreater(im_data.mean().item(), 0.0)
+                    self.assertGreater(im_data.mean().item(), 0.0)
        del camera

    def test_multi_camera_init(self):
@@ -163,10 +164,12 @@ class TestTiledCamera(unittest.TestCase):
            for im_type, im_data in camera.data.output.to_dict().items():
                if im_type == "rgb":
                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 3))
-                else:
+                    self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                    self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+                elif im_type == "depth":
                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
-                self.assertGreater(im_data[0].mean().item(), 0.0)
-                self.assertGreater(im_data[1].mean().item(), 0.0)
+                    self.assertGreater(im_data[0].mean().item(), 0.0)
+                    self.assertGreater(im_data[1].mean().item(), 0.0)
        del camera

    def test_rgb_only_camera(self):
@@ -189,7 +192,7 @@ class TestTiledCamera(unittest.TestCase):
        # Check if camera prim is set correctly and that it is a camera prim
        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
-        self.assertListEqual(list(camera.data.output.keys()), ["rgb"])
+        self.assertListEqual(sorted(camera.data.output.keys()), sorted(["rgb", "rgba"]))
        # Simulate for a few steps
        # note: This is a workaround to ensure that the textures are loaded.
        #   Check "Known Issues" section in the documentation for more details.
@@ -211,10 +214,12 @@ class TestTiledCamera(unittest.TestCase):
            # update camera
            camera.update(self.dt)
            # check image data
-            for _, im_data in camera.data.output.to_dict().items():
-                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 3))
-                self.assertGreater(im_data[0].mean().item(), 0.0)
-                self.assertGreater(im_data[1].mean().item(), 0.0)
+            im_data = camera.data.output["rgb"]
+            self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 3))
+            self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+            self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["rgb"].dtype, torch.uint8)
        del camera

    def test_data_types(self):
@@ -250,6 +255,108 @@ class TestTiledCamera(unittest.TestCase):
        prim_utils.create_prim("/World/Origin_00", "Xform")
        prim_utils.create_prim("/World/Origin_01", "Xform")

+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["distance_to_camera"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(sorted(camera.data.output.keys()), sorted(["depth", "distance_to_image_plane"]))
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            im_data = camera.data.output["depth"]
+            self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+            self.assertGreater(im_data[0].mean().item(), 0.0)
+            self.assertGreater(im_data[1].mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["depth"].dtype, torch.float)
+        del camera
+
+    def test_rgba_only_camera(self):
+        """Test initialization with only RGBA."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["rgba"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["rgba"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 4))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["rgba"].dtype, torch.uint8)
+        del camera
+
+    def test_distance_to_camera_only_camera(self):
+        """Test initialization with only distance_to_camera."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
        # Create camera
        camera_cfg = copy.deepcopy(self.camera_cfg)
        camera_cfg.data_types = ["distance_to_camera"]
@@ -291,6 +398,572 @@ class TestTiledCamera(unittest.TestCase):
                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
                self.assertGreater(im_data[0].mean().item(), 0.0)
                self.assertGreater(im_data[1].mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["distance_to_camera"].dtype, torch.float)
+        del camera
+
+    def test_distance_to_image_plane_only_camera(self):
+        """Test initialization with only distance_to_image_plane."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["distance_to_image_plane"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["distance_to_image_plane"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+                self.assertGreater(im_data[0].mean().item(), 0.0)
+                self.assertGreater(im_data[1].mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["distance_to_image_plane"].dtype, torch.float)
+        del camera
+
+    def test_normals_only_camera(self):
+        """Test initialization with only normals."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["normals"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["normals"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 3))
+                self.assertGreater(im_data[0].mean().item(), 0.0)
+                self.assertGreater(im_data[1].mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["normals"].dtype, torch.float)
+        del camera
+
+    def test_motion_vectors_only_camera(self):
+        """Test initialization with only motion_vectors."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["motion_vectors"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["motion_vectors"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 2))
+                self.assertGreater(im_data[0].mean().item(), 0.0)
+                self.assertGreater(im_data[1].mean().item(), 0.0)
+        # Check data type of image
+        self.assertEqual(camera.data.output["motion_vectors"].dtype, torch.float)
+        del camera
+
+    def test_semantic_segmentation_colorize_only_camera(self):
+        """Test initialization with only semantic_segmentation."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["semantic_segmentation"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["semantic_segmentation"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 4))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["semantic_segmentation"].dtype, torch.uint8)
+        self.assertEqual(type(camera.data.info["semantic_segmentation"]), dict)
+        del camera
+
+    def test_instance_segmentation_fast_colorize_only_camera(self):
+        """Test initialization with only instance_segmentation_fast."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["instance_segmentation_fast"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["instance_segmentation_fast"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 4))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["instance_segmentation_fast"].dtype, torch.uint8)
+        self.assertEqual(type(camera.data.info["instance_segmentation_fast"]), dict)
+        del camera
+
+    def test_instance_id_segmentation_fast_colorize_only_camera(self):
+        """Test initialization with only instance_id_segmentation_fast."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["instance_id_segmentation_fast"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["instance_id_segmentation_fast"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 4))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["instance_id_segmentation_fast"].dtype, torch.uint8)
+        self.assertEqual(type(camera.data.info["instance_id_segmentation_fast"]), dict)
+        del camera
+
+    def test_semantic_segmentation_non_colorize_only_camera(self):
+        """Test initialization with only semantic_segmentation."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["semantic_segmentation"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera_cfg.colorize_semantic_segmentation = False
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["semantic_segmentation"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["semantic_segmentation"].dtype, torch.int32)
+        self.assertEqual(type(camera.data.info["semantic_segmentation"]), dict)
+
+        del camera
+
+    def test_instance_segmentation_fast_non_colorize_only_camera(self):
+        """Test initialization with only instance_segmentation_fast."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["instance_segmentation_fast"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera_cfg.colorize_instance_segmentation = False
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["instance_segmentation_fast"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["instance_segmentation_fast"].dtype, torch.int32)
+        self.assertEqual(type(camera.data.info["instance_segmentation_fast"]), dict)
+        del camera
+
+    def test_instance_id_segmentation_fast_non_colorize_only_camera(self):
+        """Test initialization with only instance_id_segmentation_fast."""
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = ["instance_id_segmentation_fast"]
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera_cfg.colorize_instance_id_segmentation = False
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(list(camera.data.output.keys()), ["instance_id_segmentation_fast"])
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for _, im_data in camera.data.output.to_dict().items():
+                self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+                self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+        # Check data type of image and info
+        self.assertEqual(camera.data.output["instance_id_segmentation_fast"].dtype, torch.int32)
+        self.assertEqual(type(camera.data.info["instance_id_segmentation_fast"]), dict)
+        del camera
+
+    def test_all_annotators_camera(self):
+        """Test initialization with all supported annotators."""
+        all_annotator_types = [
+            "rgb",
+            "rgba",
+            "depth",
+            "distance_to_camera",
+            "distance_to_image_plane",
+            "normals",
+            "motion_vectors",
+            "semantic_segmentation",
+            "instance_segmentation_fast",
+            "instance_id_segmentation_fast",
+        ]
+
+        prim_utils.create_prim("/World/Origin_00", "Xform")
+        prim_utils.create_prim("/World/Origin_01", "Xform")
+
+        # Create camera
+        camera_cfg = copy.deepcopy(self.camera_cfg)
+        camera_cfg.data_types = all_annotator_types
+        camera_cfg.prim_path = "/World/Origin_.*/CameraSensor"
+        camera = TiledCamera(camera_cfg)
+        # Check simulation parameter is set correctly
+        self.assertTrue(self.sim.has_rtx_sensors())
+        # Play sim
+        self.sim.reset()
+        # Check if camera is initialized
+        self.assertTrue(camera.is_initialized)
+        # Check if camera prim is set correctly and that it is a camera prim
+        self.assertEqual(camera._sensor_prims[1].GetPath().pathString, "/World/Origin_01/CameraSensor")
+        self.assertIsInstance(camera._sensor_prims[0], UsdGeom.Camera)
+        self.assertListEqual(sorted(camera.data.output.keys()), sorted(all_annotator_types))
+
+        # Simulate for a few steps
+        # note: This is a workaround to ensure that the textures are loaded.
+        #   Check "Known Issues" section in the documentation for more details.
+        for _ in range(5):
+            self.sim.step()
+
+        # Check buffers that exists and have correct shapes
+        self.assertEqual(camera.data.pos_w.shape, (2, 3))
+        self.assertEqual(camera.data.quat_w_ros.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_world.shape, (2, 4))
+        self.assertEqual(camera.data.quat_w_opengl.shape, (2, 4))
+        self.assertEqual(camera.data.intrinsic_matrices.shape, (2, 3, 3))
+        self.assertEqual(camera.data.image_shape, (self.camera_cfg.height, self.camera_cfg.width))
+
+        # Simulate physics
+        for _ in range(10):
+            # perform rendering
+            self.sim.step()
+            # update camera
+            camera.update(self.dt)
+            # check image data
+            for data_type, im_data in camera.data.output.to_dict().items():
+                if data_type in ["rgb", "normals"]:
+                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 3))
+                elif data_type in [
+                    "rgba",
+                    "semantic_segmentation",
+                    "instance_segmentation_fast",
+                    "instance_id_segmentation_fast",
+                ]:
+                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 4))
+                    self.assertGreater((im_data[0] / 255.0).mean().item(), 0.0)
+                    self.assertGreater((im_data[1] / 255.0).mean().item(), 0.0)
+                elif data_type in ["motion_vectors"]:
+                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 2))
+                    self.assertGreater(im_data[0].mean().item(), 0.0)
+                    self.assertGreater(im_data[1].mean().item(), 0.0)
+                elif data_type in ["depth", "distance_to_camera", "distance_to_image_plane"]:
+                    self.assertEqual(im_data.shape, (2, self.camera_cfg.height, self.camera_cfg.width, 1))
+                    self.assertGreater(im_data[0].mean().item(), 0.0)
+                    self.assertGreater(im_data[1].mean().item(), 0.0)
+
+        # access image data and compare dtype
+        output = camera.data.output
+        info = camera.data.info
+        self.assertEqual(output["rgb"].dtype, torch.uint8)
+        self.assertEqual(output["rgba"].dtype, torch.uint8)
+        self.assertEqual(output["depth"].dtype, torch.float)
+        self.assertEqual(output["distance_to_camera"].dtype, torch.float)
+        self.assertEqual(output["distance_to_image_plane"].dtype, torch.float)
+        self.assertEqual(output["normals"].dtype, torch.float)
+        self.assertEqual(output["motion_vectors"].dtype, torch.float)
+        self.assertEqual(output["semantic_segmentation"].dtype, torch.uint8)
+        self.assertEqual(output["instance_segmentation_fast"].dtype, torch.uint8)
+        self.assertEqual(output["instance_id_segmentation_fast"].dtype, torch.uint8)
+        self.assertEqual(type(info["semantic_segmentation"]), dict)
+        self.assertEqual(type(info["instance_segmentation_fast"]), dict)
+        self.assertEqual(type(info["instance_id_segmentation_fast"]), dict)
+
        del camera

    def test_throughput(self):
@@ -322,9 +995,10 @@ class TestTiledCamera(unittest.TestCase):
            for im_type, im_data in camera.data.output.to_dict().items():
                if im_type == "rgb":
                    self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width, 3))
-                else:
+                    self.assertGreater((im_data / 255.0).mean().item(), 0.0)
+                elif im_type == "depth":
                    self.assertEqual(im_data.shape, (1, camera_cfg.height, camera_cfg.width, 1))
-                self.assertGreater(im_data.mean().item(), 0.0)
+                    self.assertGreater(im_data.mean().item(), 0.0)
        del camera

    def test_output_equal_to_usd_camera_intrinsics(self):

--- a/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_camera_env.py
+++ b/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/direct/cartpole/cartpole_camera_env.py
@@ -19,7 +19,6 @@ from omni.isaac.lab.envs import DirectRLEnv, DirectRLEnvCfg, ViewerCfg
 from omni.isaac.lab.scene import InteractiveSceneCfg
 from omni.isaac.lab.sensors import TiledCamera, TiledCameraCfg, save_images_to_file
 from omni.isaac.lab.sim import SimulationCfg
-from omni.isaac.lab.sim.spawners.from_files import GroundPlaneCfg, spawn_ground_plane
 from omni.isaac.lab.utils import configclass
 from omni.isaac.lab.utils.math import sample_uniform

@@ -45,7 +44,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg):
    # camera
    tiled_camera: TiledCameraCfg = TiledCameraCfg(
        prim_path="/World/envs/env_.*/Camera",
-        offset=TiledCameraCfg.OffsetCfg(pos=(-7.0, 0.0, 3.0), rot=(0.9945, 0.0, 0.1045, 0.0), convention="world"),
+        offset=TiledCameraCfg.OffsetCfg(pos=(-5.0, 0.0, 2.0), rot=(1.0, 0.0, 0.0, 0.0), convention="world"),
        data_types=["rgb"],
        spawn=sim_utils.PinholeCameraCfg(
            focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0)
@@ -60,7 +59,7 @@ class CartpoleRGBCameraEnvCfg(DirectRLEnvCfg):
    viewer = ViewerCfg(eye=(20.0, 20.0, 20.0))

    # scene
-    scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=256, env_spacing=20.0, replicate_physics=True)
+    scene: InteractiveSceneCfg = InteractiveSceneCfg(num_envs=1024, env_spacing=20.0, replicate_physics=True)

    # reset
    max_cart_pos = 3.0  # the cart is reset if it exceeds that position [m]
@@ -79,8 +78,8 @@ class CartpoleDepthCameraEnvCfg(CartpoleRGBCameraEnvCfg):
    # camera
    tiled_camera: TiledCameraCfg = TiledCameraCfg(
        prim_path="/World/envs/env_.*/Camera",
-        offset=TiledCameraCfg.OffsetCfg(pos=(-7.0, 0.0, 3.0), rot=(0.9945, 0.0, 0.1045, 0.0), convention="world"),
-        data_types=["distance_to_camera"],
+        offset=TiledCameraCfg.OffsetCfg(pos=(-5.0, 0.0, 2.0), rot=(1.0, 0.0, 0.0, 0.0), convention="world"),
+        data_types=["depth"],
        spawn=sim_utils.PinholeCameraCfg(
            focal_length=24.0, focus_distance=400.0, horizontal_aperture=20.955, clipping_range=(0.1, 20.0)
        ),
@@ -152,8 +151,7 @@ class CartpoleCameraEnv(DirectRLEnv):
        """Setup the scene with the cartpole and camera."""
        self._cartpole = Articulation(self.cfg.robot_cfg)
        self._tiled_camera = TiledCamera(self.cfg.tiled_camera)
-        # add ground plane
-        spawn_ground_plane(prim_path="/World/ground", cfg=GroundPlaneCfg(size=(500, 500)))
+
        # clone, filter, and replicate
        self.scene.clone_environments(copy_from_source=False)
        self.scene.filter_collisions(global_prim_paths=[])
@@ -172,8 +170,16 @@ class CartpoleCameraEnv(DirectRLEnv):
        self._cartpole.set_joint_effort_target(self.actions, joint_ids=self._cart_dof_idx)

    def _get_observations(self) -> dict:
-        data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "distance_to_camera"
-        observations = {"policy": self._tiled_camera.data.output[data_type].clone()}
+        data_type = "rgb" if "rgb" in self.cfg.tiled_camera.data_types else "depth"
+        if "rgb" in self.cfg.tiled_camera.data_types:
+            camera_data = self._tiled_camera.data.output[data_type] / 255.0
+            # normalize the camera data for better training results
+            mean_tensor = torch.mean(camera_data, dim=(1, 2), keepdim=True)
+            camera_data -= mean_tensor
+        elif "depth" in self.cfg.tiled_camera.data_types:
+            camera_data = self._tiled_camera.data.output[data_type]
+            camera_data[camera_data == float("inf")] = 0
+        observations = {"policy": camera_data.clone()}

        if self.cfg.write_image_to_file:
            save_images_to_file(observations["policy"], f"cartpole_{data_type}.png")

--- a/source/standalone/demos/cameras.py
+++ b/source/standalone/demos/cameras.py
@@ -96,7 +96,7 @@ class SensorsSceneCfg(InteractiveSceneCfg):
        update_period=0.1,
        height=480,
        width=640,
-        data_types=["rgb", "distance_to_camera"],
+        data_types=["rgb", "distance_to_image_plane"],
        spawn=None,  # the camera is already spawned in the scene
        offset=TiledCameraCfg.OffsetCfg(pos=(0.510, 0.0, 0.015), rot=(0.5, -0.5, 0.5, -0.5), convention="ros"),
    )
@@ -221,7 +221,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
        print("-------------------------------")
        print(scene["tiled_camera"])
        print("Received shape of rgb   image: ", scene["tiled_camera"].data.output["rgb"].shape)
-        print("Received shape of depth image: ", scene["tiled_camera"].data.output["distance_to_camera"].shape)
+        print("Received shape of depth image: ", scene["tiled_camera"].data.output["distance_to_image_plane"].shape)
        print("-------------------------------")
        print(scene["raycast_camera"])
        print("Received shape of depth: ", scene["raycast_camera"].data.output["distance_to_image_plane"].shape)
@@ -242,7 +242,7 @@ def run_simulator(sim: sim_utils.SimulationContext, scene: InteractiveScene):
            # compare generated Depth images across different cameras
            depth_images = [
                scene["camera"].data.output["distance_to_image_plane"][0],
-                scene["tiled_camera"].data.output["distance_to_camera"][0, ..., 0],
+                scene["tiled_camera"].data.output["distance_to_image_plane"][0, ..., 0],
                scene["raycast_camera"].data.output["distance_to_image_plane"][0],
            ]
            save_images_grid(

--- a/tools/per_test_timeouts.py
+++ b/tools/per_test_timeouts.py
@@ -11,6 +11,8 @@ PER_TEST_TIMEOUTS = {
    "test_environments.py": 1200,  # This test runs through all the environments for 100 steps each
    "test_environment_determinism.py": 200,  # This test runs through many the environments for 100 steps each
    "test_env_rendering_logic.py": 300,
+    "test_camera.py": 500,
+    "test_tiled_camera.py": 300,
    "test_rsl_rl_wrapper.py": 200,
    "test_sb3_wrapper.py": 200,
    "test_skrl_wrapper.py": 200,