Unverified Commit f8d80cb1 authored by glvov-bdai's avatar glvov-bdai Committed by GitHub

Add Camera Benchmark Tool and Allow Correct Unprojection of distance_to_camera depth image (#976)

parent e6f4ed10
......@@ -41,6 +41,7 @@ Guidelines for modifications:
* Calvin Yu
* Chenyu Yang
* David Yang
* Gary Lvov
* HoJin Jeon
* Jean Tampon
* Jia Lin Yuan
......
.. _how-to-estimate-how-cameras-can-run:
Find How Many/What Cameras You Should Train With
================================================
.. currentmodule:: omni.isaac.lab
Currently in Isaac Lab, there are several camera types; USD Cameras (standard), Tiled Cameras,
and Ray Caster cameras. These camera types differ in functionality and performance. The ``benchmark_cameras.py``
script can be used to understand the difference in cameras types, as well to characterize their relative performance
at different parameters such as camera quantity, image dimensions, and data types.
This utility is provided so that one easily can find the camera type/parameters that are the most performant
while meeting the requirements of the user's scenario. This utility also helps estimate
the maximum number of cameras one can realistically run, assuming that one wants to maximize the number
of environments while minimizing step time.
This utility can inject cameras into an existing task from the gym registry,
which can be useful for benchmarking cameras in a specific scenario. Also,
if you install ``pynvml``, you can let this utility automatically find the maximum
numbers of cameras that can run in your task environment up to a
certain specified system resource utilization threshold (without training; taking zero actions
at each timestep).
This guide accompanies the ``benchmark_cameras.py`` script in the ``IsaacLab/source/standalone/tutorials/04_sensors``
directory.
.. dropdown:: Code for benchmark_cameras.py
:icon: code
.. literalinclude:: ../../../source/standalone/tutorials/04_sensors/benchmark_cameras.py
:language: python
:linenos:
Possible Parameters
-------------------
First, run
.. code-block:: bash
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py -h
to see all possible parameters you can vary with this utility.
See the command line parameters related to ``autotune`` for more information about
automatically determining maximum camera count.
Compare Performance in Task Environments and Automatically Determine Task Max Camera Count
------------------------------------------------------------------------------------------
Currently, tiled cameras are the most performant camera that can handle multiple dynamic objects.
For example, to see how your system could handle 100 tiled cameras in
the cartpole environment, with 2 cameras per environment (so 50 environments total)
only in RGB mode, run
.. code-block:: bash
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb
If you have pynvml installed, (``./isaaclab.sh -p -m pip install pynvml``), you can also
find the maximum number of cameras that you could run in the specified environment up to
a certain performance threshold (specified by max CPU utilization percent, max RAM utilization percent,
max GPU compute percent, and max GPU memory percent). For example, to find the maximum number of cameras
you can run with cartpole, you could run:
.. code-block:: bash
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--task Isaac-Cartpole-v0 --num_tiled_cameras 100 \
--task_num_cameras_per_env 2 \
--tiled_camera_data_types rgb --autotune \
--autotune_max_percentage_util 100 80 50 50
Autotune may lead to the program crashing, which means that it tried to run too many cameras at once.
However, the max percentage utilization parameter is meant to prevent this from happening.
The output of the benchmark doesn't include the overhead of training the network, so consider
decreasing the maximum utilization percentages to account for this overhead. The final output camera
count is for all cameras, so to get the total number of environments, divide the output camera count
by the number of cameras per environment.
Compare Camera Type and Performance (Without a Specified Task)
--------------------------------------------------------------
This tool can also asses performance without a task environment.
For example, to view 100 random objects with 2 standard cameras, one could run
.. code-block:: bash
./isaaclab.sh -p source/standalone/tutorials/04_sensors/benchmark_cameras.py \
--height 100 --width 100 --num_standard_cameras 2 \
--standard_camera_data_types instance_segmentation_fast normals --num_objects 100 \
--experiment_length 100
If your system cannot handle this due to performance reasons, then the process will be killed.
It's recommended to monitor CPU/RAM utilization and GPU utilization while running this script, to get
an idea of how many resources rendering the desired camera requires. In Ubuntu, you can use tools like ``htop`` and ``nvtop``
to live monitor resources while running this script, and in Windows, you can use the Task Manager.
If your system has a hard time handling the desired cameras, you can try the following
- Switch to headless mode (supply ``--headless``)
- Ensure you are using the GPU pipeline not CPU!
- If you aren't using Tiled Cameras, switch to Tiled Cameras
- Decrease camera resolution
- Decrease how many data_types there are for each camera.
- Decrease the number of cameras
- Decrease the number of objects in the scene
If your system is able to handle the amount of cameras, then the time statistics will be printed to the terminal.
After the simulations stops it can be closed with CTRL C.
......@@ -46,6 +46,17 @@ This guide explains how to save the camera output in Isaac Lab.
save_camera_output
Estimate How Many Cameras Can Run On Your Machine
-------------------------------------------------
This guide demonstrates how to estimate the number of cameras one can run on their machine under the desired parameters.
.. toctree::
:maxdepth: 1
estimate_how_many_cameras_can_run
Drawing Markers
---------------
......
......@@ -2,6 +2,18 @@ Changelog
---------
0.24.14 (2024-09-20)
~~~~~~~~~~~~~~~~~~~~
Added
^^^^^
* Added :meth:`convert_perspective_depth_to_orthogonal_depth`. :meth:`unproject_depth` assumes
that the input depth image is orthogonal. The new :meth:`convert_perspective_depth_to_orthogonal_depth`
can be used to convert a perspective depth image into an orthogonal depth image, so that the point cloud
can be unprojected correctly with :meth:`unproject_depth`.
0.24.13 (2024-09-08)
~~~~~~~~~~~~~~~~~~~~
......
......@@ -988,7 +988,12 @@ Projection operations.
@torch.jit.script
def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
r"""Unproject depth image into a pointcloud.
r"""Unproject depth image into a pointcloud. This method assumes that depth
is provided orthogonally relative to the image plane, as opposed to absolutely relative to the camera's
principal point (perspective depth). To unproject a perspective depth image, use
:meth:`convert_perspective_depth_to_orthogonal_depth` to convert
to an orthogonal depth image prior to calling this method. Otherwise, the
created point cloud will be distorted, especially around the edges.
This function converts depth images into points given the calibration matrix of the camera.
......@@ -1059,6 +1064,105 @@ def unproject_depth(depth: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tens
return points_xyz
@torch.jit.script
def convert_perspective_depth_to_orthogonal_depth(
perspective_depth: torch.Tensor, intrinsics: torch.Tensor
) -> torch.Tensor:
r"""Provided depth image(s) where depth is provided as the distance to the principal
point of the camera (perspective depth), this function converts it so that depth
is provided as the distance to the camera's image plane (orthogonal depth).
This is helpful because `unproject_depth` assumes that depth is expressed in
the orthogonal depth format.
If `perspective_depth` is a batch of depth images and `intrinsics` is a single intrinsic matrix,
the same calibration matrix is applied to all depth images in the batch.
The function assumes that the width and height are both greater than 1.
Args:
perspective_depth: The depth measurement obtained with the distance_to_camera replicator.
Shape is (H, W) or or (H, W, 1) or (N, H, W) or (N, H, W, 1).
intrinsics: A tensor providing camera's calibration matrix. Shape is (3, 3) or (N, 3, 3).
Returns:
The depth image as if obtained by the distance_to_image_plane replicator. Shape
matches the input shape of depth
Raises:
ValueError: When depth is not of shape (H, W) or (H, W, 1) or (N, H, W) or (N, H, W, 1).
ValueError: When intrinsics is not of shape (3, 3) or (N, 3, 3).
"""
# Clone inputs to avoid in-place modifications
perspective_depth_batch = perspective_depth.clone()
intrinsics_batch = intrinsics.clone()
# Check if inputs are batched
is_batched = perspective_depth_batch.dim() == 4 or (
perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] != 1
)
# Track whether the last dimension was singleton
add_last_dim = False
if perspective_depth_batch.dim() == 4 and perspective_depth_batch.shape[-1] == 1:
add_last_dim = True
perspective_depth_batch = perspective_depth_batch.squeeze(dim=3) # (N, H, W, 1) -> (N, H, W)
if perspective_depth_batch.dim() == 3 and perspective_depth_batch.shape[-1] == 1:
add_last_dim = True
perspective_depth_batch = perspective_depth_batch.squeeze(dim=2) # (H, W, 1) -> (H, W)
if perspective_depth_batch.dim() == 2:
perspective_depth_batch = perspective_depth_batch[None] # (H, W) -> (1, H, W)
if intrinsics_batch.dim() == 2:
intrinsics_batch = intrinsics_batch[None] # (3, 3) -> (1, 3, 3)
if is_batched and intrinsics_batch.shape[0] == 1:
intrinsics_batch = intrinsics_batch.expand(perspective_depth_batch.shape[0], -1, -1) # (1, 3, 3) -> (N, 3, 3)
# Validate input shapes
if perspective_depth_batch.dim() != 3:
raise ValueError(f"Expected perspective_depth to have 2, 3, or 4 dimensions; got {perspective_depth.shape}.")
if intrinsics_batch.dim() != 3:
raise ValueError(f"Expected intrinsics to have shape (3, 3) or (N, 3, 3); got {intrinsics.shape}.")
# Image dimensions
im_height, im_width = perspective_depth_batch.shape[1:]
# Get the intrinsics parameters
fx = intrinsics_batch[:, 0, 0].view(-1, 1, 1)
fy = intrinsics_batch[:, 1, 1].view(-1, 1, 1)
cx = intrinsics_batch[:, 0, 2].view(-1, 1, 1)
cy = intrinsics_batch[:, 1, 2].view(-1, 1, 1)
# Create meshgrid of pixel coordinates
u_grid = torch.arange(im_width, device=perspective_depth.device, dtype=perspective_depth.dtype)
v_grid = torch.arange(im_height, device=perspective_depth.device, dtype=perspective_depth.dtype)
u_grid, v_grid = torch.meshgrid(u_grid, v_grid, indexing="xy")
# Expand the grids for batch processing
u_grid = u_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
v_grid = v_grid.unsqueeze(0).expand(perspective_depth_batch.shape[0], -1, -1)
# Compute the squared terms for efficiency
x_term = ((u_grid - cx) / fx) ** 2
y_term = ((v_grid - cy) / fy) ** 2
# Calculate the orthogonal (normal) depth
normal_depth = perspective_depth_batch / torch.sqrt(1 + x_term + y_term)
# Restore the last dimension if it was present in the input
if add_last_dim:
normal_depth = normal_depth.unsqueeze(-1)
# Return to original shape if input was not batched
if not is_batched:
normal_depth = normal_depth.squeeze(0)
return normal_depth
@torch.jit.script
def project_points(points: torch.Tensor, intrinsics: torch.Tensor) -> torch.Tensor:
r"""Projects 3D points into 2D image plane.
......
......@@ -376,6 +376,24 @@ class TestMathUtilities(unittest.TestCase):
iter_old_quat_rotate_inverse(q_rand, v_rand),
)
def test_depth_perspective_conversion(self):
# Create a sample perspective depth image (N, H, W)
perspective_depth = torch.tensor([[[10.0, 0.0, 100.0], [0.0, 3000.0, 0.0], [100.0, 0.0, 100.0]]])
# Create sample intrinsic matrix (3, 3)
intrinsics = torch.tensor([[500.0, 0.0, 5.0], [0.0, 500.0, 5.0], [0.0, 0.0, 1.0]])
# Convert perspective depth to orthogonal depth
orthogonal_depth = math_utils.convert_perspective_depth_to_orthogonal_depth(perspective_depth, intrinsics)
# Manually compute expected orthogonal depth based on the formula for comparison
expected_orthogonal_depth = torch.tensor(
[[[9.9990, 0.0000, 99.9932], [0.0000, 2999.8079, 0.0000], [99.9932, 0.0000, 99.9964]]]
)
# Assert that the output is close to the expected result
torch.testing.assert_close(orthogonal_depth, expected_orthogonal_depth)
if __name__ == "__main__":
run_tests()
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment