Updates CloudXR Teleoperation doc to include Kubernetes setup guide (#450)

# Description Adds user guide on CloudXR Teleoperation on Kubernetes cluster. ## Type of change - This change is a documentation update ## Screenshots n/a Did a proof read in a local build. ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there

Updates CloudXR Teleoperation doc to include Kubernetes setup guide (#450)
# Description Adds user guide on CloudXR Teleoperation on Kubernetes cluster. ## Type of change - This change is a documentation update ## Screenshots n/a Did a proof read in a local build. ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [ ] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already exists there
2ebd5edb · yanziz-nvidia · Kelly Guo · ceb02e2b · 2ebd5edb · 2ebd5edb
Commit 2ebd5edb authored Jun 01, 2025 by yanziz-nvidia Committed by Kelly Guo Jul 12, 2025
3 changed files
--- a/docs/source/deployment/cloudxr_teleoperation_cluster.rst
+++ b/docs/source/deployment/cloudxr_teleoperation_cluster.rst
+.. _cloudxr-teleoperation-cluster:
+
+Deploying CloudXR Teleoperation on Kubernetes
+=============================================
+
+.. currentmodule:: isaaclab
+
+This section explains how to deploy CloudXR Teleoperation for Isaac Lab on a Kubernetes (K8s) cluster.
+
+.. _k8s-system-requirements:
+
+System Requirements
+-------------------
+
+* **Minimum requirement**: Kubernetes cluster with a node that has at least 1 NVIDIA RTX 6000 Ada Generation / L40 GPU or equivalent
+* **Recommended requirement**: Kubernetes cluster with a node that has at least 2 RTX 6000 Ada Generation / L40 GPUs or equivalent
+
+Software Dependencies
+---------------------
+
+* ``kubectl`` on your host computer
+
+  * If you use MicroK8s, you already have ``microk8s kubectl``
+  * Otherwise follow the `official kubectl installation guide <https://kubernetes.io/docs/tasks/tools/#kubectl>`_
+
+* ``helm`` on your host computer
+
+  * If you use MicroK8s, you already have ``microk8s helm``
+  * Otherwise follow the `official Helm installation guide <https://helm.sh/docs/intro/install/>`_
+
+* Access to NGC public registry from your Kubernetes cluster, in particular these container images:
+
+  * ``https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-lab``
+  * ``https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cloudxr-runtime``
+
+* NVIDIA GPU Operator or equivalent installed in your Kubernetes cluster to expose NVIDIA GPUs
+* NVIDIA Container Toolkit installed on the nodes of your Kubernetes cluster
+
+Preparation
+-----------
+
+On your host computer, you should have already configured ``kubectl`` to access your Kubernetes cluster. To validate, run the following command and verify it returns your nodes correctly:
+
+.. code:: bash
+
+   kubectl get node
+
+If you are installing this to your own Kubernetes cluster instead of using the setup described in the :ref:`k8s-appendix`, your role in the K8s cluster should have at least the following RBAC permissions:
+
+.. code:: yaml
+
+   rules:
+   - apiGroups: [""]
+     resources: ["configmaps"]
+     verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
+   - apiGroups: ["apps"]
+     resources: ["deployments", "replicasets"]
+     verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
+   - apiGroups: [""]
+     resources: ["pods"]
+     verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
+   - apiGroups: [""]
+     resources: ["services"]
+     verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
+
+.. _k8s-installation:
+
+Installation
+------------
+
+.. note::
+
+   The following steps are verified on a MicroK8s cluster with GPU Operator installed (see configurations in the :ref:`k8s-appendix`). You can configure your own K8s cluster accordingly if you encounter issues.
+
+#. Download the Helm chart from NGC (get your NGC API key based on the `public guide <https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key>`_):
+
+   .. code:: bash
+
+      helm fetch https://helm.ngc.nvidia.com/nvidia/charts/isaac-lab-teleop-2.2.0.tgz \
+        --username='$oauthtoken' \
+        --password=<your-ngc-api-key>
+
+#. Install and run the CloudXR Teleoperation for Isaac Lab pod in the default namespace, consuming all host GPUs:
+
+   .. code:: bash
+
+      helm upgrade --install hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz \
+        --set fullnameOverride=hello-isaac-teleop \
+        --set hostNetwork="true"
+
+   .. note::
+
+      You can remove the need for host network by creating an external LoadBalancer VIP (e.g., with MetalLB), and setting the environment variable ``NV_CXR_ENDPOINT_IP`` when deploying the Helm chart:
+
+      .. code:: yaml
+
+         # local_values.yml file example:
+         fullnameOverride: hello-isaac-teleop
+         streamer:
+           extraEnvs:
+             - name: NV_CXR_ENDPOINT_IP
+               value: "<your external LoadBalancer VIP>"
+             - name: ACCEPT_EULA
+               value: "Y"
+
+      .. code:: bash
+
+         # command
+         helm upgrade --install --values local_values.yml \
+           hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz
+
+#. Verify the deployment is completed:
+
+   .. code:: bash
+
+      kubectl wait --for=condition=available --timeout=300s \
+        deployment/hello-isaac-teleop
+
+   After the pod is running, it might take approximately 5-8 minutes to complete loading assets and start streaming.
+
+Uninstallation
+--------------
+
+You can uninstall by simply running:
+
+.. code:: bash
+
+   helm uninstall hello-isaac-teleop
+
+.. _k8s-appendix:
+
+Appendix: Setting Up a Local K8s Cluster with MicroK8s
+------------------------------------------------------
+
+Your local workstation should have the NVIDIA Container Toolkit and its dependencies installed. Otherwise, the following setup will not work.
+
+Cleaning Up Existing Installations (Optional)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   # Clean up the system to ensure we start fresh
+   sudo snap remove microk8s
+   sudo snap remove helm
+   sudo apt-get remove docker-ce docker-ce-cli containerd.io
+   # If you have snap docker installed, remove it as well
+   sudo snap remove docker
+
+Installing MicroK8s
+~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   sudo snap install microk8s --classic
+
+Installing NVIDIA GPU Operator
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. code:: bash
+
+   microk8s helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
+   microk8s helm repo update
+   microk8s helm install gpu-operator \
+     -n gpu-operator \
+     --create-namespace nvidia/gpu-operator \
+     --set toolkit.env[0].name=CONTAINERD_CONFIG \
+     --set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
+     --set toolkit.env[1].name=CONTAINERD_SOCKET \
+     --set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
+     --set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
+     --set toolkit.env[2].value=nvidia \
+     --set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
+     --set-string toolkit.env[3].value=true
+
+.. note::
+
+   If you have configured the GPU operator to use volume mounts for ``DEVICE_LIST_STRATEGY`` on the device plugin and disabled ``ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED`` on the toolkit, this configuration is currently unsupported, as there is no method to ensure the assigned GPU resource is consistently shared between containers of the same pod.
+
+Verifying Installation
+~~~~~~~~~~~~~~~~~~~~~~
+
+Run the following command to verify that all pods are running correctly:
+
+.. code:: bash
+
+   microk8s kubectl get pods -n gpu-operator
+
+You should see output similar to:
+
+.. code:: text
+
+   NAMESPACE          NAME                                                        READY   STATUS      RESTARTS   AGE
+   gpu-operator       gpu-operator-node-feature-discovery-gc-76dc6664b8-npkdg       1/1     Running     0          77m
+   gpu-operator       gpu-operator-node-feature-discovery-master-7d6b448f6d-76fqj   1/1     Running     0          77m
+   gpu-operator       gpu-operator-node-feature-discovery-worker-8wr4n              1/1     Running     0          77m
+   gpu-operator       gpu-operator-86656466d6-wjqf4                                 1/1     Running     0          77m
+   gpu-operator       nvidia-container-toolkit-daemonset-qffh6                      1/1     Running     0          77m
+   gpu-operator       nvidia-dcgm-exporter-vcxsf                                    1/1     Running     0          77m
+   gpu-operator       nvidia-cuda-validator-x9qn4                                   0/1     Completed   0          76m
+   gpu-operator       nvidia-device-plugin-daemonset-t4j4k                          1/1     Running     0          77m
+   gpu-operator       gpu-feature-discovery-8dms9                                   1/1     Running     0          77m
+   gpu-operator       nvidia-operator-validator-gjs9m                               1/1     Running     0          77m
+
+Once all pods are running, you can proceed to the :ref:`k8s-installation` section.
--- a/docs/source/deployment/index.rst
+++ b/docs/source/deployment/index.rst
@@ -19,4 +19,5 @@ container.

  docker
  cluster
+  cloudxr_teleoperation_cluster
  run_docker_example
--- a/docs/source/how-to/cloudxr_teleoperation.rst
+++ b/docs/source/how-to/cloudxr_teleoperation.rst
@@ -910,6 +910,10 @@ Known Issues
  This error message can be caused by shader assets authored with older versions of USD, and can
  typically be ignored.

+Kubernetes Deployment
+---------------------
+
+For information on deploying XR Teleop for Isaac Lab on a Kubernetes cluster, see :ref:`cloudxr-teleoperation-cluster`.

 ..
  References