Commit 2ebd5edb authored by yanziz-nvidia's avatar yanziz-nvidia Committed by Kelly Guo

Updates CloudXR Teleoperation doc to include Kubernetes setup guide (#450)

# Description

Adds user guide on CloudXR Teleoperation on Kubernetes cluster.

## Type of change

- This change is a documentation update

## Screenshots

n/a

Did a proof read in a local build.

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
parent ceb02e2b
.. _cloudxr-teleoperation-cluster:
Deploying CloudXR Teleoperation on Kubernetes
=============================================
.. currentmodule:: isaaclab
This section explains how to deploy CloudXR Teleoperation for Isaac Lab on a Kubernetes (K8s) cluster.
.. _k8s-system-requirements:
System Requirements
-------------------
* **Minimum requirement**: Kubernetes cluster with a node that has at least 1 NVIDIA RTX 6000 Ada Generation / L40 GPU or equivalent
* **Recommended requirement**: Kubernetes cluster with a node that has at least 2 RTX 6000 Ada Generation / L40 GPUs or equivalent
Software Dependencies
---------------------
* ``kubectl`` on your host computer
* If you use MicroK8s, you already have ``microk8s kubectl``
* Otherwise follow the `official kubectl installation guide <https://kubernetes.io/docs/tasks/tools/#kubectl>`_
* ``helm`` on your host computer
* If you use MicroK8s, you already have ``microk8s helm``
* Otherwise follow the `official Helm installation guide <https://helm.sh/docs/intro/install/>`_
* Access to NGC public registry from your Kubernetes cluster, in particular these container images:
* ``https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-lab``
* ``https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cloudxr-runtime``
* NVIDIA GPU Operator or equivalent installed in your Kubernetes cluster to expose NVIDIA GPUs
* NVIDIA Container Toolkit installed on the nodes of your Kubernetes cluster
Preparation
-----------
On your host computer, you should have already configured ``kubectl`` to access your Kubernetes cluster. To validate, run the following command and verify it returns your nodes correctly:
.. code:: bash
kubectl get node
If you are installing this to your own Kubernetes cluster instead of using the setup described in the :ref:`k8s-appendix`, your role in the K8s cluster should have at least the following RBAC permissions:
.. code:: yaml
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
.. _k8s-installation:
Installation
------------
.. note::
The following steps are verified on a MicroK8s cluster with GPU Operator installed (see configurations in the :ref:`k8s-appendix`). You can configure your own K8s cluster accordingly if you encounter issues.
#. Download the Helm chart from NGC (get your NGC API key based on the `public guide <https://docs.nvidia.com/ngc/ngc-overview/index.html#generating-api-key>`_):
.. code:: bash
helm fetch https://helm.ngc.nvidia.com/nvidia/charts/isaac-lab-teleop-2.2.0.tgz \
--username='$oauthtoken' \
--password=<your-ngc-api-key>
#. Install and run the CloudXR Teleoperation for Isaac Lab pod in the default namespace, consuming all host GPUs:
.. code:: bash
helm upgrade --install hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz \
--set fullnameOverride=hello-isaac-teleop \
--set hostNetwork="true"
.. note::
You can remove the need for host network by creating an external LoadBalancer VIP (e.g., with MetalLB), and setting the environment variable ``NV_CXR_ENDPOINT_IP`` when deploying the Helm chart:
.. code:: yaml
# local_values.yml file example:
fullnameOverride: hello-isaac-teleop
streamer:
extraEnvs:
- name: NV_CXR_ENDPOINT_IP
value: "<your external LoadBalancer VIP>"
- name: ACCEPT_EULA
value: "Y"
.. code:: bash
# command
helm upgrade --install --values local_values.yml \
hello-isaac-teleop isaac-lab-teleop-2.2.0.tgz
#. Verify the deployment is completed:
.. code:: bash
kubectl wait --for=condition=available --timeout=300s \
deployment/hello-isaac-teleop
After the pod is running, it might take approximately 5-8 minutes to complete loading assets and start streaming.
Uninstallation
--------------
You can uninstall by simply running:
.. code:: bash
helm uninstall hello-isaac-teleop
.. _k8s-appendix:
Appendix: Setting Up a Local K8s Cluster with MicroK8s
------------------------------------------------------
Your local workstation should have the NVIDIA Container Toolkit and its dependencies installed. Otherwise, the following setup will not work.
Cleaning Up Existing Installations (Optional)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: bash
# Clean up the system to ensure we start fresh
sudo snap remove microk8s
sudo snap remove helm
sudo apt-get remove docker-ce docker-ce-cli containerd.io
# If you have snap docker installed, remove it as well
sudo snap remove docker
Installing MicroK8s
~~~~~~~~~~~~~~~~~~~
.. code:: bash
sudo snap install microk8s --classic
Installing NVIDIA GPU Operator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: bash
microk8s helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
microk8s helm repo update
microk8s helm install gpu-operator \
-n gpu-operator \
--create-namespace nvidia/gpu-operator \
--set toolkit.env[0].name=CONTAINERD_CONFIG \
--set toolkit.env[0].value=/var/snap/microk8s/current/args/containerd-template.toml \
--set toolkit.env[1].name=CONTAINERD_SOCKET \
--set toolkit.env[1].value=/var/snap/microk8s/common/run/containerd.sock \
--set toolkit.env[2].name=CONTAINERD_RUNTIME_CLASS \
--set toolkit.env[2].value=nvidia \
--set toolkit.env[3].name=CONTAINERD_SET_AS_DEFAULT \
--set-string toolkit.env[3].value=true
.. note::
If you have configured the GPU operator to use volume mounts for ``DEVICE_LIST_STRATEGY`` on the device plugin and disabled ``ACCEPT_NVIDIA_VISIBLE_DEVICES_ENVVAR_WHEN_UNPRIVILEGED`` on the toolkit, this configuration is currently unsupported, as there is no method to ensure the assigned GPU resource is consistently shared between containers of the same pod.
Verifying Installation
~~~~~~~~~~~~~~~~~~~~~~
Run the following command to verify that all pods are running correctly:
.. code:: bash
microk8s kubectl get pods -n gpu-operator
You should see output similar to:
.. code:: text
NAMESPACE NAME READY STATUS RESTARTS AGE
gpu-operator gpu-operator-node-feature-discovery-gc-76dc6664b8-npkdg 1/1 Running 0 77m
gpu-operator gpu-operator-node-feature-discovery-master-7d6b448f6d-76fqj 1/1 Running 0 77m
gpu-operator gpu-operator-node-feature-discovery-worker-8wr4n 1/1 Running 0 77m
gpu-operator gpu-operator-86656466d6-wjqf4 1/1 Running 0 77m
gpu-operator nvidia-container-toolkit-daemonset-qffh6 1/1 Running 0 77m
gpu-operator nvidia-dcgm-exporter-vcxsf 1/1 Running 0 77m
gpu-operator nvidia-cuda-validator-x9qn4 0/1 Completed 0 76m
gpu-operator nvidia-device-plugin-daemonset-t4j4k 1/1 Running 0 77m
gpu-operator gpu-feature-discovery-8dms9 1/1 Running 0 77m
gpu-operator nvidia-operator-validator-gjs9m 1/1 Running 0 77m
Once all pods are running, you can proceed to the :ref:`k8s-installation` section.
...@@ -19,4 +19,5 @@ container. ...@@ -19,4 +19,5 @@ container.
docker docker
cluster cluster
cloudxr_teleoperation_cluster
run_docker_example run_docker_example
...@@ -910,6 +910,10 @@ Known Issues ...@@ -910,6 +910,10 @@ Known Issues
This error message can be caused by shader assets authored with older versions of USD, and can This error message can be caused by shader assets authored with older versions of USD, and can
typically be ignored. typically be ignored.
Kubernetes Deployment
---------------------
For information on deploying XR Teleop for Isaac Lab on a Kubernetes cluster, see :ref:`cloudxr-teleoperation-cluster`.
.. ..
References References
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment