Unverified Commit c530af44 authored by Pascal Roth's avatar Pascal Roth Committed by GitHub

Fixes `orbit_assets` copy and bugfixes in docker and singularity (#338)

# Description

This PR fixes three issues related to cluster deployment:

- remove the explicit copy of `orbit_assets` directory in the
`container.sh` file
- Direct bind the `logs` directory in the `run_singularity.sh` file to
make sure that the logs are copied when the run crashed
- Create an additional file in the docker to make sure it exists when
binded when the singularity is run

## Type of change

- Bug fix (non-breaking change which fixes an issue)

## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./orbit.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
Co-authored-by: 's avatarMayank Mittal <12863862+Mayankm96@users.noreply.github.com>
parent 62aeea19
......@@ -48,7 +48,7 @@ RUN ln -sf ${ISAACSIM_PATH} ${ORBIT_PATH}/_isaac_sim
RUN mkdir -p ${ISAACSIM_PATH}/kit/cache && \
mkdir -p ${DOCKER_USER_HOME}/.cache/ov && \
mkdir -p ${DOCKER_USER_HOME}/.cache/pip && \
mkdir -p ${DOCKER_USER_HOME}/.cache/nvidia/GLCache&& \
mkdir -p ${DOCKER_USER_HOME}/.cache/nvidia/GLCache && \
mkdir -p ${DOCKER_USER_HOME}/.nv/ComputeCache && \
mkdir -p ${DOCKER_USER_HOME}/.nvidia-omniverse/logs && \
mkdir -p ${DOCKER_USER_HOME}/.local/share/ov/data && \
......@@ -60,7 +60,9 @@ RUN touch /bin/nvidia-smi && \
touch /bin/nvidia-persistenced && \
touch /bin/nvidia-cuda-mps-control && \
touch /bin/nvidia-cuda-mps-server && \
touch /etc/localtime
touch /etc/localtime && \
mkdir -p /var/run/nvidia-persistenced && \
touch /var/run/nvidia-persistenced/socket
# installing Orbit dependencies
RUN ${ORBIT_PATH}/orbit.sh --install --extra
......
......@@ -36,6 +36,16 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
# load variables to set the orbit path on the cluster
source $SCRIPT_DIR/../.env
# make sure that all directories exists in cache directory
setup_directories
# copy all cache files
cp -r $CLUSTER_ISAAC_SIM_CACHE_DIR $TMPDIR
# copy orbit source code
mkdir -p "$CLUSTER_ORBIT_DIR/logs"
touch "$CLUSTER_ORBIT_DIR/logs/.keep"
cp -r $CLUSTER_ORBIT_DIR $TMPDIR
# copy singulary image to the compute node
folder="$TMPDIR/isaac-sim.sif"
......@@ -46,14 +56,6 @@ else
tar -xf $CLUSTER_SIF_PATH/orbit.tar -C $TMPDIR
fi
# make sure that all directories exists in cache directory
setup_directories
# copy all cache files
cp -r $CLUSTER_ISAAC_SIM_CACHE_DIR $TMPDIR
# copy orbit source code
cp -r $CLUSTER_ORBIT_DIR $TMPDIR
# execute command in singularity container
singularity exec \
-B $TMPDIR/docker-isaac-sim/cache/kit:${DOCKER_ISAACSIM_PATH}/kit/cache:rw \
......@@ -65,12 +67,10 @@ singularity exec \
-B $TMPDIR/docker-isaac-sim/data:${DOCKER_USER_HOME}/.local/share/ov/data:rw \
-B $TMPDIR/docker-isaac-sim/documents:${DOCKER_USER_HOME}/Documents:rw \
-B $TMPDIR/orbit:/workspace/orbit:rw \
-B $CLUSTER_ORBIT_DIR/logs:/workspace/orbit/logs:rw \
--nv --writable --containall $TMPDIR/orbit.sif \
bash -c "cd /workspace/orbit && /isaac-sim/python.sh ${CLUSTER_PYTHON_EXECUTABLE} $@"
# copy orbit logs back to host
cp -r $TMPDIR/orbit/logs $CLUSTER_ORBIT_DIR
# copy resulting cache files back to host
cp -r $TMPDIR/docker-isaac-sim $CLUSTER_ISAAC_SIM_CACHE_DIR/..
......
......@@ -140,10 +140,6 @@ case $mode in
echo "[INFO] Syncing orbit code..."
source $SCRIPT_DIR/.env
rsync -rh --exclude="*.git*" --filter=':- .dockerignore' /$SCRIPT_DIR/.. $CLUSTER_LOGIN:$CLUSTER_ORBIT_DIR
# Explicitly also sync orbit_assets as long as it is still used
if [ -f /$SCRIPT_DIR/../source/extensions/omni.isaac.orbit_assets ]; then
rsync -rh --exclude="*.git*" /$SCRIPT_DIR/../source/extensions/omni.isaac.orbit_assets $CLUSTER_LOGIN:$CLUSTER_ORBIT_DIR/source/extensions
fi
# execute job script
echo "[INFO] Executing job script..."
ssh $CLUSTER_LOGIN "cd $CLUSTER_ORBIT_DIR && sbatch $CLUSTER_ORBIT_DIR/docker/cluster/submit_job.sh" "$CLUSTER_ORBIT_DIR" "${@:2}"
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment