• Kelly Guo's avatar
    Fixes distributed training hanging issue (#3273) · 3edc06c0
    Kelly Guo authored
    # Description
    
    We have been hunting down a strange issue in distributed training setups
    with rendering enabled, where often the process would hang midway
    through training and causes NCCL timeouts. A workaround was discovered
    to set `app.execution.debug.forceSerial = true`, which forces serialized
    scheduling of omni graph within the same thread. This appears to have
    resolved the hanging issue and did not cause performance regressions.
    
    ## Type of change
    
    <!-- As you go through the list, delete the ones that are not
    applicable. -->
    
    - Bug fix (non-breaking change which fixes an issue)
    
    ## Checklist
    
    - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
    `./isaaclab.sh --format`
    - [x] I have made corresponding changes to the documentation
    - [x] My changes generate no new warnings
    - [ ] I have added tests that prove my fix is effective or that my
    feature works
    - [ ] I have updated the changelog and the corresponding version in the
    extension's `config/extension.toml` file
    - [ ] I have added my name to the `CONTRIBUTORS.md` or my name already
    exists there
    
    <!--
    As you go through the checklist above, you can mark something as done by
    putting an x character in it
    
    For example,
    - [x] I have done this task
    - [ ] I have not done this task
    -->
    3edc06c0
Name
Last commit
Last update
.github Loading commit data...
.vscode Loading commit data...
apps Loading commit data...
docker Loading commit data...
docs Loading commit data...
scripts Loading commit data...
source Loading commit data...
tools Loading commit data...
.dockerignore Loading commit data...
.flake8 Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.pre-commit-config.yaml Loading commit data...
CITATION.cff Loading commit data...
CONTRIBUTING.md Loading commit data...
CONTRIBUTORS.md Loading commit data...
LICENSE Loading commit data...
LICENSE-mimic Loading commit data...
README.md Loading commit data...
SECURITY.md Loading commit data...
VERSION Loading commit data...
environment.yml Loading commit data...
isaaclab.bat Loading commit data...
isaaclab.sh Loading commit data...
pyproject.toml Loading commit data...
pytest.ini Loading commit data...