• ooctipus's avatar
    Adds PBT algorithm to rl games (#3399) · 40c8d16d
    ooctipus authored
    # Description
    
    This PR introduces the Population Based Training algorithm originally
    implemented in
    
    Petrenko, Aleksei, et al. "Dexpbt: Scaling up dexterous manipulation for
    hand-arm systems with population based training." arXiv preprint
    arXiv:2305.12127 (2023).
    
    Pbt algorithm offers a alternative to scaling when increasing number of
    environment has margin effect.
    It takes idea in natural selection and stochastic property in
    rl-training to always keeps the top performing agent while replace weak
    agent with top performance to overcome the catastrophic failure, and
    improve the exploration.
    
    Training view, underperformers are rescued by best performers and later
    surpasses them and become best performers
    <img width="1078" height="509" alt="Screenshot from 2025-09-09 00-55-11"
    src="https://github.com/user-attachments/assets/34434bf1-5cb6-4956-a344-49c9969d4861"
    />
    
    
    Note:
    PBT is still at beta phase and has below limitations:
    
    1. in theory It can work with any rl algorithm but current
    implementation only works for rl-games
    2. The API could be furthur simplified without needing explicitly input
    num_policies or policy_idx, which allows for dynamic max_population, but
    it is for future work
    
    ## Screenshots
    
    Please attach before and after screenshots of the change if applicable.
    
    <!--
    Example:
    
    | Before | After |
    | ------ | ----- |
    | _gif/png before_ | _gif/png after_ |
    
    To upload images to a PR -- simply drag and drop an image while in edit
    mode and it should upload the image directly. You can then paste that
    source into the above before/after sections.
    -->
    
    ## Checklist
    
    - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
    `./isaaclab.sh --format`
    - [x] I have made corresponding changes to the documentation
    - [x] My changes generate no new warnings
    - [ ] I have added tests that prove my fix is effective or that my
    feature works
    - [x] I have updated the changelog and the corresponding version in the
    extension's `config/extension.toml` file
    - [x] I have added my name to the `CONTRIBUTORS.md` or my name already
    exists there
    
    <!--
    As you go through the checklist above, you can mark something as done by
    putting an x character in it
    
    For example,
    - [x] I have done this task
    - [ ] I have not done this task
    -->
    40c8d16d
Name
Last commit
Last update
..
extension.toml Loading commit data...