• glvov-bdai's avatar
    Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning,... · 286e1eea
    glvov-bdai authored
    Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud (#1301)
    
    This PR adds Ray support, which enables a lot of really cool stuff by
    leveraging the existing Hydra support, including but not limited to:
    
    - Several training runs at once in parallel or consecutively with
    minimal interaction
    - Using the same training setup everywhere (on cloud and local) with
    minimal overhead
    - Tuning hyperparameters
    - Tuning hyperparameters in parallel on multiple GPUs and/or multiple
    GPU Nodes
    - Simultaneously tuning model hyperparameters for different
    environments/agents
    - Resource Isolation
    286e1eea
Name
Last commit
Last update
..
cluster_configs Loading commit data...
hyperparameter_tuning Loading commit data...
grok_cluster_with_kubectl.py Loading commit data...
launch.py Loading commit data...
mlflow_to_local_tensorboard.py Loading commit data...
submit_job.py Loading commit data...
tuner.py Loading commit data...
util.py Loading commit data...
wrap_resources.py Loading commit data...