-
glvov-bdai authored
Adds Ray Workflow: Multiple Run Support, Distributed Hyperparameter Tuning, and Consistent Setup Across Local/Cloud (#1301) This PR adds Ray support, which enables a lot of really cool stuff by leveraging the existing Hydra support, including but not limited to: - Several training runs at once in parallel or consecutively with minimal interaction - Using the same training setup everywhere (on cloud and local) with minimal overhead - Tuning hyperparameters - Tuning hyperparameters in parallel on multiple GPUs and/or multiple GPU Nodes - Simultaneously tuning model hyperparameters for different environments/agents - Resource Isolation