tree: 4cedba2181cabed82e17791a247926b75c054ac5 [path history] [tgz]
  1. tpu/
  2. __init__.py
  3. BUILD
  4. cluster_resolver.py
  5. cluster_resolver_test.py
  6. gce_cluster_resolver.py
  7. gce_cluster_resolver_test.py
  8. kubernetes_cluster_resolver.py
  9. kubernetes_cluster_resolver_test.py
  10. README.md
  11. README_Slurm.md
  12. sagemaker_cluster_resolver.py
  13. sagemaker_cluster_resolver_test.py
  14. slurm_cluster_resolver.py
  15. slurm_cluster_resolver_test.py
  16. tfconfig_cluster_resolver.py
  17. tfconfig_cluster_resolver_test.py
  18. tpu_cluster_resolver.py
tensorflow/python/distribute/cluster_resolver/README.md

Cluster Resolvers

Cluster Resolvers are a new way of specifying cluster information for distributed execution. Built on top of existing ClusterSpec framework, Cluster Resolvers allow users to simply specify a configuration and a cluster management service and a ClusterResolver will automatically fetch the relevant information from the service and populate ClusterSpecs.

ClusterResolvers are designed to work well with ManagedTrainingSession and ClusterSpec propagation so that distributed training sessions remain robust in the face of node and network failures.