Running Distributed TensorFlow on DC/OS - Kevin Klues, Mesosphere, Inc. Oct. 31, 2017

from The Linux Foundation·

Running Distributed TensorFlow on DC/OS - Kevin Klues, Mesosphere, Inc. Running distributed TensorFlow is challenging, especially if you want to train large models on your own infrastructure. In this talk, Kevin Klues and Sam Pringle will present an open source TensorFlow framework for distributed training on DC/OS. This framework addresses several challenges associated with distributed TensorFlow, and they hope it will make life much easier for anyone doing machine learning with large models/datasets. Kevin will introduce TensorFlow on Mesos and DC/OS, and Sam will give a live demo of the framework. About Kevin Klues Kevin Klues is a Tech Lead …



Running Distributed TensorFlow on DC/OS - Kevin Klues, Mesosphere, Inc. Running distributed TensorFlow is challenging, especially if you want to train large models on your own infrastructure. In this talk, Kevin Klues and Sam Pringle will present an open source TensorFlow framework for distributed training on DC/OS. This framework addresses several challenges associated with distributed TensorFlow, and they hope it will make life much easier for anyone doing machine learning with large models/datasets. Kevin will introduce TensorFlow on Mesos and DC/OS, and Sam will give a live demo of the framework. About Kevin Klues Kevin Klues is a Tech Lead Manager at Mesosphere running the DC/OS ClusterOps team. Since joining Mesosphere, Kevin has been involved in the design and implementation of a number of Mesos’s core subsystems, including GPU isolation, Pods, the Mesos CLI and Attach/Exec support. He now leads a team of 10 Engineers working on everything from the DC/OS CLI to the installation, logging, backup/restore, and gathering / reporting of metrics and diagnostics of a running DC/OS cluster. When not working, you can usually find Kevin on a snowboard or up in the mountains in some capacity or another.