Toward Scalable Distributed Machine Learning on Data-parallel Clusters
The rise of BigData leads to demand for machine learning (ML) for training complex models on a huge volume of input data. Thus, distributed ML is getting prevalent in both academia and industry. In distributed ML, training jobs are executed in clusters with tens to hundreds of machines through distributed data-parallel computing. The key goal of scalable distributed ML is to pursue a data-parallel computing framework with (1) efficient parameter synchronization among parallel tasks; (2) efficient parameter server (PS) and low cost of inter-machine communication; (3) scalable resource scheduling ...
(For more, see "View full record.")