Scaling Distributed ML with the Parameter Server

Paper Link

Authors

Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su

Summary

Proposed a new general purpose framework called Parameter Server for distributed machine learning
Partitioned cluster nodes into server group and worker group and use scheduler to coordinate their communication
Used various consistency model and asynchronous communication to provide flexibility, scalability and fault tolerance
The proposed framework is general purpose, which means it can be applied to arbitrary ML algorithms
Considered the characteristics of ML parameters (i.e. vectors, and linear algebra operations between matrices) in their framework and thus enable efficient communication and reduced bandwidth
Since the cluster nodes are usually commodity hardware, they used an optimized data replication architecture to replicate data across server nodes to provide fault tolerance and durability

Notes about scalable Machine Learning¹

What opportunities Parameter Servers provides?
- Users do not need to use complicated distributed programming tools to build distributed ML application
  - How?
    - Notice that the iterative-convergent nature of ML algorithms
      - Explore relaxed consistency models for controlled async parallel execution of ML programs to improve overall system efficiency
How to deal with Big data in ML computation?
- This paper’s approach: Push/Poll gradients between server nodes and worker nodes - arguably less convenient since users should explicitly decide which parts to be communicated - more aggressive efficiency gain
  - Other ways: Distributed Shared Memory (DSM) that allow programmers to treat the entire cluster as a single memory pool and each node can read/write to any model parameter via programming interface
    - Benefits: facilitates implementation without worrying low-level communication
Scalable ML categories
- General-purpose, programmable libs or frameworks - user-programmable - extensible to handle arbitrary ML applications - e.g. GraphLab, Parameter Server
  - Special purpose solvers for specific ML applications
    - non-programmable
    - restricted to predefined ML applications
      - e.g.
        
        CCD++ for Matrix Factorization;
        
        Vowpal Wabbit for regression/classification via Stochastic optimization;
        
        Yahoo LDA and Google plda for topic modeling
Compare to Hadoop/Spark
- Hadoop or Spark do not have superior ML algorithm performance compared to proposed frameworks like Parameter Server, GraphLab
- Hadoop or Spark do not support flexible consistency model and enforce strict consistency; But they do ensure program portability, reliability and fault tolerance

Analysis of High-Performance Distributed ML at Scale through Parameter Server Consistency Models ↩

Scaling Distributed ML with the Parameter Server

Paper Link#

Authors#

Summary#

Notes about scalable Machine Learning1#

Footnotes#

Paper Link

Authors

Summary

Notes about scalable Machine Learning¹

Footnotes