Paper Link
Authors
Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, Bor-Yiing Su
Summary
- Proposed a new general purpose framework called Parameter Server for distributed machine learning
- Partitioned cluster nodes into server group and worker group and use scheduler to coordinate their communication
- Used various consistency model and asynchronous communication to provide flexibility, scalability and fault tolerance
- The proposed framework is general purpose, which means it can be applied to arbitrary ML algorithms
- Considered the characteristics of ML parameters (i.e. vectors, and linear algebra operations between matrices) in their framework and thus enable efficient communication and reduced bandwidth
- Since the cluster nodes are usually commodity hardware, they used an optimized data replication architecture to replicate data across server nodes to provide fault tolerance and durability
Notes about scalable Machine Learning1
- What opportunities Parameter Servers provides?
- Users do not need to use complicated distributed programming tools to build distributed ML application
- How?
- Notice that the iterative-convergent nature of ML algorithms
- Explore relaxed consistency models for controlled async parallel execution of ML programs to improve overall system efficiency
- Notice that the iterative-convergent nature of ML algorithms
- How?
- Users do not need to use complicated distributed programming tools to build distributed ML application
- How to deal with Big data in ML computation?
- This paper’s approach: Push/Poll gradients between server nodes and worker nodes - arguably less convenient since users should explicitly decide which parts to be communicated - more aggressive efficiency gain
- Other ways: Distributed Shared Memory (DSM) that allow programmers to treat the entire cluster as a single memory pool and each node can read/write to any model parameter via programming interface
- Benefits: facilitates implementation without worrying low-level communication
- Other ways: Distributed Shared Memory (DSM) that allow programmers to treat the entire cluster as a single memory pool and each node can read/write to any model parameter via programming interface
- This paper’s approach: Push/Poll gradients between server nodes and worker nodes - arguably less convenient since users should explicitly decide which parts to be communicated - more aggressive efficiency gain
- Scalable ML categories
- General-purpose, programmable libs or frameworks - user-programmable - extensible to handle arbitrary ML applications - e.g. GraphLab, Parameter Server
- Special purpose solvers for specific ML applications
- non-programmable
- restricted to predefined ML applications
- e.g.
- CCD++ for Matrix Factorization;
- Vowpal Wabbit for regression/classification via Stochastic optimization;
- Yahoo LDA and Google plda for topic modeling
- e.g.
- Special purpose solvers for specific ML applications
- General-purpose, programmable libs or frameworks - user-programmable - extensible to handle arbitrary ML applications - e.g. GraphLab, Parameter Server
- Compare to Hadoop/Spark
- Hadoop or Spark do not have superior ML algorithm performance compared to proposed frameworks like Parameter Server, GraphLab
- Hadoop or Spark do not support flexible consistency model and enforce strict consistency; But they do ensure program portability, reliability and fault tolerance