Efficient and Portable ALS Matrix Factorization for Recommender Systems

J. Chen, J. Fang, Weifeng Liu, T Tang, X Chen, C. Yang

9 Citations (Scopus)

Abstract

Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.

Original languageEnglish
Title of host publication2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) : 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS)
Place of PublicationLake Buena Vista, FL, USA
Publication date30 Jun 2017
EditionIEEE
Pages409-418
ISBN (Print)9780769561493
DOIs
Publication statusPublished - 30 Jun 2017
SeriesIEEE International Symposium on Parallel and Distributed Processing Workshops
ISSN2164-7062

Fingerprint

Dive into the research topics of 'Efficient and Portable ALS Matrix Factorization for Recommender Systems'. Together they form a unique fingerprint.

Cite this