Efficient and Portable ALS Matrix Factorization for Recommender Systems

J. Chen, J. Fang, Weifeng Liu, T Tang, X Chen, C. Yang

9 Citationer (Scopus)

Abstract

Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.

OriginalsprogEngelsk
Titel2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) : 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS)
UdgivelsesstedLake Buena Vista, FL, USA
Publikationsdato30 jun. 2017
UdgaveIEEE
Sider409-418
ISBN (Trykt)9780769561493
DOI
StatusUdgivet - 30 jun. 2017
NavnIEEE International Symposium on Parallel and Distributed Processing Workshops
ISSN2164-7062

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient and Portable ALS Matrix Factorization for Recommender Systems'. Sammen danner de et unikt fingeraftryk.

Citationsformater