Efficient and Portable ALS Matrix Factorization for Recommender Systems

J. Chen; J. Fang; Weifeng Liu; T Tang; X Chen; C. Yang

doi:10.1109/IPDPSW.2017.91

Efficient and Portable ALS Matrix Factorization for Recommender Systems

J. Chen, J. Fang, Weifeng Liu, T Tang, X Chen, C. Yang

eScience

9 Citationer (Scopus)

Abstract

Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.

Originalsprog	Engelsk
Titel	2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) : 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS)
Udgivelsessted	Lake Buena Vista, FL, USA
Publikationsdato	30 jun. 2017
Udgave	IEEE
Sider	409-418
ISBN (Trykt)	9780769561493
DOI	https://doi.org/10.1109/IPDPSW.2017.91
Status	Udgivet - 30 jun. 2017

Navn	IEEE International Symposium on Parallel and Distributed Processing Workshops
ISSN	2164-7062

Adgang til dokumentet

10.1109/IPDPSW.2017.91

Citationsformater

Chen, J., Fang, J., Liu, W., Tang, T., Chen, X., & Yang, C. (2017). Efficient and Portable ALS Matrix Factorization for Recommender Systems. I 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW): 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS) (IEEE udg., s. 409-418). IEEE International Symposium on Parallel and Distributed Processing Workshops https://doi.org/10.1109/IPDPSW.2017.91

Efficient and Portable ALS Matrix Factorization for Recommender Systems. / Chen, J.; Fang, J.; Liu, Weifeng et al.

2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW): 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS). IEEE. udg. Lake Buena Vista, FL, USA, 2017. s. 409-418 (IEEE International Symposium on Parallel and Distributed Processing Workshops).

Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › peer review

Chen, J, Fang, J, Liu, W, Tang, T, Chen, X & Yang, C 2017, Efficient and Portable ALS Matrix Factorization for Recommender Systems. i 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW): 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS). IEEE udg, Lake Buena Vista, FL, USA, IEEE International Symposium on Parallel and Distributed Processing Workshops, s. 409-418. https://doi.org/10.1109/IPDPSW.2017.91

Chen J, Fang J, Liu W, Tang T, Chen X, Yang C. Efficient and Portable ALS Matrix Factorization for Recommender Systems. I 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW): 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS). IEEE udg. Lake Buena Vista, FL, USA. 2017. s. 409-418. (IEEE International Symposium on Parallel and Distributed Processing Workshops). doi: 10.1109/IPDPSW.2017.91

Chen, J. ; Fang, J. ; Liu, Weifeng et al. / Efficient and Portable ALS Matrix Factorization for Recommender Systems. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW): 31st IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPS). IEEE. udg. Lake Buena Vista, FL, USA, 2017. s. 409-418 (IEEE International Symposium on Parallel and Distributed Processing Workshops).

@inproceedings{f0b99f06daba4f26896c4e2fbe2431c8,

title = "Efficient and Portable ALS Matrix Factorization for Recommender Systems",

abstract = "Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.",

author = "J. Chen and J. Fang and Weifeng Liu and T Tang and X Chen and C. Yang",

year = "2017",

month = jun,

day = "30",

doi = "10.1109/IPDPSW.2017.91",

language = "English",

isbn = "9780769561493",

series = "IEEE International Symposium on Parallel and Distributed Processing Workshops",

pages = "409--418",

booktitle = "2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)",

edition = "IEEE",

}

TY - GEN

T1 - Efficient and Portable ALS Matrix Factorization for Recommender Systems

AU - Chen, J.

AU - Fang, J.

AU - Liu, Weifeng

AU - Tang, T

AU - Chen, X

AU - Yang, C.

PY - 2017/6/30

Y1 - 2017/6/30

N2 - Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.

AB - Alternating least squares (ALS) has been proved to be an effective solver of matrix factorization for recommender systems. To speedup factorizing performance, various parallel ALS solvers have been proposed to leverage modern multi-core CPUs and many-core GPUs/MICs. Existing implementations are limited in either speed or portability (constrained to certain platforms). In this paper, we present an efficient and portable ALS solver for recommender systems. On the one hand, we diagnose the baseline implementation and observe that it lacks the awareness of the hierarchical thread organization on modern hardware. To achieve high performance, we apply the thread batching technique and three architecture-specific optimizations. On the other hand, we implement the ALS solver in OpenCL so that it can run on various platforms (CPUs, GPUs, and MICs). Based on the architectural specifics, we select a suitable code variant for each platform to efficiently mapping it to the underlying hardware. The experimental results show that our implementation performs 5.5× faster on a 16-core CPU and 21.2 faster on K20c than the baseline implementation. Our implementation also outperforms cuMF on various datasets.

U2 - 10.1109/IPDPSW.2017.91

DO - 10.1109/IPDPSW.2017.91

M3 - Article in proceedings

SN - 9780769561493

T3 - IEEE International Symposium on Parallel and Distributed Processing Workshops

SP - 409

EP - 418

BT - 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW)

CY - Lake Buena Vista, FL, USA

ER -

Efficient and Portable ALS Matrix Factorization for Recommender Systems

Abstract

Adgang til dokumentet

Fingeraftryk

Citationsformater