TY - GEN
T1 - Grassmann averages for scalable robust PCA
AU - Hauberg, Søren
AU - Feragen, Aasa
AU - Black, Michael J.
PY - 2014/9/24
Y1 - 2014/9/24
N2 - As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase - 'big data' implies 'big outliers'. While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors, we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to 'big noisy data.' We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie.
AB - As the collection of large datasets becomes increasingly automated, the occurrence of outliers will increase - 'big data' implies 'big outliers'. While principal component analysis (PCA) is often used to reduce the size of data, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA do not scale beyond small-to-medium sized datasets. To address this, we introduce the Grassmann Average (GA), which expresses dimensionality reduction as an average of the subspaces spanned by the data. Because averages can be efficiently computed, we immediately gain scalability. GA is inherently more robust than PCA, but we show that they coincide for Gaussian data. We exploit that averages can be made robust to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. Robustness can be with respect to vectors (subspaces) or elements of vectors, we focus on the latter and use a trimmed average. The resulting Trimmed Grassmann Average (TGA) is particularly appropriate for computer vision because it is robust to pixel outliers. The algorithm has low computational complexity and minimal memory requirements, making it scalable to 'big noisy data.' We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie.
U2 - 10.1109/CVPR.2014.481
DO - 10.1109/CVPR.2014.481
M3 - Article in proceedings
T3 - I E E E Conference on Computer Vision and Pattern Recognition. Proceedings
SP - 3810
EP - 3817
BT - Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition
PB - IEEE
T2 - IEEE Conference on Computer Vision and Pattern Recognition (2014)
Y2 - 23 June 2014 through 28 June 2014
ER -