Scalable robust principal component analysis using Grassmann averages

Søren Hauberg; Aasa Feragen; Raffi Enficiaud; Michael J. Black

doi:10.1109/TPAMI.2015.2511743

Scalable robust principal component analysis using Grassmann averages

Søren Hauberg, Aasa Feragen, Raffi Enficiaud, Michael J. Black

Datalogisk Institut

16 Citationer (Scopus)

Abstract

In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

Originalsprog	Engelsk
Tidsskrift	IEEE Transactions on Pattern Analysis and Machine Intelligence
Vol/bind	38
Udgave nummer	11
Sider (fra-til)	2298-2311
Antal sider	14
ISSN	0162-8828
DOI	https://doi.org/10.1109/TPAMI.2015.2511743
Status	Udgivet - nov. 2016

Emneord

Gaussian processes
data handling
image processing
principal component analysis
GA
Gaussian data
Grassmann Average
Grassmann manifold
RGA
Robust Grassmann Average
TGA
Trimmed Grassmann average
average subspace
computer vision
data size
data verification
grassmann averages
one dimensional subspace
pixel outliers
robust PCA
scalable robust principal component analysis
simple algorithm
zero mean dataset
Approximation methods
Complexity theory
Computer vision
Estimation
Manifolds
Principal component analysis
Robustness
Dimensionality reduction
robust principal component analysis
subspace estimation

Adgang til dokumentet

10.1109/TPAMI.2015.2511743

Scalable Robust Principal Component Analysis Using Grassmann AveragesForlagets udgivne version, 1,59 MB

Citationsformater

@article{1c084208cc52452fa33e14688d262ce7,

title = "Scalable robust principal component analysis using Grassmann averages",

abstract = "In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.",

keywords = "Gaussian processes, data handling, image processing, principal component analysis, GA, Gaussian data, Grassmann Average, Grassmann manifold, RGA, Robust Grassmann Average, TGA, Trimmed Grassmann average, average subspace, computer vision, data size, data verification, grassmann averages, one dimensional subspace, pixel outliers, robust PCA, scalable robust principal component analysis, simple algorithm, zero mean dataset, Approximation methods, Complexity theory, Computer vision, Estimation, Manifolds, Principal component analysis, Robustness, Dimensionality reduction, robust principal component analysis, subspace estimation",

author = "S{\o}ren Hauberg and Aasa Feragen and Raffi Enficiaud and Black, {Michael J.}",

year = "2016",

month = nov,

doi = "10.1109/TPAMI.2015.2511743",

language = "English",

volume = "38",

pages = "2298--2311",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "Institute of Electrical and Electronics Engineers",

number = "11",

}

TY - JOUR

T1 - Scalable robust principal component analysis using Grassmann averages

AU - Hauberg, Søren

AU - Feragen, Aasa

AU - Enficiaud, Raffi

AU - Black, Michael J.

PY - 2016/11

Y1 - 2016/11

N2 - In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

AB - In large datasets, manual data verification is impossible, and we must expect the number of outliers to increase with data size. While principal component analysis (PCA) can reduce data size, and scalable solutions exist, it is well-known that outliers can arbitrarily corrupt the results. Unfortunately, state-of-the-art approaches for robust PCA are not scalable. We note that in a zero-mean dataset, each observation spans a one-dimensional subspace, giving a point on the Grassmann manifold. We show that the average subspace corresponds to the leading principal component for Gaussian data. We provide a simple algorithm for computing this Grassmann Average (GA), and show that the subspace estimate is less sensitive to outliers than PCA for general distributions. Because averages can be efficiently computed, we immediately gain scalability. We exploit robust averaging to formulate the Robust Grassmann Average (RGA) as a form of robust PCA. The resulting Trimmed Grassmann Average (TGA) is appropriate for computer vision because it is robust to pixel outliers. The algorithm has linear computational complexity and minimal memory requirements. We demonstrate TGA for background modeling, video restoration, and shadow removal. We show scalability by performing robust PCA on the entire Star Wars IV movie; a task beyond any current method. Source code is available online.

KW - Gaussian processes

KW - data handling

KW - image processing

KW - principal component analysis

KW - GA

KW - Gaussian data

KW - Grassmann Average

KW - Grassmann manifold

KW - RGA

KW - Robust Grassmann Average

KW - TGA

KW - Trimmed Grassmann average

KW - average subspace

KW - computer vision

KW - data size

KW - data verification

KW - grassmann averages

KW - one dimensional subspace

KW - pixel outliers

KW - robust PCA

KW - scalable robust principal component analysis

KW - simple algorithm

KW - zero mean dataset

KW - Approximation methods

KW - Complexity theory

KW - Computer vision

KW - Estimation

KW - Manifolds

KW - Principal component analysis

KW - Robustness

KW - Dimensionality reduction

KW - robust principal component analysis

KW - subspace estimation

U2 - 10.1109/TPAMI.2015.2511743

DO - 10.1109/TPAMI.2015.2511743

M3 - Journal article

C2 - 26731634

SN - 0162-8828

VL - 38

SP - 2298

EP - 2311

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 11

ER -