Skip to content

Commit 10c9760

Browse files
committed
IALM implementation (#7)
* Implemented IALM * Updated README to reflect changes
1 parent 6faa1ed commit 10c9760

File tree

8 files changed

+175
-46
lines changed

8 files changed

+175
-46
lines changed

.github/workflows/ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
runs-on: ubuntu-24.04
3030
strategy:
3131
matrix:
32-
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
32+
python-version: ["3.10", "3.11", "3.12", "3.13"]
3333
steps:
3434
- uses: actions/checkout@v4
3535
- uses: actions/setup-python@v5
@@ -46,7 +46,7 @@ jobs:
4646
runs-on: ubuntu-24.04
4747
strategy:
4848
matrix:
49-
python-version: ["3.9", "3.10", "3.11", "3.12", "3.13"]
49+
python-version: ["3.10", "3.11", "3.12", "3.13"]
5050
steps:
5151
- uses: actions/checkout@v4
5252
- uses: actions/setup-python@v5

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.vscode/
2+
13
# Byte-compiled / optimized / DLL files
24
__pycache__/
35
*.py[cod]

README.md

Lines changed: 30 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,40 @@
1-
# Robust principle component analysis for Python
1+
# Robust principal component analysis for Python
22

3-
Work in progress. Coming soon here and on PyPI.
3+
This package provides algorithms to solve the Robust Principal Component Analysis (RPCA) problem, as presented by Candès et al.[[1]](#candes2011).
4+
Currently, a single algorithm is implemented: it solves the Principal Component Pursuit (PCP) convex relaxation of RPCA from the same paper, using the Inexact Augmented Lagrange Multiplier (IALM) method from Lin et al.[[2]](#lin2011)[[3]](#lin2013).
5+
6+
## Example
7+
```python
8+
from pyrpca import rpca_pcp_ialm
9+
10+
# given an m x n data matrix
11+
data = ...
12+
13+
# decide on sparsity factor.
14+
# this parameter is also commonly known as `lambda`.
15+
sparsity_factor = 1.0 / numpy.sqrt(max(data.shape))
16+
17+
# run the ialm algorithm.
18+
low_rank, sparse = rpca_pcp_ialm(data, sparsity_factor)
19+
```
420

521
## Installing
622
```shell
723
pip install pyrpca
824
```
925

10-
## Example
11-
TODO
26+
## Feature requests and contributing
27+
Pull requests and feature requests are welcome. The current version is minimal and suits my personal needs, but feel free to make suggestions. Even if the repository looks inactive, I will still respond :)
28+
29+
## References
30+
1. <a name="candes2011"></a> [Emmanuel J. Candès, Xiaodong Li, Yi Ma, John Wright. Robust principal component analysis? Association for Computing Machinery 2011.](https://doi.org/10.1145/1970392.1970395) (preprint on [arXiv](https://doi.org/10.48550/arXiv.0912.3599))
31+
2. <a name="lin2011"></a> [Zhouchen Lin, Risheng Liu, Zhixun Su. Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation. arXiv 2011.](https://doi.org/10.48550/arXiv.1109.0367)
32+
3. <a name="lin2013"></a> [Zhouchen Lin, Minming Chen, Yi Ma. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices, V3. arXiv 2013.](https://doi.org/10.48550/arXiv.1009.5055)
1233

13-
## Acknowledgements
14-
Appreciation is due to various other Python implementations of RPCA; below is a non-exhaustive list.
15-
The code in this project has been loosely inspired by these works.
34+
## Acknowledgements
35+
Appreciation is due to various other Python implementations of RPCA that served as inspiration for this project. Below is a non-exhaustive list:
1636

17-
- https://github.com/2020leon/rpca
18-
- https://github.com/dganguli/robust-pca
19-
- https://github.com/weilinear/PyRPCA
37+
- https://github.com/2020leon/rpca
38+
- https://github.com/dganguli/robust-pca
39+
- https://github.com/weilinear/PyRPCA
2040
- https://github.com/loiccoyle/RPCA

pyproject.toml

Lines changed: 10 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,38 @@
11
[build-system]
2-
requires = [
3-
"setuptools >= 78",
4-
"setuptools-git-versioning >=2.1, <3",
5-
]
2+
requires = ["setuptools >= 78", "setuptools-git-versioning >=2.1, <3"]
63
build-backend = "setuptools.build_meta"
74

85
[tool.setuptools-git-versioning]
96
enabled = true
107

118
[project]
129
name = "pyrpca"
13-
description = "Robust principle component analysis for Python."
10+
description = "Robust principal component analysis for Python."
1411
dynamic = ["version"]
1512
readme = "README.md"
16-
requires-python = ">=3.9,<4"
13+
requires-python = ">=3.10,<4"
1714
classifiers = [
1815
"License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
1916
]
20-
dependencies = [
21-
"numpy >=2.0.2, <3",
22-
]
23-
keywords = [
24-
"rpca",
25-
"robust pca",
26-
"robust principal component analysis",
27-
]
28-
authors = [
29-
{name = "Aart Stuurman", email = "[email protected]"},
30-
]
31-
maintainers = [
32-
{name = "Aart Stuurman", email = "[email protected]"},
33-
]
34-
license = {file = "LICENSE"}
17+
dependencies = ["numpy >= 2.0.2, < 3", "scipy >= 1.15.2"]
18+
keywords = ["rpca", "robust pca", "robust principal component analysis"]
19+
authors = [{ name = "Aart Stuurman", email = "[email protected]" }]
20+
maintainers = [{ name = "Aart Stuurman", email = "[email protected]" }]
21+
license = { file = "LICENSE" }
3522

3623
[project.optional-dependencies]
3724
dev = [
3825
"ruff == 0.11.2",
3926
"mypy == 1.15.0",
4027
"pytest == 8.3.5",
28+
"scipy-stubs >= 1.15.2",
4129
]
4230

4331
[project.urls]
4432
homepage = "https://github.com/surgura/PyRPCA"
4533

4634
[tool.setuptools]
47-
package-dir = {"pyrpca" = "pyrpca"}
35+
package-dir = { "pyrpca" = "pyrpca" }
4836

4937
[tool.mypy]
5038
strict = true

pyrpca/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .pcp_ialm import rpca_pcp_ialm
2+
3+
__all__ = ["rpca_pcp_ialm"]

pyrpca/pcp_ialm.py

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
import numpy.typing as npt
2+
from typing import Tuple
3+
import numpy as np
4+
from numpy.linalg import norm
5+
from scipy.linalg import svd
6+
7+
8+
def rpca_pcp_ialm(
9+
observations: npt.ArrayLike,
10+
sparsity_factor: float,
11+
max_iter: int = 1000,
12+
mu: float | None = None,
13+
mu_upper_bound: float | None = None,
14+
rho: float = 1.5,
15+
tol: float = 1e-7,
16+
verbose: bool = True,
17+
) -> Tuple[npt.ArrayLike, npt.ArrayLike]:
18+
"""
19+
Solve the Principal Component Pursuit (PCP) convex relaxation of Robust PCA using the Inexact Augmented Lagrange Multiplier (IALM) method.
20+
21+
See README for algorithmic details and references.
22+
23+
Mu is updated every loop by multiplying it by `rho` until reaching `mu_upper_bound`.
24+
25+
Parameters:
26+
observations: The m x n input matrix to decompose ('D' in the IALM paper).
27+
sparsity_factor: Weight on the sparse term in the objective ('lambda' in the IALM paper).
28+
max_iter: Maximum number of iterations to perform.
29+
mu: Initial value for the penalty parameter. If None, defaults to 1/spectral norm of observations.
30+
mu_upper_bound: Maximum allowed value for `mu`. If None, defaults to `mu * 1e7`.
31+
rho: Multiplicative factor to increase `mu` in each iteration.
32+
tol: Tolerance for stopping criterion (relative Frobenius norm of the residual).
33+
verbose: If True, print status and debug information during optimization.
34+
35+
Returns:
36+
low_rank_component: The recovered low-rank matrix ('A' in the IALM paper).
37+
sparse_component: The recovered sparse matrix ('E' in the IALM paper).
38+
"""
39+
if mu is None:
40+
mu = float(1.25 / norm(observations, ord=2))
41+
if mu_upper_bound is None:
42+
mu_upper_bound = mu * 1e7
43+
44+
norm_fro_obs = norm(observations, ord="fro")
45+
46+
dual = observations / np.maximum(
47+
norm(observations, ord=2), norm(observations, ord=np.inf) / sparsity_factor
48+
)
49+
sparse = np.zeros_like(observations)
50+
51+
i = 0
52+
while True:
53+
# compute next iteration of a
54+
u, s, v = svd(observations - sparse + 1.0 / mu * dual, full_matrices=False)
55+
s_thresholded = np.maximum(s - 1.0 / mu, 0)
56+
low_rank = (u * s_thresholded) @ v
57+
58+
# compute next iteration of e
59+
residual_for_sparse = observations - low_rank + 1.0 / mu * dual
60+
sparse = np.sign(residual_for_sparse) * np.maximum(
61+
np.abs(residual_for_sparse) - sparsity_factor / mu, 0
62+
)
63+
64+
# calculate error
65+
residual = observations - low_rank - sparse
66+
err = norm(residual, ord="fro") / norm_fro_obs
67+
68+
i += 1
69+
70+
if verbose:
71+
print(f"iter {i:<4} | err {err:<25} | mu {mu:<25}")
72+
73+
if err < tol:
74+
if verbose:
75+
print("Finished optimization. Error smaller than tolerance.")
76+
break
77+
if i == max_iter:
78+
if verbose:
79+
print("Finized optimization. Max iterations reached.")
80+
break
81+
82+
# update dual and mu
83+
dual = dual + mu * (residual)
84+
mu = min(mu * rho, mu_upper_bound)
85+
return low_rank, sparse

tests/test_dummy.py

Lines changed: 0 additions & 12 deletions
This file was deleted.

tests/test_pcp_ialm.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
import numpy as np
2+
from pyrpca import rpca_pcp_ialm
3+
from numpy.linalg import norm
4+
from scipy.sparse import random as sparse_random
5+
6+
7+
def test_rpca_separates_low_rank_and_sparse():
8+
np.random.seed(0)
9+
m, n, rank = 500, 400, 5
10+
11+
# create low-rank matrix A
12+
u = np.random.randn(m, rank)
13+
v = np.random.randn(rank, n)
14+
low_rank = u @ v
15+
16+
# create sparse matrix E
17+
sparse = sparse_random(
18+
m, n, density=0.1, format="csr", data_rvs=np.random.randn
19+
).toarray()
20+
21+
# create observation matrix
22+
observations = low_rank + sparse
23+
24+
# Run RPCA
25+
low_rank_recovered, sparse_recovered = rpca_pcp_ialm(
26+
observations,
27+
sparsity_factor=1.0 / np.sqrt(max(observations.shape)),
28+
)
29+
30+
# check that the reconstruction is close
31+
reconstruction_error = norm(
32+
observations - (low_rank_recovered + sparse_recovered), ord="fro"
33+
) / norm(observations, ord="fro")
34+
assert reconstruction_error < 1e-6, (
35+
f"Reconstruction error too high: {reconstruction_error}"
36+
)
37+
38+
# check that recovered matrices are low rank and sparse
39+
approx_rank = np.linalg.matrix_rank(low_rank_recovered, tol=1e-3)
40+
sparsity = np.count_nonzero(sparse_recovered) / sparse_recovered.size
41+
42+
assert approx_rank <= rank + 2, f"Recovered A not low rank: {approx_rank}"
43+
assert sparsity < 0.2, f"Recovered E not sparse: sparsity={sparsity}"

0 commit comments

Comments
 (0)