Skip to content

SCS fails for l2 regularized logistic regression after certain problem size  #252

Open
@caglarari

Description

@caglarari

Hi,

I am using SCS 3.2.3 via cvx on a windows 10 machine. I was testing SCS on simple ML problems gradually increasing the problem size to see how it performs.

I observed that after certain size (500 features, 80 000 samples), SCS starts to fail to solve l2 regularized logistic regression problems. In addition, after certain size (500 features, 40 000 samples), the final objective value SCS reports & the objective value computed using the optimum parameters returned by cvx starts to become different.

Do you know what's the issue?

Thanks,
Regards,
Caglar Ari

SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints


       SCS v3.2.3 - Splitting Conic Solver
(c) Brendan O'Donoghue, Stanford University, 2012

problem: variables n: 320001, constraints m: 480502
cones: q: soc vars: 502, qsize: 1
e: exp vars: 0, dual exp vars: 480000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
max_iters: 10000, normalize: 1, rho_x: 1.00e-06
acceleration_lookback: 10, acceleration_interval: 10
lin-sys: sparse-direct-amd-qdldl
nnz(A): 40480002, nnz(P): 0

iter | pri res | dua res | gap | obj | scale | time (s)

 0| 7.66e+03  7.01e+00  3.25e+06 -1.63e+06  1.00e-01  3.48e+01 

250| 5.05e-03 5.70e-03 1.20e-01 -1.74e+00 2.26e+02 1.12e+02
500| 1.20e-03 2.55e-03 2.26e-02 -6.75e-01 2.26e+02 1.55e+02
750| 1.54e-02 5.85e-01 1.58e-02 1.34e-03 2.26e+02 2.01e+02
1000| 1.68e-03 1.35e-03 2.80e-02 -1.21e-01 2.26e+02 2.56e+02
1250| 1.56e-03 1.36e-03 2.83e-02 -2.85e-02 2.26e+02 3.11e+02
1500| 1.52e-03 1.41e-03 2.94e-02 6.00e-02 2.26e+02 3.65e+02
1750| 1.55e-03 1.51e-03 3.14e-02 1.52e-01 2.26e+02 4.18e+02
2000| 1.64e-03 1.68e-03 3.48e-02 2.56e-01 2.26e+02 4.74e+02
2250| 1.83e-03 1.95e-03 4.03e-02 3.86e-01 2.26e+02 5.22e+02
2500| 2.17e-03 2.39e-03 4.96e-02 5.66e-01 2.26e+02 5.68e+02
2750| 2.82e-03 3.22e-03 6.67e-02 8.56e-01 2.26e+02 6.16e+02
3000| 4.36e-03 5.16e-03 1.07e-01 1.48e+00 2.26e+02 6.61e+02
3250| 1.14e-02 1.40e-02 2.89e-01 4.16e+00 2.26e+02 7.05e+02
3425| 2.62e+09 1.78e+08 1.23e+17 3.23e+17 2.26e+02 7.34e+02

status: infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s

objective = inf

Here is the matlab code I used
rand('state',0)
randn('state',0)

sizes = [500 80000];
p = sizes(1,1); % number of features
q = sizes(1,2); % number of samples

%Generate some data
w_true = randn(p,1);
X_tmp = 3*randn(p, q);
X_tmp(p,:)=1; %set the last feature to 1
%
ips = -w_true'X_tmp;
ps = (exp(ips)./(1 + exp(ips)))';
labels = 2
(rand(q,1) < ps) - 1;

X_pos = X_tmp(:,labels==1);
X_neg = X_tmp(:,labels==-1);

X = [X_pos -X_neg]; % include labels with data
clear X_tmp ips ps labels;

Xcvx=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
bcvx=[ones(kN1,1);zeros(kN2,1)];
w_init=w_true;
lam=1.0;
fobj_ref=(-bcvx'*full(Xcvx)*w_init+sum(log_sum_exp([zeros(1,q);w_init'full(Xcvx)'])))/q+ lamsum_square(w_init);

cvx_use_solver = 'scs';
lam=1.0;
%Use SCS via cvx for l2 reg logistic
cvx_begin
cvx_precision high
cvx_solver(cvx_use_solver)
variable w(p)
minimize((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q+ lamsum_square(w))
cvx_end
%
fopt_cvx=((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'full(Xcvx)'])))/q)+ lamsum_square(w);
fprintf(1,"Cvx objective value: %f\n", fopt_cvx)

%Use truncated Newton for comparison
Xh=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
Xh=full(Xh);
bh=[ones(kN1,1);-ones(kN2,1)];
lambda=1.0;
method='pcg';
ptol=1e-4;
pmaxi=1000;
tic
[w1,status1,history1] = l2_logreg(Xh,bh,lambda,method,ptol,pmaxi);
toc

%Compute the objective used in the truncated Newton code
[mx,nx] = size(Xh);
A = sparse(1:mx,1:mx,bh)Xh;
Aw = A
w1; expAw = exp(Aw); expmAw = exp(-Aw);
fobj_tnt=sum(log(1+expmAw))/mx+sum(lam.*w1.*w1);

%Compare the objective values
fprintf(1,"fobj_cvx: %f\n",fopt_cvx)
fprintf(1,"fobj_tnt: %f\n",fobj_tnt)

function [w,status,history] = l2_logreg(X,b,lambda,method,ptol,pmaxi)
%
% l2-regularized Logistic Regression Problem Solver
%
% L2_LOGREG solves problems of the following form:
%
% minimize (1/m) sum_i log(1+exp(-b_i(x_iw))) + sum_j lambda_iw_j^2,
%
% where variable is w and problem data are x_i, b_i and lambda_i.
%
% INPUT
%
% X : mxn matrix; input data. each row corresponds to each feature
% b : m vector; class label
% lambda : positive n-vector; regularization parameter
%
% method : string; search direction method type
% 'cg' : conjugate gradients method, 'pcg'
% 'pcg' : preconditioned conjugate gradients method
% 'exact': exact method (default value)
% ptol : scalar; pcg relative tolerance. if empty, use adaptive rule.
% pmaxi : scalar: pcg maximum iteration. if empty, use default value (500).
%
% OUTPUT
%
% w : n vector; classifier
% status : scalar; +1: success, -1: maxiter exceeded
% history :
% row 1) phi
% row 2) norm(gradient of phi)
% row 3) cumulative cg iterations
%
% USAGE EXAMPLE
%
% [w,status] = l2_logreg(X,b,lambda,'pcg');
%

% Written by Kwangmoo Koh [email protected]

%------------------------------------------------------------
% INITIALIZE
%------------------------------------------------------------

% NEWTON PARAMETERS
MAX_TNT_ITER = 200; % maximum (truncated) Newton iteration
ABSTOL = 1e-8; % terminates when the norm of gradient < ABSTOL

% LINE SEARCH PARAMETERS
ALPHA = 0.01; % minimum fraction of decrease in norm(gradient)
BETA = 0.5; % stepsize decrease factor
MAX_LS_ITER = 100; % maximum backtracking line search iteration

[m,n] = size(X); % problem size: m examples, n features

if(isempty(pmaxi)) pcgmaxi = 500; else pcgmaxi = pmaxi; end
if(isempty(ptol )) pcgtol = 1e-4; else pcgtol = ptol; end

% INITIALIZE
pobj = Inf; s = inf; pitr = 0 ; pflg = 0 ; prelres = 0; pcgiter = 0;
history = [];

w = zeros(n,1); dw = zeros(n,1);

A = sparse(1:m,1:m,b)*X;
%if (strcmp(method,'cg')||strcmp(method,'pcg')) A2 = A.^2; end
A2 = A.^2;

disp(sprintf('%s %15s %10s %10s %6s %10s %6s',...
'iter','primal obj','stepsize','norg(g)','p_flg','p_res','p_itr'));

%------------------------------------------------------------
% MAIN LOOP
%------------------------------------------------------------

for ntiter = 0:MAX_TNT_ITER
Aw = A*w; expAw = exp(Aw); expmAw = exp(-Aw);
g = -1/m./(1+expAw); h = 1/m./(2+expAw+expmAw);
gradphi = A'g+2lambda.*w;
normg = norm(gradphi);

phi = sum(log(1+expmAw))/m+sum(lambda.*w.*w);
disp(sprintf('%4d %15.6e %10.2e %10.2e %6d %10.2e %6d',...
            ntiter,phi,s,normg,pflg,prelres,pitr));
history = [history [phi; normg; pcgiter]];

%------------------------------------------------------------
%   STOPPING CRITERION
%------------------------------------------------------------
if (norm(gradphi) < ABSTOL) 
    status = 1;
    disp('Absolute tolerance reached.');
    disp(sprintf('%d/%d',sum(abs((A2'*h)./(2*lambda))<0.5),n));
    return;
end
%------------------------------------------------------------
%       CALCULATE NEWTON STEP
%------------------------------------------------------------
switch lower(method)
    case 'pcg'
    if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
    [dw, pflg, prelres, pitr, presvec] = ...
        pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,@Mfunc,[],[],...
            A,h,2*lambda,1./(A2'*h+2*lambda));
    if (pitr == 0) pitr = pcgmaxi; end

    case 'cg'
    if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
    [dw, pflg, prelres, pitr, presvec] = ...
        pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,[],[],[],...
            A,h,2*lambda,[]);
    if (pitr == 0) pitr = pcgmaxi; end

    otherwise % exact method
    hessphi = A'*sparse(1:m,1:m,h)*A+2*sparse(1:n,1:n,lambda);
    dw = -hessphi\gradphi;
end
pcgiter = pcgiter+pitr;
%------------------------------------------------------------
%   BACKTRACKING LINE SEARCH
%------------------------------------------------------------
s = 1;
for lsiter = 1:MAX_LS_ITER
    new_w = w+s*dw;
    newgradphi = A'*(-1/m./(1+exp(A*new_w)))+2*lambda.*new_w;
    if (norm(newgradphi)<=(1-ALPHA*s)*normg) break; end
    s = BETA*s;
end
if (lsiter == MAX_LS_ITER) break; end
w = new_w;

end
status = -1;
end
%------------------------------------------------------------
% CALL BACK FUNCTIONS FOR PCG
%------------------------------------------------------------
function y = AXfunc(x,A,h,d,p)
y = A'(h.(A*x))+d.*x;
end

function y = Mfunc(x,A,h,d,p)
y = x.*p;
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions