SCS fails for l2 regularized logistic regression after certain problem size 

Hi,

I am using SCS 3.2.3 via cvx on a windows 10 machine. I was testing SCS on simple ML problems gradually increasing the problem size to see how it performs.

I observed that after certain size (500 features, 80 000 samples), SCS starts to fail to solve l2 regularized logistic regression problems. In addition, after certain size (500 features, 40 000 samples), the final objective value SCS reports & the objective value computed using the optimum parameters returned by cvx starts to become different.

Do you know what's the issue?

Thanks,
Regards,
Caglar Ari

SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints
------------------------------------------------------------
------------------------------------------------------------------
	       SCS v3.2.3 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
------------------------------------------------------------------
problem:  variables n: 320001, constraints m: 480502
cones: 	  q: soc vars: 502, qsize: 1
	  e: exp vars: 0, dual exp vars: 480000
settings: eps_abs: 1.0e-04, eps_rel: 1.0e-04, eps_infeas: 1.0e-07
	  alpha: 1.50, scale: 1.00e-01, adaptive_scale: 1
	  max_iters: 10000, normalize: 1, rho_x: 1.00e-06
	  acceleration_lookback: 10, acceleration_interval: 10
lin-sys:  sparse-direct-amd-qdldl
	  nnz(A): 40480002, nnz(P): 0
------------------------------------------------------------------
 iter | pri res | dua res |   gap   |   obj   |  scale  | time (s)
------------------------------------------------------------------
     0| 7.66e+03  7.01e+00  3.25e+06 -1.63e+06  1.00e-01  3.48e+01 
   250| 5.05e-03  5.70e-03  1.20e-01 -1.74e+00  2.26e+02  1.12e+02 
   500| 1.20e-03  2.55e-03  2.26e-02 -6.75e-01  2.26e+02  1.55e+02 
   750| 1.54e-02  5.85e-01  1.58e-02  1.34e-03  2.26e+02  2.01e+02 
  1000| 1.68e-03  1.35e-03  2.80e-02 -1.21e-01  2.26e+02  2.56e+02 
  1250| 1.56e-03  1.36e-03  2.83e-02 -2.85e-02  2.26e+02  3.11e+02 
  1500| 1.52e-03  1.41e-03  2.94e-02  6.00e-02  2.26e+02  3.65e+02 
  1750| 1.55e-03  1.51e-03  3.14e-02  1.52e-01  2.26e+02  4.18e+02 
  2000| 1.64e-03  1.68e-03  3.48e-02  2.56e-01  2.26e+02  4.74e+02 
  2250| 1.83e-03  1.95e-03  4.03e-02  3.86e-01  2.26e+02  5.22e+02 
  2500| 2.17e-03  2.39e-03  4.96e-02  5.66e-01  2.26e+02  5.68e+02 
  2750| 2.82e-03  3.22e-03  6.67e-02  8.56e-01  2.26e+02  6.16e+02 
  3000| 4.36e-03  5.16e-03  1.07e-01  1.48e+00  2.26e+02  6.61e+02 
  3250| 1.14e-02  1.40e-02  2.89e-01  4.16e+00  2.26e+02  7.05e+02 
  3425| 2.62e+09  1.78e+08  1.23e+17  3.23e+17  2.26e+02  7.34e+02 
------------------------------------------------------------------
status:  infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
	 lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s
------------------------------------------------------------------
objective = inf
------------------------------------------------------------------

Here is the matlab code I used
rand('state',0)
randn('state',0)

sizes = [500 80000];
p = sizes(1,1); % number of features
q = sizes(1,2); % number of samples

%Generate some data
w_true = randn(p,1);    
X_tmp = 3*randn(p, q);
X_tmp(p,:)=1; %set the last feature to 1
%
ips = -w_true'*X_tmp;
ps = (exp(ips)./(1 + exp(ips)))';
labels = 2*(rand(q,1) < ps) - 1;
    
X_pos = X_tmp(:,labels==1);
X_neg = X_tmp(:,labels==-1);
    
X = [X_pos -X_neg]; % include labels with data
clear X_tmp ips ps labels;

Xcvx=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
bcvx=[ones(kN1,1);zeros(kN2,1)];
w_init=w_true;
lam=1.0;
fobj_ref=(-bcvx'*full(Xcvx)*w_init+sum(log_sum_exp([zeros(1,q);w_init'*full(Xcvx)'])))/q+ lam*sum_square(w_init);

cvx_use_solver = 'scs';
lam=1.0;
%Use SCS via cvx for l2 reg logistic
cvx_begin
    cvx_precision high
    cvx_solver(cvx_use_solver)
    variable w(p)
    minimize((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'*full(Xcvx)'])))/q+ lam*sum_square(w))
cvx_end
%
fopt_cvx=((-bcvx'*full(Xcvx)*w+sum(log_sum_exp([zeros(1,q);w'*full(Xcvx)'])))/q)+ lam*sum_square(w);
fprintf(1,"Cvx objective value: %f\n", fopt_cvx)

%Use truncated Newton for comparison
Xh=[X_pos';X_neg']; kN1=size(X_pos,2); kN2=size(X_neg,2);
Xh=full(Xh);
bh=[ones(kN1,1);-ones(kN2,1)];
lambda=1.0; 
method='pcg';
ptol=1e-4;
pmaxi=1000;
tic
[w1,status1,history1] = l2_logreg(Xh,bh,lambda,method,ptol,pmaxi);
toc

%Compute the objective used in the truncated Newton code
[mx,nx]   = size(Xh); 
A = sparse(1:mx,1:mx,bh)*Xh;
Aw = A*w1; expAw = exp(Aw); expmAw = exp(-Aw);
fobj_tnt=sum(log(1+expmAw))/mx+sum(lam.*w1.*w1);

%Compare the objective values
fprintf(1,"fobj_cvx: %f\n",fopt_cvx)
fprintf(1,"fobj_tnt: %f\n",fobj_tnt)

function [w,status,history] = l2_logreg(X,b,lambda,method,ptol,pmaxi)
%
% l2-regularized Logistic Regression Problem Solver
%
% L2_LOGREG solves problems of the following form:
%
% minimize (1/m) sum_i log(1+exp(-b_i(x_i*w))) + sum_j lambda_i*w_j^2,
% 
% where variable is w and problem data are x_i, b_i and lambda_i.
%
% INPUT
%
%  X       : mxn matrix; input data. each row corresponds to each feature
%  b       : m vector; class label
%  lambda  : positive  n-vector; regularization parameter
%
%  method  : string; search direction method type
%               'cg'   : conjugate gradients method, 'pcg'
%               'pcg'  : preconditioned conjugate gradients method
%               'exact': exact method (default value)
%  ptol    : scalar; pcg relative tolerance. if empty, use adaptive rule.
%  pmaxi   : scalar: pcg maximum iteration. if empty, use default value (500).
%
% OUTPUT
%
%  w       : n vector; classifier
%  status  : scalar; +1: success, -1: maxiter exceeded
%  history :
%            row 1) phi
%            row 2) norm(gradient of phi)
%            row 3) cumulative cg iterations
%
% USAGE EXAMPLE
%
%  [w,status] = l2_logreg(X,b,lambda,'pcg');
%

% Written by Kwangmoo Koh <deneb1@stanford.edu>

%------------------------------------------------------------
%       INITIALIZE
%------------------------------------------------------------

% NEWTON PARAMETERS
MAX_TNT_ITER    = 200;      % maximum (truncated) Newton iteration
ABSTOL          = 1e-8;     % terminates when the norm of gradient < ABSTOL

% LINE SEARCH PARAMETERS
ALPHA           = 0.01;     % minimum fraction of decrease in norm(gradient)
BETA            = 0.5;      % stepsize decrease factor
MAX_LS_ITER     = 100;      % maximum backtracking line search iteration

[m,n]   = size(X);          % problem size: m examples, n features

if(isempty(pmaxi)) pcgmaxi = 500; else pcgmaxi = pmaxi; end
if(isempty(ptol )) pcgtol = 1e-4; else pcgtol  = ptol;  end

% INITIALIZE
pobj  = Inf; s = inf; pitr  = 0 ; pflg  = 0 ; prelres = 0; pcgiter = 0;
history = [];

w = zeros(n,1); dw =  zeros(n,1);

A = sparse(1:m,1:m,b)*X;
%if (strcmp(method,'cg')||strcmp(method,'pcg')) A2 = A.^2; end
A2 = A.^2;


disp(sprintf('%s %15s %10s %10s %6s %10s %6s',...
    'iter','primal obj','stepsize','norg(g)','p_flg','p_res','p_itr'));

%------------------------------------------------------------
%               MAIN LOOP
%------------------------------------------------------------

for ntiter = 0:MAX_TNT_ITER
    Aw = A*w; expAw = exp(Aw); expmAw = exp(-Aw);
    g = -1/m./(1+expAw); h = 1/m./(2+expAw+expmAw);
    gradphi = A'*g+2*lambda.*w;
    normg = norm(gradphi);

    phi = sum(log(1+expmAw))/m+sum(lambda.*w.*w);
    disp(sprintf('%4d %15.6e %10.2e %10.2e %6d %10.2e %6d',...
                ntiter,phi,s,normg,pflg,prelres,pitr));
    history = [history [phi; normg; pcgiter]];

    %------------------------------------------------------------
    %   STOPPING CRITERION
    %------------------------------------------------------------
    if (norm(gradphi) < ABSTOL) 
        status = 1;
        disp('Absolute tolerance reached.');
        disp(sprintf('%d/%d',sum(abs((A2'*h)./(2*lambda))<0.5),n));
        return;
    end
    %------------------------------------------------------------
    %       CALCULATE NEWTON STEP
    %------------------------------------------------------------
    switch lower(method)
        case 'pcg'
        if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
        [dw, pflg, prelres, pitr, presvec] = ...
            pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,@Mfunc,[],[],...
                A,h,2*lambda,1./(A2'*h+2*lambda));
        if (pitr == 0) pitr = pcgmaxi; end

        case 'cg'
        if (isempty(ptol)) pcgtol = min(0.1,norm(gradphi)); end
        [dw, pflg, prelres, pitr, presvec] = ...
            pcg(@AXfunc,-gradphi,pcgtol,pcgmaxi,[],[],[],...
                A,h,2*lambda,[]);
        if (pitr == 0) pitr = pcgmaxi; end

        otherwise % exact method
        hessphi = A'*sparse(1:m,1:m,h)*A+2*sparse(1:n,1:n,lambda);
        dw = -hessphi\gradphi;
    end
    pcgiter = pcgiter+pitr;
    %------------------------------------------------------------
    %   BACKTRACKING LINE SEARCH
    %------------------------------------------------------------
    s = 1;
    for lsiter = 1:MAX_LS_ITER
        new_w = w+s*dw;
        newgradphi = A'*(-1/m./(1+exp(A*new_w)))+2*lambda.*new_w;
        if (norm(newgradphi)<=(1-ALPHA*s)*normg) break; end
        s = BETA*s;
    end
    if (lsiter == MAX_LS_ITER) break; end
    w = new_w;
end
status = -1;
end
%------------------------------------------------------------
%   CALL BACK FUNCTIONS FOR PCG
%------------------------------------------------------------
function y = AXfunc(x,A,h,d,p)
    y = A'*(h.*(A*x))+d.*x;
end

function y = Mfunc(x,A,h,d,p)
    y = x.*p;
end





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SCS fails for l2 regularized logistic regression after certain problem size #252

SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints

iter | pri res | dua res | gap | obj | scale | time (s)

status: infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s

objective = inf

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SCS fails for l2 regularized logistic regression after certain problem size #252

Description

SCS Output: Calling SCS 3.2.3: 480502 variables, 320001 equality constraints

iter | pri res | dua res | gap | obj | scale | time (s)

status: infeasible timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s

objective = inf

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

SCS Output:
Calling SCS 3.2.3: 480502 variables, 320001 equality constraints

status: infeasible
timings: total: 7.34e+02s = setup: 3.43e+01s + solve: 7.00e+02s
lin-sys: 4.73e+02s, cones: 1.50e+02s, accel: 4.11e+00s