Coordinate descent optimizer (unregularized case) should find optimal weights for the given data

Given matrix X:

[10.0, 20.0, 30.0]
[40.0, 50.0, 60.0]
[70.0, 80.0, 90.0]
[20.0, 30.0, 10.0]

Given labels vector y:

[20.0, 30.0, 20.0, 40.0]

Put it together:

[10.0, 20.0, 30.0] [20.0]
[40.0, 50.0, 60.0] [30.0]
[70.0, 80.0, 90.0] [20.0]
[20.0, 30.0, 10.0] [40.0]

Given λ = 0.0 (unregularized case)

Formula for coordinate descent with respect to j column:

x_ij * (y_i - x_i,-j * w_-j),

where

x_ij - j-th component on i-row (e.g., if j = 0 and x_i = [10.0, 40.0, 70.0, 20.0] then x_ij is 10)
y_i - i-th label (e.g., if i = 0 then y_i = 20.0)
x_i,-j - i-th vector, where j coordinate is excluded (e.g., if i = 0 then x_i = [10.0, 20.0, 30.0], if i = 0 and j = 0 then x_i,-j = [20.0, 30.0])
w_-j - coefficients vector or weights vector, where j term is excluded

Initial weights:

 w = [0.0, 0.0, 0.0]

iteration 1:

j = 0:                         j = 1:                         j = 2:
10 * (20 - (20 * 0 + 30 * 0))  20 * (20 - (10 * 0 + 30 * 0))  30 * (20 - (10 * 0 + 20 * 0))
40 * (30 - (50 * 0 + 60 * 0))  50 * (30 - (40 * 0 + 60 * 0))  60 * (30 - (40 * 0 + 50 * 0))
70 * (20 - (80 * 0 + 90 * 0))  80 * (20 - (70 * 0 + 90 * 0))  90 * (20 - (70 * 0 + 80 * 0))
20 * (40 - (30 * 0 + 10 * 0))  30 * (40 - (20 * 0 + 10 * 0))  10 * (40 - (20 * 0 + 30 * 0))

summing up all above (column-wise), we get:

3600,  4700,  4600

so the weights at the first iteration are:

w = [3600, 4700, 4600]

iteration 2:

j = 0:                               j = 1:                               j = 2:
10 * (20 - (20 * 4700 + 30 * 4600))  20 * (20 - (10 * 3600 + 30 * 4600))  30 * (20 - (10 * 3600 + 20 * 4700))
40 * (30 - (50 * 4700 + 60 * 4600))  50 * (30 - (40 * 3600 + 60 * 4600))  60 * (30 - (40 * 3600 + 50 * 4700))
70 * (20 - (80 * 4700 + 90 * 4600))  80 * (20 - (70 * 3600 + 90 * 4600))  90 * (20 - (70 * 3600 + 80 * 4700))
20 * (40 - (30 * 4700 + 10 * 4600))  30 * (40 - (20 * 3600 + 10 * 4600))  10 * (40 - (20 * 3600 + 30 * 4700))

summing up all above (column-wise):

-81796400 -81295300 -85285400

so the weights at the second iteration:

w = [-81796400, -81295300, -85285400]

But we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case we will never get exactly -81295300 (second element of the vector w), since 32-bit floating point number has 24 bits of mantissa precision. 81295300 in binary is 100110110000111011111000100. This requires 25bits of mantissa precision to store precisely, so the binary number 100 (4 in decimal) will be cut off. Thus we should deposit some delta for comparison

To the table of contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly