Skip to content

Coordinate descent optimizer (unregularized case) should find optimal weights for the given data

Ilya Gyrdymov edited this page Feb 16, 2019 · 3 revisions

Given matrix X:

[10.0, 20.0, 30.0]
[40.0, 50.0, 60.0]
[70.0, 80.0, 90.0]
[20.0, 30.0, 10.0]

Given labels vector y:

[20.0, 30.0, 20.0, 40.0]

Put it together:

[10.0, 20.0, 30.0] [20.0]
[40.0, 50.0, 60.0] [30.0]
[70.0, 80.0, 90.0] [20.0]
[20.0, 30.0, 10.0] [40.0]

Given λ = 0.0 (unregularized case)

Formula for coordinate descent with respect to j column:

xij * (yi - xi,-j * w-j),

where

  • xij - j-th component on i-row (e.g., if j = 0 and xi = [10.0, 40.0, 70.0, 20.0] then xij is 10)
  • yi - i-th label (e.g., if i = 0 then yi = 20.0)
  • xi,-j - i-th vector, where j coordinate is excluded (e.g., if i = 0 then xi = [10.0, 20.0, 30.0], if i = 0 and j = 0 then xi,-j = [20.0, 30.0])
  • w-j - coefficients vector or weights vector, where j term is excluded

Initial weights:

 w = [0.0, 0.0, 0.0]

iteration 1:

j = 0:                         j = 1:                         j = 2:
10 * (20 - (20 * 0 + 30 * 0))  20 * (20 - (10 * 0 + 30 * 0))  30 * (20 - (10 * 0 + 20 * 0))
40 * (30 - (50 * 0 + 60 * 0))  50 * (30 - (40 * 0 + 60 * 0))  60 * (30 - (40 * 0 + 50 * 0))
70 * (20 - (80 * 0 + 90 * 0))  80 * (20 - (70 * 0 + 90 * 0))  90 * (20 - (70 * 0 + 80 * 0))
20 * (40 - (30 * 0 + 10 * 0))  30 * (40 - (20 * 0 + 10 * 0))  10 * (40 - (20 * 0 + 30 * 0))

summing up all above (column-wise), we get:

3600,  4700,  4600

so the weights at the first iteration are:

w = [3600, 4700, 4600]

iteration 2:

j = 0:                               j = 1:                               j = 2:
10 * (20 - (20 * 4700 + 30 * 4600))  20 * (20 - (10 * 3600 + 30 * 4600))  30 * (20 - (10 * 3600 + 20 * 4700))
40 * (30 - (50 * 4700 + 60 * 4600))  50 * (30 - (40 * 3600 + 60 * 4600))  60 * (30 - (40 * 3600 + 50 * 4700))
70 * (20 - (80 * 4700 + 90 * 4600))  80 * (20 - (70 * 3600 + 90 * 4600))  90 * (20 - (70 * 3600 + 80 * 4700))
20 * (40 - (30 * 4700 + 10 * 4600))  30 * (40 - (20 * 3600 + 10 * 4600))  10 * (40 - (20 * 3600 + 30 * 4700))

summing up all above (column-wise):

-81796400 -81295300 -85285400

so the weights at the second iteration:

w = [-81796400, -81295300, -85285400]

But we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case we will never get exactly -81295300 (second element of the vector w), since 32-bit floating point number has 24 bits of mantissa precision. 81295300 in binary is 100110110000111011111000100. This requires 25bits of mantissa precision to store precisely, so the binary number 100 (4 in decimal) will be cut off. Thus we should deposit some delta for comparison