-
-
Notifications
You must be signed in to change notification settings - Fork 32
Coordinate descent optimizer (unregularized case) should find optimal weights for the given data
Ilya Gyrdymov edited this page Feb 16, 2019
·
3 revisions
[10.0, 20.0, 30.0]
[40.0, 50.0, 60.0]
[70.0, 80.0, 90.0]
[20.0, 30.0, 10.0]
[20.0, 30.0, 20.0, 40.0]
[10.0, 20.0, 30.0] [20.0]
[40.0, 50.0, 60.0] [30.0]
[70.0, 80.0, 90.0] [20.0]
[20.0, 30.0, 10.0] [40.0]
Formula for coordinate descent with respect to j column:
xij * (yi - xi,-j * w-j),
where
- xij - j-th component on i-row (e.g., if j = 0 and xi = [10.0, 40.0, 70.0, 20.0] then xij is 10)
- yi - i-th label (e.g., if i = 0 then yi = 20.0)
- xi,-j - i-th vector, where j coordinate is excluded (e.g., if i = 0 then xi = [10.0, 20.0, 30.0], if i = 0 and j = 0 then xi,-j = [20.0, 30.0])
- w-j - coefficients vector or weights vector, where j term is excluded
w = [0.0, 0.0, 0.0]
j = 0: j = 1: j = 2:
10 * (20 - (20 * 0 + 30 * 0)) 20 * (20 - (10 * 0 + 30 * 0)) 30 * (20 - (10 * 0 + 20 * 0))
40 * (30 - (50 * 0 + 60 * 0)) 50 * (30 - (40 * 0 + 60 * 0)) 60 * (30 - (40 * 0 + 50 * 0))
70 * (20 - (80 * 0 + 90 * 0)) 80 * (20 - (70 * 0 + 90 * 0)) 90 * (20 - (70 * 0 + 80 * 0))
20 * (40 - (30 * 0 + 10 * 0)) 30 * (40 - (20 * 0 + 10 * 0)) 10 * (40 - (20 * 0 + 30 * 0))
summing up all above (column-wise), we get:
3600, 4700, 4600
so the weights at the first iteration are:
w = [3600, 4700, 4600]
j = 0: j = 1: j = 2:
10 * (20 - (20 * 4700 + 30 * 4600)) 20 * (20 - (10 * 3600 + 30 * 4600)) 30 * (20 - (10 * 3600 + 20 * 4700))
40 * (30 - (50 * 4700 + 60 * 4600)) 50 * (30 - (40 * 3600 + 60 * 4600)) 60 * (30 - (40 * 3600 + 50 * 4700))
70 * (20 - (80 * 4700 + 90 * 4600)) 80 * (20 - (70 * 3600 + 90 * 4600)) 90 * (20 - (70 * 3600 + 80 * 4700))
20 * (40 - (30 * 4700 + 10 * 4600)) 30 * (40 - (20 * 3600 + 10 * 4600)) 10 * (40 - (20 * 3600 + 30 * 4700))
summing up all above (column-wise):
-81796400 -81295300 -85285400
so the weights at the second iteration:
w = [-81796400, -81295300, -85285400]
But we cannot get exactly the same vector as above due to fuzzy arithmetic with floating point numbers. In our case
we will never get exactly -81295300
(second element of the vector w), since 32-bit floating point number has 24 bits
of mantissa precision. 81295300
in binary is 100110110000111011111000100
. This requires 25bits of mantissa
precision to store precisely, so the binary number 100
(4
in decimal) will be cut off. Thus we should deposit
some delta for comparison