Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

piever · 2024-10-03T07:51:30Z

Bug description

I've experienced the following inconsistency between GPU and CPU gradient computation for sum(abs, _).

julia> using Zygote, CUDA

julia> rl, cplx = [0.0f0], [0.0f0 + 0.0f0im]
(Float32[0.0], ComplexF32[0.0f0 + 0.0f0im])

julia> l1(x) = sum(abs, x)
l1 (generic function with 1 method)

julia> Zygote.gradient(l1, rl)
(Float32[0.0],)

julia> Zygote.gradient(l1, cplx)
(ComplexF32[0.0f0 + 0.0f0im],)

julia> Zygote.gradient(l1, cu(rl))
(Float32[1.0],)

julia> Zygote.gradient(l1, cu(cplx))
(ComplexF32[NaN32 + NaN32*im],)

The last one is particularly problematic, as it leads to NaN values in the gradient that may be hard to understand in a more complex model.

Slack discussion

On Slack, @mcabbott explained to me the most likely cause for this:

on GPU, in the backward pass Zygote converts sum(abs, x) to sum(abs.(x)) and the broadcasting part is differentiated via ForwardDiff
ForwardDiff is responsible for the different values of the gradient

julia> abs(ForwardDiff.Dual(0,1))
Dual{Nothing}(0,1)

julia> abs(ForwardDiff.Dual(0,1) + 0im)
Dual{Nothing}(0.0,NaN)

even though DiffRules has a rule for abs (used for real inputs), for complex inputs the computation passes through hypot and the DiffRule method for the derivative of hypot in (0, 0) gives NaN

Not sure what the best fix is here. If DiffRules is open to it, maybe the easiest is to fix their hypot derivative rule?

Version info

I'm on julia 1.10.5, on a fresh environment with

(jl_FHvUua) pkg> st
Status `/tmp/jl_FHvUua/Project.toml`
  [052768ef] CUDA v5.5.2
  [e88e6eb3] Zygote v0.6.71

The text was updated successfully, but these errors were encountered:

mcabbott · 2024-10-05T12:20:03Z

Any chance that JuliaDiff/ForwardDiff.jl#669 solves this?

piever · 2024-10-07T15:18:52Z

Somehow it doesn't... Unless I messed something up, I checked out JuliaDiff/ForwardDiff.jl#669 (manually changing version ForwardDiff version number to 0.10) and still get

julia> Zygote.gradient(l1, cu(cplx))
(ComplexF32[NaN32 + NaN32*im],)

which is weird, because indeed hypot differentiates just fine:

julia> f(x) = hypot(x, 0, 0)
f (generic function with 1 method)

julia> ForwardDiff.derivative(f, 0.0)
1.0

julia> ForwardDiff.derivative(f, -0.0)
-1.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

piever commented Oct 3, 2024

mcabbott commented Oct 5, 2024

piever commented Oct 7, 2024

Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

Inconsistent values of sum(abs, _) between GPU and CPU (NaNs for zero input only on GPU) #1529

Comments

piever commented Oct 3, 2024

Bug description

Slack discussion

Version info

mcabbott commented Oct 5, 2024

piever commented Oct 7, 2024