Test with Yota, too #105

mcabbott · 2022-08-19T05:51:03Z

Does not close #96, in fact this surely makes tests slower. But perhaps it's good to get something besides Zygote running?

mcabbott

Paging @dfdx about these errors.

test/destructure.jl

mcabbott · 2022-08-19T20:28:24Z

docs/src/index.md

+Unfortunately this example doesn't actually run right now. This is the error:
+```
+julia> loss, (∇function, ∇model, ∇image) = Yota.grad(model, image) do m, x
+         sum(m(x))
+       end;
+┌ Error: Failed to compile rrule for #233(Chain(Conv((3, 3), 64 => 64, pad=1, bias=false), BatchNorm(64, relu), Conv((3, 3), 64 => 64, pad=1, bias=false), BatchNorm(64)),), extract details via:
+│ 	(f, args) = Yota.RRULE_VIA_AD_STATE[]
+└ @ Yota ~/.julia/packages/Yota/GIFMf/src/cr_api.jl:160
+ERROR: No deriative rule found for op %3 = getfield(%1, :x)::Array{Float32, 4} , try defining it using 


Maybe this should stay WIP for a bit.

Thanks for pinging me! I'll be able to check out these errors during the weekend.

dfdx · 2022-08-21T23:17:49Z

Some of the broken tests are already fixed on main, others need some adjustment (e.g. ZeroTangent() vs. NoTangent()), but I think I'll be able to fix them in a couple of days.

One think that I seem to be missing is why destructure/Restructure needs to be differentiable. I'd expect a training loop to like this:

model = MyModel()
state = Optimisers.setup(Optimisers.Adam(), model) 
input = ...
loss = ...
for i=1:N
    grad = gradient(loss, model, input)                                                           # differentiable part
    state, model = Optimisers.update(state, model, grad)  # at every step      # non necessarily differentiable
end

state points to the trainable parameters of MyModel() and lets us update them, but never steps into gradient calculation. Yet, you test things like Yota_gradient(x -> only(sum(re8(x)[3]))^2, v8)[1], so my picture of the world is obviously incomplete.

mcabbott · 2022-08-22T00:13:30Z

Sounds good. I have no idea if the tests have ZeroTangent() vs. NoTangent() the wrong way around, fine to adjust tests to whatever is produced.

I broke Flux at some point because it turned out half the SciML universe rested on the gradient of destructrure, and there were exactly zero tests... it was so rudimentary I assumed it was for saving to CSV or similar use only. But I think it gets used as an interface between things which don't like nested structures (like calling some package for LBFGS) and models which do. Or obtaining a Hessian of the parameters of some Flux model.

dfdx · 2022-08-24T22:38:58Z

I have a question regarding tests like this:

@test Yota_gradient(m -> destructure(m)[1][2], m2)[1] == ([0,1,0], [0,0,0])

Currently, Yota returns Tangent{Tuple{Vector{Float64}, Vector{Float64}}}([0.0, 1.0, 0.0], [0.0, 0.0, 0.0]) so can do e.g.:

g = Yota_gradient(m -> destructure(m)[1][2], m2)[1]
g + [0, 0, 0]     # =>     [0.0, 1.0, 0.0]

Is it what Optimisers.jl expects or should I better return a plain tuple as in the test?

mcabbott · 2022-08-24T23:35:37Z

I think a Tangent is fine, this term is the gradient with respect to a Tuple.

The test should be changed to allow for this, or perhaps the Yota_gradient function should convert, since its job is to make tests look the same.

dfdx · 2022-08-26T22:50:06Z

I fixed the most hardcore issues in the tests, but after several days of investigation I can't solve 2 remaining problems:

ZeroTangent vs NoTangent. Honestly, I still don't have a clear understanding of the difference. For example, function arguments may be generalized to callable structs, so it makes to return ZeroTangent() for them, yet most examples in ChainRules return NoTangent(). Another case is function ChainRules.var"#fieldtype_pullback#422, that, being applied to ZeroTangent(), returns (NoTangent(), NoTangent(), NoTangent()). So I wasn't able to adjust Yota's behavior to the tests (which reflects the behavior of Zygote, right?), but I'm open to suggestions.
The gradient seem to be packed and unpacked differently. For example, to account for the Tangent{Tuple} vs Tuple case above, I tried to modify Yota_gradient to this:

unpack(x::Tangent) = x.backing
unpack(x) = x
function Yota_gradient(f, xs...)
  g = Base.tail(Yota.grad(f, xs...)[2])
  return map(unpack, g)
end

It helped with some tested, but broke others. Structurally, the results seem to be correct, but I don't quite understand what needs to be adjusted - Yota, Yota_gradient or tests themselves.

I'm going to proceed with testing of Yota on Flux models + Optimisers, which should uncover more inconsistencies, but if you are want to make another pass on these tests. please try [email protected] and share your thoughts!

mcabbott

Had a quick go with 0.8, and still see many errors? But will update a few things so long.

mcabbott · 2022-08-27T00:29:49Z

test/runtests.jl

@@ -13,6 +13,8 @@ struct TwoThirds a; b; c; end
 Functors.@functor TwoThirds (a, c)
 Optimisers.trainable(x::TwoThirds) = (a = x.a,)

+Yota_gradient(f, xs...) = Base.tail(Yota.grad(f, xs...)[2])


Maybe this is a better rough translation function, much like the suggestion above:

Suggested change

Yota_gradient(f, xs...) = Base.tail(Yota.grad(f, xs...)[2])

Yota_gradient(f, xs...) = map(y2z, Base.tail(Yota.grad(f, xs...)[2]));

y2z(::AbstractZero) = nothing # we don't care about different flavours

y2z(t::Tangent) = map(y2z, ChainRulesCore.backing(canonicalize(t)))

y2z(x) = x

The only goal is to have as few changes as possible between tests using Zygote and the same with Yota. I don't think we care at all about the different kinds of special Zero.

Well, we care internally that all should be accepted. But when testing what's returned, we are happy if we get any one of them.

Project.toml

docs/src/index.md

test/destructure.jl

dfdx · 2022-08-27T18:27:32Z

I can successfully run tests in this PR on Julia nightly with this rule added:

function rrule(::typeof(getfield), s, f::Symbol)
  y = getproperty(s, f)
  function getproperty_pullback(dy)
      dy = unthunk(dy)
      T = typeof(s)
      nt = NamedTuple{(f,)}((dy,))
      return NoTangent(), Tangent{T}(; nt...), ZeroTangent()
  end
  return y, getproperty_pullback
end

Yota contains the same rule for getproperty, which usually is enough but doesn't work in this particular case. If the code above is an acceptable solution, I can add this rule to Yota or create a PR to ChainRules.

mcabbott · 2022-08-27T18:41:43Z

It's possible that this package and Functors.jl should think more about whether to call getfield vs getproperty. The weird @functor macro https://github.com/FluxML/Functors.jl/blob/master/src/functor.jl#L11-L20 goes by fieldnames(T) and getproperty I think.

But looking at the errors on CI, maybe it's from somewhere deeper inside, involving Core.Box because everything is type-unstable?

 Error During Test at /home/runner/work/Optimisers.jl/Optimisers.jl/test/destructure.jl:205
  Test threw exception
  Expression: (Yota_gradient((x->(sum(abs2, (re9(x)).c[1]);)), 1:7))[1] == [0, 0, 0, 8, 10, 12, 14]
  No deriative rule found for op %7 = getfield(%3, :contents)::Optimisers.Restructure{NamedTuple{(:a, :b, :c), Tuple{Vector{Float64}, Matrix{Float32}, Vector{Array}}}, NamedTuple{(:a, :b, :c), Tuple{Int64, Int64, Vector{Int64}}}} , try defining it using 
  	ChainRulesCore.rrule(::typeof(getfield), ::Core.Box, ::Symbol) = ...

That said, having a rule for getfield sounds fine to me. I think it should probably call ProjectTo, since this will sometimes turn e.g. Tangent{Complex} back into a number. And perhaps care is needed about whether closing over a symbol & a type works well, or needs Val to help out?

function rrule(::typeof(getfield), x::T, f::Symbol) where T
  y = getproperty(x, f)
  proj = ProjectTo(x)
  # valT = Val(T)  # perhaps more stable inside closure?
  function getfield_pullback(dy)
      nt = NamedTuple{(f,)}((unthunk(dy),))
      # not really sure whether this ought to unthunk or not, maybe ProjectTo will anyway, in which case best to be explicit?
      return NoTangent(), proj(Tangent{T}(; nt...)), ZeroTangent()
  end
  return y, getfield_pullback
end

# These print lots in red:
@code_warntype rrule(getfield, (x=1, y=2.0), :x)
@code_warntype rrule(getfield, (x=1, y=2.0), :x)[2](3)

# But these are OK
@code_warntype (nt -> rrule(getfield, nt, :x))((x=1, y=2.0))
@code_warntype (nt -> rrule(getfield, nt, :x)[2](3.0))((x=1, y=2.0))

mcabbott · 2022-08-27T19:45:02Z

It's not in the tests here, but running the Metalhead example in the docs I still get this error (with or without getfield rule, 1.8 and 1.9):

julia> loss, (∇function, ∇model, ∇image) = Yota.grad(model, image) do m, x
                  sum(m(x))
                   end;
ERROR: BoundsError: attempt to access Nothing at index [1]
Stacktrace:
  [1] _getfield(value::Nothing, fld::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/helpers.jl:40
  [2] mkcall(::Function, ::Umlaut.Variable, ::Vararg{Any}; val::Missing, line::Nothing, kwargs::NamedTuple{(), Tuple{}}, free_kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Umlaut ~/.julia/packages/Umlaut/vGy3v/src/tape.jl:192
  [3] mkcall
    @ ~/.julia/packages/Umlaut/vGy3v/src/tape.jl:174 [inlined]
  [4] chainrules_transform!(tape::Umlaut.Tape{Yota.GradCtx})
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:183
  [5] gradtape!(tape::Umlaut.Tape{Yota.GradCtx}; seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:268
  [6] gradtape(::Function, ::ResNet, ::Vararg{Any}; ctx::Yota.GradCtx, seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:297
  [7] grad(::Function, ::ResNet, ::Vararg{Any}; seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:367
  [8] grad(::Function, ::ResNet, ::Vararg{Any})
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:359

dfdx · 2022-08-27T22:06:59Z

But looking at the errors on CI, maybe it's from somewhere deeper inside, involving Core.Box because everything is type-unstable?

It's even curiouser! Running a random test in REPL works fine:

julia> re1 = destructure(m1)[2]
Restructure(Array, ..., 3)

julia> @test Yota_gradient(x -> re1(x)[1], rand(3))[1] == [1,0,0]
Test Passed

But wrap it into @testset and it fails!

julia> @testset "using Yota" begin
              re1 = destructure(m1)[2]
             @test Yota_gradient(x -> re1(x)[1], rand(3))[1] == [1,0,0] 
          end
          
using Yota: Error During Test at REPL[69]:3
  Test threw exception
  Expression: (Yota_gradient((x->((re1(x))[1];)), rand(3)))[1] == [1, 0, 0]
  No deriative rule found for op %3 = getfield(%1, :re1)::Optimisers.Restructure{Vector{Float64}, Int64} , try defining it using 
  
        ChainRulesCore.rrule(::typeof(getfield), ::var"#95#96"{Optimisers.Restructure{Vector{Float64}, Int64}}, ::Symbol) = ...
...

Perhaps, @testset captures the module it's running in and uses something like getfield(captured_data, :global_var_name). Essentially, this is the same issue as dfdx/Yota.jl#112 .

I think it should probably call ProjectTo [...]

Yes, it makes sense. Regarding type stability, I'm going to include your definition to Yota as is for now to keep the focus on correctness, and come back to performance later.

It's not in the tests here, but running the Metalhead example in the docs I still get this error (with or without getfield rule, 1.8 and 1.9):

I'm looking at it.

dfdx · 2022-08-28T22:38:38Z

I may have spotted one of the bugs related to the failures on Metalhead example, but must make sure first. In this piece of code in generic broadcasting:

    ys3, backs = unzip_broadcast(args...) do a...
        rrule_via_ad(cfg, f, a...)
    end

does f refer to a function being broadcasted or to the Broadcast.broadcasted itself? For example, in this case:

f = x -> identity(x)
args = (rand(3),)
rrule(cfg, broadcasted, f, args...)

which of the following is invoked:

rrule_via_ad(cfg, broadcasted, f, args...)

or

rrule_via_ad(cfg, f, args...)

?

mcabbott · 2022-08-28T22:55:05Z

I don't see an obvious mistake. The intention is for rrule(cfg, broadcasted, BroadcastStyle, sqrt, [1,2,3]) to call y, bk = rrule(cfg, sqrt, 2) i.e. acting on the elements, no broadcasting. That gives y=1.414, and then the unzip gives ys3 = [1, 1.41, 1.8] and an array of functions.

This plit_bc_pullbacks(cfg, f, args...) is never passed the broadcasted, BroadcastStyle parts, since it doesn't need them. So its second argument should be sqrt.

dfdx · 2022-08-28T23:12:56Z

Oh, I don't think it's a mistake in the generic broadcasting, but rather in Yota.rrule_via_ad()! The example I'm currently testing is this:

using Flux, Yota

model = Dense(28*28, 1024, x -> identity(x))
x = rand(Float32, 28*28, 4)
grad((model, x) -> sum(model(x)), model, x)

which produces this nice stacktrace:

ERROR: all field arrays must have same shape
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] (::StructArrays.var"#6#7"{Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})(ci::Vector{Function})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:21
  [3] map
    @ ./tuple.jl:221 [inlined]
  [4] (StructArrays.StructArray{Tuple{Float32, Function}, 2, Tuple{Matrix{Float32}, Vector{Function}}})(c::Tuple{Matrix{Float32}, Vector{Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:20
  [5] (StructArrays.StructArray{Tuple{Float32, Function}})(c::Tuple{Matrix{Float32}, Vector{Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:97
  [6] _widenstructarray(dest::StructArrays.StructArray{Tuple{Float32, var"#25#27"}, 2, Tuple{Matrix{Float32}, Matrix{var"#25#27"}}, Int64}, i::Int64, #unused#::Type{Tuple{Float32, Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:118
  [7] widen_from_type(dest::StructArrays.StructArray{Tuple{Float32, var"#25#27"}, 2, Tuple{Matrix{Float32}, Matrix{var"#25#27"}}, Int64}, i::Int64, #unused#::Type{Tuple{Float32, var"#24#26"}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:109
  [8] widen_from_instance(dest::StructArrays.StructArray{Tuple{Float32, var"#25#27"}, 2, Tuple{Matrix{Float32}, Matrix{var"#25#27"}}, Int64}, i::Int64, el::Tuple{Float32, var"#24#26"})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:105
  [9] collect_to_structarray!(dest::StructArrays.StructArray{Tuple{Float32, var"#25#27"}, 2, Tuple{Matrix{Float32}, Matrix{var"#25#27"}}, Int64}, itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1705#1707"{YotaRuleConfig, var"#141#142"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}, offs::Int64, st::Tuple{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndex{2}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:77
 [10] _collect_structarray!
    @ ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:59 [inlined]
 [11] _collect_structarray(itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1705#1707"{YotaRuleConfig, var"#141#142"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}, elem::Tuple{Tuple{Float32, var"#25#27"}, Tuple{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndex{2}}}, ax::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}; initializer::StructArrays.StructArrayInitializer{typeof(StructArrays.alwaysfalse), typeof(StructArrays.arrayof)})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:54
 [12] collect_structarray(itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1705#1707"{YotaRuleConfig, var"#141#142"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}; initializer::StructArrays.StructArrayInitializer{typeof(StructArrays.alwaysfalse), typeof(StructArrays.arrayof)})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:40
 [13] StructArrays.StructArray(v::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1705#1707"{YotaRuleConfig, var"#141#142"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}; unwrap::typeof(StructArrays.alwaysfalse))
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:261
 [14] StructArray
    @ ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:260 [inlined]
 [15] unzip_broadcast
    @ ~/.julia/packages/ChainRules/DUopG/src/unzipped.jl:39 [inlined]
 [16] split_bc_pullbacks(cfg::YotaRuleConfig, f::var"#141#142", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/DUopG/src/rulesets/Base/broadcast.jl:127
 [17] rrule(cfg::YotaRuleConfig, #unused#::typeof(Base.Broadcast.broadcasted), #unused#::Base.Broadcast.DefaultArrayStyle{2}, f::var"#141#142", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/DUopG/src/rulesets/Base/broadcast.jl:44
 [18] mkcall(::Function, ::YotaRuleConfig, ::Vararg{Any}; val::Missing, line::Core.LineInfoNode, kwargs::NamedTuple{(), Tuple{}}, free_kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Umlaut ~/.julia/packages/Umlaut/vGy3v/src/tape.jl:192
 [19] chainrules_transform!(tape::Tape{GradCtx})
    @ Main ~/work/Yota/src/grad.jl:184
 [20] gradtape!(tape::Tape{GradCtx}; seed::Int64)
    @ Main ~/work/Yota/src/grad.jl:271
 [21] gradtape(::Function, ::Dense{var"#141#142", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any}; ctx::GradCtx, seed::Int64)
    @ Main ~/work/Yota/src/grad.jl:300
 [22] grad(::Function, ::Dense{var"#141#142", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any}; seed::Int64)
    @ Main ~/work/Yota/src/grad.jl:370
 [23] grad(::Function, ::Dense{var"#141#142", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any})
    @ Main ~/work/Yota/src/grad.jl:362
 [24] top-level scope
    @ REPL[26]:1

From the stacktrace I infer that rrule_via_ad() returns not what unzip_broadcast() expects. I made a guess that split_bc_pullbacks() calls rrule_via_ad() on broadcasted itself, e.g.:

julia> y, bk = rrule_via_ad(YotaRuleConfig(), broadcasted, sqrt, [1.0, 2, 3])
...
julia> y
3-element Vector{Float64}:
 1.0
 1.4142135623730951
 1.7320508075688772

julia> bk
#24 (generic function with 1 method)

and that unzip_broadcast() expects y and bk to have the same length to pack them into StructArray. But if you say rrule_via_ad() is never invoked that way, then I'm going to get a good night's sleep before the next iteration of debugging 😄

mcabbott · 2022-08-28T23:26:31Z

Quite the stacktrace! These lines look correct to me: The same function f is being passed through, and it acts on arg which is the result of lazy broadcasting +. The broadcasted, DefaultArrayStyle arguments are marked unused:

 [16] split_bc_pullbacks(cfg::YotaRuleConfig, f::var"#141#142", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/DUopG/src/rulesets/Base/broadcast.jl:127
 [17] rrule(cfg::YotaRuleConfig, #unused#::typeof(Base.Broadcast.broadcasted), #unused#::Base.Broadcast.DefaultArrayStyle{2}, f::var"#141#142", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/DUopG/src/rulesets/Base/broadcast.jl:44

To get to line 39 https://github.com/JuliaDiff/ChainRules.jl/blob/main/src/unzipped.jl#L39 the function rrule_via_ad must have inferred to give a Tuple which is correct. I don't see how the shape can then come out wrong, but...

dfdx · 2022-09-03T22:43:21Z

Here's an interesting observation. If I run the same example as is:

using Flux, Yota, ChainRules


myid = x -> identity(x)
model = Dense(5, 3, myid)
x = rand(Float32, 5, 1);
val, g = grad((model, x) -> sum(model(x)), model, x)
@show val
@show g

I get the same stacktrace as posted above, complaining about "ERROR: all field arrays must have same shape". However, if I slightly modify unzip_broadcast() and just add Broadcast.materialize(bc):

function unzip_broadcast(f::F, args...) where {F}
    T = Broadcast.combine_eltypes(f, args)
    if isconcretetype(T)
        T <: Tuple || throw(ArgumentError("""unzip_broadcast(f, args) only works on functions returning a tuple,
            but f = $(sprint(show, f)) returns type T = $T"""))
    end
    bc = Broadcast.instantiate(Broadcast.broadcasted(f, args...))
    bcs = Broadcast.BroadcastStyle(typeof(bc))
    if bcs isa AbstractGPUArrayStyle
        # This is a crude way to allow GPU arrays, not currently tested, TODO.
        # See also https://github.com/JuliaArrays/StructArrays.jl/issues/150
        return unzip(broadcast(f, args...))
    elseif bcs isa Broadcast.AbstractArrayStyle
        Broadcast.materialize(bc)                                             # <-- this line added
        return StructArrays.components(StructArray(bc))
    else
        return unzip(broadcast(f, args...))  # e.g. tuples
    end
    # TODO maybe this if-else can be replaced by methods of `unzip(:::Broadcast.Broadcasted)`?
end

The error disappears!

The only hypothesis I have is that materialization of a broadcasted variable changes something in the global Julia state that makes it more friendly to StructArray, but I can't find any relevant information.

(ChainRules) pkg> st
Project ChainRules v1.44.5
Status `~/work/ChainRules.jl/Project.toml`
  [79e6a3ab] Adapt v3.4.0
  [d360d2e6] ChainRulesCore v1.15.3
  [34da2185] Compat v4.2.0
  [46192b85] GPUArraysCore v0.1.2
  [92d709cd] IrrationalConstants v0.1.1
  [c1ae055f] RealDot v0.1.0
  [09ab397b] StructArrays v0.6.12
  [cd998857] Yota v0.8.0 `https://github.com/dfdx/Yota.jl.git#fix-broadcast`
  [8ba89e20] Distributed
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
  [2f01184e] SparseArrays
  [10745b16] Statistics

mcabbott · 2022-09-04T00:10:49Z

That is pretty odd.

I can reproduce this, by @eval ChainRules function unzip_broadcast(f::F, args...) where {F} ... your definition in the repl. What's strange is that if I then @eval again the old code (or the old code with some printout) it still works. (The signature has not changed, the replacement code is run.) Does this mean it's some world-age problem or something?

Edit: I've pasted in a complete session below. This @eval has exactly the same code as the source, and somehow fixes the problem. Running it before grad seems to have no effect.

julia> using Flux, Yota, ChainRules

julia> ENV["JULIA_DEBUG"] = ChainRules;

julia> begin
        myid = x -> identity(x)
        model = Dense(5, 3, myid)
        x = rand(Float32, 5, 1)
       end;

julia> val, g = grad((model, x) -> sum(model(x)), model, x)
┌ Debug: broadcasting: plus
│   length(xs) = 2
└ @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:161
┌ Debug: split broadcasting generic
│   f = #7 (generic function with 1 method)
│   N = 1
└ @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:126
ERROR: all field arrays must have same shape
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] (::StructArrays.var"#6#7"{Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}})(ci::Vector{Function})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:21
  [3] map
    @ ./tuple.jl:273 [inlined]
  [4] (StructArrays.StructArray{Tuple{Float32, Function}, 2, Tuple{Matrix{Float32}, Vector{Function}}})(c::Tuple{Matrix{Float32}, Vector{Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:20
  [5] (StructArrays.StructArray{Tuple{Float32, Function}})(c::Tuple{Matrix{Float32}, Vector{Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:97
  [6] _widenstructarray(dest::StructArrays.StructArray{Tuple{Float32, Yota.var"#21#23"}, 2, Tuple{Matrix{Float32}, Matrix{Yota.var"#21#23"}}, Int64}, i::Int64, #unused#::Type{Tuple{Float32, Function}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:118
  [7] widen_from_type(dest::StructArrays.StructArray{Tuple{Float32, Yota.var"#21#23"}, 2, Tuple{Matrix{Float32}, Matrix{Yota.var"#21#23"}}, Int64}, i::Int64, #unused#::Type{Tuple{Float32, Yota.var"#20#22"}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:109
  [8] widen_from_instance(dest::StructArrays.StructArray{Tuple{Float32, Yota.var"#21#23"}, 2, Tuple{Matrix{Float32}, Matrix{Yota.var"#21#23"}}, Int64}, i::Int64, el::Tuple{Float32, Yota.var"#20#22"})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:105
  [9] collect_to_structarray!(dest::StructArrays.StructArray{Tuple{Float32, Yota.var"#21#23"}, 2, Tuple{Matrix{Float32}, Matrix{Yota.var"#21#23"}}, Int64}, itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1707#1709"{Yota.YotaRuleConfig, var"#7#8"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}, offs::Int64, st::Tuple{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndex{2}})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:77
 [10] _collect_structarray!
    @ ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:59 [inlined]
 [11] _collect_structarray(itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1707#1709"{Yota.YotaRuleConfig, var"#7#8"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}, elem::Tuple{Tuple{Float32, Yota.var"#21#23"}, Tuple{CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndex{2}}}, ax::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}; initializer::StructArrays.StructArrayInitializer{typeof(StructArrays.alwaysfalse), typeof(StructArrays.arrayof)})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:54
 [12] collect_structarray(itr::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1707#1709"{Yota.YotaRuleConfig, var"#7#8"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}; initializer::StructArrays.StructArrayInitializer{typeof(StructArrays.alwaysfalse), typeof(StructArrays.arrayof)})
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/collect.jl:40
 [13] StructArrays.StructArray(v::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, ChainRules.var"#1707#1709"{Yota.YotaRuleConfig, var"#7#8"}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}}}}; unwrap::typeof(StructArrays.alwaysfalse))
    @ StructArrays ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:261
 [14] StructArray
    @ ~/.julia/packages/StructArrays/w2GaP/src/structarray.jl:260 [inlined]
 [15] unzip_broadcast
    @ ~/.julia/packages/ChainRules/fgVxV/src/unzipped.jl:39 [inlined]
 [16] split_bc_pullbacks(cfg::Yota.YotaRuleConfig, f::var"#7#8", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:127
 [17] rrule(cfg::Yota.YotaRuleConfig, #unused#::typeof(Base.Broadcast.broadcasted), #unused#::Base.Broadcast.DefaultArrayStyle{2}, f::var"#7#8", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})
    @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:44
 [18] mkcall(::Function, ::Yota.YotaRuleConfig, ::Vararg{Any}; val::Missing, line::Core.LineInfoNode, kwargs::NamedTuple{(), Tuple{}}, free_kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Umlaut ~/.julia/packages/Umlaut/vGy3v/src/tape.jl:192
 [19] chainrules_transform!(tape::Umlaut.Tape{Yota.GradCtx})
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:181
 [20] gradtape!(tape::Umlaut.Tape{Yota.GradCtx}; seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:268
 [21] gradtape(::Function, ::Dense{var"#7#8", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any}; ctx::Yota.GradCtx, seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:297
 [22] grad(::Function, ::Dense{var"#7#8", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any}; seed::Int64)
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:367
 [23] grad(::Function, ::Dense{var"#7#8", Matrix{Float32}, Vector{Float32}}, ::Vararg{Any})
    @ Yota ~/.julia/packages/Yota/uu3H0/src/grad.jl:359
 [24] top-level scope
    @ REPL[4]:1
 [25] top-level scope
    @ ~/.julia/packages/CUDA/DfvRa/src/initialization.jl:52

julia> @eval ChainRules function unzip_broadcast(f::F, args...) where {F}
           T = Broadcast.combine_eltypes(f, args)
           if isconcretetype(T)
               T <: Tuple || throw(ArgumentError("""unzip_broadcast(f, args) only works on functions returning a tuple,
                   but f = $(sprint(show, f)) returns type T = $T"""))
           end
           bc = Broadcast.instantiate(Broadcast.broadcasted(f, args...))
           bcs = Broadcast.BroadcastStyle(typeof(bc))
           if bcs isa AbstractGPUArrayStyle
               # This is a crude way to allow GPU arrays, not currently tested, TODO.
               # See also https://github.com/JuliaArrays/StructArrays.jl/issues/150
               return unzip(broadcast(f, args...))
           elseif bcs isa Broadcast.AbstractArrayStyle
               # Broadcast.materialize(bc)   # <-- this line added  # <-- now removed, identical to original
               return StructArrays.components(StructArray(bc))
           else
               return unzip(broadcast(f, args...))  # e.g. tuples
           end
           # TODO maybe this if-else can be replaced by methods of `unzip(:::Broadcast.Broadcasted)`?
       end
unzip_broadcast (generic function with 1 method)

julia> val, g = grad((model, x) -> sum(model(x)), model, x)
┌ Debug: broadcasting: plus
│   length(xs) = 2
└ @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:161
┌ Debug: split broadcasting generic
│   f = #7 (generic function with 1 method)
│   N = 1
└ @ ChainRules ~/.julia/packages/ChainRules/fgVxV/src/rulesets/Base/broadcast.jl:126
(2.6951299f0, (ChainRulesCore.ZeroTangent(), Tangent{Dense{var"#7#8", Matrix{Float32}, Vector{Float32}}}(σ = ChainRulesCore.ZeroTangent(), weight = Float32[0.88211715 0.71158904 … 0.74754727 0.49648; 0.88211715 0.71158904 … 0.74754727 0.49648; 0.88211715 0.71158904 … 0.74754727 0.49648], bias = Float32[1.0, 1.0, 1.0]), Float32[1.0559639; 1.8083295; … ; 0.78016365; -0.7226729;;]))

dfdx · 2022-09-04T09:14:18Z

Here's a hypothesis for world age problem:

bc contains a reference to a lazy function definition
materialize(bc) triggers the definition and adds a new method to an existing method table
StructArray(bc) w/ and w/o prior call to materialize(bc) thus goes different dispatch paths and hits different versions of the same function

But:

if I put Base.get_world_counter() before and after materialize(bc), I see the same world number
I don't see a function to which a new method is added, but only completely new functions that shouldn't change the dispatch path

mcabbott · 2022-09-04T13:15:15Z

How are you adding this materialize line? By editing the source used for a fresh session, or by loading something while running?

dfdx · 2022-09-04T13:44:53Z

I have a file called _main.jl inside Yota/src directory with contents similar to this:

include("core.jl")       # in its turn, core.jl includes all the files from Yota, so now Main ~ Yota

using Flux

# I think these imports are not needed anymore, but just copy pasting them 
import ChainRules: unzip_broadcast, RCR, TRI_NO, AbstractGPUArrayStyle, StructArrays
import ChainRules.StructArrays: StructArray


@eval ChainRules function unzip_broadcast(f::F, args...) where {F}
    global BC_STATE = (f, args)
    T = Broadcast.combine_eltypes(f, args)
    if isconcretetype(T)
        T <: Tuple || throw(ArgumentError("""unzip_broadcast(f, args) only works on functions returning a tuple,
            but f = $(sprint(show, f)) returns type T = $T"""))
    end
    # bc - rrule_via_ad's wrapper broadcasted to all arguments
    bc = Broadcast.instantiate(Broadcast.broadcasted(f, args...))
    bcs = Broadcast.BroadcastStyle(typeof(bc))
    if bcs isa AbstractGPUArrayStyle
        # This is a crude way to allow GPU arrays, not currently tested, TODO.
        # See also https://github.com/JuliaArrays/StructArrays.jl/issues/150
        return unzip(broadcast(f, args...))
    elseif bcs isa Broadcast.AbstractArrayStyle
        println("World age before materialize(bc): $(Base.get_world_counter())")
        # Broadcast.materialize(bc)
        println("World age after materialize(bc): $(Base.get_world_counter())")
        # global BC = bc
        return StructArrays.components(StructArray(bc))
    else
        return unzip(broadcast(f, args...))  # e.g. tuples
    end
    # TODO maybe this if-else can be replaced by methods of `unzip(:::Broadcast.Broadcasted)`?
end


function bc_test()
    myid = x -> identity(x)
    model = Dense(5, 3, myid)
    x = rand(Float32, 5, 1);
    grad((model, x) -> sum(model(x)), model, x)
end

Whenever I do a change, I include the whole file, thus updating all definitions from Yota + ChainRules.unzip_broadcast.

I also noticed that the problem is fixed if I replace rrule_via_ad() with a dummy implementation that doesn't generate new functions:

function ChainRulesCore.rrule_via_ad(cfg::YotaRuleConfig, f, args...)
    return 1.0, dy -> (ZeroTangent(), [ZeroTangent() for _ in args]...)
end

In theory, I can make rrule_via_ad() always work as interpreter and never compile code, but it would be a high price...

mcabbott · 2022-09-04T13:55:45Z

Focusing on these lines

  [4] (StructArrays.StructArray{Tuple{Float32, Function}, 2, Tuple{Matrix{Float32}, Vector{Function}}})(c::Tuple{Matrix{Float32}, Vector{Function}})

 [16] split_bc_pullbacks(cfg::Yota.YotaRuleConfig, f::var"#7#8", args::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{2}, Nothing, typeof(+), Tuple{Matrix{Float32}, Vector{Float32}}})

here's a smaller reproducer:

julia> using ChainRules, Yota

julia> y, bk = ChainRules.split_bc_pullbacks(Yota.YotaRuleConfig(), identity, Broadcast.broadcasted(+, [1 2; 3 4], [5, 6]));

julia> bk([7 8; 9 0])  # with identity it works fine, also sqrt
(ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), ChainRulesCore.NoTangent(), [7 8; 9 0])

julia> y, bk = ChainRules.split_bc_pullbacks(Yota.YotaRuleConfig(), x -> identity(x), Broadcast.broadcasted(+, [1 2; 3 4], [5, 6]));
ERROR: all field arrays must have same shape

(@v1.9) pkg> st Yota ChainRules
Status `~/.julia/environments/v1.9/Project.toml`
  [082447d4] ChainRules v1.44.5
  [cd998857] Yota v0.8.0

This seems to avoid my order-of-loading weirdness above. If I @eval your method with materialize, then it starts working. If I @eval again the old one without, it stops working.

mcabbott · 2022-09-04T14:30:23Z

Some half-way steps:

julia> using ChainRules, Yota

# Easy case

julia> broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), identity, x)
       end
2×2 Matrix{Tuple{Int64, ChainRules.var"#identity_pullback#1201"}}:
 (6, identity_pullback)  (7, identity_pullback)
 (9, identity_pullback)  (10, identity_pullback)

julia> ChainRules.unzip(ans)
([6 7; 9 10], [ChainRules.var"#identity_pullback#1201"() ChainRules.var"#identity_pullback#1201"(); ChainRules.var"#identity_pullback#1201"() ChainRules.var"#identity_pullback#1201"()])

julia> broadcast(|>, [7 8; 9 0], ans[2])
2×2 Matrix{Tuple{ChainRulesCore.NoTangent, Int64}}:
 (NoTangent(), 7)  (NoTangent(), 8)
 (NoTangent(), 9)  (NoTangent(), 0)

julia> ChainRules.unzip_broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), identity, x)
       end
([6 7; 9 10], [ChainRules.var"#identity_pullback#1201"() ChainRules.var"#identity_pullback#1201"(); ChainRules.var"#identity_pullback#1201"() ChainRules.var"#identity_pullback#1201"()])

julia> broadcast(|>, [7 8; 9 0], ans[2])
2×2 Matrix{Tuple{ChainRulesCore.NoTangent, Int64}}:
 (NoTangent(), 7)  (NoTangent(), 8)
 (NoTangent(), 9)  (NoTangent(), 0)

# Now try with y -> identity(y)

julia> broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), y -> identity(y), x)
       end
2×2 Matrix{Tuple{Int64, Function}}:  ## <-- notice Function, abstract type
 (6, #21)  (7, #20)
 (9, #20)  (10, #20)

julia> ChainRules.unzip(ans)  ## notice Core.Box
([6 7; 9 10], Function[Yota.var"#21#23"(Core.Box(Yota.var"##pullback_#72#328#86"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))) Yota.var"#20#22"(Core.Box(Yota.var"##pullback_#72#328#86"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))); Yota.var"#20#22"(Core.Box(Yota.var"##pullback_#72#328#86"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))) Yota.var"#20#22"(Core.Box(Yota.var"##pullback_#72#328#86"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"())))])

julia> broadcast(|>, [7 8; 9 0], ans[2])
2×2 Matrix{Tuple{ChainRulesCore.ZeroTangent, Int64}}:
 (ZeroTangent(), 7)  (ZeroTangent(), 8)
 (ZeroTangent(), 9)  (ZeroTangent(), 0)

julia> ChainRules.unzip_broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), y -> identity(y), x)
       end
ERROR: all field arrays must have same shape

# Name the function:

julia> myid(x) = x;

julia> broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), myid, x)
       end
2×2 Matrix{Tuple{Int64, Function}}:  ## <-- looks as bad
 (6, #21)  (7, #20)
 (9, #20)  (10, #20)

julia> ChainRules.unzip_broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
         ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), myid, x)   ## now this works, with Core.Box
       end
([6 7; 9 10], Yota.var"#20#22"[Yota.var"#20#22"(Core.Box(Yota.var"##pullback_myid#334#89"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))) Yota.var"#20#22"(Core.Box(Yota.var"##pullback_myid#334#89"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))); Yota.var"#20#22"(Core.Box(Yota.var"##pullback_myid#334#89"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"()))) Yota.var"#20#22"(Core.Box(Yota.var"##pullback_myid#334#89"{ChainRules.var"#identity_pullback#1201"}(ChainRules.var"#identity_pullback#1201"())))])

dfdx · 2022-09-04T23:02:24Z

Apparently, in the last example there's no error because rrule() for myid has been already compiled in the previous broadcast() call. Running only the last statement gives the same error:

julia> ChainRules.unzip_broadcast(Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])) do x
        ChainRules.rrule_via_ad(Yota.YotaRuleConfig(), myid, x)
end
...
ERROR: all field arrays must have same shape
...

dfdx · 2022-09-05T07:04:56Z

My current understanding is as follows:

Yota.rrule_via_ad() generates a new rrule() and shifts the world age forward
unzip_broadcast() doesn't immediately call rrule_via_ad(), but instead creates a lazy broadcasting object bc = Broadcast.instantiate(Broadcast.broadcasted(f, args...)), where f is wrapper created here:

unzip_broadcast(args...) do a...
    rrule_via_ad(cfg, f, a...)
end

this broadcasted object is actually materialized only when StructArray(bc) is called
when rrule_via_ad() itself acts on broadcasted objects, combination of double broadcasting, new function generation and something in StructArray constructor leads to incorrect results

Removing any of these factors solves the problem. Also, if in unzip_broadcast() I replace:

bc = Broadcast.instantiate(Broadcast.broadcasted(f, args...))

with this:

bc = broadcast(f, args...)

it leads to earlier evaluation and fixes the issue to. Given that a few lines later, in all 3 branches we materialize bc anyway:

        if bcs isa AbstractGPUArrayStyle
            # This is a crude way to allow GPU arrays, not currently tested, TODO.
            # See also https://github.com/JuliaArrays/StructArrays.jl/issues/150
            return unzip(broadcast(f, args...))
        elseif bcs isa Broadcast.AbstractArrayStyle        
            return StructArrays.components(StructArray(bc))
        else
            return unzip(broadcast(f, args...))  # e.g. tuples
        end

I wonder why do we need bc to be lazily broadcasted here at all?

mcabbott · 2022-09-05T09:54:04Z

in all 3 branches we materialize bc anyway:

But one of them materialises directly two arrays, instead of allocating an array of tuples first. This path is the entire reason for this function, and for depending on StructArrays.

Cc @piever in case this weird error rings any bells. (I wonder if it's possible to hit it without AD being involved?)

piever · 2022-09-05T13:39:07Z

Not sure if this is helpful, but here are some thoughts that could be useful.

One of the reasons for the mismatches could be that for StructArrays with no fields, the size of the array is hard to preserve (could be useful to test this branch with throw if no fields JuliaArrays/StructArrays.jl#235, though based on the eval & co. issues I imagine this shouldn't be the problem).
StructArrays collection mechanism only uses inference when the iterator is empty.
There is a PR to improve broadcast for StructArrays: Generalize StructArray's broadcast. JuliaArrays/StructArrays.jl#215, maybe it is helpful here (it would unify GPU and CPU implementation of unzip_broadcast ideally).

Though I definitely am puzzled as to why this is happening. Looks like the collection mechanism StructArray(bc) is failing on a lazy broadcasted object (indeed this is a different code path than StructArray(::AbstractArray)). An "AD-free" reproducer would definitely help narrow this down.

dfdx · 2022-09-05T22:49:57Z

Here's a reproducible example without Yota and ChainRules:

import StructArrays
import StructArrays: StructArray

# eval a new function similar to rrule()
function make_rrule(f, args...)
    name = gensym()
    ex = :(function $name(f, args...)
        y = sum(args)
        pullback(dy) = dy + y
        return y, pullback
    end)
    return Base.eval(@__MODULE__, ex)
end

# wrap rrule-like function with required number of invokelatest()
function rrule_via_ad(f, args...)
    rr = make_rrule(f, args...)
    res = Base.invokelatest(rr, f, args...)
    y, pb_ = res
    pb = dy -> Base.invokelatest(pb_, dy)
    return y, pb
end

# original split_bc_pullbacks stripped to the bones
function split_bc_pullbacks(f::F, args::Vararg{Any,N}) where {F,N}
    wf = (a...) -> rrule_via_ad(f, a...)
    # comment/uncomment the next 2 lines to make the example fail/work
    bc = Broadcast.instantiate(Broadcast.broadcasted(wf, args...))
    # bc = broadcast(wf, args...)
    return StructArrays.components(StructArray(bc))
end

bce() = Broadcast.broadcasted(+, [1 2; 3 4], [5, 6])
split_bc_pullbacks(x -> identity(x), bce())

Since make_rrule() generates a new function on every call, this code can be run with fail in the same REPL multiple times. I think the pullback stuff can also be removed, but I want to try the aforementioned branches from StructArrays first.

mcabbott · 2022-09-05T23:00:11Z

Stil happens on the PR's branch. You can simplify a bit further, and note that acting on a vector is OK, but higher ndims fails:

julia> [4,5,6] .|> split_bc_pullbacks(x -> identity(x), [1,2,3])[2]
3-element Vector{Int64}:
 5
 7
 9

julia> split_bc_pullbacks(x -> identity(x), [1 2; 3 4])[2]
ERROR: all field arrays must have same shape

Trying to pick bits out of the stack trace, is this correct?

julia> mat = [(; i, f) for i in 1:3, f in (sin, sin)] |> StructArray
3×2 StructArray(::Matrix{Int64}, ::Matrix{typeof(sin)}) with eltype NamedTuple{(:i, :f), Tuple{Int64, typeof(sin)}}:
 (i = 1, f = sin)  (i = 1, f = sin)
 (i = 2, f = sin)  (i = 2, f = sin)
 (i = 3, f = sin)  (i = 3, f = sin)

julia> StructArrays._widenstructarray(mat, 2, Tuple{Int, Function})
6-element StructArray(::Vector{Int64}, ::Vector{Function}) with eltype Tuple{Int64, Function}:
    (1, sin)
 #undef
 #undef
 #undef
 #undef
 #undef

piever · 2022-09-06T07:40:50Z

Trying to pick bits out of the stack trace, is this correct?

julia> mat = [(; i, f) for i in 1:3, f in (sin, sin)] |> StructArray
3×2 StructArray(::Matrix{Int64}, ::Matrix{typeof(sin)}) with eltype NamedTuple{(:i, :f), Tuple{Int64, typeof(sin)}}:
 (i = 1, f = sin)  (i = 1, f = sin)
 (i = 2, f = sin)  (i = 2, f = sin)
 (i = 3, f = sin)  (i = 3, f = sin)

julia> StructArrays._widenstructarray(mat, 2, Tuple{Int, Function})
6-element StructArray(::Vector{Int64}, ::Vector{Function}) with eltype Tuple{Int64, Function}:
    (1, sin)
 #undef
 #undef
 #undef
 #undef
 #undef

Agh, no it isn't, well spotted! Somehow the widening mechanism was not updated to support arrays of arbitrary shape and only worked for 2D things... JuliaArrays/StructArrays.jl#246 should hopefully fix it!

dfdx · 2022-09-08T22:13:34Z

I can confirm JuliaArrays/StructArrays.jl#246 fixes all the issues up to my first reproducer using Flux and Yota. Thanks for the quick fix!

The Metalhead example still fails though, but that's another story, which I'm looking at now.

mcabbott · 2022-10-17T22:42:18Z

docs/src/index.md

+
+loss, (∇function, ∇model, ∇image) = Yota.grad(model, image) do m, x
+  sum(m(x))
+end;


I rebased this and tests pass!

This example does not, it fails with a seemingly simple error:

julia> loss, (∇function, ∇model, ∇image) = Yota.grad(model, image) do m, x sum(m(x)) end; loss, (_, ∇model) = Yota.grad(m -> sum(m(image)), model)ERROR: No derivative rule found for op %454 = ntuple(%452, 4)::NTuple{4, Int64} , try defining it using ChainRulesCore.rrule(::typeof(ntuple), ::Flux.var"#336#337"{4, Array{Float32, 4}}, ::Int64) = ... Stacktrace: [1] error(s::String) @ Base ./error.jl:35 [2] step_back!(tape::Umlaut.Tape{Yota.GradCtx}, y::Umlaut.Variable) @ Yota ~/.julia/packages/Yota/KJQ6n/src/grad.jl:219

That was on tagged Yota; on latest everything instead it seems to take forever, and interrupts here:

julia> loss, (∇function, ∇model, ∇image) = Yota.grad(model, image) do m, x sum(m(x)) end; ^CERROR: InterruptException: Stacktrace: [1] collect(itr::Base.Generator{Vector{Umlaut.Variable}, Yota.var"#68#72"{Umlaut.Tape{Yota.GradCtx}}}) @ Base ./array.jl:792 [2] todo_list(tape::Umlaut.Tape{Yota.GradCtx}, y::Umlaut.Variable) @ Yota ~/.julia/packages/Yota/5CVY7/src/grad.jl:113 [3] #68 @ ./none:0 [inlined] [4] iterate @ ./generator.jl:47 [inlined] [5] collect(itr::Base.Generator{Vector{Umlaut.Variable}, Yota.var"#68#72"{Umlaut.Tape{Yota.GradCtx}}}) @ Base ./array.jl:787 [6] todo_list(tape::Umlaut.Tape{Yota.GradCtx}, y::Umlaut.Variable) @ Yota ~/.julia/packages/Yota/5CVY7/src/grad.jl:113 [7] #68 @ ./array.jl:0 [inlined] [8] iterate @ ./generator.jl:47 [inlined] [9] collect_to!(dest::Vector{Vector{Umlaut.Variable}}, itr::Base.Generator{Vector{Umlaut.Variable}, Yota.var"#68#72"{Umlaut.Tape{Yota.GradCtx}}}, offs::Int64, st::Int64) @ Base ./array.jl:845 [10] collect_to_with_first!(dest::Vector{Vector{Umlaut.Variable}}, v1::Vector{Umlaut.Variable}, itr::Base.Generator{Vector{Umlaut.Variable}, Yota.var"#68#72"{Umlaut.Tape{Yota.GradCtx}}}, st::Int64) @ Base ./array.jl:823 [11] collect(itr::Base.Generator{Vector{Umlaut.Variable}, Yota.var"#68#72"{Umlaut.Tape{Yota.GradCtx}}}) @ Base ./array.jl:797 --- the last 10 lines are repeated 2 more times --- (jl_aZPcXz) pkg> st Status `/private/var/folders/yq/4p2zwd614y59gszh7y9ypyhh0000gn/T/jl_aZPcXz/Project.toml` [dbeba491] Metalhead v0.8.0-DEV `https://github.com/FluxML/Metalhead.jl.git#master` [3bd65402] Optimisers v0.2.10 `~/.julia/dev/Optimisers` [09ab397b] StructArrays v0.6.13 `https://github.com/JuliaArrays/StructArrays.jl.git#master` [cd998857] Yota v0.8.1 `https://github.com/dfdx/Yota.jl.git#main`

Hmm, I was indeed investigating incredibly long processing time, but profiler blamed type inference/abstract interpreter, so I started a long search for a better way to trace functions (e.g. see my recent post on Discourse). However, your stacktrace implies the problem may actually appear after the tracing. I will try to investigate this option too closer to the end of the week.

FYI: I opened an issue to track this.

Fixed. The ResNet(18) example now compiles and runs in 61 second (compared to 47 seconds with Zygote). Subsequent calls take ~0.4 seconds on my CPU.

Great, I see something similar locally, on 0.8.2

ToucheSir · 2022-11-03T06:00:22Z

Are the failures on nightly easy to resolve?

dfdx · 2022-11-03T07:05:59Z

It's a failure in CompilerPluginTools.jl, which apparently has not been adapted for Julia 1.9 yet. I opened JuliaCompilerPlugins/CompilerPluginTools.jl#8 to track it.

mcabbott · 2022-11-27T04:42:53Z

Should we just skip tests on nightly, so that this can go in?

@dfdx do you know whether 1.9 works?

dfdx · 2022-11-28T21:15:18Z

It looks like there's more work to do in CompilerPluginTools.jl to make it work on Julia 1.9, so I don't think it will happen in the nearest time. If we can skip Yota tests for Julia 1.9, it should be the most efficient solution for now.

Note that Julia nightly now points to Julia 1.10, so perhaps we need a separate entry for the 1.9.

…thout checking first because I forgot about this for ages

mcabbott · 2022-12-08T03:32:09Z

Tests with Yota are now skipped for 1.9 & up.

Should be ready to go. Can someone approve?

mcabbott commented Aug 19, 2022

View reviewed changes

test/destructure.jl Outdated Show resolved Hide resolved

test/destructure.jl Outdated Show resolved Hide resolved

mcabbott commented Aug 19, 2022

View reviewed changes

mcabbott marked this pull request as draft August 19, 2022 20:28

mcabbott added the gradients label Aug 19, 2022

mcabbott commented Aug 27, 2022

View reviewed changes

mcabbott mentioned this pull request Aug 27, 2022

No deriative rule found for struct constructor dfdx/Yota.jl#117

Closed

mcabbott commented Aug 27, 2022

View reviewed changes

test/destructure.jl Outdated Show resolved Hide resolved

test/destructure.jl Outdated Show resolved Hide resolved

test/destructure.jl Outdated Show resolved Hide resolved

mcabbott force-pushed the yota branch from 277bc34 to e451d15 Compare October 17, 2022 18:40

mcabbott commented Oct 17, 2022

View reviewed changes

mcabbott marked this pull request as ready for review October 31, 2022 00:27

ToucheSir closed this Nov 3, 2022

ToucheSir reopened this Nov 3, 2022

mcabbott force-pushed the yota branch from 08c23f2 to 33a2e78 Compare November 25, 2022 06:23

mcabbott force-pushed the yota branch from 33a2e78 to 71d8a15 Compare November 27, 2022 04:44

mcabbott added 7 commits December 7, 2022 19:47

test with Yota too, and document this

76b681c

also test destructure

a7d575f

actually try out the doc examples

181c2f0

tidy, add summarysize

1a426ce

add again changes made on website which got lost in a local rebase wi…

86a23bb

…thout checking first because I forgot about this for ages

Yota 0.8.2, etc

e4a21d9

skip Yota tests on 1.9 & later

8562963

mcabbott force-pushed the yota branch from 773ba5e to 8562963 Compare December 8, 2022 00:48

skip more tests

ce3cc0c

ToucheSir approved these changes Dec 8, 2022

View reviewed changes

mcabbott merged commit 79269be into FluxML:master Dec 8, 2022

mcabbott deleted the yota branch December 8, 2022 03:44

cossio mentioned this pull request Dec 11, 2022

Investigate using a different AD for tests #96

Open

-Yota_gradient(f, xs...) = Base.tail(Yota.grad(f, xs...)[2])
+Yota_gradient(f, xs...) = map(y2z, Base.tail(Yota.grad(f, xs...)[2]));
+y2z(::AbstractZero) = nothing  # we don't care about different flavours
+y2z(t::Tangent) = map(y2z, ChainRulesCore.backing(canonicalize(t)))
+y2z(x) = x

Test with Yota, too #105

Test with Yota, too #105

Conversation

mcabbott commented Aug 19, 2022

mcabbott left a comment

Choose a reason for hiding this comment

mcabbott Aug 19, 2022

Choose a reason for hiding this comment

dfdx Aug 19, 2022

Choose a reason for hiding this comment

dfdx commented Aug 21, 2022

mcabbott commented Aug 22, 2022

dfdx commented Aug 24, 2022

mcabbott commented Aug 24, 2022

dfdx commented Aug 26, 2022

mcabbott left a comment

Choose a reason for hiding this comment

mcabbott Aug 27, 2022 • edited Loading

Choose a reason for hiding this comment

dfdx commented Aug 27, 2022

mcabbott commented Aug 27, 2022 • edited Loading

mcabbott commented Aug 27, 2022

dfdx commented Aug 27, 2022

dfdx commented Aug 28, 2022

mcabbott commented Aug 28, 2022

dfdx commented Aug 28, 2022

mcabbott commented Aug 28, 2022

dfdx commented Sep 3, 2022

mcabbott commented Sep 4, 2022 • edited Loading

dfdx commented Sep 4, 2022

mcabbott commented Sep 4, 2022

dfdx commented Sep 4, 2022

mcabbott commented Sep 4, 2022

mcabbott commented Sep 4, 2022

dfdx commented Sep 4, 2022

dfdx commented Sep 5, 2022

mcabbott commented Sep 5, 2022

piever commented Sep 5, 2022

dfdx commented Sep 5, 2022

mcabbott commented Sep 5, 2022 • edited Loading

piever commented Sep 6, 2022

dfdx commented Sep 8, 2022

mcabbott Oct 17, 2022

Choose a reason for hiding this comment

dfdx Oct 18, 2022

Choose a reason for hiding this comment

dfdx Oct 20, 2022

Choose a reason for hiding this comment

dfdx Oct 30, 2022

Choose a reason for hiding this comment

mcabbott Oct 31, 2022

Choose a reason for hiding this comment

ToucheSir commented Nov 3, 2022

dfdx commented Nov 3, 2022

mcabbott commented Nov 27, 2022

dfdx commented Nov 28, 2022

mcabbott commented Dec 8, 2022

mcabbott Aug 27, 2022 •

edited

Loading

mcabbott commented Aug 27, 2022 •

edited

Loading

mcabbott commented Sep 4, 2022 •

edited

Loading

mcabbott commented Sep 5, 2022 •

edited

Loading