-
Notifications
You must be signed in to change notification settings - Fork 11
Zygote: transposing a block yields DimensionMismatch (or missing adjoint) #170
Comments
Thanks for the issue, this is a problem causes by returning circuit gradients as vector. I made a patch for it, can you check if it solves your issue? #171 |
@GiggleLiu perfect, I have tested a few cases and indeed this patch works for me! #171 Do you recommend I put the |
We will not add this patch to YaoBlocks, because Zygote is very slow in loading and sometimes has version issues. E.g. now tests break on nightly due to using zygote in tests. In the future, we might switch a more correct implementation of constructing Tangent type for the circuit. |
@GiggleLiu ok understood! One remaining issue with the current rrules is the following tiny modification I made to the code:
as compared to the opening of this issue, I only added the Zygote.accum, and moved the C, which is an Add block of chain, into the loss function code. There is even no dependence on parameters, but I get the following error:
Not only is this a current issue, but also in general I would like to differentiate such type of blocks, for example if a parameter (from the perspective of Zygote, not a Yao dispatched param per se) appears in there. Please let me know if you want me to open a new issue as a copy of this comment. |
Hi, I just used the correct tangent type for gradients. It seems to solve your problem, and you do not need the patch anymore. However, there are still some issues unsettled. I am not good at debugging Zygote, if some one can help, that would be great. I posted a WIP PR in #173 |
yes this works for my problem and the accum patch is no longer needed thanks! |
I haven't got time to look into the Zygote issue (just traveling Boston), e.g the one @GiggleLiu posted julia> Zygote.gradient(x->x.content.theta, Daggered(Rx(0.5)))
(nothing,) let me take a second look at this a bit later to fully resolve this issue. But glad to see the main case is fixed |
For certain kinds of WF overlap gradients, one would need to conjugate-transpose a unitary that has appeared before, so one needs both the regular and daggered version to appear in the expression (cost or loss function). However, I get errors when using the daggered version.
A minimal reproducing example with even just 1 block, which for now is the non-daggered version,
U
:In this case, the above loss function computes effectively an expectation value equivalent to
expect(C, psi_0 => U)
. Therefore, expect' and zygote.gradient yield the same result[-0.8912073600614354, -0.8084964038195902]
, as expected.However, if we instead select the conjugate transpose, daggered version, of U, the
expect'
version correctly returns the gradient asBut when I attempt the same in Zygote by setting
psi1 = apply(psi0, U')
we get an error message:I have also tried copying
U
first by settingpsi1 = apply(psi0, copy(U)')
orpsi1 = apply(psi0, copy(U'))
, (because the 'lazy' dagger operation might cause trouble?) but then I think Zygote gets lost tracking parameters across the copied block, because I get another kind of error even if I don't perform the dagger operation but simplypsi1 = apply(psi0, copy(U))
:any thoughts on how to enable daggered blocks with the Zygote patch? Am I taking the wrong approach, or is it simply defining the correct adjoint/rule like @GiggleLiu did for the other blocks like
Add
? is it an issue with double-daggered definitions? Thanks as always!The text was updated successfully, but these errors were encountered: