I can see the implementation computes KL div between q(z|x,c) and p(z)=N(0,I) but the original formulation has conditional prior p(z|c). Can you explain why you are still using zero mean unit Gaussian prior?