What is a Transformation? #2
Description
I was about to start writing code, but I think it's worth having a discussion first. If we're going to realize my eventual vision of autonomous creation of valid "directed graphs of transformations", then we need to approach the problem systematically. Every transformation should have the same type signature, otherwise we're going to have too many special cases for the framework to be useful.
A Transformation should:
- Know what input shape it expects (is it generative? if so, maybe
nothing
is the input. does it always get a scalar? or a vector?) - Know the input domain (can take any numbers? only reals? only probabilities?)
- Likewise, it should know the output shape and domain. (Sigmoid produces probabilities, for example)
Should this info be part of the type signature as parameters? It might have to be, though I'm going to attempt to solve this through "query functions" that produce traits to be dispatched on. If I'm successful, we don't need the wrappers for Distributions at all... we just need the generic:
output_shape(::Distributions.Sampleable{Univariate}) = 0 # scalar
output_shape(::Distributions.Sampleable{Multivariate}) = 1 # vector
# etc
For generative/stochastic transformations (generating distributions):
I wonder if we should think of the input as "randomness". i.e. we take "randomness" as input, and generate rand(t, output_dims)
. I don't know the best way to represent this... maybe just immutable Randomness end
The transform
should, I think, take in nothing, and output the center. Think about if you have a generative process:
T := y = wx + N(mu,sigma)
where T is a Transformation. What does it mean to transform
x into y? I think it probably means to give our best estimate of y given x, so: transform(T, x) == w * x + mu
. If that's the case, then transform(N) == mu
.
What does it mean to generate
an output from T? We're walking into the land of Bayesians here... I started to answer this but need more time to think it through.
What does it mean to learn
the parameters w
? I think: learn(T, x, y)
means to learn w
, mu
, and sigma
in parallel.
Do we agree on these definitions? If so then we need some drastic changes to the first PR.