Description
Firstly, thanks again for this package. I'm getting really nice results with it.
Given I'm registering an image stack relative to its first image, I was wondering whether there's any easy opportunities for further parallelization? I say further because I've seen that qd_translate()
does seem to exceed a single thread, which is great, but only reaches ~250% on my 6 core cpu, which restricts the opportunity for per-image for loop threading. Also, might there be any easy way to use CuArrays & GPUs?
For instance, the simplest per-image thread parallelization I could imagine would be something like this. Note that if tforms
has already been calculated for the previous frame, it's used for initial_tfm
, otherwise the default pre-populated (0.0, 0.0)
is used (that's the idea at least):
#obviously you wouldn't actually try to register random noise..
imgstack = map(x->rand(100,100),1:100)
tforms = map(x->RegisterQD.Translation(0.0, 0.0),length(imgstack))
mxshift = (10,10)
t = Threads.@threads for i in 1:length(imgstack)
tforms[i], mm = qd_translate(
imgstack[1],
imgstack[i],
mxshift;
maxevals = 1000,
crop = true,
print_interval = typemax(Int),
initial_tfm = tforms[i-1],
)
end
wait(t)