I'm just wrighting a bit of code do some performance comparisons with matrices that contain floating point information. Honestly, I just want to see what happens when I can do forward and backward propagation while re-using memory that was allocated in the intermidiate steps, but first I need to figure out a decent implementation of the algorithms that I'm going to use.