-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double-check function invocation overhead for NumPy and PyData/Sparse #19
Comments
As per the numpy documentation:
|
When we benchmarked sparse tensor addition in TensorFlow (which also implements kernels in compiled code) for a previous paper, what we found was that maybe only ~10% of the runtime was actually spent on the addition while the rest was split between validating the inputs and actually dispatching the compiled code. NumPy and PyData/Sparse might not have similar issues, but that's something we might want to double check at some point. |
I see. We should definitely check then. |
I was talking to Fred about this on Thursday and we both agreed that we definitely need to check this. It will be useful for the analysis of "Why is TACO faster than Numpy/PyData?" and "What're the causes of the speedup and what's the breakdown contributing to the speedup/slowdown of the systems?" etc. We believe that since PyData/Sparse is still using the dense Numpy calls, it's spending most of the time breaking down the tensor into dense vectors for each iteration space. That code is currently (most likely) implemented in Python so we need to be able to pinpoint what PyData/Sparse is doing (i.e. how much of it is due to Python interpretation, how much is caused by the splitting/re-merging algorithm, and how much is due to the Numpy function call). |
I looked into this a little bit. Here is the flamegraph for running logical_xor (rename the file to .svg and then view it in your browser): It doesn't look like there is any sort of interpretation overhead. pydata/sparse spends most of its time in some list comprehension, followed by jit compiling and calling into some generated code via numba. I'm not very familiar with how pydata/sparse actually does the sparse ufunc operation, so I can't attempt to match these up. |
At some point (if feasible), it might be a good idea to measure how much overhead NumPy and PyData/Sparse incur when functions are invoked. That way, we can verify that any observed difference between NumPy and TACO is not just due to the overhead of interpreting Python functions or something trivial like that. This might require modifying NumPy's and PyData/Sparse's source code to insert some timers inside the functions we are benchmarking.
The text was updated successfully, but these errors were encountered: