-
-
Notifications
You must be signed in to change notification settings - Fork 31
[WIP] : improve current timing script #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to show text for both the time speedup and the time itself?
I often care more about speed-ups when the length of time is large -- the current values shown make it hard to see how the length of time changes between 50 and 300 nodes. Should I care that parallel is slower for 50 nodes?
Any ideas about why it would be slower for 50 nodes? It is doing all n_jobs=-1
processors -- what is making it slower?
We could display both of them by making an annotation dataframe that stores formatted strings. Something like this:
Another way could be to create two dataframes (one for the parallel run time and another for speed-ups) and overlaying one over the other. For our use case, I think the annotation dataframe would be much simpler.
![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why the speed-up with 500 nodes is less than with 300 nodes. Could you please help me understand this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long does it take to run and create this tiing picture? If it isn't very long, try running it a few times and see if the dip in performance between 300 and 500 goes away.
If the results change from run to run, we should try to pick "better" parameters that don't show those slowdowns. Most computers these days have background processes running that can imapct timing scripts. To see through the haze of such noise, we could try multiple runs and take the minimum time from those runs. Another approach is to choose larger problems so the small differences from background processes looks more like noise.
It's also nice to repeatedly double the parameter (or multiply by a constant factor other than 2, like 10). That allows you to see whether the time is increasing like n^2 or n^3. Doubling nodes will take ~8 times longer for O(n^3). Of course, the big-O is only valid in the limit as n-> infinity, so take it all with a grain of salt. But if it is consistently a factor of 8 longer when doubling, you can be fairly constant it is running an O(n^3) code.
In this case, since we don't care so much about graphs with 50 nodes, and we do care about large graphs. We should probably increase the number of nodes. Maybe start with 250 nodes and double it up to 4000 for this picture. Lots of problems should really be run on 10_000-100_000 nodes. But that would take a long time to do the timing... :)
I guess the point of these plots is to show the impact of parallel code. So the best results would show a factor near 2 for njobs=2 and near 5 for njobs=5. And the title or caption should say what njobs was for the experiment.
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dschult Thank you for your insights!
How long does it take to run and create this tiing picture? If it isn't very long, try running it a few times and see if the dip in performance between 300 and 500 goes away.
It didn’t take long to run, so I executed it a few more times to check if the dip was just a one-off. What I observed was that the speedup would initially peak, but then wouldn't yield any speedup for larger number of nodes.
After some debugging, I found the issue was in the is_reachable
function, where num_in_chunk
was being passed instead of n_jobs
:
num_in_chunk = max(len(G) // n_jobs, 1)
node_chunks = nxp.chunks(G, num_in_chunk)
needs to be replaced with:
node_chunks = nxp.chunks(G, n_jobs)
This is being addressed in PR#112
After applying that fix, I ran it 5 times and considered the minimum time from these runs for both is_reachable
(ref. heatmap) and betweenness_centrality
(ref. heatmap). However, I still observe a decline in speedup at higher node counts. Could this be due to system limitations?
System details:
MacBook Pro 13-inch M2
macOS Sequoia 15.5
Apple M2 Chip
Memory 8GB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!! So good to hear you found one fix. Too bad there are still some mysteries.
Let's see -- are we running into memory issues? (probably not.... 1600 nodes -> 2.5M edges for the complete graph (prob=1)... each takes about 32-128 bytes so I think the 8GB memory would be OK unless the data has to be copied many times in the parallel version. I think they share memory, but I'm not sure.
Let's hope it is another bug. One that involves code that affects both of these functions. I guess start in the utility functions that are called in both functions.
d12a986
to
6174eb3
Compare
Hi @dschult, I had previously added another commit to move numpy to the test dependencies, but I think it might not have been pulled. Would you like me to reapply that? |
Yes -- I'm very sorry to have lost that commit to the PR. I didn't pull the latest changes when I rebased on the main branch. It all looks good now that you reapplied that. 🚀 |
In the recent commit, I tried to implement memmapping in I replaced NetworkX's graph object with a NumPy-based adjacency matrix. This is because the problem involved making multiple copies of the input graph for each worker. The heatmap obtained gives speedups ranging from 11x to 25x with this change, view heatmap. Note : the current implementation assumes that the graph nodes are integers — I plan to make this more dynamic soon. One potential drawback is the memory usage in case of sparse graphs. |
Interesting that they have special tools for numpy arrays, but not for python objects. nodelist = list(G) # map of integers to nodes
nodemap = {n: i for i, n in enumerate(nodelist)} Those are good speeds -- so it looks like memory might be an issue more generally too. It might be possible to use a numpy matrix in a more memory efficient way than the adjacency matrix. Another approach would be to use a scipy sparse array. So, there are lighter memory ways to handle adjacencies. But let's not worry about that yet. Do I understand the memmap model? The array is written to a file and then each process has read access to that file. So it uses hard disk space instead of memory to share the data. It looks like there is a Python standard library version of this called mmap. I wonder if that works well with dicts. But numpy is very fast so it might be better to just stick with Very cool! |
I like this idea of using an NxM numpy array as an adjacency list -- would be an efficient way of expressing it. I'll work my way with this. Thanks for the insights :)
Yep, that’s pretty much it! |
+1 for moving the adjacency stuff to a separate PR. :) |
Do we still need |
I think we can delete that file too. |
This reverts commit 472d395.
a82b405
to
f45627d
Compare
timing/timing_all_functions.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we should remove this file-- because we do wish to someday have a heatmap (or some other visual) to showcase all(or most) of the algorithms' speedups. LMKWYT. Also, we still have the heatmap generated by this script - https://github.com/networkx/nx-parallel/blob/main/timing/heatmap_all_functions.png
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we manage the heatmap as the number of algorithms grow? Would we do something like regenerating this heatmap every time a new algorithm is added? I'm just wondering what the long-term plan may possibly be!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant something like this - https://github.com/python-graphblas/graphblas-algorithms/blob/main/README.md -- doesn't have to be a heatmap
we don't have to showcase all algorithms-- only 1-2 algorithms of each kind(like shortest paths, centrality measures, etc) and on real world networks.
timing/timing_comparison.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this file(and populate it with all other heatmaps) because it contains important info like the machine's specifications and the heatmap results. I think here we can also mention other software specifications like backend="loky"
, etc.
maybe we should rename it to README.md
(in the timing
folder)-- so that it's visible as soon as someone opens the folder.
Also, currently, not all heatmaps are run on the same machine. These are very old machine specifications. LMKWYT.
# Default Config | ||
joblib.parallel_config(n_jobs=-1) | ||
# To use NetworkX's parallel backend, set the following configuration. | ||
# nx.config.backends.parallel.active = True | ||
# nx.config.backends.parallel.n_jobs = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for these comments-- We expect that the reader of timing script knows how to use/configure nx-parallel. LMKWYT.
# Default Config | |
joblib.parallel_config(n_jobs=-1) | |
# To use NetworkX's parallel backend, set the following configuration. | |
# nx.config.backends.parallel.active = True | |
# nx.config.backends.parallel.n_jobs = -1 | |
# Setting configs: Using all CPU cores | |
joblib.parallel_config(n_jobs=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn’t need this anymore since we’re already updating the NetworkX configs to -1, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but that PR is not merged yet-- what merge order of these PRs would you prefer? should we merge this PR second?
stdTime = t2 - t1 | ||
tournament_funcs = ["is_reachable", "tournament_is_strongly_connected"] | ||
bipartite_funcs = ["node_redundancy"] | ||
random.seed(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set the seed
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had included it for reproducibility, but happy to remove it if there's no need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but why do we need a global seed? is it being used by any function(s) below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using "line"(top line, bottom line) is a bit confusing-- because usually in the context of graphs(not networks) a line might mean line graph. So, "value" or "number" might be more apt. or we can also color code the values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried running this script and I got a lot of findfont: Font family 'Arial Rounded MT Bold' not found.
at the end
also for betweenness_centrality
it ran once and then it started running again--
(nxp-dev) @Schefflera-Arboricola ➜ /nx-parallel (improve-timing) $ python timing/timing_individual_function.py
200
0.24375876500016602
0.4948986529998365
Finished <function betweenness_centrality at 0x7173ebb4d440>
400
2.0439088319999428
3.804219898000156
Finished <function betweenness_centrality at 0x7173ebb4d440>
800
17.221894170999803
29.955433169000116
Finished <function betweenness_centrality at 0x7173ebb4d440>
1600
139.0610084220002
237.87486764899995
Finished <function betweenness_centrality at 0x7173ebb4d440>
200
0.2752337769998121
0.5823823759992592
Finished <function betweenness_centrality at 0x7173ebb4d440>
400
2.3135327339996365
4.701841838999826
Finished <function betweenness_centrality at 0x7173ebb4d440>
800
19.94758787899991
...
Please make sure if the script runs with all kinds of algorithms i.e. all code blocks run fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is because for algorithms like betweenness centrality, the script checks it for multiple probabilities of nodes, i.e., p = [1, 0.8, 0.6, 0.4, 0.2]
, which isn't the case for tournament graphs (where edge probability is always 1). I’m assuming this is what you meant when you said it started running again! Pls let me know if I’ve misunderstood anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thank you for clarifying -- it would be helpful to print out the edge probability in the output
ax.set_xlabel("Number of Vertices", fontweight="bold", fontsize=12) | ||
ax.set_ylabel("Edge Probability", fontweight="bold", fontsize=12) | ||
|
||
n_jobs = nxp.get_n_jobs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd recommend using get_active_backend()[1]
instead of get_n_jobs
here-- bcoz the first one gives the n_jobs that joblib has stored and used while running the code(and we are using joblib's config system in this script).
This reverts commit f45627d.
This reverts commit 924bc27.
Co-authored-by: Aditi Juneja <[email protected]>
improves #51