Description
This is a follow-up issue after discussing #131
Right now plotting function calculates the small coefficient of "real_graph" relative to random graphs whenever called. That's why it takes a while and that is not ok for plotting functions.
Our discussions led to the 2 ways how to solve it:
-
Calculate
small_world
once and save values as a property of GraphBundle, which means adding a field to class GraphBundle. When plotting small coefficient values, we could check if these values already exist, and do not need to calculate every time small_world. -
Reduce the time needed for calculating small_world values
@KirstieJane asked to show which step takes the most time. Here it is:
Code under the hood
%%time
# Calculate the small coefficient of `gname` relative to each other graph in GraphBundle
bundleGraphs.report_small_world("real_graph")
What report_small_world does is the following:
%%time
global_dict = {}
for name, graph in bundleGraphs.items():
global_dict[name] = small_coefficient(bundleGraphs["real_graph"], graph)
# Calculate the small coefficient of G relative to R
def small_coefficient(G, R):
return small_world_sigma((nx.average_clustering(G),
nx.average_shortest_path_length(G)),
(nx.average_clustering(R),
nx.average_shortest_path_length(R)))
# Compute small world sigma from tuples
def small_world_sigma(tupleG, tupleR):
Cg, Lg = tupleG
Cr, Lr = tupleR
return ((Cg/Cr)/(Lg/Lr))
Time to execute code:
After looking at what report_small_world
essentially does, it is easy to notice that for each graph in pair ("real_graph", "random[i]_graph") we are calculating measures average_clustering and average_shortest_path_length again and again.
That's not nice, cause we already have these measure values stored as properties of a Graph. No need to calculate over and over.
bundleGraphs["<graph-name>"].graph["global_measures"]
So, changing the small_coefficient() to access already available values rather than calculating them again, makes small_world
calculations really fast!
return small_world_sigma((G.graph["global_measures"]["average_clustering"],
G.graph["global_measures"]["average_shortest_path_length"]), # noqa
(R.graph["global_measures"]["average_clustering"],
R.graph["global_measures"]["average_shortest_path_length"])) # noqa
Thanks for reading till the end :)
ps. The issue is so long because my initial goal was to document which part of code takes the most time.