Skip to content

small_world calculations #141

Open
Open
@wingedRuslan

Description

@wingedRuslan

This is a follow-up issue after discussing #131

Right now plotting function calculates the small coefficient of "real_graph" relative to random graphs whenever called. That's why it takes a while and that is not ok for plotting functions.

Our discussions led to the 2 ways how to solve it:

  1. Calculate small_world once and save values as a property of GraphBundle, which means adding a field to class GraphBundle. When plotting small coefficient values, we could check if these values already exist, and do not need to calculate every time small_world.

  2. Reduce the time needed for calculating small_world values
    @KirstieJane asked to show which step takes the most time. Here it is:

Code under the hood

%%time
# Calculate the small coefficient of `gname` relative to each other graph in GraphBundle
bundleGraphs.report_small_world("real_graph")

What report_small_world does is the following:

%%time
global_dict = {}
for name, graph in bundleGraphs.items():
    global_dict[name] = small_coefficient(bundleGraphs["real_graph"], graph)

# Calculate the small coefficient of G relative to R
def small_coefficient(G, R):
    return small_world_sigma((nx.average_clustering(G),
                              nx.average_shortest_path_length(G)),
                             (nx.average_clustering(R),
                              nx.average_shortest_path_length(R)))

# Compute small world sigma from tuples
def small_world_sigma(tupleG, tupleR):
    Cg, Lg = tupleG
    Cr, Lr = tupleR
    return ((Cg/Cr)/(Lg/Lr))

Time to execute code:

image

image

After looking at what report_small_world essentially does, it is easy to notice that for each graph in pair ("real_graph", "random[i]_graph") we are calculating measures average_clustering and average_shortest_path_length again and again.

image

That's not nice, cause we already have these measure values stored as properties of a Graph. No need to calculate over and over.

bundleGraphs["<graph-name>"].graph["global_measures"]

So, changing the small_coefficient() to access already available values rather than calculating them again, makes small_world calculations really fast!

    return small_world_sigma((G.graph["global_measures"]["average_clustering"],
                              G.graph["global_measures"]["average_shortest_path_length"]),  # noqa
                             (R.graph["global_measures"]["average_clustering"],
                              R.graph["global_measures"]["average_shortest_path_length"]))  # noqa

Thanks for reading till the end :)

ps. The issue is so long because my initial goal was to document which part of code takes the most time.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions