Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update clustering.py #608

Merged
merged 4 commits into from
Oct 25, 2024
Merged

Update clustering.py #608

merged 4 commits into from
Oct 25, 2024

Conversation

cosimoagostinelli
Copy link
Contributor

Fixing local_clustering_coefficient function.

I think that the two 'for' cycles are not correct because 'range(dv)' does not identify the hyper-edges to which the node n belongs.
Since we have to go over all the possible pairs of hyperedges (without repetitions) containing the node n, I guess that one possible correct implementation may be: 'for e1, e2 in combinations(ev, 2)'.

Fixing local_clustering_coefficient function
@nwlandry
Copy link
Collaborator

Looks good @cosimoagostinelli! Thanks for the contribution!

@nwlandry
Copy link
Collaborator

The only thing is that can you make sure that the unit tests pass? It looks like they need to be updated.

@maximelucas maximelucas linked an issue Oct 23, 2024 that may be closed by this pull request
@maximelucas
Copy link
Collaborator

The tests failing seem to be in algorithms/test_clustering.py::test_local_clustering_coefficient .
You can see them by clicking "Details" on the right, on this page.
Also if you could add the example (the one you gave to check that it matches the pairwise case) as a test, that would be amazing :)

@cosimoagostinelli
Copy link
Contributor Author

I think that the output given by my version of the function is the correct one, according to the definition proposed in "Properties of metabolic graphs: biological organization or representation artifacts?" (ref. 1 of the local_clustering function) while the test does not match this defeinition.

Indeed, in the test we have 3 nodes and 4 hyperedges: e1=[0,1], e2=[0,2], e3=[1,2], e4=[0,1,2].
Considering the node i, to compute its local hyper clustering coefficient we have to consider all the possible pairs of hyperedges containing i, sum the extra-overlap (EO) between these pairs and then divide by the number of edges containg i.

For instance, for the node 0 we have:

EO(e1,e2) = 1 -- because nodes 1 and 2 are connected through at least another edge.
EO(e1,e4) = 0 -- because all nodes in e1 are also elements of e4.
EO(e2,e4) = 0 -- as before.

Thus, the clustering is given by (1+0+0)/3. This is true for all the nodes as they are equivalent.
So, even if it is counter intuitive, each node of this triangle should have hyper clusterig coefficient 1/3 instead of 1.

@cosimoagostinelli
Copy link
Contributor Author

Does it make sense?
Correct me if I am wrong

@nwlandry
Copy link
Collaborator

Hi @cosimoagostinelli --- absolutely! I agree with you. What I meant was not that your new version of the function is wrong, but rather, could you update the unit tests so that they are also correct and match your new function? Sorry about the confusion.

@cosimoagostinelli
Copy link
Contributor Author

Oh, sorry, I completely misunderstood!
I will update the test as soon as possible. :)

Update test 2 and 6 for local_clutering_coefficient and added a 7th test with respect to pairwise clustering coefficient (networkx).
3: 0.625,
4: 0.5833333333333334,
5: 1.0,
6: 1.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice this make more sense, since 6's two neighbours are connected

}
for n in cc:
assert round(cc[n], 3) == round(true_cc[n], 3)

G = nx.erdos_renyi_graph(50, 0.1, seed=0)
Copy link
Collaborator

@maximelucas maximelucas Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
G = nx.erdos_renyi_graph(50, 0.1, seed=0)
# fix #607 to ensure it matches standard cc in pairwise case
G = nx.erdos_renyi_graph(50, 0.1, seed=0)

just for future reference

@maximelucas
Copy link
Collaborator

maximelucas commented Oct 25, 2024

Mini comment but looks good to me, thanks Cosimo! I'll let Nich approve since I think he wrote this function.

Edit: there's still examples in the docs that are failing the tests. Can you update the example there too?

update docs local_clustering_coefficient
@nwlandry
Copy link
Collaborator

Great work @cosimoagostinelli !! Merging now.

@nwlandry nwlandry self-requested a review October 25, 2024 14:37
@nwlandry nwlandry merged commit e137361 into xgi-org:main Oct 25, 2024
22 checks passed
@cosimoagostinelli
Copy link
Contributor Author

Thank you, happy to contribute!

@maximelucas
Copy link
Collaborator

Congrats on your first PR!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in local_clustering_coefficient function
3 participants