Skip to content

SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Apr 3, 2025

You can preview this rule here (updated a few minutes after each push).

Review

A dedicated reviewer checked the rule description successfully for:

  • logical errors and incorrect information
  • information gaps and missing content
  • text style and tone
  • PR summary and labels follow the guidelines

@github-actions github-actions bot added the python label Apr 3, 2025
@joke1196 joke1196 changed the title Create rule S7470 SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" Apr 7, 2025
@joke1196 joke1196 force-pushed the rule/add-RSPEC-S7470 branch 3 times, most recently from 0339673 to c675ef1 Compare April 7, 2025 09:22
Copy link
Contributor

@thomas-serre-sonarsource thomas-serre-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good!
Thanks for giving more details than what was in the first draft!
I left only a suggestion to rewrite a sentence. 💪

Comment on lines 8 to 9
When performing aggregations, data is usually shuffled between partitions of course,
this shuffling and its associated cost are needed to compute the result correctly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When performing aggregations, data is usually shuffled between partitions of course,
this shuffling and its associated cost are needed to compute the result correctly.
When performing aggregations, data is usually shuffled between partitions.
This shuffling is needed to compute the result correctly. It has an associated cost that can impact performance.

joke1196 and others added 3 commits April 8, 2025 09:37
…ion with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\"
@joke1196 joke1196 force-pushed the rule/add-RSPEC-S7470 branch from 48b2e70 to 8a5a51b Compare April 8, 2025 07:38
Copy link

sonarqube-next bot commented Apr 8, 2025

Quality Gate passed Quality Gate passed for 'rspec-tools'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Copy link

sonarqube-next bot commented Apr 8, 2025

Quality Gate passed Quality Gate passed for 'rspec-frontend'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants