SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

github-actions · 2025-04-03T14:40:01Z

You can preview this rule here (updated a few minutes after each push).

Review

A dedicated reviewer checked the rule description successfully for:

logical errors and incorrect information
information gaps and missing content
text style and tone
PR summary and labels follow the guidelines

thomas-serre-sonarsource

Looks very good!
Thanks for giving more details than what was in the first draft!
I left only a suggestion to rewrite a sentence. 💪

thomas-serre-sonarsource · 2025-04-08T07:13:44Z

rules/S7470/python/rule.adoc

+When performing aggregations, data is usually shuffled between partitions of course, 
+this shuffling and its associated cost are needed to compute the result correctly.


Suggested change

When performing aggregations, data is usually shuffled between partitions of course,

this shuffling and its associated cost are needed to compute the result correctly.

When performing aggregations, data is usually shuffled between partitions.

This shuffling is needed to compute the result correctly. It has an associated cost that can impact performance.

…ion with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\"

sonarqube-next · 2025-04-08T07:39:57Z

Quality Gate passed for 'rspec-tools'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

sonarqube-next · 2025-04-08T07:41:06Z

Quality Gate passed for 'rspec-frontend'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

github-actions bot assigned joke1196 Apr 3, 2025

github-actions bot added the python label Apr 3, 2025

joke1196 changed the title ~~Create rule S7470~~ SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" Apr 7, 2025

joke1196 force-pushed the rule/add-RSPEC-S7470 branch 3 times, most recently from 0339673 to c675ef1 Compare April 7, 2025 09:22

joke1196 requested a review from thomas-serre-sonarsource April 7, 2025 09:46

thomas-serre-sonarsource approved these changes Apr 8, 2025

View reviewed changes

joke1196 and others added 3 commits April 8, 2025 09:37

Create rule S7470

5c5e941

Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuct…

858aaa3

…ion with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\"

Fix after review

8a5a51b

joke1196 force-pushed the rule/add-RSPEC-S7470 branch from 48b2e70 to 8a5a51b Compare April 8, 2025 07:38

joke1196 requested a review from jean-jimbo-sonarsource April 8, 2025 08:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

github-actions bot commented Apr 3, 2025

thomas-serre-sonarsource left a comment

thomas-serre-sonarsource Apr 8, 2025

sonarqube-next bot commented Apr 8, 2025

sonarqube-next bot commented Apr 8, 2025

		When performing aggregations, data is usually shuffled between partitions of course,
		this shuffling and its associated cost are needed to compute the result correctly.

SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

Are you sure you want to change the base?

SONARPY-2548: Create rule S7470: PySpark's \"RDD.groupByKey\", when used in conjuction with \"RDD.mapValues\" with a commutative and associative operation, should be replaced by \"RDD.reduceByKey\" #4875

Conversation

github-actions bot commented Apr 3, 2025

Review

thomas-serre-sonarsource left a comment

Choose a reason for hiding this comment

thomas-serre-sonarsource Apr 8, 2025

Choose a reason for hiding this comment

sonarqube-next bot commented Apr 8, 2025

Quality Gate passed for 'rspec-tools'

sonarqube-next bot commented Apr 8, 2025

Quality Gate passed for 'rspec-frontend'