fixes after second review

Seppli11 · Seppli11 · commit 8e7e8338fc4d · 2025-04-07T11:51:26.000+02:00
diff --git a/rules/S7469/python/rule.adoc b/rules/S7469/python/rule.adoc
@@ -8,7 +8,7 @@ In PySpark, a `DataFrame` with duplicate column names can cause ambiguous and un
 * Joins with other DataFrames may produce unexpected results or errors 
 * Saving to external data sources may fail
 
-Case-insensitive duplicates, for example a column named "name" and "Name", are also flagged. This is because having column names that differ only in casing creates confusion when referencing columns and makes code harder to understand and maintain. This can lead to subtle bugs that are difficult to detect and fix.
+Case-insensitive duplicates, for example a column named "name" and "Name", are also flagged. This is because having column names that differ only in casing creates confusion when referencing columns and makes code harder to understand and maintain leading to subtle bugs that are difficult to detect and fix.
 
 == How to fix it
 To fix this issue, remove or rename the duplicate columns.
@@ -41,7 +41,7 @@ df = spark.createDataFrame(data, ["id", "name", "age"]) # Compliant
 
 == Resources
 === Documentation
-- PySpark Documentation - https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html#do-not-use-duplicated-column-names[Best Practices]
+* PySpark Documentation - https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.SparkSession.createDataFrame.html[SparkSession.createDataFrame]
 
 ifdef::env-github,rspecator-view[]
 === Implementation Specification
@@ -89,6 +89,20 @@ nested_schema = StructType([
 
 In addition to that, parts can be passed as variable, instead of literals. This seems to be especially common for schemas.
 
+This rule could also apply to pandas `DataFrame`s, as well as the `DataFrame`s from Pandas API on Spark. However, this would increase the scope of an already big rule. Depending if the implementation raises on pandas (or pandas on spark) `DataFrame`s or not, the rule description should be updated to reflect this. Below are examples of how to construct a pandas `DataFrame` with duplicate column names.
+
+[source,python]
+----
+import pandas as pd
+# the example below also work with pyspark.pandas
+import pyspark.pandas as ps
+
+pd.DataFrame(data=[1, 2], columns=["name", "name"]) # Noncompliant
+pd.DataFrame.from_dict(data={"row_1": [1, 2], "row_2": [3,4]}, orient="index", columns=["name", "name"]) # Noncompliant
+pd.DataFrame.from_records(data=[(3, 'a'), (2, 'b'), (1, 'c'), (0, 'd')], columns=['col', 'col']) # Noncompliant
+----
+
+Documentation for best practices for pandas on spark: https://spark.apache.org/docs/latest/api/python/user_guide/pandas_on_spark/best_practices.html#do-not-use-duplicated-column-names[Best Practices]
 
 === Message