Skip to content

Commit 665bef1

Browse files
[SNOW-1826257]: Refactor docs to provide one place for supported aggregation functions (#2680)
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1826257 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. - [ ] I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: [Thread-safe Developer Guidelines](https://docs.google.com/document/d/162d_i4zZ2AfcGRXojj0jByt8EUq-DrSHPPnTa4QvwbA/edit#bookmark=id.e82u4nekq80k) 3. Please describe how your code solves the related issue. There are four functions that provide aggregation - DataFrame.agg, Series.agg, DataFrameGroupBy.agg, and SeriesGroupBy.agg. They share similar code paths, so whenever an aggregation function is added, it generally supports more than one of the APIs. We document each API in a different page though, so code authors need to update 3-4 different docs - which can lead to inconsistent docs (in the case that someone forgets to update one or all of the docs). This refactor moves all documentation of supported aggregation functions to one page, which should help keep the docs consistent and correct.
1 parent 2fcf2ff commit 665bef1

File tree

5 files changed

+119
-19
lines changed

5 files changed

+119
-19
lines changed
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
:orphan:
2+
3+
Supported Aggregation Functions
4+
====================================
5+
6+
This page lists which aggregation functions are supported by ``DataFrame.agg``,
7+
``Series.agg``, ``DataFrameGroupBy.agg``, and ``SeriesGroupBy.agg``.
8+
The following table is structured as follows: The first column contains the aggregation function's name.
9+
The second column is a flag for whether or not the aggregation is supported by ``DataFrame.agg``. The
10+
third column is a flag for whether or not the aggregation is supported by ``Series.agg``. The fourth column
11+
is whether or not the aggregation is supported by ``DataFrameGroupBy.agg``. The fifth column is whether or not
12+
the aggregation is supported by ``SeriesGroupBy.agg``.
13+
14+
.. note::
15+
``Y`` stands for yes (supports distributed implementation), ``N`` stands for no (API simply errors out),
16+
and ``P`` stands for partial (meaning some parameters may not be supported yet).
17+
18+
Both Python builtin and NumPy functions are supported for ``DataFrameGroupBy.agg`` and ``SeriesGroupBy.agg``.
19+
20+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
21+
| Aggregation Function | ``DataFrame.agg`` supports? (Y/N/P) | ``Series.agg`` supports? (Y/N/P) | ``DataFrameGroupBy.agg`` supports? (Y/N/P) | ``SeriesGroupBy.agg`` supports? (Y/N/P) |
22+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
23+
| ``count`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
24+
| | For ``axis=1``, ``Y`` if index is | | | |
25+
| | not a MultiIndex. | | | |
26+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
27+
| ``mean`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
28+
| | ``N`` for ``axis=1``. | | | |
29+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
30+
| ``min`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
31+
| | For ``axis=1``, ``Y`` if index is | | | |
32+
| | not a MultiIndex. | | | |
33+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
34+
| ``max`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
35+
| | For ``axis=1``, ``Y`` if index is | | | |
36+
| | not a MultiIndex. | | | |
37+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
38+
| ``sum`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
39+
| | For ``axis=1``, ``Y`` if index is | | | |
40+
| | not a MultiIndex. | | | |
41+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
42+
| ``median`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
43+
| | ``N`` for ``axis=1``. | | | |
44+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
45+
| ``size`` | ``Y`` for ``axis=0``. | ``Y`` | ``Y`` | ``Y`` |
46+
| | ``N`` for ``axis=1``. | | | |
47+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
48+
| ``std`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` |
49+
| | ``ddof=0`` or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. |
50+
| | ``N`` for ``axis=1``. | | | |
51+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
52+
| ``var`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` | ``P`` - only when ``ddof=0`` |
53+
| | ``ddof=0`` or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. | or ``ddof=1``. |
54+
| | ``N`` for ``axis=1``. | | | |
55+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
56+
| ``quantile`` | ``P`` for ``axis=0`` - only when | ``P`` - only when ``q`` is the | ``P`` - only when ``q`` is the | ``P`` - only when ``q`` is the |
57+
| | ``q`` is the default value or | default value or a scalar. | default value or a scalar. | default value or a scalar. |
58+
| | a scalar. | | | |
59+
| | ``N`` for ``axis=1``. | | | |
60+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+
61+
| ``len`` | ``N`` | ``N`` | ``Y`` | ``Y`` |
62+
+-----------------------------+-------------------------------------+----------------------------------+--------------------------------------------+-----------------------------------------+

docs/source/modin/supported/dataframe_supported.rst

Lines changed: 3 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -65,15 +65,9 @@ Methods
6565
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
6666
| ``add_suffix`` | Y | | |
6767
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
68-
| ``agg`` | P | ``margins``, ``observed``, | If ``axis == 0``: ``Y`` when function is one of |
69-
| | | ``sort`` | ``count``, ``mean``, ``min``, ``max``, ``sum``, |
70-
| | | | ``median``, ``size``; ``std`` and ``var`` |
71-
| | | | supported with ``ddof=0`` or ``ddof=1``; |
72-
| | | | ``quantile`` is supported when ``q`` is the |
73-
| | | | default value or a scalar. |
74-
| | | | If ``axis == 1``: ``Y`` when function is |
75-
| | | | ``count``, ``min``, ``max``, or ``sum`` and the |
76-
| | | | index is not a MultiIndex. |
68+
| ``agg`` | P | ``margins``, ``observed``, | Check |
69+
| | | ``sort`` | `Supported Aggregation Functions <agg_supp.html>`_ |
70+
| | | | for a list of supported functions. |
7771
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
7872
| ``aggregate`` | P | ``margins``, ``observed``, | See ``agg`` |
7973
| | | ``sort`` | |

docs/source/modin/supported/groupby_supported.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,9 @@ Function application
3030
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
3131
| GroupBy method | Snowpark implemented? (Y/N/P/D) | Missing parameters | Notes for current implementation |
3232
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
33-
| ``agg`` | P | ``axis`` other than 0 is not | ``Y``, support functions are count, mean, min, max,|
34-
| | | implemented. | sum, median, std, size, len, and var |
35-
| | | | (including both Python and NumPy functions) |
36-
| | | | otherwise ``N``. |
33+
| ``agg`` | P | ``axis`` other than 0 is not | Check |
34+
| | | implemented. | `Supported Aggregation Functions <agg_supp.html>`_ |
35+
| | | | for a list of supported functions. |
3736
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
3837
| ``aggregate`` | P | ``axis`` other than 0 is not | See ``agg`` |
3938
| | | implemented. | |

docs/source/modin/supported/series_supported.rst

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,9 @@ Methods
7676
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
7777
| ``add_suffix`` | Y | | |
7878
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
79-
| ``agg`` | P | | ``Y`` when function is one of ``count``, |
80-
| | | | ``mean``, ``min``, ``max``, ``sum``, ``median``, |
81-
| | | | ``size``; ``std`` and ``var`` supported with |
82-
| | | | ``ddof=0`` or ``ddof=1``; ``quantile`` is |
83-
| | | | supported when ``q`` is the default value |
84-
| | | | or a scalar. |
79+
| ``agg`` | P | | Check |
80+
| | | | `Supported Aggregation Functions <agg_supp.html>`_ |
81+
| | | | for a list of supported functions. |
8582
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+
8683
| ``aggregate`` | P | | See ``agg`` |
8784
+-----------------------------+---------------------------------+----------------------------------+----------------------------------------------------+

tests/integ/modin/groupby/test_groupby_basic_agg.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,54 @@ def test_groupby_agg_with_float_dtypes_named_agg() -> None:
284284
)
285285

286286

287+
@pytest.mark.parametrize(
288+
"grpby_fn",
289+
[
290+
lambda gr: gr.quantile(),
291+
lambda gr: gr.quantile(q=0.3),
292+
],
293+
)
294+
@sql_count_checker(query_count=1)
295+
def test_groupby_agg_quantile_with_int_dtypes(grpby_fn) -> None:
296+
native_df = native_pd.DataFrame(
297+
{
298+
"col1_grp": ["g1", "g2", "g0", "g0", "g2", "g3", "g0", "g2", "g3"],
299+
"col2_int64": np.arange(9, dtype="int64") // 3,
300+
"col3_int_identical": [2] * 9,
301+
"col4_int32": np.arange(9, dtype="int32") // 4,
302+
"col5_int16": np.arange(9, dtype="int16") // 3,
303+
"col6_mixed": np.concatenate(
304+
[
305+
np.arange(3, dtype="int64") // 3,
306+
np.arange(3, dtype="int32") // 3,
307+
np.arange(3, dtype="int16") // 3,
308+
]
309+
),
310+
"col7_int_missing": [5, 6, np.nan, 2, 1, np.nan, 5, np.nan, np.nan],
311+
"col8_mixed_missing": np.concatenate(
312+
[
313+
np.arange(2, dtype="int64") // 3,
314+
[np.nan],
315+
np.arange(2, dtype="int32") // 3,
316+
[np.nan],
317+
np.arange(2, dtype="int16") // 3,
318+
[np.nan],
319+
]
320+
),
321+
}
322+
)
323+
snowpark_pandas_df = pd.DataFrame(native_df)
324+
by = "col1_grp"
325+
snowpark_pandas_groupby = snowpark_pandas_df.groupby(by=by)
326+
pandas_groupby = native_df.groupby(by=by)
327+
eval_snowpark_pandas_result(
328+
snowpark_pandas_groupby,
329+
pandas_groupby,
330+
grpby_fn,
331+
comparator=assert_snowpark_pandas_equals_to_pandas_with_coerce_to_float64,
332+
)
333+
334+
287335
@sql_count_checker(query_count=2)
288336
def test_groupby_agg_with_int_dtypes(int_to_decimal_float_agg_method) -> None:
289337
snowpark_pandas_df = pd.DataFrame(

0 commit comments

Comments
 (0)