Skip to content

slice_min() differs from dplyr behavior in the presence of NULLs #1599

@MichaelChirico

Description

@MichaelChirico
conn |>
  tbl(sql("SELECT 1 AS a UNION ALL SELECT NULL as a UNION ALL SELECT 3 AS a")) |>
  slice_min(a, n=2)
# # Source:   SQL [?? x 1]
# # Database: MyConnection
#         a
#   <int64>
# 1      NA
# 2       1

vs.

data.frame(a = c(1, NA, 3)) |>
  slice_min(a, n=2)
#   a
# 1 1
# 2 3

This seems at odds with the documentation where na_rm=TRUE is the intended default. I don't see anything about NULL in the generated SQL, either:

conn |>
  tbl(sql("SELECT 1 AS a UNION ALL SELECT NULL as a UNION ALL SELECT 3 AS a")) |>
  slice_min(a, n=2) |>
  show_query()
# <SQL>
# SELECT `a`
# FROM
#   (
#     SELECT `q01`.*, RANK() OVER (ORDER BY `a`) AS `col01`
#     FROM
#       (SELECT 1 AS a UNION ALL SELECT NULL AS a UNION ALL SELECT 3 AS a) `q01`
#   ) `q01`
# WHERE (`col01` <= 2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions