Skip to content

scenario_list fails with mixture of data types #248

@kgoldfeld

Description

@kgoldfeld

Apologies in advance - I used Claude to help draft this - so it does not sound like me :(. But it does a good job of representing the motivating problem, solution, and potential issues.

Problem

Currently, scenario_list() returns a list of named vectors. This works well when all parameters are the same type, but when parameters have mixed types (e.g., numeric and character), R coerces the entire vector to character. This makes numeric values unusable without manual conversion.

Example:

scenarios <- scenario_list(
  effect = c(0.5, 1.0),      # numeric
  method = c("A", "B")        # character
)

scenarios[[1]]
#   effect   method scenario 
#    "0.5"      "A"      "1"    # effect is now character, not numeric!

Root Cause

The function uses:

  1. expand.grid() - creates a data.frame (handles mixed types correctly)
  2. asplit() - converts data.frame rows to vectors (forces type coercion)

Proposed Solution

Replace asplit() with split(), which creates a list of single-row data.frames instead of vectors. Data.frames naturally preserve mixed types.

Implementation:

# Current (line ~XX):
scenarios <- asplit(argmat, MARGIN = 1)

# Proposed:
scenarios <- split(x = argmat, f = seq_len(nrow(argmat)))

Result:

scenarios[[1]]
#   effect method scenario
#      0.5      A        1   # types preserved!

Breaking Change Concern

This change may break existing user code that converts scenarios to data.tables.

Current pattern (that will break):

argsvec <- scenarios[[1]]  # named vector
summary_stats <- data.table(t(argsvec), power = 0.8)
# t() transposes vector to single row

New pattern (required after change):

argsvec <- scenarios[[1]]  # single-row data.frame
summary_stats <- data.table(argsvec, power = 0.8)
# No transpose needed - more intuitive!

# Alternative:
summary_stats <- as.data.table(argsvec)[, power := 0.8]

With the change, data.table(t(argsvec)) will produce incorrect results because t() on a data.frame creates a matrix with different behavior.

Questions for Discussion

  1. Is this breaking change justified? The current behavior produces incorrect results with mixed types, which seems like a more serious issue than breaking code that uses t().

  2. Migration strategy options:

    • Document as breaking change with clear migration guide in NEWS
    • Add temporary package option for legacy behavior (e.g., options(simstudy.legacy_scenarios = TRUE))
    • Provide helper function (e.g., scenario_to_dt()) for easier conversion
    • Detect and warn when old pattern is used (harder to implement)
  3. Version bump: Should this be a major version bump given the breaking change?

  4. Communication: Update vignettes, add prominent NEWS entry, consider blog post about migration?

Recommendation (from Claude)

Proceed with the change because:

  • Fixes a real bug (incorrect behavior with mixed types)
  • New pattern is more intuitive (no transpose needed)
  • Migration is straightforward for affected users
  • Properly documented breaking change is better than silent incorrect behavior

I am inclined to agree with this approach, because I doubt it affects too many people, maybe only me, but need to think about it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions