-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Apologies in advance - I used Claude to help draft this - so it does not sound like me :(. But it does a good job of representing the motivating problem, solution, and potential issues.
Problem
Currently, scenario_list() returns a list of named vectors. This works well when all parameters are the same type, but when parameters have mixed types (e.g., numeric and character), R coerces the entire vector to character. This makes numeric values unusable without manual conversion.
Example:
scenarios <- scenario_list(
effect = c(0.5, 1.0), # numeric
method = c("A", "B") # character
)
scenarios[[1]]
# effect method scenario
# "0.5" "A" "1" # effect is now character, not numeric!
Root Cause
The function uses:
expand.grid()- creates a data.frame (handles mixed types correctly)asplit()- converts data.frame rows to vectors (forces type coercion)
Proposed Solution
Replace asplit() with split(), which creates a list of single-row data.frames instead of vectors. Data.frames naturally preserve mixed types.
Implementation:
# Current (line ~XX):
scenarios <- asplit(argmat, MARGIN = 1)
# Proposed:
scenarios <- split(x = argmat, f = seq_len(nrow(argmat)))
Result:
scenarios[[1]]
# effect method scenario
# 0.5 A 1 # types preserved!
Breaking Change Concern
This change may break existing user code that converts scenarios to data.tables.
Current pattern (that will break):
argsvec <- scenarios[[1]] # named vector
summary_stats <- data.table(t(argsvec), power = 0.8)
# t() transposes vector to single row
New pattern (required after change):
argsvec <- scenarios[[1]] # single-row data.frame
summary_stats <- data.table(argsvec, power = 0.8)
# No transpose needed - more intuitive!
# Alternative:
summary_stats <- as.data.table(argsvec)[, power := 0.8]
With the change, data.table(t(argsvec)) will produce incorrect results because t() on a data.frame creates a matrix with different behavior.
Questions for Discussion
-
Is this breaking change justified? The current behavior produces incorrect results with mixed types, which seems like a more serious issue than breaking code that uses
t(). -
Migration strategy options:
- Document as breaking change with clear migration guide in NEWS
- Add temporary package option for legacy behavior (e.g.,
options(simstudy.legacy_scenarios = TRUE)) - Provide helper function (e.g.,
scenario_to_dt()) for easier conversion - Detect and warn when old pattern is used (harder to implement)
-
Version bump: Should this be a major version bump given the breaking change?
-
Communication: Update vignettes, add prominent NEWS entry, consider blog post about migration?
Recommendation (from Claude)
Proceed with the change because:
- Fixes a real bug (incorrect behavior with mixed types)
- New pattern is more intuitive (no transpose needed)
- Migration is straightforward for affected users
- Properly documented breaking change is better than silent incorrect behavior
I am inclined to agree with this approach, because I doubt it affects too many people, maybe only me, but need to think about it.