Skip to content

Conversation

@WladRamos
Copy link

This PR introduces a new function, calculate_timestamps, which allows splitting a dataframe into train, validation, and test sets based on percentage inputs. Additionally, the function supports optional preservation of continuous periods, determined by a time tolerance parameter.

Key Features:

  • Percentage-based splits: The function takes percentages for train, validation, and test sets and returns the corresponding timestamps for splitting the data.
  • Period preservation: When the preserve_periods option is enabled, the function ensures that the splits occur at the end of continuous periods, preventing data from being split in the middle of a period.
  • Time tolerance: The function includes a time_tolerance parameter, which defines the minimum gap between observations to be considered as the start of a new period.

Unit Tests:

  • Added comprehensive unit tests to cover:
    • Period preservation behavior.
    • Validation of input percentages (ensuring they sum to 1).
    • Handling of dataframes without a DatetimeIndex.
    • Small time_tolerance cases to ensure gaps are correctly respected.

…splits based on percentages with optional period preservation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant