-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
code qualityRelated to cleaning up the code base, rather than bug fixing or adding new featuresRelated to cleaning up the code base, rather than bug fixing or adding new featuresenhancementNew feature or requestNew feature or request
Description
Right now, the only input this tool accepts is tabular CSV-like files (csv
, tsv
, and so on). Given that, under the hood, its just Pandas loading and managing the data, it should be pretty simple to change the code to extend this to be any data type Pandas supports, giving users more options.
What will be trickier is the output: the SQLite Database was chosen as the default for a number of reasons:
- By making the output of each data, model, and study config its own unique table, we can allow the user to re-use a study config file without modification (as the study config dictates what file to save the results of the analysis too)
- The Pandas interface for it handles asynchronous writes to the output DB, allowing for multiple studies to be run in parallel and still write to the same file (though the write isn't truly asynchronous, instead being a queue with a timeout; this means running LOTS of studies in parallel can result in some crashing out due to waiting too long)
- Having all results saved to a single file makes moving them around (i.e. from a computer cluster to a local computer) much easier; no need to zip the results, or synchronize multiple files.
As such, changing the format of the output will likely need to be much more involved:
- For excel, combining a file lock with sub-sheets (one per combination of data, model, and study config) could provide similar attributes
- JSON output, cursed as it may be, could also be done in a similar manner. Don't think its smart to provide this native, though...
- Not a clue how we could make this work with CSV-like data without introducing some horrifically painful sync issues. Maybe add post-run utilities to reformat an SQLite database into CSV-like post-run?
Metadata
Metadata
Assignees
Labels
code qualityRelated to cleaning up the code base, rather than bug fixing or adding new featuresRelated to cleaning up the code base, rather than bug fixing or adding new featuresenhancementNew feature or requestNew feature or request