Distribution to sample from when simulating from custom predict #17

sagarsimha · 2024-10-21T11:21:22Z

Hi,
Very nice work. Thank you for your contribution.
At the step where we feed in "Custom" covariate models, I am interested in providing custom fit and predict functions. I see from the source code that once the predicted values from a custom predict model (mean values for each patient at any given time point) are given out on the simulated data, the code to sample data from (which?) distribution is not present. It is also not clear which distribution it would be sampled from. Or are you expecting the user to feed in the distribution as well based on the assumption the custom model would make on the underlying distribution of the target variable in condideration? In that case, it is not clear from the documentation. For example, in case of "normal", the predicted mean and the variance is used to characterize a normal distribution and patient data (counterfactual covariates) is drawn by sampling from this distribution.

LilJing · 2024-10-23T02:33:20Z

Hi,

Thank you for the question. Yes once the predicted values are derived from the custom predict function, these values will be directly used in the simulation code without sampling. If the predictor has a known distribution, as specified in the pre-defined distributions, e.g., "binary", "normal", etc, the predicted values are drawn from their distribution in the simulation. If the predictor has a custom distribution, it is expected that users include feeding in the distribution in the custom fit function and sample from that distribution to get the sampled values in the custom predict function. If there is no underlying distribution assumption on the predictor, such as using a random forest model, then there is no need to do sampling in the custom predict function, the predicted values will be used directly.

I'll make it clear in the documentation. Feel free to let me know if you have any further questions on this.

Best,
Jing

sagarsimha · 2024-10-31T10:43:17Z

Thank you for your answer. In case of random forest model, when there is no assumption about the underlying distribution of the predictor, does it make sense to directly use the mean (predicted) value? The new data simulated for n_simul number of patients would all be the same with no variance? Perhaps I am not understanding something here?

I also had another query. My application requires working with end-of-followup outcomes of variable time length for each patient. Is there a way to work with this in the package. Right now, the package requires all the patients have the same time length.

LilJing · 2024-11-01T03:11:26Z

Hi,
The new simulated data for different patients would not be the same. Each simulated patient id has a different covariate history (saved in new_df) and the model will give different predictions/values with each patient's different covariate history as input.

With a quick look-over, there should be a way to implement the EOF outcome with variable time length. You are free to add this new feature and submit a pull request as well. Alternatively, you can send me more details about your question and the required data structure, I'll review its compatibility with the current version and update the code when I have a moment.

Best,
Jing

sagarsimha · 2024-11-15T14:51:58Z

Hi Jing,

Thank you for your prompt replies. I have coded up the feature of EOF outcomes with variable time length. I will submit the new feature as a pull request. Please do review when you have time for it to be merged. Also, can you add the 'Discussions' tab to your project? I would like to discuss some assumptions that I made when developing the feature and also some conceptual questions I wanted to ask. I feel they are more discussions than issues.

Thanks again for a very useful package!

Regards,
Sagar

LilJing · 2024-11-15T20:09:59Z

Hi,
Thank you for contributing the new feature. I'll review the pull request and merge it. Ps. I've opened the tab to allow more discussions.

Best,
Jing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution to sample from when simulating from custom predict #17

Distribution to sample from when simulating from custom predict #17

sagarsimha commented Oct 21, 2024 •

edited

Loading

LilJing commented Oct 23, 2024

sagarsimha commented Oct 31, 2024 •

edited

Loading

LilJing commented Nov 1, 2024

sagarsimha commented Nov 15, 2024

LilJing commented Nov 15, 2024

Distribution to sample from when simulating from custom predict #17

Distribution to sample from when simulating from custom predict #17

Comments

sagarsimha commented Oct 21, 2024 • edited Loading

LilJing commented Oct 23, 2024

sagarsimha commented Oct 31, 2024 • edited Loading

LilJing commented Nov 1, 2024

sagarsimha commented Nov 15, 2024

LilJing commented Nov 15, 2024

sagarsimha commented Oct 21, 2024 •

edited

Loading

sagarsimha commented Oct 31, 2024 •

edited

Loading