Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution to sample from when simulating from custom predict #17

Open
sagarsimha opened this issue Oct 21, 2024 · 5 comments
Open

Distribution to sample from when simulating from custom predict #17

sagarsimha opened this issue Oct 21, 2024 · 5 comments

Comments

@sagarsimha
Copy link

sagarsimha commented Oct 21, 2024

Hi,
Very nice work. Thank you for your contribution.
At the step where we feed in "Custom" covariate models, I am interested in providing custom fit and predict functions. I see from the source code that once the predicted values from a custom predict model (mean values for each patient at any given time point) are given out on the simulated data, the code to sample data from (which?) distribution is not present. It is also not clear which distribution it would be sampled from. Or are you expecting the user to feed in the distribution as well based on the assumption the custom model would make on the underlying distribution of the target variable in condideration? In that case, it is not clear from the documentation. For example, in case of "normal", the predicted mean and the variance is used to characterize a normal distribution and patient data (counterfactual covariates) is drawn by sampling from this distribution.

@LilJing
Copy link
Collaborator

LilJing commented Oct 23, 2024

Hi,

Thank you for the question. Yes once the predicted values are derived from the custom predict function, these values will be directly used in the simulation code without sampling. If the predictor has a known distribution, as specified in the pre-defined distributions, e.g., "binary", "normal", etc, the predicted values are drawn from their distribution in the simulation. If the predictor has a custom distribution, it is expected that users include feeding in the distribution in the custom fit function and sample from that distribution to get the sampled values in the custom predict function. If there is no underlying distribution assumption on the predictor, such as using a random forest model, then there is no need to do sampling in the custom predict function, the predicted values will be used directly.

I'll make it clear in the documentation. Feel free to let me know if you have any further questions on this.

Best,
Jing

@sagarsimha
Copy link
Author

sagarsimha commented Oct 31, 2024

Thank you for your answer. In case of random forest model, when there is no assumption about the underlying distribution of the predictor, does it make sense to directly use the mean (predicted) value? The new data simulated for n_simul number of patients would all be the same with no variance? Perhaps I am not understanding something here?

I also had another query. My application requires working with end-of-followup outcomes of variable time length for each patient. Is there a way to work with this in the package. Right now, the package requires all the patients have the same time length.

@LilJing
Copy link
Collaborator

LilJing commented Nov 1, 2024

Hi,
The new simulated data for different patients would not be the same. Each simulated patient id has a different covariate history (saved in new_df) and the model will give different predictions/values with each patient's different covariate history as input.

With a quick look-over, there should be a way to implement the EOF outcome with variable time length. You are free to add this new feature and submit a pull request as well. Alternatively, you can send me more details about your question and the required data structure, I'll review its compatibility with the current version and update the code when I have a moment.

Best,
Jing

@sagarsimha
Copy link
Author

Hi Jing,

Thank you for your prompt replies. I have coded up the feature of EOF outcomes with variable time length. I will submit the new feature as a pull request. Please do review when you have time for it to be merged. Also, can you add the 'Discussions' tab to your project? I would like to discuss some assumptions that I made when developing the feature and also some conceptual questions I wanted to ask. I feel they are more discussions than issues.

Thanks again for a very useful package!

Regards,
Sagar

@LilJing
Copy link
Collaborator

LilJing commented Nov 15, 2024

Hi,
Thank you for contributing the new feature. I'll review the pull request and merge it. Ps. I've opened the tab to allow more discussions.

Best,
Jing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants