Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposed production-level config for humans #97

Merged
merged 10 commits into from
Oct 10, 2023

Conversation

chriscrsmith
Copy link
Collaborator

Soliciting feedback on the choice of settings in the proposed config. This will be what @lntran26 and I use for the hopefully final simulations in humans without scaling.

Notes:

  • I made a separate config for human. Thought it would be cleaner to break up sims for different species?
  • I think 100 samples is an increase from 20 which I've been using up to this point.
  • add demographic model OutOfAfricaArchaicAdmixture_5R19

@stsmall
Copy link
Contributor

stsmall commented Mar 29, 2023

@chriscrsmith,
I dont recall how to line comment on the commit.
For the demofix, you need to follow the tiny_config.yaml format.
specifically:

"mask_file": "workflows/masks/HapmapII_GRCh37.mask.bed"
# set any of the below to 'none' to skip annot masking
"stairway_annot_mask" : ""
"msmc_annot_mask" : ""
"gone_annot_mask" : ""
"smcpp_annot_mask" : ""
"methods" : ["stairwayplot", "gone", "smcpp", "msmc"]

@stsmall
Copy link
Contributor

stsmall commented Mar 29, 2023

also you want the "num_msmc_iterations" to be at least 20

@stsmall
Copy link
Contributor

stsmall commented Mar 29, 2023

dfe and annotations list need to be the same length

"dfe_list": ["Gamma_H17", "Gamma_H17"]
"annotation_list": ["all_sites", "ensembl_havana_104_exons"]

@stsmall
Copy link
Contributor

stsmall commented Mar 29, 2023

For 'replicates' I think it should be more than 3. Not sure what an upper limit is ... maybe 10 or at least 20?

@chriscrsmith
Copy link
Collaborator Author

Thanks @stsmall . Ok what do you think of the new version? I left reps=3 until we get input from others. 20 sounds like too many to me

@stsmall
Copy link
Contributor

stsmall commented Mar 30, 2023

Thanks @stsmall . Ok what do you think of the new version? I left reps=3 until we get input from others. 20 sounds like too many to me

The plots use seeds (reps) to create CI ribbons. 3 reps will just be noisy. IDK if 20 is too many or not enough, I just picked a number. Since it runs in parallel w/ the reps, shouldnt be too much a slow down to do more, right? We could always add more later, but then would have to rerun the n_t, dfe pipelines on the full dataset.

@stsmall
Copy link
Contributor

stsmall commented Mar 30, 2023

Otherwise it looks good. :)
Do we want to do more variations for msmc2? Right now it is just 6. More haps (maybe the limit is 16?) will provide better resolution of <1000 gens, which is where msmc2 really seems to go awry. The run time will get way longer and even though it is paralleled, it is still the last thing to finish.

@chriscrsmith
Copy link
Collaborator Author

gotcha, the ribbons. 20 reps sounds good

@andrewkern
Copy link
Member

andrewkern commented Apr 4, 2023

i'm a bit concerned about the compute cost of 20 reps up front. The way these runs go, we almost always have to rerun it.
i think we should start with 3 reps -- if that completes in a reasonable time we can generate more reps if we want to. One way we could do this would be to have two seeds -- 1 for the first 3 reps, then a second seed for the next 17 (or whatever number..)

@chriscrsmith
Copy link
Collaborator Author

Otherwise it looks good. :) Do we want to do more variations for msmc2? Right now it is just 6. More haps (maybe the limit is 16?) will provide better resolution of <1000 gens, which is where msmc2 really seems to go awry. The run time will get way longer and even though it is paralleled, it is still the last thing to finish.

If it's already the longest running part of the analysis, I think let's leave for now, update later as needed?

@chriscrsmith
Copy link
Collaborator Author

see new commit: changed genetic map, deleted some unused parameters

I have not done a full run yet, but if I turn on scaling it seems to get off the ground ok.

@chriscrsmith
Copy link
Collaborator Author

There was some talk in the tuesday meeting about potentially doing the Papuan demographic model. What does everyone think?

@RyanGutenkunst
Copy link
Contributor

That would be a flex. :-) I guess we'd assume the DFE was the same in Denisova and Neanderthal as modern humans. We'd lose the easy comparison with the previous paper, but if we run and include the neutral analysis here, that's no problem.

@petrelharp
Copy link
Contributor

Say, @chriscrsmith - could you clarify what exactly is being proposed? Like, is there going to be just one demographic model? Or, more than one? And, what DFE(s)?

@chriscrsmith
Copy link
Collaborator Author

Demog.

  • I imagined at least running the same demographic model from the previous paper, for comparison. So, OutOfAfricaArchaicAdmixture_5R19
  • However we have been using the OutOfAfrica_3G09. Is there something special about this one? Do we leave this model in the analysis.
  • Based on Ryan's feedback I'd lean towards skipping the Papuan model. But was wondering if we should it include it alongside the other one(s).

DFEs

@petrelharp
Copy link
Contributor

I'm still a bit fuzzy here - are we deciding which single model to run, or are we deciding between having 1 or 2 models? Or what? And, concretely, what goes in the paper - is this the demographic model(s) that'll be used for both (a) inferring DFEs and (b) the effect of selection on demographic inference? The same one(s) for both?

@chriscrsmith
Copy link
Collaborator Author

chriscrsmith commented Apr 12, 2023

Demog.

  • I imagined at least running the same demographic model from the previous paper, for comparison. So, OutOfAfricaArchaicAdmixture_5R19
  • However we have been using the OutOfAfrica_3G09. Is there something special about this one? Here are options: 1. Do we leave this model in the analysis? 2. Take it out?
  • Based on Ryan's feedback I'd lean towards skipping the Papuan model. But was wondering if we should it include it alongside the other one(s). Here are options: 1. Do we use this model? 2 Skip this model?

DFEs

@chriscrsmith
Copy link
Collaborator Author

I'm still a bit fuzzy here - are we deciding which single model to run, or are we deciding between having 1 or 2 models?

I imagined at least running the same demographic model from the previous paper.

And, concretely, what goes in the paper - is this the demographic model(s) that'll be used for both (a) inferring DFEs and (b) the effect of selection on demographic inference? The same one(s) for both?

I think that makes sense.

@petrelharp
Copy link
Contributor

I agree about using the same model as in the last paper. There is nothing special (besides being an early model and thus jumping to our minds more easily?) about OutOfAfrica_3G09.

I don't have a good sense about whether we've got room for results about more than one model - that depends on what figures we want?

@chriscrsmith
Copy link
Collaborator Author

Updated the PR to delete the human model we've been using, so it's now replaced with the model from the previous paper.

Here's the relevant post about our plan for the paper: #8

@petrelharp
Copy link
Contributor

Thanks for finding the outline! =) So, current proposal is to just have one model? That seems fine to me, really - unless there's a reason to think that methods might behave differently under some methods than others? But, I guess if we're going to look at different scenarios I'd much rather look at different speices than just different human models. So: I agree!

@petrelharp
Copy link
Contributor

In the meeting just now we decided we can merge this.

@chriscrsmith
Copy link
Collaborator Author

In meeting just now agreed this looks good, minus the Gamma_H17 dfe.

@petrelharp
Copy link
Contributor

@chriscrsmith says merge!

@petrelharp petrelharp merged commit 8ae1c1a into popsim-consortium:main Oct 10, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants