Selection paper outline #8

andrewkern · 2021-07-13T16:15:05Z

Hey all-- I'm opening up an issue for us to start bashing away at an outline for the second paper. a particular goal is to have a solid list of the analyses we want to do and then later, delegation of those analyses to particular individuals/groups.

We have a google doc going for the outline here , but it might be preferable to just use this issue and so I've copied that text below

Selection & PopSim
Paper 2

Timeline for selection papers:
Late summer early fall
Companion papers on 1) sweeps & 2) rescaling. Also similar timeline.

Outline of main analyses for main paper:
Comparison of different DFE methods like FitDadi polyDFE, GRAPES (Ryan G’s group & Izabel can work on this). How is demography dealt with? Sample size?
Sweeps! (will be its own companion paper that Andy is leading, but some key results in the main paper).
Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc.
Look at summary stats & power to detect sweeps in human genomes under different demographic models.
Look at power of ML methods
Confounders. Multiple sweeps. Sweeps & BGS.
How do DFE methods perform when sweeps are included?
Selection confounding demographic inference (can recycle a lot of pipelines from paper 1, just running them on models with selection).

What we need to do:
Decide what models to do:
DFE
Sweep
https://github.com/popsim-consortium/analysis2
Implement models
QC
Analyses

######################################################

Brainstorming of ideas for PopSim Selection paper form the call on 6/15 (not all will be in paper):

Comparison of different DFE methods (Ryan G’s group can work on this). How is demography dealt with? Sample size?

Scaling (maybe merits its own paper delving into theory of scaling...might be too ambitious for PopSim paper)
Ideally, PopSim paper will point to this companion paper. PopSim paper will have to mention scaling in some way. PopSim paper could connect it with applications...use guidelines from theory paper to do stuff for a particular organism
3)Do current models of DFEs/annotations in humans predict summaries of genetic variation (spatial pattern of pi, SFS, LD?)? (strength: leverage demographic models from before...annotations, DFE...all the fancy stuff together. Great way to showcase the whole resource! Guidance for how well the field is doing in terms of model adequacy)
What if synonymous (or “neutral sites”) are actually under selection? Does that confound things.
Sweeps! (may be its own paper, but could put some key results in the main paper).
Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc.
Look at summary stats & power to detect sweeps in human genomes under different demographic models.
Look at power of ML methods
Confounders. Multiple sweeps. Sweeps & BGS.
How do DFE methods perform when sweeps are included?
Selection confounding demographic inference
In paper say how stdpopsim can be used to test “your new method” for detecting selection. No one perfect statistic--depends on biology, data, etc.
Try to show an example in the paper from a non-human example.

izabelcavassim · 2022-02-22T22:23:02Z

We have made some decisions in terms of the manuscript's scope (ping me, correct me if I am wrong) based on the discussion we had today (02/22/22) during our biweekly meeting:

PART I

For the demography inference with flavors of selection (background selection)

Using three different software:
- mscm, update it to mscm2 (recombination information based), inclusion by @izabelcavassim
- stairwaiplot (SFS based inference), inclusion by @izabelcavassim
- GONE (LD based inference?), inclusion by @chriscrsmith

These analyses are halfway implemented in our current analyses2 repository specifically in n_t.snake workflow

We also want the multi-population analyses

Using (up to) three different software:
fastsimcoal
dadi
momi2 (conditional on someone being interested in including it into our workflow).
Species for these analyses:
Human, and drosophila (?)

Part II

For the DFE inference excluding the positive portion

Using four different software:
- dadi, inclusion by @xin-huang and @chriscrsmith
- polyDFE, inclusion by @xin-huang and @chriscrsmith
- DFE-alpha, inclusion by @xin-huang and @chriscrsmith
- GRAPES**, inclusion by @xin-huang and @chriscrsmith

I think implementations are almost finished, see analyses2 repository for details, thanks to @andrewkern @petrelharp @mufernando, and others...

Part III

Understanding/simulating beneficial mutations as in a sweep using the positive portion of a DFE

This is still a work in progress, but two things could be evaluated here
methods inference:

"Back-of-the-envelop" power analyses to detect sweeps
How is dadi inference of the negative portion affected by the inclusion of the positive portion
How do the currently/standard used methods predict the simulated sweeps?
Could we think in the multi-population level and simulate across pops and do an Fst analysis on top of it?

As @andrewkern @petrelharp have pointed out, there are multiple features to be added in terms of positive selection, that could either be included in the discussion of this manuscript and implemented in the next paper, or that could be implemented for this paper but not trivial. I would personally vote to simplify and leave it as future work just so we don't lose momentum.

chriscrsmith · 2023-04-13T15:51:49Z

Update based on @izabelcavassim 's previous post:

PART I
Single-population demographic inference methods:

mscm2
stairwaiplot
GONE
SMC++

~~Multi-population demographic inference methods:~~

this was on the to-do list, but I don't think we've talked about it as a group so I've crossed it off.
the plan was to apply: dadi, fastsimcoal, momi2

Mostly complete. Need production sims and final plots.

PART II
DFE inference methods:

dadi
polyDFE
DFE-alpha
GRAPES

Mostly complete. Need production sims and final plots.

PART III
Sweeps:

Want to quantify the effect of BGS on sweep detection.
Compare different sweep methods?
How is dadi inference of negative fitness effects influenced by positive portion?
Analyze divergent selection (between pops)?
(@izabelcavassim had suggested to simplify and leave some of these as future work)

There is work left to do for this aim.

SPECIES:

Human

Demographic history: out of Africa model used in previous paper. (Don't see it as important to use more than one model here, since the paper is about selection?)
DFEs: there are two available, might as well run both and report on any differences since the paper is focused on selection?

Arabidopsis

One demographic history (the one with smallest N_ancestral)
One DFE (only one available)

PLOTS:

main figure(s) conveying our implementation of genome wide selection, and diversity changes along a chromosome
DFE analysis
N_e analysis
sweeps analysis

Plots in each of the above areas could be kept relatively simple and extra information reported in tables or supp mat; or they could get pretty big including panels for different methods, species, and DFEs... TBD

RyanGutenkunst · 2023-04-13T19:53:49Z

It would be nice to do a non-gamma DFE for one of our simulations. Maybe a lognormal, even reaching back to Boyko 2008?

nspope · 2023-04-24T22:32:38Z

Sweeps:

Want to quantify the effect of BGS on sweep detection.

Compare different sweep methods?

How is dadi inference of negative fitness effects influenced by positive portion?

Analyze divergent selection (between pops)?

(@izabelcavassim had suggested to simplify and leave some of these as future work)

There is work left to do for this aim.

What we're set up to do is "compare different sweep detection methods" in terms of FPR/TPR in windows across a chromosome. In particular, there's a working pipeline that uses sweepfinder2 to detect sweeps in windows across a chromosome (under simulated neutral/BGS/BGS+sweep scenarios). There's a start at a similar pipeline for diploshic, but it isn't finished.

So, assuming that what'll go in the paper is a sweepfinder vs diploshic comparison, what remains to be done is:

Finish the diploshic prediction workflow
Merge the diploshic and sweepfinder workflows so they're applied to the same set of simulations and dumped into the same output format
Rule to generate a figure showing FPR/TPR vs position on chromosome, split by sweep detection method
Settle on a demographic model/population (probably easiest to match whatever is used for DFEs)

(I think? Tagging @mufernando and @andrewkern as they're the ones who've put these workflows together.)

This should serve as an illustration of what stdpopsim can do wrt sweeps, so maybe we don't need anything else? The other bullet points, while interesting, seem like a lot of work without clear questions in mind.

petrelharp · 2023-04-25T16:48:20Z

Notes from the meeting: proposal is for the sweeps section, discuss:

compare effect of BGS on sweep detect
ah ha but recombination rate is a more important factor
compare power in different pops
compare two methods
possibly look at how DFE inference works with beneficial mutations, if add beneficial-containing DFE stdpopsim#1469 gets in (i.e., misspecification)

petrelharp · 2023-04-25T17:01:48Z

We also discussed, following @RyanGutenkunst 's comment above, adding a non-gamma DFE for humans, and running the DFE inference pipeline on it: popsim-consortium/stdpopsim#1470

andrewkern · 2024-03-07T14:41:12Z

I started stubbing out a manuscript in a new repo here: https://github.com/popsim-consortium/analysis2_manuscript

I'm planning on starting to the writing today

andrewkern added the help wanted Extra attention is needed label Jul 13, 2021

chriscrsmith mentioned this issue Apr 13, 2023

proposed production-level config for humans #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selection paper outline #8

Selection paper outline #8

andrewkern commented Jul 13, 2021

izabelcavassim commented Feb 22, 2022 •

edited

Loading

chriscrsmith commented Apr 13, 2023 •

edited

Loading

RyanGutenkunst commented Apr 13, 2023

nspope commented Apr 24, 2023

petrelharp commented Apr 25, 2023

petrelharp commented Apr 25, 2023

andrewkern commented Mar 7, 2024

Selection paper outline #8

Selection paper outline #8

Comments

andrewkern commented Jul 13, 2021

izabelcavassim commented Feb 22, 2022 • edited Loading

PART I

Part II

Part III

chriscrsmith commented Apr 13, 2023 • edited Loading

RyanGutenkunst commented Apr 13, 2023

nspope commented Apr 24, 2023

petrelharp commented Apr 25, 2023

petrelharp commented Apr 25, 2023

andrewkern commented Mar 7, 2024

izabelcavassim commented Feb 22, 2022 •

edited

Loading

chriscrsmith commented Apr 13, 2023 •

edited

Loading