Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selection paper outline #8

Open
andrewkern opened this issue Jul 13, 2021 · 7 comments
Open

Selection paper outline #8

andrewkern opened this issue Jul 13, 2021 · 7 comments
Labels
help wanted Extra attention is needed

Comments

@andrewkern
Copy link
Member

Hey all-- I'm opening up an issue for us to start bashing away at an outline for the second paper. a particular goal is to have a solid list of the analyses we want to do and then later, delegation of those analyses to particular individuals/groups.

We have a google doc going for the outline here , but it might be preferable to just use this issue and so I've copied that text below


Selection & PopSim
Paper 2

Timeline for selection papers:
Late summer early fall
Companion papers on 1) sweeps & 2) rescaling. Also similar timeline.

Outline of main analyses for main paper:
Comparison of different DFE methods like FitDadi polyDFE, GRAPES (Ryan G’s group & Izabel can work on this). How is demography dealt with? Sample size?
Sweeps! (will be its own companion paper that Andy is leading, but some key results in the main paper).
Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc.
Look at summary stats & power to detect sweeps in human genomes under different demographic models.
Look at power of ML methods
Confounders. Multiple sweeps. Sweeps & BGS.
How do DFE methods perform when sweeps are included?
Selection confounding demographic inference (can recycle a lot of pipelines from paper 1, just running them on models with selection).

What we need to do:
Decide what models to do:
DFE
Sweep
https://github.com/popsim-consortium/analysis2
Implement models
QC
Analyses

######################################################

Brainstorming of ideas for PopSim Selection paper form the call on 6/15 (not all will be in paper):

Comparison of different DFE methods (Ryan G’s group can work on this). How is demography dealt with? Sample size?

Scaling (maybe merits its own paper delving into theory of scaling...might be too ambitious for PopSim paper)
Ideally, PopSim paper will point to this companion paper. PopSim paper will have to mention scaling in some way. PopSim paper could connect it with applications...use guidelines from theory paper to do stuff for a particular organism
3)Do current models of DFEs/annotations in humans predict summaries of genetic variation (spatial pattern of pi, SFS, LD?)? (strength: leverage demographic models from before...annotations, DFE...all the fancy stuff together. Great way to showcase the whole resource! Guidance for how well the field is doing in terms of model adequacy)
What if synonymous (or “neutral sites”) are actually under selection? Does that confound things.
Sweeps! (may be its own paper, but could put some key results in the main paper).
Implement sweep models from literature. Make a model in StdPopSim “recurrent_sweeps”. Can put this model with different demographics & rec rates, etc.
Look at summary stats & power to detect sweeps in human genomes under different demographic models.
Look at power of ML methods
Confounders. Multiple sweeps. Sweeps & BGS.
How do DFE methods perform when sweeps are included?
Selection confounding demographic inference
In paper say how stdpopsim can be used to test “your new method” for detecting selection. No one perfect statistic--depends on biology, data, etc.
Try to show an example in the paper from a non-human example.

@andrewkern andrewkern added the help wanted Extra attention is needed label Jul 13, 2021
@izabelcavassim
Copy link
Member

izabelcavassim commented Feb 22, 2022

We have made some decisions in terms of the manuscript's scope (ping me, correct me if I am wrong) based on the discussion we had today (02/22/22) during our biweekly meeting:

PART I

For the demography inference with flavors of selection (background selection)

  • Using three different software:
    • mscm, update it to mscm2 (recombination information based), inclusion by @izabelcavassim

These analyses are halfway implemented in our current analyses2 repository specifically in n_t.snake workflow

We also want the multi-population analyses

  • Using (up to) three different software:
  • fastsimcoal
  • dadi
  • momi2 (conditional on someone being interested in including it into our workflow).
    Species for these analyses:
    Human, and drosophila (?)
Part II

For the DFE inference excluding the positive portion

I think implementations are almost finished, see analyses2 repository for details, thanks to @andrewkern @petrelharp @mufernando, and others...

Part III

Understanding/simulating beneficial mutations as in a sweep using the positive portion of a DFE

This is still a work in progress, but two things could be evaluated here
methods inference:

  • "Back-of-the-envelop" power analyses to detect sweeps
  • How is dadi inference of the negative portion affected by the inclusion of the positive portion
  • How do the currently/standard used methods predict the simulated sweeps?
  • Could we think in the multi-population level and simulate across pops and do an Fst analysis on top of it?

As @andrewkern @petrelharp have pointed out, there are multiple features to be added in terms of positive selection, that could either be included in the discussion of this manuscript and implemented in the next paper, or that could be implemented for this paper but not trivial. I would personally vote to simplify and leave it as future work just so we don't lose momentum.

@chriscrsmith
Copy link
Collaborator

chriscrsmith commented Apr 13, 2023

Update based on @izabelcavassim 's previous post:


PART I
Single-population demographic inference methods:

  • mscm2
  • stairwaiplot
  • GONE
  • SMC++

Multi-population demographic inference methods:

  • this was on the to-do list, but I don't think we've talked about it as a group so I've crossed it off.
  • the plan was to apply: dadi, fastsimcoal, momi2

Mostly complete. Need production sims and final plots.


PART II
DFE inference methods:

  • dadi
  • polyDFE
  • DFE-alpha
  • GRAPES

Mostly complete. Need production sims and final plots.


PART III
Sweeps:

  • Want to quantify the effect of BGS on sweep detection.
  • Compare different sweep methods?
  • How is dadi inference of negative fitness effects influenced by positive portion?
  • Analyze divergent selection (between pops)?
  • (@izabelcavassim had suggested to simplify and leave some of these as future work)

There is work left to do for this aim.


SPECIES:

Human

  • Demographic history: out of Africa model used in previous paper. (Don't see it as important to use more than one model here, since the paper is about selection?)
  • DFEs: there are two available, might as well run both and report on any differences since the paper is focused on selection?

Arabidopsis

  • One demographic history (the one with smallest N_ancestral)
  • One DFE (only one available)


PLOTS:

  • main figure(s) conveying our implementation of genome wide selection, and diversity changes along a chromosome
  • DFE analysis
  • N_e analysis
  • sweeps analysis

Plots in each of the above areas could be kept relatively simple and extra information reported in tables or supp mat; or they could get pretty big including panels for different methods, species, and DFEs... TBD

@RyanGutenkunst
Copy link
Contributor

It would be nice to do a non-gamma DFE for one of our simulations. Maybe a lognormal, even reaching back to Boyko 2008?

@nspope
Copy link
Contributor

nspope commented Apr 24, 2023

Sweeps:

  • Want to quantify the effect of BGS on sweep detection.
  • Compare different sweep methods?
  • How is dadi inference of negative fitness effects influenced by positive portion?
  • Analyze divergent selection (between pops)?
  • (@izabelcavassim had suggested to simplify and leave some of these as future work)

There is work left to do for this aim.

What we're set up to do is "compare different sweep detection methods" in terms of FPR/TPR in windows across a chromosome. In particular, there's a working pipeline that uses sweepfinder2 to detect sweeps in windows across a chromosome (under simulated neutral/BGS/BGS+sweep scenarios). There's a start at a similar pipeline for diploshic, but it isn't finished.

So, assuming that what'll go in the paper is a sweepfinder vs diploshic comparison, what remains to be done is:

  • Finish the diploshic prediction workflow
  • Merge the diploshic and sweepfinder workflows so they're applied to the same set of simulations and dumped into the same output format
  • Rule to generate a figure showing FPR/TPR vs position on chromosome, split by sweep detection method
  • Settle on a demographic model/population (probably easiest to match whatever is used for DFEs)

(I think? Tagging @mufernando and @andrewkern as they're the ones who've put these workflows together.)

This should serve as an illustration of what stdpopsim can do wrt sweeps, so maybe we don't need anything else? The other bullet points, while interesting, seem like a lot of work without clear questions in mind.

@petrelharp
Copy link
Contributor

Notes from the meeting: proposal is for the sweeps section, discuss:

  • compare effect of BGS on sweep detect
  • ah ha but recombination rate is a more important factor
  • compare power in different pops
  • compare two methods
  • possibly look at how DFE inference works with beneficial mutations, if add beneficial-containing DFE stdpopsim#1469 gets in (i.e., misspecification)

@petrelharp
Copy link
Contributor

We also discussed, following @RyanGutenkunst 's comment above, adding a non-gamma DFE for humans, and running the DFE inference pipeline on it: popsim-consortium/stdpopsim#1470

@andrewkern
Copy link
Member Author

I started stubbing out a manuscript in a new repo here: https://github.com/popsim-consortium/analysis2_manuscript

I'm planning on starting to the writing today

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

6 participants