Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define null data simulation function #6

Merged
merged 24 commits into from
Apr 1, 2024

Conversation

jsahrmann
Copy link
Contributor

This PR adds functionality for simulating null data. It adds the following new script and function:

  • R/simulate.null.R/simulate.null()

Unlike in the original code, simulate.null() combines the creation of a 'raw' null transcript with the creation of noise and directly adds the two together rather than creating a matrix of 'raw' transcripts, a matrix of noise, and adding the two; this should reduce memory consumption. It accomplishes this by taking as inputs a vector of transcripts, the code indicating the optimal distribution for this transcript, the corresponding vector of (trimmed) residuals, and the code indicating the optimal distribution for those residuals. It creates the 'raw' null transcript, the noise, and returns the sum of the two.

This PR also incorporates the creation of simulated data into the main detect.outliers() function. It adds a parameter num.null specifying the number of null transcripts to simulate. It samples num.null transcripts from the observed data and passes each one to simulate.null() along with the corresponding residual and the codes indicating the optimal distributions of the transcript and its residual. It combines the null transcripts into a matrix. It also determines the optimal distribution for each null transcript. Finally, it calculates outlier statistics on the null data, and adds quantities to the output list for inspection.

Qualify namespace in function calls in `R/simulate.null.R`.
Add example to function `simulate.null()`, and update the description
of the return value.
Add `num.null` as a parameter to the function `detect.outliers()` to
give users control over how many null transcripts to simulate.
Add code to simulate null transcripts in the function
`detect.outliers()`, and include the resulting matrix in the list of
values to return.
Use `truncnorm::rtruncnorm()` rather than `extraDistr::rtnorm()` in
`simulate.null()`.  The two functions appear to be equivalent, and
doing so avoids adding the package extraDistr as a dependency.
Add code to `simulate.null()` to avoid errors handling data frame
input.
Add script `R/add.residual.noise.R` with first draft of function
`add.residual.noise()`.
Draft an alternative function definition, tentatively called
`simulate.null2()`, for simulating null transcripts.  This function
combines the computations from the original `simulate.null()` function
with those of the new `add.residual.noise()` function, with the
benefit being that we ultimately end up with one `num.null` by
`ncol(data)` matrix rather than two, thereby relieving memory
constraints.
Replace the original definition of the function `simulate.null()` with
the new definition.  Update documentation and examples.
Use the new definition of the function `simulate.null()` in the main
`detect.outliers()` function.
Delete script `R/add.residual.noise.R`, whose functionality has been
incorporated into `R/simulate.null.R`.
Add transcript names to `null.data` in function `detect.outliers()`.
Determine optimal distribution of each null transcript in function
`detect.outliers()`.
Calculate outlier statistics on null data in function
`detect.outliers()`.  To optimize memory use, overwrite the vectors
used for the observed data rather than create new vectors.
Combine outlier statistics from null transcripts into a matrix, and
calculate the rank product for each null transcript.
Add new objects to the list returned by the function
`detect.outliers()`.
Don't compute ranks of null data in function `detect.outliers()`.
Rather, ranks are computed as each observed transcript is added to the
null data.
@jeeyunhan jeeyunhan merged commit adee4c5 into main Apr 1, 2024
2 checks passed
@jsahrmann jsahrmann deleted the jsahrmann-calc-outlier-stats-null-data branch May 28, 2024 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants