Computes precision weights that account for heteroscedasticity in RNA-seq count data based on non-parametric local linear regression estimates.
Arguments
- y
a numeric matrix of size
G x n
containing the raw RNA-seq counts or preprocessed expression fromn
samples forG
genes.- x
a numeric matrix of size
n x p
containing the model covariate(s) fromn
samples (design matrix).- phi
a numeric design matrix of size
n x K
containing the K variable(s) of interest( e.g. bases of time).- use_phi
a logical flag indicating whether conditional means should be conditioned on
phi
and on covariate(s)x
, or onx
alone. Default isTRUE
in which case conditional means are estimated conditionally on bothx
andphi
.- preprocessed
a logical flag indicating whether the expression data have already been preprocessed (e.g. log2 transformed). Default is
FALSE
, in which casey
is assumed to contain raw counts and is normalized into log(counts) per million.- gene_based
a logical flag indicating whether to estimate weights at the gene-level. Default is
FALSE
, when weights will be estimated at the observation-level.- bw
a character string indicating the smoothing bandwidth selection method to use. See
bandwidth
for details. Possible values are'ucv'
,'SJ'
,'bcv'
,'nrd'
or'nrd0'
. Default is'nrd'
.- kernel
a character string indicating which kernel should be used. Possibilities are
'gaussian'
,'epanechnikov'
,'rectangular'
,'triangular'
,'biweight'
,'tricube'
,'cosine'
,'optcosine'
. Default is'gaussian'
(NB:'tricube'
kernel corresponds to the loess method).- transform
a logical flag indicating whether values should be transformed to uniform for the purpose of local linear smoothing. This may be helpful if tail observations are sparse and the specified bandwidth gives suboptimal performance there. Default is
TRUE
.- verbose
a logical flag indicating whether informative messages are printed during the computation. Default is
TRUE
.- na.rm
logical: should missing values (including
NA
andNaN
) be omitted from the calculations? Default isFALSE
.
Value
a list containing the following components:
weights
: a matrixn x G
containing the computed precision weightsplot_utilities
: a list containing the following elements:reverse_trans
: a function encoding the reverse function used for smoothing the observations before computing the weightsmethod
: the weight computation method ("loclin"
)smth
: the vector of the smoothed values computedgene_based
: a logical indicating whether the computed weights are based on average at the gene level or on individual observationsmu
: the transformed observed counts or averagesv
: the observed variability estimates
See also
bandwidth
density