Computes association test p-values from a generalized linear model for each considered threshold, and computes a p-value for the combination of all the envisioned thresholds through Fisher's method using perturbation resampling.
Arguments
- match_prob
matching probabilities matrix (e.g. obtained through
recordLink
) of dimensionsn1 x n2
.- y
response variable of length
n1
. Only binary phenotypes are supported at the moment.- x
a
matrix
or adata.frame
of predictors of dimensionsn2 x p
. An intercept is automatically added within the function.- covar
a
matrix
or adata.frame
of variables to be adjusted on in the test of dimensionsn3 x p
. Default isNULL
in which case there is no adjustment.- thresholds
a vector (possibly of length
1
) containing the different threshold to use to call a match. Default isseq(from = 0.5, to = 0.95, by = 0.05)
.- nb_perturb
the number of perturbation used for the p-value combination. Default is 200.
- dist_family
a character string indicating the distribution family for the glm. Currently, only
'gaussian'
and'binomial'
are supported. Default is'gaussian'
.- impute_strategy
a character string indicating which strategy to use to impute x from the matching probabilities
match_prob
. Either"best"
(in which case the highest probable match above the threshold is imputed) or"weighted average"
(in which case weighted mean is imputed for each individual who has at least one match with a posterior probability above the threshold). Default is"weighted average"
.
Value
a list containing the following:
influencefn_pvals
p-values obtained from influence function perturbations with the covariates as columns and thethresholds
as rows, with an additional row at the top for the combinationwald_pvals
a matrix containing the p-values obtained from the Wald test with the covariates as columns and thethresholds
as rowsptbed_pvals
a list containing, for each covariates, a matrix with thenb_perturb
perturbed p-values with the differentthresholds
as rowstheta_impute
a matrix of the estimated coefficients from the glm when imputing the weighted average for covariates (as columns) with thethresholds
as rowssd_theta
a matrix of the estimated SD (from the influence function) of the coefficients from the glm when imputing the weighted average for covariates (as columns), with thethresholds
as rowsptbed_theta_impute
a list containing, for each covariates, a matrix with thenb_perturb
perturbed estimated coefficients from the glm when imputing the weighted average for covariates, with the differentthresholds
as rowsimpute_strategy
a character string indicating which impute strategy was used (either"weighted average"
or"best"
)
References
Zhang HG, Hejblum BP, Weber G, Palmer N, Churchill S, Szolovits P, Murphy S, Liao KP, Kohane I and Cai T, ATLAS: An automated association test using probabilistically linked health records with application to genetic studies, JAMIA, in press (2021). doi:10.1093/jamia/ocab187 .
Examples
#rm(list=ls())
n_sims <- 1#5000
mysim <- function(i){
x <- matrix(ncol=2, nrow=99, stats::rnorm(n=99*2))
#plot(density(rbeta(n=1000, 1,2)))
match_prob <- matrix(rbeta(n=103*99, 1, 2), nrow=103, ncol=99)
#y <- rnorm(n=103, mean = 1, sd = 0.5)
#return(atlas(match_prob, y, x, dist_family="gaussian")$influencefn_pvals)
y <- rbinom(n=103, size = 1, prob=0.5)
return(atlas(match_prob, y, x, dist_family="binomial")$influencefn_pvals)
}
#res <- pbapply::pblapply(1:n_sims, mysim, cl = parallel::detectCores()-1)
res <- lapply(1:n_sims, mysim)
size <- sapply(1:(ncol(res[[1]])-2),
FUN = function(i){
rowMeans(sapply(res, function(m){m[, i]<0.05}), na.rm = TRUE)
}
)
rownames(size) <- rownames(res[[1]])
colnames(size) <- colnames(res[[1]])[-(-1:0 + ncol(res[[1]]))]
size
#> (Intercept) x_impute1 x_impute2
#> Combined p-value 0 0 0
#> 0.1 0 0 0
#> 0.3 0 0 0
#> 0.5 0 0 0
#> 0.7 0 0 0
#> 0.9 0 0 0