A subsample of the RNA-seq data from Baduel et al. studying Arabidopsis Arenosa physiology.
Usage
data(baduel_5gs)Format
3 objects
design: a design matrix for the 48 measured samples, containing the following variables:SampleNamecorresponding column names fromexpr_norm_corrInterceptan intercept variablePopulationa factor identifying the plant populationAge_weeksnumeric age of the plant at sampling time (in weeks)Replicatea purely technical variable as replicates are not from the same individual over weeks. Should not be used in analysis.Vernalizeda logical variable indicating whether the plant had undergone vernalization (exposition to cold and short day photoperiods)Vernalizeda binary variable indicating whether the plant belonged to the KA populationAgeWeeks_Populationinteraction variable between theAgeWeeksandPopulationvariablesAgeWeeks_Vernalizedinteraction variable between theAgeWeeksandVernalizedvariablesVernalized_Populationinteraction variable between theVernalizedandPopulationvariablesAgeWeeks_Vernalized_Populationinteraction variable between theAgeWeeks,VernalizedandPopulationvariables
baduel_gmt: agmtobject containing 5 gene sets of interest (seeGSA.read.gmt), which is simply alistwith the 3 following components:genesets: alistofngene identifiers vectors composing eachgene set (each gene set is represented as the vector of the gene identifiers composing it)geneset.names: a vector of lengthncontaining the gene set names (i.e. gene sets identifiers)geneset.descriptions: a vector of length
ncontaining gene set descriptions (e.g. textual information on their biological function)
expr_norm_corr: a numeric matrix containing the normalized batch corrected expression for the 2454 genes included in either of the 5 gene sets of interests
References
Baduel P, Arnold B, Weisman CM, Hunter B & Bomblies K (2016). Habitat-Associated Life History and Stress-Tolerance Variation in Arabidopsis Arenosa. Plant Physiology, 171(1):437-51. doi:10.1104/pp.15.01875
Agniel D & Hejblum BP (2017). Variance component score test for time-course gene set analysis of longitudinal RNA-seq data, Biostatistics, 18(4):589-604. doi:10.1093/biostatistics/kxx005 arXiv:1605.02351.
Examples
if(interactive()){
data('baduel_5gs')
set.seed(54321)
KAvsTBG <- dgsa_seq(exprmat=log2(expr_norm_corr+1),
covariates=apply(as.matrix(design[,
c('Intercept', 'Vernalized', 'AgeWeeks', 'Vernalized_Population',
'AgeWeeks_Population'), drop=FALSE]), 2, as.numeric),
variables2test =
as.matrix(design[, c('PopulationKA'), drop=FALSE]),
genesets=baduel_gmt$genesets[c(3,5)],
which_test = 'permutation', which_weights = 'loclin',
n_perm=1000, preprocessed = TRUE)
set.seed(54321)
Cold <- dgsa_seq(exprmat=log2(expr_norm_corr+1),
covariates=apply(as.matrix(design[,
c('Intercept', 'AgeWeeks', 'PopulationKA', 'AgeWeeks_Population'),
drop=FALSE]), 2, as.numeric),
variables2test=as.matrix(design[, c('Vernalized',
'Vernalized_Population')]),
genesets=baduel_gmt$genesets[c(3,5)],
which_test = 'permutation', which_weights = 'loclin',
n_perm=1000, preprocessed = TRUE)
}