This package provides a thin wrapper around Rlabkey
and connects to the ImmuneSpace database, making it easier to fetch datasets, including gene expression data, HAI, and so forth, from specific studies.
In order to connect to ImmuneSpace, you will need a .netrc
file in your home directory that will contain a machine
name (hostname of ImmuneSpace), and login
and password
. See here for more information.
A netrc file may look like this:
machine www.immunespace.org
login [email protected]
password supersecretpassword
Set up your netrc file now!
Put it in your home directory. If you type:
ls ~/.netrc
at the command prompt, you should see it there. If it’s not there, create one now. Make sure you have a valid login and password. If you don’t have one, go to ImmuneSpace now and set yourself up with an account.
We’ll be looking at study SDY269
. If you want to use a different study, change that string. The connections have state, so you can instantiate multiple connections to different studies simultaneously.
library(ImmuneSpaceR)
sdy269 <- CreateConnection(study = "SDY269")
sdy269
## <ImmuneSpaceConnection>
## Study: SDY269
## URL: https://www.immunespace.org/Studies/SDY269
## User: unknown_user at not_a_domain.com
## 9 Available Datasets
## - demographics
## - elispot
## - hai
## - fcs_analyzed_result
## - cohort_membership
## - pcr
## - elisa
## - gene_expression_files
## - fcs_sample_files
## 2 Available Expression Matrices
The call to CreateConnection
instantiates the connection. Printing the object shows where it’s connected, to what study, and the available data sets and gene expression matrices.
Note that when a script is running on ImmuneSpace, some variables set in the global environments will automatically indicate which study should be used and the study
argument can be skipped.
We can grab any of the datasets listed in the connection.
sdy269$getDataset("hai")
## participant_id age_reported gender race
## 1: SUB112868.269 37 Male White
## 2: SUB112869.269 41 Male White
## 3: SUB112841.269 28 Female White
## 4: SUB112836.269 28 Female White
## 5: SUB112844.269 29 Female White
## ---
## 332: SUB112847.269 24 Male White
## 333: SUB112877.269 27 Male White
## 334: SUB112860.269 32 Female White
## 335: SUB112879.269 35 Female White
## 336: SUB112871.269 46 Female Black or African American
## cohort study_time_collected study_time_collected_unit
## 1: LAIV group 2008 28 Days
## 2: LAIV group 2008 28 Days
## 3: TIV Group 2008 0 Days
## 4: LAIV group 2008 28 Days
## 5: LAIV group 2008 28 Days
## ---
## 332: TIV Group 2008 0 Days
## 333: TIV Group 2008 0 Days
## 334: LAIV group 2008 0 Days
## 335: TIV Group 2008 0 Days
## 336: TIV Group 2008 28 Days
## virus value_preferred
## 1: A/Uruguay/716/2007 5
## 2: B/Florida/4/2006 5
## 3: A/Uruguay/716/2007 5
## 4: A/South Dakota/06/2007 5
## 5: A/Uruguay/716/2007 20
## ---
## 332: B/Brisbane/03/2007 320
## 333: A/Brisbane/59/2007 5
## 334: A/Uruguay/716/2007 20
## 335: B/Brisbane/03/2007 5
## 336: B/Brisbane/03/2007 5
The sdy269 object is an R6 class, so it behaves like a true object. Methods (like getDataset
) are members of the object, thus the $
semantics to access member functions.
The first time you retrieve a data set, it will contact the database. The data is cached locally, so the next time you call getDataset
on the same dataset, it will retrieve the cached local copy. This is much faster.
To get only a subset of the data and speed up the download, filters can be passed to getDataset
. The filters are created using the makeFilter
function of the Rlabkey
package.
library(Rlabkey)
myFilter <- makeFilter(c("gender", "EQUAL", "Female"))
hai <- sdy269$getDataset("hai", colFilter = myFilter)
See ?Rlabkey::makeFilter
for more information on the syntax.
For more information about getDataset
’s options, refer to the dedicated vignette.
We can also grab a gene expression matrix
sdy269$getGEMatrix("LAIV_2008")
## Downloading matrix..
## Constructing ExpressionSet
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 16442 features, 83 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: BS586175 BS586212 ... BS586178 (83 total)
## varLabels: study_time_collected study_time_collected_unit ...
## biosample_accession (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: DDR1 RFC2 ... NUS1P3 (16442 total)
## fvarLabels: FeatureId gene_symbol
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
The object contacts the database and downloads the matrix file. This is stored and cached locally as a data.table
. The next time you access it, it will be much faster since it won’t need to contact the database again.
It is also possible to call this function using multiple matrix names. In this case, all the matrices are downloaded and combined into a single ExpressionSet
.
sdy269$getGEMatrix(c("TIV_2008", "LAIV_2008"))
## Downloading matrix..
## returning summary matrix from cache
## returning latest annotation from cache
## Constructing ExpressionSet
## Constructing ExpressionSet
## Combining ExpressionSets
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 16442 features, 163 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: BS586205 BS586250 ... BS586178 (163 total)
## varLabels: study_time_collected study_time_collected_unit ...
## biosample_accession (5 total)
## varMetadata: labelDescription
## featureData
## featureNames: 1 2 ... 16442 (16442 total)
## fvarLabels: FeatureId gene_symbol
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
Finally, the summary argument will let you download the matrix with gene symbols in place of probe ids.
gs <- sdy269$getGEMatrix("TIV_2008", outputType = "summary", annotation = "latest")
## returning TIV_2008_sum_eset from cache
If the connection was created with verbose = TRUE
, some functions will display additional informations such as the valid dataset names.
A plot of a dataset can be generated using the plot
method which automatically chooses the type of plot depending on the selected dataset.
sdy269$plot("hai")
sdy269$plot("elisa")
However, the type
argument can be used to manually select from “boxplot”, “heatmap”, “violin” and “line”.
To fetch data from multiple studies, simply create a connection at the project level.
con <- CreateConnection("")
This will instantiate a connection at the Studies
level. Most functions work cross study connections just like they do on single studies.
You can get a list of datasets and gene expression matrices available accross all studies.
con
## <ImmuneSpaceConnection>
## Study: Studies
## URL: https://www.immunespace.org/Studies/
## User: unknown_user at not_a_domain.com
## 13 Available Datasets
## - neut_ab_titer
## - fcs_sample_files
## - elisa
## - fcs_analyzed_result
## - pcr
## - demographics
## - hai
## - elispot
## - hla_typing
## - cohort_membership
## - fcs_control_files
## - gene_expression_files
## - mbaa
## 76 Available Expression Matrices
In cross-study connections, getDataset
and getGEMatrix
will combine the requested datasets or expression matrices. See the dedicated vignettes for more information.
Likewise, plot
will visualize accross studies. Note that in most cases the datasets will have too many cohorts/subjects, making the filtering of the data a necessity. The colFilter
argument can be used here, as described in the getDataset
section.
plotFilter <- makeFilter(c("cohort", "IN", "TIV 2010;TIV Group 2008"))
con$plot("elispot", filter = plotFilter)
The figure above shows the ELISPOT results for two different years of TIV vaccine cohorts from two different studies.
sessionInfo()
## R version 3.5.1 Patched (2018-07-12 r74967)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.7-bioc/R/lib/libRblas.so
## LAPACK: /home/biocbuild/bbs-3.7-bioc/R/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Rlabkey_2.2.2 jsonlite_1.5 httr_1.3.1
## [4] ImmuneSpaceR_1.8.2 rmarkdown_1.10 knitr_1.20
##
## loaded via a namespace (and not attached):
## [1] Biobase_2.40.0 viridis_0.5.1 tidyr_0.8.1
## [4] viridisLite_0.3.0 foreach_1.4.4 gtools_3.8.1
## [7] assertthat_0.2.0 stats4_3.5.1 yaml_2.2.0
## [10] robustbase_0.93-2 pillar_1.3.0 backports_1.1.2
## [13] lattice_0.20-35 glue_1.3.0 digest_0.6.16
## [16] RColorBrewer_1.1-2 colorspace_1.3-2 preprocessCore_1.42.0
## [19] htmltools_0.3.6 plyr_1.8.4 pkgconfig_2.0.2
## [22] pheatmap_1.0.10 purrr_0.2.5 mvtnorm_1.0-8
## [25] scales_1.0.0 webshot_0.5.0 gdata_2.18.0
## [28] whisker_0.3-2 tibble_1.4.2 ggplot2_3.0.0
## [31] nnet_7.3-12 BiocGenerics_0.26.0 lazyeval_0.2.1
## [34] magrittr_1.5 crayon_1.3.4 mclust_5.4.1
## [37] heatmaply_0.15.2 evaluate_0.11 MASS_7.3-50
## [40] gplots_3.0.1 class_7.3-14 registry_0.5
## [43] tools_3.5.1 data.table_1.11.4 trimcluster_0.1-2.1
## [46] stringr_1.3.1 plotly_4.8.0 kernlab_0.9-27
## [49] munsell_0.5.0 cluster_2.0.7-1 fpc_2.1-11.1
## [52] bindrcpp_0.2.2 compiler_3.5.1 caTools_1.17.1.1
## [55] rlang_0.2.2 grid_3.5.1 iterators_1.0.10
## [58] rjson_0.2.20 htmlwidgets_1.2 labeling_0.3
## [61] bitops_1.0-6 codetools_0.2-15 gtable_0.2.0
## [64] flexmix_2.3-14 curl_3.2 reshape2_1.4.3
## [67] TSP_1.1-6 R6_2.2.2 seriation_1.2-3
## [70] gridExtra_2.3 prabclus_2.2-6 dplyr_0.7.6
## [73] bindr_0.1.1 rprojroot_1.3-2 KernSmooth_2.23-15
## [76] dendextend_1.8.0 modeltools_0.2-22 stringi_1.2.4
## [79] parallel_3.5.1 Rcpp_0.12.18 gclus_1.3.1
## [82] DEoptimR_1.0-8 tidyselect_0.2.4 diptest_0.75-7