AnnotationHub: How to write recipes for new resources for the AnnotationHub

Package: AnnotationHub
Authors: Martin Morgan [cre], Marc Carlson [ctb], Dan Tenenbaum [ctb], Sonali Arora [ctb]
Modified: 11 October, 2014
Compiled: Fri Sep 11 20:36:31 2015

Overview of the process

If you are reading this it is (hopefully) because you intend to write some code that will allow the processing of online resources into R objects that are to be made available via that the AnnotationHub package. In order to do this you will have to do three basic steps (outlined below). These steps will have you writing two functions and then calling a third function to do some automatic set up for you. The 1st function will contain instructions on how to process data that is stored online into metadata for describing your new R resources for the AnnotationHub. And the 2nd function is for describing how to take these online resources and transform them into an R object that is useful to end users.

Setup

It should go without saying that this vignette is intended for users who are comfortable with R. And in order to follow the instuctions in this vignette, you will need to install the AnnotationHubData package. This package is not meant to be used by most people, and in fact it's not really intended to be anything other than a support package. So it's not exposed via biocLite(). So to get it you will need to use svn to check it out from the following location:

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/AnnotationHubData

Once you have that checked out, you will need to use R CMD INSTALL to install the package from source.

Introducing AnnotationHubMetadata Objects

The AnnotationHubData package is a complementary package to the AnnotationHub package that provides a place where we can store code that processes online resources into R objects suitable for access through the AnnotationHub package. But before you can understand the requirements for this package it is important that you 1st learn about the objects that are used as intermediaries between the hub and its web based repository behind the scenes. That means that you need to know about AnnotationHubMetadata objects. These objects store the metadata that describes an online resource. And if you want to see a set of online resources added to the repository and maintained, then it will be necessary to become familiar with the AnnotationHubMetadata constructor. For each online resource that you want to process into the AnnotationHub, you will have to be able to construct an AnnotationHubMetadata object that describes it in detail and that specifies where the recipe function lives.

Step 1: Writing your AnnotationHubMetadata generating function

The 1st function you need to provide is one that processes some online resources into AnnotationHubMetadata objects. This function MUST return a list of AnnotationHubMetadata objects. It can rely on other helper functions that you define, but ultimately it (and it's helpers) need to contain all of the instructions needed to find resources and process those resources into AnnotationHubMetadata objects.

The following example function takes files from the latest release of inparanoid and processes them into AnnotationHubMetadata objects using Map. The calling of the Map function is really the important part of this function, as it shows the function creating a series of AnnotationHubMetadata objects. Prior to that, the function was just calling out to other helper functions in order to process the metadata so that it could be passed to the AnnotationHubMetadata constructor using Map. Notice how one of the fields specified by this function is the Recipe, which indicates both the name and location of the recipe function. We expect most people will want to submit their recipe to the same package as they are submitting their metadata processing function.

makeinparanoid8ToAHMs <- function(currentMetadata){
    baseUrl <- 'http://inparanoid.sbc.su.se/download/current/Orthologs_other_formats'
    ## Make list of metadata in a helper function
    meta <- .inparanoidMetadataFromUrl(baseUrl)
    ## then make AnnotationHubMetadata objects.
    Map(AnnotationHubMetadata,
        Description=meta$description,
        Genome=meta$genome,
        SourceFile=meta$sourceFile, 
        SourceUrl=meta$sourceUrl,
        SourceVersion=meta$sourceVersion,
        Species=meta$species,
        TaxonomyId=meta$taxonomyId,
        Title=meta$title,
        RDataPath=meta$rDataPath,
        MoreArgs=list(
          Coordinate_1_based = TRUE,
          DataProvider = baseUrl,
          Maintainer = "Marc Carlson <[email protected]>",
          RDataClass = "SQLiteFile",
          RDataDateAdded = Sys.time(),
          RDataVersion = "0.0.1",
          Recipe = "AnnotationHubData:::inparanoid8ToDbsRecipe",
          Tags = c("Inparanoid", "Gene", "Homology", "Annotation")))
}

Step 2: Writing your recipe

The 2nd kind of function you need to write is called a recipe function. It always must take an single argument that must be an AnnotationHubMetadata object. The job of a recipe function is to use the metadata from an AnnotationHubMetadata object to produce an R object or data file that will be retrievable from the AnnotationHub service later on. Below is a recipe function that calls some helper functions to generate an inparanoid database object from the metadata stored in it's AnnotationHubMetadata object.

inparanoid8ToDbsRecipe <- function(ahm){
    require(AnnotationForge)
    inputFiles <- metadata(ahm)$SourceFile
    dbname <- makeInpDb(dir=file.path(inputFiles,""),
                        dataDir=tempdir())
    db <- loadDb(file=dbname)
    outputPath <- file.path(metadata(ahm)$AnnotationHubRoot,
                            metadata(ahm)$RDataPath)
    saveDb(db, file=outputPath) 
    outputFile(ahm)
}

Step 3: Test your functions and then contact us when they work

So at this point you should make sure that the AnnotationHubMetadata generating function produces a list of AnnotationHubMetadata objects and that your recipe produces a path to a file that is generated in the way that you expect it to. Once this happens you should contact us about running your recipe so that your data can actually be put into the hub.

Session Information

## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] AnnotationHub_2.0.4 BiocStyle_1.6.0    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.1                  futile.logger_1.4.1         
##  [3] BiocInstaller_1.18.4         formatR_1.2                 
##  [5] GenomeInfoDb_1.4.2           XVector_0.8.0               
##  [7] futile.options_1.0.0         bitops_1.0-6                
##  [9] tools_3.2.2                  zlibbioc_1.14.0             
## [11] digest_0.6.8                 evaluate_0.7.2              
## [13] RSQLite_1.0.0                shiny_0.12.2                
## [15] DBI_0.3.1                    curl_0.9.3                  
## [17] parallel_3.2.2               rtracklayer_1.28.10         
## [19] httr_1.0.0                   stringr_1.0.0               
## [21] knitr_1.11                   Biostrings_2.36.4           
## [23] S4Vectors_0.6.5              IRanges_2.2.7               
## [25] stats4_3.2.2                 Biobase_2.28.0              
## [27] R6_2.1.1                     AnnotationDbi_1.30.1        
## [29] BiocParallel_1.2.21          XML_3.98-1.3                
## [31] lambda.r_1.1.7               magrittr_1.5                
## [33] Rsamtools_1.20.4             htmltools_0.2.6             
## [35] BiocGenerics_0.14.0          GenomicRanges_1.20.6        
## [37] GenomicAlignments_1.4.1      mime_0.4                    
## [39] interactiveDisplayBase_1.6.0 xtable_1.7-4                
## [41] httpuv_1.3.3                 stringi_0.5-5               
## [43] RCurl_1.95-4.7               markdown_0.7.7