useR!2017: Full Schedule

11:54am CEST

Interactive and Reproducible Research for RNA Sequencing Analysis

Keywords: Shiny, microbiome, sequencing, ecology, 16S rRNA
Webpages: https://acnc-shinyapps.shinyapps.io/DAME/, https://github.com/bdpiccolo/ACNC-DAME
A new renaissance in knowledge about the role of commensal microbiota in health and disease is well underway facilitated by culture-independent sequencing technologies; however, microbial sequencing data poses new challenges (e.g., taxonomic hierarchy, overdispersion) not generally seen in more traditional sequencing outputs. Additionally, complex study paradigms from clinical or basic research studies necessitate a multilayered analysis pipeline that can seamlessly integrate both primary bioinformatics and secondary statistical analysis combined with data visualization.
In order to address this need, we created a web-based Shiny app, titled DAME, which allows users not familiar with R programming to import, filter, and analyze microbial sequencing data from experimental studies. DAME only requires two files (a BIOM file with sequencing reads combined with taxonomy details, and a csv file containing experimental metadata), which upon upload will trigger the app to render a linear work-flow controlled by the user. Currently, DAME supports group comparisons of several ecological estimates of α-diversity (ANOVA) and β-diversity indices (ordinations and PERMANOVA). Additionally, pairwise differential comparisons of operational taxonomic units (OTUs) using Negative Binomial Regression at all taxonomic levels can be performed. All analyses are accompanied by dynamic graphics and tables for complete user interactivity. DAME leverages functions derived from phyloseq, vegan, and DESeq2 packages for microbial data organization and analysis and DT, highcharter* and scatterD3 for table and plot visualizations. Downloadable options for α-diversity measurements and DESeq2 table outputs are also provided.
The current release (v0.1) is available online (https://acnc-shinyapps.shinyapps.io/DAME/) and in the Github repository (https://github.com/bdpiccolo/ACNC-DAME). *This app uses Highsoft software with non-commercial packages. Highsoft software product is not free for commercial use. Funding supported by United States Department of Agriculture-Agricultural Research Service Project: 6026-51000-010-05S.

Speakers

Brian Piccolo

Thursday July 6, 2017 11:54am - 12:12pm CEST
2.01 Wild Gallery

Talk, Bioinformatics II

Company 383

11:00am CEST

IntegratedJM - an R package to Jointly Model the Gene-Expression and Bioassay Data, Taking Care of the Fingerprint Feature effect

Keywords: Bioactivity, Biomarkers, Chemical Structure, Joint Model, Multi-source
Webpages: https://cran.r-project.org/web/packages/IntegratedJM/index.html
In recent days, data from different sources need to be integrated together in order to arrive at meaningful conclusions. In drug-discovery experiments, most of the different data sources, related to a new set of compounds under development, are of high-dimension. For example, in order to investigate the properties of a new set of compounds, pharmaceutical companies need to analyse chemical structure (fingerprint features) of the compounds, phenotypic bioactivity (bioassay read-outs) data for targets of interest and transcriptomic(gene expression) data. Perualila-Tan et al. (2016) proposed a joint model in which the three data sources are included to better detect the association between gene expression and biological activity. For a given set of compounds, the joint modeling approach accounts for a possible effect of the chemical structure of the compound on both variables. The joint model allows us to identify genes as potential biomarkers for compound’s efficacy. The joint modeling approach, proposed by Perualila-Tan et al. (2016), is implemented in the IntegratedJM R package which provides, in addition to model estimation and inference, a set of exploratory and visualization functions that can be used to clearly present the results. The joint model and the IntegratedJM R package are discussed in details in Perualila et al. (2016) as well.
References Perualila, Nolen Joy, Ziv Shkedy, Rudradev Sengupta, Theophile Bigirumurame, Luc Bijnens, Willem Talloen, Bie Verbist, Hinrich W.H. Göohlmann, Adetayo Kasim, and QSTAR Consortium. 2016. “Applied Surrogate Endpoint Evaluation Methods with Sas and R.” In, edited by Ariel Alonso, Theophile Bigirumurame, Tomasz Burzykowski, Marc Buyse, Geert Molenberghs, Leacky Muchene, Nolen Joy Perualila, Ziv Shkedy, and Wim Van der Elst, 275–309. CRC Press.

Perualila-Tan, Nolen, Adetayo Kasim, Willem Talloen, Bie Verbist, Hinrich W.H. Göhlmann, QSTAR Consortium, and Ziv Shkedy. 2016. “A Joint Modeling Approach for Uncovering Associations Between Gene Expression, Bioactivity and Chemical Structure in Early Drug Discovery to Guide Lead Selection and Genomic Biomarker Development.” Statistical Applications in Genetics and Molecular Biology 15: 291–304. doi:10.1515/sagmb-2014-0086.

Speakers

Rudradev Sengupa

useR RudradevSengupta pdf

Friday July 7, 2017 11:00am - 11:18am CEST
3.02 Wild Gallery

Talk, Bioinformatics II

Company 653

11:18am CEST

Detecting eQTLs from high-dimensional sequencing data using recount2

Keywords: eQTLs, RNA-seq, recount2, Batch Effect, gEUVADIS
Webpages: https://jhubiostatistics.shinyapps.io/recount/, https://www.bioconductor.org/packages/recount
recount2 is a recently launched multi-experiment resource of analysis-ready RNA-seq gene and exon count datasets for 2,041 different studies with over 70,000 human RNA-seq samples from the Sequence Read Archive (SRA), Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) projects (Collado-Torres et al. (2016)). The raw sequencing reads were processed with Rail-RNA as described at Nellore et al. (2016). RangedSummarizedExperiment objects at the gene, exon or exon-exon junctions level, the raw counts, the phenotype metadata used, the urls to the sample coverage bigWig files or the mean coverage bigWig file for a particular study can be accessed via the Bioconductor package recount or via a Shiny App.
We use this source of preprocessed RNA-seq expression data to present our recently developed analysis protocol for performing extensive eQTL analyses. The goal of an eQTL analysis is to detect patterns of gene expression related to specific genetic variants. We demonstrate how to integrate gene expression data from recount2 and genotype information to perform eQTL analyses and visualize the results with gene-SNP interaction plots. We explain in detail how expression and genotype data are filtered, transformed, and batch corrected. We also discuss possible pitfalls and artifacts that may occur when analyzing genomic data from different sources jointly. Our protocol is tested on a publicly available data set of the RNA-sequencing project from the GEUVADIS consortium and also applied to recently generated omics data from the GeneSTAR project at Johns Hopkins University.
References Collado-Torres, Leonardo, Abhinav Nellore, Kai Kammers, Shannon E Ellis, Margaret A Taub, Kasper D Hansen, Andrew E Jaffe, Ben Langmead, and Jeffrey Leek. 2016. “Recount: A Large-Scale Resource of Analysis-Ready RNA-seq Expression Data.” bioRxiv. doi:10.1101/068478.

Nellore, Abhinav, Leonardo Collado-Torres, Andrew E Jaffe, Jose Alquicira-Hernandez, Christopher Wilks, Jacob Pritt, James Morton, Jeffrey T Leek, and Ben Langmead. 2016. “Rail-RNA: Scalable Analysis of RNA-seq Splicing and Coverage.” Bioinformatics. doi:10.1093/bioinformatics/btw575.

Speakers

Kai Kammers

20170707 useR Kammers pdf

Friday July 7, 2017 11:18am - 11:36am CEST
3.02 Wild Gallery

Talk, Bioinformatics II

Company 920

11:36am CEST

Integrated analysis of digital PCR experiments in R

Keywords: digital PCR, multiple comparison, GUI, reproducible research
Webpages: https://CRAN.R-project.org/package=dpcR, http://michbur.github.io/dpcR_manual/, http://michbur.github.io/pcRuniveRsum/
Digital PCR (dPCR) is a variant of PCR, where the PCR amplification is conducted in multiple small volume reactions (termed partitions) instead of a bulk. The dichotomous status of each partition (positive or negative amplification) is used for absolute quantification of the template molecules by Poisson transformation of the proportion of positive partitions. The vast expansion of dPCR technology and its applications has been followed by the development of statistical data analysis methods. Yet, the software landscape is scattered, consisting of scripts in various programming languages, web servers with narrow scopes or closed source vendor software packages, that are usually tightly tied to their platform. This leads to unfavourable environments, as results from different platforms, or even from different laboratories using the same platform, cannot be easily compared with one another.
To address these challenges, we developed the dpcReport shiny server that provides an open-source tool for the analysis of dPCR data. dpcReport provides a streamlined analysis framework to the dPCR community that is compatible with the data output (e.g., CSV, XLSX) from different dPCR platforms (e.g., Bio-Rad QX100/200, Biomark). This goes beyond the basic dPCR data analysis with vendor-supplied softwares, which is often limited to the computation of the mean template copy number per partition and its uncertainty. dpcReport gives users more control over their data analysis and they benefit from standardization and reproducible analysis.
Our web server analyses data regardless of the platform vendor or type (droplet or chamber dPCR). It is not limited to the commercially available platforms and can also be used with experimental systems by importing data through the universal REDF format, which follows the IETF RFC 4180 standard. dpcReport provides users with advanced tools for data quality control and it incorporates statistical tests for comparing multiple reactions in an experiment [@burdukiewicz_methods_2016], currently absent in many dPCR-related software tools. dpcReport provides users with advanced tools for data quality control. The conducted analyses are fully integrated within extensive and customizable interactive HTML reports including figures, tables and calculations.
To improve reproducibility and transparency, a report may include snippets in R enabling an exact reproduction of the analysis performed by dpcReport. We developed dpcR package to collect all functionalities employed by the shiny server. Furthermore, the package provides additional functions facilitating analysis and quality control of dPCR data. Nevertheless, core functionalities are available through the shiny server to minimize entry barrier required to use our software.
Both dpcReport and dpcR follow the standardized dPCR nomenclature of the dMIQE guidelines [@huggett_digital_2013]. Since the vast functionality offered by our software may be overwhelming at first, our software is extensively documented. The documentation is enriched by the analysis of sample data sets.
The dpcReport web server and dpcR package belong to pcRuniveRsum, a collection of R tools for analysis DNA Amplification of Experiments
References

Speakers

Stefan Rödiger

Friday July 7, 2017 11:36am - 11:54am CEST
3.02 Wild Gallery

Talk, Bioinformatics II

Company 1189

useR!2017

11:54am CEST

Brian Piccolo

11:00am CEST

Rudradev Sengupa

11:18am CEST

Kai Kammers

11:36am CEST

Stefan Rödiger

Recently Active Attendees