useR!2017 has ended
Back To Schedule
Wednesday, July 5 • 1:48pm - 2:06pm
Clustering transformed compositional data using *coseq*

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Abstract: Although there is no shortage of clustering algorithms proposed in the literature, the question of the most relevant strategy for clustering compositional data (i.e., data made up of profiles, whose rows belong to the simplex), remains largely unexplored, particularly in cases where the observed value of an observation is equal or close to zero for one or more samples. This work is motivated by the analysis of two sets of compositional data, both focused on the categorization of profiles but arising from considerably different applications: (1) identifying groups of co-expressed genes from high-throughput RNA sequencing data, in which a given gene may be completely silent in one or more experimental conditions; and (2) finding patterns in the usage of stations over the course of one week in the Velib’ bike sharing system in Paris, France. For both of these applications, we propose the use of appropriate data transformations in conjunction with either Gaussian mixture models or K-means algorithms and penalized model selection criteria. Using our Bioconductor package coseq, we illustrate the user-friendly implementation and visualization provided by our proposed approach, with a focus on the functional coherence of the gene co-expression clusters and the geographical coherence of the bike station groupings.
Keywords: Clustering, compositional data, K-means, mixture model, transformation, co-expression


Wednesday July 5, 2017 1:48pm - 2:06pm CEST
2.01 Wild Gallery