Name: Extracting Meaningful Noisy Biclusters from a Binary Big-Data Matrix using the BiBitR R Package
Start: 2017-07-05T14:06:00+0200
End: 2017-07-05T14:24:00+0200

Back To Schedule

Extracting Meaningful Noisy Biclusters from a Binary Big-Data Matrix using the BiBitR R Package

Feedback form is now closed.

Keywords: R, package, biclustering, binary data
Webpages: https://cran.r-project.org/web/packages/BiBitR/index.hmtl, https://github.com/ewouddt/BiBitR
Biclustering is a data analysis method that can be used to cluster the rows and columns in a (big) data matrix simultaneously in order to identify local submatrices of interest, i.e., local patterns in a big data matrix. For binary data matrices, the local submatrices that biclustering methods can identify consists of rectangles of 1’s. Several methods were developed for biclustering of binary data, such as the Bimax algorithm proposed by Prelić et al. (2006) and the BiBit algorithm by Rodriguez-Baena, Perez-Pulido, and Aguilar-Ruiz (2011). However, these methods are capable to discover only perfect biclusters which means that noise is not allowed (i.e., zeros are not included in the bicluster). We present an extension for the BiBit algorithm (E-BiBit) that allows for noisy biclusters. While this method works very fast, its downside is that it often produces a large number of biclusters (typically >10000) which makes it very difficult to recover any meaningful patterns and to interpret the results. Furthermore many of these biclusters are highly overlapping.
We propose a data analysis workflow to extract meaningful noisy biclusters from binary data using an extended and `pattern-guided’ version of BiBit and combine it with traditional clustering/networking methods. The proposed algorithm and the data analysis workflow are illustrated using the BiBitR R package to extract and visualize these results.
The proposed method/data analysis flow is applied to high dimensional real life health data which contains information of disease symptoms of hundreds thousands of patients. The E-BiBit algorithm is used to identify homogeneous subsets of patients who share the same disease symptom profiles.
The E-BiBit has also been included in the BiclustGUI R package (De Troyer and Otava (2016), De Troyer et al. (2016)), an ensemble GUI package in which multiple biclustering and visualisation methods are implemented.
References De Troyer, E., and M. Otava. 2016. Package ’Rcmdrplugin.BiclustGUI’: ’Rcmdr’ Plug-in Gui for Biclustering. https://ewouddt.github.io/RcmdrPlugin.BiclustGUI/aboutbiclustgui/.

De Troyer, E., M. Otava, J. D. Zhang, S. Pramana, T. Khamiakova, S. Kaiser, M. Sill, et al. 2016. “Applied Biclustering Methods for Big and High-Dimensional Data Using R.” In, edited by A. Kasim, Z. Shkedy, S. Kaiser, S. Hochreiter, and W. Talloen. CRC Press Taylor & Francis Group, Chapman & Hall/CRC Biostatistics Series.

Prelić, A., S. Bleuler, P. Zimmermann, Wille A., P. Bühlmann, W. Gruissem, L. Henning, L. Thiele, and E. Zitzler. 2006. “A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data.” Bioinformatics 22: 1122–9.

Rodriguez-Baena, Domingo S., Antona J. Perez-Pulido, and Jesus S. Aguilar-Ruiz. 2011. “A Biclustering Algorithm for Extracting Bit-Patterns from Binary Dataets.” Bioinformatics 27 (19).

Speakers

Ewoud De Troyer

UseR2017 EwoudDeTroyer pdf

Wednesday July 5, 2017 2:06pm - 2:24pm CEST
2.01 Wild Gallery

Talk, Clustering

Company 681

Attendees (68)

S
R
A
p
B
L
S
V
E
M
V
M
S
h
A
J
j
k
a
A
P
c
View All →

useR!2017

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Ewoud De Troyer

Attendees (68)