Loading…
useR!2017 has ended
Thursday, July 6 • 11:36am - 11:54am
**BradleyTerryScalable**: Ranking items scalably with the Bradley-Terry model

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Keywords: Citation data, Directed network, Paired comparisons, Quasi-symmetry, Sparse matrices
Webpage: https://github.com/EllaKaye/BradleyTerryScalable
Motivated by the analysis of large-scale citation networks, we implement the familiar Bradley-Terry model (Zermelo 1929; Bradley and Terry 1952) in such a way that it can be applied, with relatively modest memory and execution-time requirements, to pair-comparison data from networks with large numbers of nodes. This provides a statistically principled method of ranking a large number of objects, based only on paired comparisons.
The BradleyTerryScalable package complements the existing CRAN package BradleyTerry2 (Firth and Turner 2012) by permitting a much larger number of objects to be compared. In contrast to BradleyTerry2, the new BradleyTerryScalable package implements only the simplest, ‘unstructured’ version of the Bradley-Terry model. The new package leverages functionality in the additional R packages igraph (Csardi and Nepusz 2006), Matrix (Bates and Maechler 2017) and Rcpp (Eddelbuettel 2013) to provide flexibility in model specification (whole-network versus disconnected cliques) as well as memory efficiency and speed. The Bayesian approach of Caron and Doucet (2012) is provided as an optional alternative to maximum likelihood, in order to allow whole-network ranking even when the network of paired comparisons is not fully connected.
The BradleyTerryScalable package can readily handle data from directed networks with many thousands of nodes. The use of the Bradley-Terry model to produce a ranking from citation data was originally advocated in Stigler (1994), and was studied in detail more recently in Varin, Cattelan, and Firth (2016); here we will illustrate its use with a large-scale network of inter-company patent citations.
References Bates, Douglas, and Martin Maechler. 2017. “Matrix: Sparse and Dense Matrix Classes and Methods.” R Package Version 1.2-8. http://cran.r-project.org/package=Matrix.

Bradley, Ralph Allan, and Milton E Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. the Method of Paired Comparisons.” Biometrika 39: 324–45.

Caron, François, and Arnaud Doucet. 2012. “Efficient Bayesian Inference for Generalized Bradley–Terry Models.” Journal of Computational and Graphical Statistics 21: 174–96.

Csardi, Gabor, and Tamas Nepusz. 2006. “The igraph Software Package for Complex Network Research.” InterJournal Complex Systems: 1695. http://igraph.org.

Eddelbuettel, Dirk. 2013. Seamless R and C++ Integration with Rcpp. New York: Springer.

Firth, David, and Heather L Turner. 2012. “Bradley-Terry Models in R: The BradleyTerry2 Package.” Journal of Statistical Software 48 (9). http://www.jstatsoft.org/v48/i09.

Stigler, Stephen M. 1994. “Citation Patterns in the Journals of Statistics and Probability.” Statistical Science, 94–108.

Varin, Cristiano, Manuela Cattelan, and David Firth. 2016. “Statistical Modelling of Citation Exchange Between Statistics Journals.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 179: 1–63.

Zermelo, Ernst. 1929. “Die Berechnung Der Turnier-Ergebnisse Als Ein Maximumproblem Der Wahrscheinlichkeitsrechnung.” Mathematische Zeitschrift 29: 436–60.




Speakers
avatar for Ella Kaye

Ella Kaye

Ms, University of Warwick
Ella is a Research Software Engineer in the Department of Statistics at the University of Warwick, UK. She works to increase sustainability and EDI (Equality, Diversity and Inclusion) in the R Project. She also runs rainbowR, a community that supports, promotes and connects LGBTQ... Read More →



Thursday July 6, 2017 11:36am - 11:54am CEST
4.01 Wild Gallery