Keywords: Citation data, Directed network, Paired comparisons, Quasi-symmetry, Sparse matrices
Webpage:
https://github.com/EllaKaye/BradleyTerryScalable Motivated by the analysis of large-scale citation networks, we implement the familiar Bradley-Terry model (Zermelo 1929; Bradley and Terry 1952) in such a way that it can be applied, with relatively modest memory and execution-time requirements, to pair-comparison data from networks with large numbers of nodes. This provides a statistically principled method of ranking a large number of objects, based only on paired comparisons.
The
BradleyTerryScalable package complements the existing
CRAN package
BradleyTerry2 (Firth and Turner 2012) by permitting a much larger number of objects to be compared. In contrast to
BradleyTerry2, the new
BradleyTerryScalable package implements only the simplest, ‘unstructured’ version of the Bradley-Terry model. The new package leverages functionality in the additional
R packages
igraph (Csardi and Nepusz 2006),
Matrix (Bates and Maechler 2017) and
Rcpp (Eddelbuettel 2013) to provide flexibility in model specification (whole-network versus disconnected cliques) as well as memory efficiency and speed. The Bayesian approach of Caron and Doucet (2012) is provided as an optional alternative to maximum likelihood, in order to allow whole-network ranking even when the network of paired comparisons is not fully connected.
The
BradleyTerryScalable package can readily handle data from directed networks with many thousands of nodes. The use of the Bradley-Terry model to produce a ranking from citation data was originally advocated in Stigler (1994), and was studied in detail more recently in Varin, Cattelan, and Firth (2016); here we will illustrate its use with a large-scale network of inter-company patent citations.
References Bates, Douglas, and Martin Maechler. 2017. “Matrix: Sparse and Dense Matrix Classes and Methods.”
R Package Version 1.2-8.
http://cran.r-project.org/package=Matrix.
Bradley, Ralph Allan, and Milton E Terry. 1952. “Rank Analysis of Incomplete Block Designs: I. the Method of Paired Comparisons.”
Biometrika 39: 324–45.
Caron, François, and Arnaud Doucet. 2012. “Efficient Bayesian Inference for Generalized Bradley–Terry Models.”
Journal of Computational and Graphical Statistics 21: 174–96.
Csardi, Gabor, and Tamas Nepusz. 2006. “The igraph Software Package for Complex Network Research.”
InterJournal Complex Systems: 1695.
http://igraph.org.
Eddelbuettel, Dirk. 2013.
Seamless R and C++ Integration with Rcpp. New York: Springer.
Firth, David, and Heather L Turner. 2012. “Bradley-Terry Models in R: The BradleyTerry2 Package.”
Journal of Statistical Software 48 (9).
http://www.jstatsoft.org/v48/i09.
Stigler, Stephen M. 1994. “Citation Patterns in the Journals of Statistics and Probability.”
Statistical Science, 94–108.
Varin, Cristiano, Manuela Cattelan, and David Firth. 2016. “Statistical Modelling of Citation Exchange Between Statistics Journals.”
Journal of the Royal Statistical Society: Series A (Statistics in Society) 179: 1–63.
Zermelo, Ernst. 1929. “Die Berechnung Der Turnier-Ergebnisse Als Ein Maximumproblem Der Wahrscheinlichkeitsrechnung.”
Mathematische Zeitschrift 29: 436–60.