useR!2017: Full Schedule

11:00am CEST

Analysis of German Fuel Prices with R

Keywords: Analytics, Marketing, tidyverse, purrr, ggplot2, rgdal, sp and more
Webpages: https://creativecommons.tankerkoenig.de (sic), https://www.openstreetmap.org/
We present an R-based analysis to measure the impact of different market drivers on fuel prices in Germany. The analysis is based on the open dataset on German fuel prices, bringing in many additional open data sets along the way.

Overview of the dataset
1. History, Legal framework and data collection
2. Current uses in “price-finder apps”
3. Structure of the dataset
4. Preparation of the data
5. A first graphical analysis
- price levels
- weekly and daily pricing patterns

Overview of potential price drivers and corresponding data sources
1. A Purrr workflow for preparing regional data from Destatis
- Number of registered cars
- Number of fuel stations
- Number of inhabitants
- Mean income, etc.
1. Determining geographical market drivers with OSM data using sp, rgdal, geosphere
- Branded vs independent
- Location: higwhway, close to highway exit (“Autohof”) etc.
- Proximity to competitors, etc.
1. Cost drivers
- Market prices for crude oil
- Distance of fuel station to fuel depot
- Land lease and property-prices
1. Outlook:
- Weather
- Traffic density

Based on this data, we will present different modelling approaches to quantify the impact of the above drivers on average price levels. We will also give an outlook and first results on temporal pricing patterns and indicators for competitive or anti-competitive behaviour.
This talk is a condensed version of an online R-workshop that I am currently preparing and which I expect to be fully available at the time of UseR 2017.

Speakers

Boris Vaillant

Presentation BV FuelPrices UseR 170712 V19 pdf

Wednesday July 5, 2017 11:00am - 11:18am CEST
PLENARY Wild Gallery

Talk, Kaleidoscope I

Company 510

11:18am CEST

When is an Outlier an Outlier? The O3 plot

Whether a case might be identified as an outlier depends on the other cases in the dataset and on the variables available. A case can stand out as unusual on one or two variables, while appearing middling on the others. If a case is identified as an outlier, it is useful to find out why. This paper introduces a new display, the O3 plot (Overview Of Outliers), for supporting outlier analyses, and describes its implementation in R.

Figure 1 shows an example of an O3 plot for four German demographic variables recorded for the 299 Bundestag constituencies. There is a row for each variable combination for which outliers were found and two blocks of columns. Each row of the block on the left shows which variable combination defines that row. There are 4 variables, so there are 4 columns, one for each variable, and a cell is coloured grey if that variable is part of the combination. The combinations (the rows) are sorted by numbers of outliers found within numbers of variables in the combination, and blue dotted lines separate the combinations with different numbers of variables. The columns in the left block are sorted by how often the variables occur. A boundary column separates this block from the block on the right that records the outliers found with whichever outlier identification algorithm was used (in this case Wilkinson’s HDoutliers with alpha=0.05). There is one column for each case that is found to be an outlier at least once and these columns are sorted by the numbers of times the cases are outliers.

Given \(n\) cases and \(p\) variables there would be \((p+1+n)\) columns if all cases were an outlier on some combination of variables. And if outliers were identified for all possible combinations there would be \(2^p-1\) rows. An O3 plot has too many rows if there are lots of variables with many combinations having outliers and it has too many columns if there are lots of cases identified as outliers on at least one variable combination. Combinations are only reported if outliers are found for them and cases are only reported which occur at least once as an outlier.

O3 plots show which cases are identified often as outliers, which are identified in single dimensions, and which are only identified in higher dimensions. They highlight which variables and combinations of variables may be affected by possible outliers.

Speakers

Antony Unwin

UnwinO3user2017Slides pdf

Wednesday July 5, 2017 11:18am - 11:36am CEST
PLENARY Wild Gallery

Talk, Kaleidoscope I

Company 442

11:36am CEST

Sports Betting and R: How R is changing the sports betting world

Title Sports Betting and R: How R is changing the sports betting world Speaker: Marco Blume Keywords: Sports Betting, Sports Analytics, Vegas, Markets Webpages - https://cran.r-project.org/web/packages/odds.converter/index.html - https://cran.r-project.org/web/packages/pinnacle.API/index.html - http://pinnacle.com/

Sports Betting markets are one of the purest prediction markets that exist and are yet vastly misunderstood by the public. Many assume that the center of the sports betting world is situated in Las Vegas. However, in the modern era, sports bookmaking is a task that looks a lot like market making in finance with sophisticated algorithmic trading systems running and constantly adjusting prices in real-time as events occur. But, unlike financial markets, sports are governed by a set of physical rules and can usually be measured and understood.

Since the late 90s, Pinnacle has been one of the largest sportsbooks in the world and one of the only sportsbooks who will take wagers from professional bettors (who win in the long term). Similar to card counters in Blackjack, most other sportsbook will ban these winners. At Pinnacle the focus is on modelLing, automation, data science and R is a central piece of the business and a large number of customers use an API to interact with us.

In this talk, we dispel common misconceptions about the sports betting world and show how this is actually a very sexy problem in modelLing and data science and show how we are using R to try to beat Vegas and other sportsbooks every day in a form of data science warfare.

Since the rise of in-play betting markets, an operator must make a prediction in real time on the probability of outcomes for the remainder of an event within a very small margin of error. Customers can compete by building their own models or utilising information that might not be accounted for in the market and expressing their belief through wagering.

Naturally, a customer will generally wager when they believe they have an edge, and then the operator must determine how to change its belief after each piece of new information (wagers, in-game events, etc). This essentially involves predicting how much information is encoded in a wager, which depends partially on the sharpness of each customer, and then determining how to act on that information to maximise profits.

One way to look at this is that we are aggregating, in a smart way, the world’s models, opinions, and information when we come up with a price. This is a powerful concept and is why, for example, political prediction markets are much more accurate than polls or pundits.

For this reason, we are releasing another package to CRAN very soon: We will be releasing a package that has all our odds for the entire MLB season and US Election 2016, which can be combined with the very popular Lahman package to build predictive models and to measure the prediction vs real market data to see how your model would have performed in a real market.

We believe this is a very exciting (and difficult) problem to use for educational purposes. This package can be used in conjunction with two of our existing packages already on CRAN for a few years: odds.converter (to convert between betting market odds types and probabilities) and Pinnacle.API (used to interact with Pinnacle’s real-time odds API in R).

Even if you have no interest in sports or wagering, we believe this is a fascinating problem and our data and tools are perfect for the R community at large to work with, for academic reasons or for hobby.

Speakers

Marco Blume

Trading Director, Pinnacle

useR2017 PPT Final ppt

Wednesday July 5, 2017 11:36am - 11:54am CEST
PLENARY Wild Gallery

Talk, Kaleidoscope I

Company 434

11:54am CEST

Urban green spaces and their biophonic soundscape component

Keywords: soundscape ecology, urbanization, green space, indicators, soundscape
Abstract
Sustainable urban environments with urban green spaces like city parks and urban gardens provide enduring benefits for individuals and society. Providing recreational spaces they encourage physical activity resulting in improved physical and mental health of citizens. As such, the density and the quality of these areas are of high importance in urban area planning.
In order to study urban green spaces as a landscape, the study of their soundscape as the holistic experience of their sounds has recently gained attention in soundscape ecological studies. Using R, the soundecology and seewave packages provide accessible processing tools appropriate to automate the calculation of soundecology indicators of long run sound recordings from permanent outdoor recorders. These indicators give information about the biophonic component in the present soundscape, and as such give a clear indication of the quality of the green space. Since bird vocalizations contribute strongly to the biophonic component, their spring singing activity is clearly reflected in the yearly pattern of these indicators.
A pilot study focussing on the annual variations of the soundscape of a typical urban green space has been conducted.

Speakers

Paul Devos

useR!2017 Brussel distrib PaulDevos pdf

Wednesday July 5, 2017 11:54am - 12:12pm CEST
PLENARY Wild Gallery

Talk, Kaleidoscope I

Company 970

12:12pm CEST

Maps are data, so why plot data on a map?

Keywords: data maps, OpenStreetMap, spatial, visualization
Webpages: https://CRAN.R-project.org/package=osmplotr, https://github.com/ropensci/osmplotr, https://github.com/osmdatar/osmdata
R, like any and every other system for analysing and visualising spatial data, has a host of ways to overlay data on maps (or the other way around). Maps nevertheless contain data—nay, maps are data—making this act tantamount to overlaying data upon data. That’s likely not going to end well, and so this talk will present two new packages that enable you to visualise your own data with actual map data such as building polygons or street lines, rather than merely overlaying (or underlaying) them. The osmdata package enables publically accessible data from OpenStreetMap to be read into R, and osmplotr can then use these data as a visual basis for your own data. Both categorical and continuous data can be visualised through colours or through structural properties such as line thicknesses or types. We think this results is more visually striking and beautiful data maps than any alternative approach that necessitates separating your data from map data.

Speakers

Mark Padgham

padgham pdf

Wednesday July 5, 2017 12:12pm - 12:30pm CEST
PLENARY Wild Gallery

Talk, Kaleidoscope I

Company 630

useR!2017

11:00am CEST

Boris Vaillant

11:18am CEST

Antony Unwin

11:36am CEST

Marco Blume

11:54am CEST

Paul Devos

12:12pm CEST

Mark Padgham

Recently Active Attendees