Keywords: rvest, purrr, webscraping, fantasy, sports
Webpages:
http://www.maxhumber.com Really interesting data never actually lives inside of a tidy csv. Unless, of course, you think
Iris or
mtcars is super interesting. Interesting data lives outside of comma separators. It’s unstructured, and messy, and all over the place. It lives around us and on poorly formatted websites, just waiting and begging to be played with.
Finding and fetching and cleaning your own data is a bit like cooking a meal from scratch—instead of microwaving a frozen TV dinner. Microwaving food is simple. It’s literally one step: put thing in microwave. There is, however, no singular step to making a proper meal from scratch. Every meal is different. The recipe for making coconut curry isn’t the same as the recipe for Brussels sprout tacos. But both require a knife and a frying pan!
In “Scraping data with
rvest and
purrr” I will talk through how to pair and combine
rvest (the knife) and
purrr (the frying pan) to scrape interesting data from a bunch of websites. This talk is inspired by a
recent blog post that I authored for and was well received by the r-bloggers.com community.
rvest is a popular
R package that makes it easy to scrape data from html web pages.
purrr is a relatively new package that makes it easy to write code for a single element of a list that can be quickly generalized to the rest of that same list.