Episode 10: Adventures in Data Munging Part 2

Posted on Sunday, Sep 16, 2012 | Category: Podcast
I’m happy to present episode 10 of the R-Podcast! Season 1 of the R-Podcast concludes with part 2 of my series on data munging, in which I discuss issues surrounding importing data sets contained in HTML tables. I share how I used the XML and RCurl packages to validate and import data from hockey-reference.com for storage into a MySQL database. Our listener feedback segment contains another installment on the Pitfalls of R contributed by listener Frans. I want to thank everyone who has provided such positive feedback throughout the season, and I’m looking forward to providing some exciting new content for season 2. I hope you enjoy the episode and check out our new contact page if you would like to provide any feedback. Thanks for listening!

Show Notes

Episode 10 Time Stamps

00:00 The R-Podcast #010 Adventures in Data Munging Part 2
00:33 Introduction
01:50 Wrapping up season 1 ... wait, what?
03:30 Rstudio team expands
05:41 R Community milestone
07:53 Discovering hockey-reference.com
10:54 Tips for readHTMLtable
21:10 Checking for valid data first
29:23 Minor processing needed
35:18 Saving data to MySQL database
45:26 Listener Feedback: Andrew
54:58 Frans: Pitfalls of R segment 2
63:40 Wrapping up: subscribe to the podcast, theRcast@gmail.com, + 1-269-849-9780, Twitter @theRcast
69:14 End


Eric Nantz

Eric Nantz

Eric Nantz is a principal research scientist at a large life sciences company, creating innovative analytical pipelines and capabilities supporting study designs and analyses. Outside of his day job, Eric is passionate about connecting with the R community as the creator/host of the R-Podcast, Shiny Developer Series, and a curator / podcast host for the R Weekly project. Plus, he likes to share his adventures with R and general computing on Twitch livestreams at twitch.tv/rpodcast.