Sport Informatics and Analytics/Pattern Recognition/Using R
Contents
Introduction
This topic develops issues raised in Pattern Recognition, Theme 2 of this course. It starts a conversation about the use of R in sport analytics.
R is a programming language and a software environment for statistical computing and graphics that is supported by the R Foundation for Statistical Computing.[1]
Kurt Hornik and Friedrich Leisch[2] introduce R in the first edition of the R Newsletter. The R Core Team provide a brief background report about R in that newsletter.[3]
There is a detailed description of R on this Wikipedia page.
There is a vibrant R community on Twitter that includes RStudio and RLadies Global.
Learning about R
Using R in sport contexts
{{IDevice
|theme=Line
|type=Reading
|title=Parkrun
|body=Each weekend the Parkrun organisers around the world provide opportunities for people to take part in a 5 kms run. The data from these runs are shared on public websites. As a gentle introduction to the use of R in sport contexts, you might like to have a look at the data shared by Keith Lyons (2019)[23]. The data provide information about 385 parkruns that were under 40 minutes that took place in Braidwood, NSW, in 2018. There is a GitHub repository for the data.}
Australian rules football
Netball
Association football
Cricket
Basketball
Tennis
Salaries in sport
Strava
Olympic medals
Extreme skiing and snowboarding
Baseball
NFL
Ice hockey
Visualising data with R
One of the options you have with R is to visualise your data. R has a number of functions and libraries to support your visualisations.
If you would like to explore the potential of R to visualise data, you might find Remko Duursma, Jeff Powell and Glenn Stone's (2017)[77] introduction to learning R very helpful. Their Chapter 4 refers explicitly to visualizing data and the use of RStudio and includes discussion of: scatterplot; bar plot; histogram; curves; pie chart; box and whisker plot; and symbols.
A powerful visualisation tool in R is ggplot2[78].
ggplot2 was inspired by Leland Wilkinson's (1999) The Grammar of Graphics[79] and is available as a CRAN package in R and RStudio.
Edwin Chen (2012)[80] provides "a bare-bones introduction to ggplot2" that "assumes no knowledge of R". A definitive introduction to ggplot2 is provided by Hadley Wickham (2016)[81].
R as an ePortfolio resource
References
- ↑ Hornik, Kurt; Leisch, Friedrich (November 26, 2015). "R FAQ". https://cran.r-project.org/doc/FAQ/R-FAQ.html#What-is-R_003f. Retrieved 9 February 2016.
- ↑ Hornik, Kurt; Leisch, Friedrich (1 January, 2001). "Editorial". R-project. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf. Retrieved 9 February 2016.
- ↑ The R Core Team (1 January, 2001). "What is R?". R-project. https://www.r-project.org/doc/Rnews/Rnews_2001-1.pdf. Retrieved 9 February 2016.
- ↑ Hicks, Stephanie; Irizarry, Rafael (2016). "A Guide to Teaching Data Science". https://arxiv.org/ftp/arxiv/papers/1612/1612.07140.pdf.
- ↑ Campbell, Paul (September 2018). "A whirlwind tour of working with data in R". https://paulc91.github.io/intro_to_r/#1. Retrieved 23 September 2018.
- ↑ Dancho, Matt (4 November 2018). "New R cheatsheet: data science workflow with R". https://www.business-science.io/learning-r/2018/11/04/data-science-r-cheatsheet.html. Retrieved 5 November 2018.
- ↑ Wickham, Hadley (August 2019). "Mastering Shiny". https://mastering-shiny.org/. Retrieved 14 August 2019.
- ↑ Walum, Hasse; De Leon, Desiree (August 2019). "Introduction". https://tinystats.github.io/teacups-giraffes-and-statistics/02_bellCurve.html. Retrieved 15 August 2019.
- ↑ Schneider, Todd (2016). https://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 18 October 2017.
- ↑ Frick, Hannah; Kosmidis, Ioannis (2017). "trackeR: Infrastructure for Running and Cycling Data from GPS-Enabled Tracking Devices in R". Journal of Statistical Software 82 (7).
- ↑ Frick, Hannah; Kosmidis, Ioannis (2017). "trackeR: Infrastructure for Running and Cycling Data from GPS-Enabled Tracking Devices in R". Journal of Statistical Software 82 (7): 1.
- ↑ Tran, Jacquie (15 February 2018). "Sport analytics in R". https://jacquietran.neocities.org/acu-gcpa-2018-02/presentation.html. Retrieved 15 February 2018.
- ↑ Nakagawara, Ryo (4 July 2018). https://datascienceplus.com/visualize-the-world-cup-with-r-part-1-recreating-goals-with-ggsoccer-and-ggplot2/. Retrieved 8 August 2018.
- ↑ Nakagawara, Ryo (6 August 2018). https://www.r-bloggers.com/animating-the-goals-of-the-world-cup-comparing-the-old-vs-new-gganimate-and-tweenr-api/. Retrieved 8 August 2018.
- ↑ Benz, Luke. https://github.com/lbenz730/ncaahoopR. Retrieved 8 August 2018.
- ↑ Postive Residual (2019). "Portfolio". https://positiveresidual.com/. Retrieved 7 January 2019.
- ↑ Arregoitia, Luis (January 2019). "Animate shot distances for NBA games". https://luisdva.github.io/rstats/bball-shots/. Retrieved 7 January 2019.
- ↑ Ward, Patrick (20 January 2019). "A Simple Approach to Analyzing Athlete Data in Applied Sports Science". http://optimumsportsperformance.com/blog/testing-syntax-highlighter-evolved/. Retrieved 21 January 2019.
- ↑ Averick, Mara (27 February 2019). "NBA Advanced Metrics". http://rpubs.com/maraaverick/470388. Retrieved 28 February 2019.
- ↑ Frigaard, Martin; Spangler, Peter (7 May 2019). "Exploring Chicago rideshare data in R". http://www.storybench.org/exploring-chicago-rideshare-data/. Retrieved 9 May 2019.
- ↑ O'Hara-Wild, Mitchell (17 June 2019). "Introducing tsibbledata". https://www.mitchelloharawild.com/blog/tsibbledata/. Retrieved 15 June 2019.
- ↑ Padgham, Mark (9 May 2019). "bikedata". https://cran.r-project.org/web/packages/bikedata/vignettes/bikedata.html. Retrieved 2 September 2019.
- ↑ Lyons, Keith (5 January 2019). "Braidwood Showground Parkruns 2018". https://keithlyons.me/blog/2019/01/05/braidwood-showground-parkruns-2018/. Retrieved 5 January 2019.
- ↑ Jovanović, Mladen (13 March 2015). "AFL Data Analysis Report". http://complementarytraining.net/wp-content/uploads/2015/03/AFL_Analysis.html. Retrieved 26 March 2016.
- ↑ Tran, Jacquie (12 January 2019). "Getting to know the fitzRoy package (AFL game statistics". https://underthehood.jacquietran.com/2019/01/12/getting-to-know-the-fitzroy-package-afl-game-statistics/. Retrieved 13 January 2019.
- ↑ Sweeting, Alice (2017). "Discovering the Movement Sequences of Elite and Junior Elite Netball Athletes" (PhD). Institute of Sport, Exercise and Active Living, Victoria University, Melbourne, Australia. http://trove.nla.gov.au/work/227110648?q&versionId=249204357. Retrieved 18 July 2017.
- ↑ Sweeting, Alice (11 June 2016). "Introduction to R and A Basic Analysis of Athlete Load". https://sportstatisticsrsweet.wordpress.com/2016/06/. Retrieved 18 July 2017.
- ↑ Sweeting, Alice (29 January 2018). "k-means Clustering in R". https://sportstatisticsrsweet.wordpress.com/2018/01/29/k-means-clustering-in-r/. Retrieved 30 January 2018.
- ↑ Loridan, Thomas. "téouch analytics". https://teouchanalytics.wordpress.com/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Google Scholar Profile". https://scholar.google.com.au/citations?user=VVRMn3cAAAAJ&hl=en. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 1: feature engineering (and some data to play with". https://teouchanalytics.wordpress.com/2017/07/08/episode-1-feature-engineering-and-some-data-to-play-with/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 2: Assessing feature importance". https://teouchanalytics.wordpress.com/2017/07/10/episode-2-assessing-feature-importance/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 3: Building and testing a predictive model". https://teouchanalytics.wordpress.com/2017/07/13/episode-3-building-and-testing-a-predictive-model/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 4: Tuning a football predictive model with caret". https://teouchanalytics.wordpress.com/2017/07/18/tuning-a-football-prediction-model-with-caret/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 5: how to bet on football using a prediction model". https://teouchanalytics.wordpress.com/2017/07/21/episode-5-how-to-bet-on-football-using-a-prediction-model/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 6: where to from here?". https://teouchanalytics.wordpress.com/2017/08/04/episode-6-where-to-from-here/. Retrieved 8 September 2017.
- ↑ Loridan, Thomas. "Episode 6: where to from here?". https://teouchanalytics.wordpress.com/2017/08/04/episode-6-where-to-from-here/. Retrieved 8 September 2017.
- ↑ Wilson, Robbie et al (2017). "Skill not athleticism predicts individual variation in match performance of soccer players". Proceedings of the Royal Society B Biological Sciences 284(1869).
- ↑ Tyner, Sam; Briatte, François; Hofmann, Henke (2017). "Network Visualization with ggplot2". The R Journal 9(1).
- ↑ Curley, James. "Introducing engsoccerdata". https://github.com/jalapic/engsoccerdata. Retrieved 8 November 2017.
- ↑ . https://ewen.io/2018/12/10/understatr/. Retrieved 12 December 2018.
- ↑ "#15: Getting Started with Free StatsBomb Event Data – xG Shot Map Tutorial". 16 June 2019. https://thelastmananalytics.home.blog/2019/06/16/15-getting-started-with-free-statsbomb-event-data-xg-shot-map-tutorial/. Retrieved 18 June 2019.
- ↑ Torvaney, Ben (1 January 2019). https://stats-and-snakeoil.herokuapp.com/2019/01/01/predicting-the-premier-league-with-dixon-coles/. Retrieved 12 December 2018.
- ↑ Torvaney, Ben (6 August 2019). ggsoccer. https://github.com/Torvaney/ggsoccer. Retrieved 7 August 2019.
- ↑ Ganesh, Tinniam. "Introducing cricketr! : An R package to analyze performances of cricketers". https://gigadom.wordpress.com/2015/07/04/introducing-cricketr-a-r-package-to-analyze-performances-of-cricketers/. Retrieved 25 October 2017.
- ↑ Ganesh, Tinniam. "The making of cricket package yorkr – Part 1". https://gigadom.wordpress.com/2016/03/05/the-making-of-cricket-package-yorkr-part-1-2/. Retrieved 25 October 2017.
- ↑ Ganesh, Tinniam. "More book, more cricket! 2nd edition of my books now on Amazon". https://gigadom.wordpress.com/2017/03/26/more-book-more-cricket-2nd-edition-of-my-books-now-on-amazon/. Retrieved 25 October 2017.
- ↑ Ganesh, Tinniam. "cricketr sizes up legendary All-rounders of yesteryear". https://gigadom.wordpress.com/2016/09/10/cricketr-sizes-up-legendary-all-rounders-of-yesteryear/. Retrieved 25 October 2017.
- ↑ Ganesh, Tinniam. "Analysis of IPL T20 matches with yorkr templates". https://gigadom.wordpress.com/2017/03/04/analysis-of-ipl-t20-matches-with-yorkr-templates/. Retrieved 25 October 2017.
- ↑ Cervone, Daniel et al (4 August 2014). "A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes". https://arxiv.org/pdf/1408.0777.pdf. Retrieved 21 November 2017.
- ↑ Cervone, Daniel. "EPVDemo". https://github.com/dcervone/EPVDemo. Retrieved 21 November 2017.
- ↑ Schneider, Todd (8 March 2016). "BallR: Interactive NBA Shot Charts with R and Shiny". http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 4 April 2018.
- ↑ Schneider, Todd (8 March 2016). "BallR: Interactive NBA Shot Charts with R and Shiny". http://toddwschneider.com/posts/ballr-interactive-nba-shot-charts-with-r-and-shiny/. Retrieved 4 April 2018.
- ↑ Arregoita, Luis (14 February 2019). "Quantifying point overlap for NBA shot chart data". https://luisdva.github.io/rstats/nba-overlap/. Retrieved 27 February 2019.
- ↑ Arregoita, Luis (9 January 2019). "Animate shot distances for NBA games". https://luisdva.github.io/rstats/bball-shots/. Retrieved 27 February 2019.
- ↑ Greenberg, Neil (18 March 2019). "2019 NCAA tournament: The perfect bracket to win your March Madness pool". https://www.washingtonpost.com/sports/2019/03/18/ncaa-tournament-perfect-bracket-win-your-march-madness-pool/. Retrieved 19 March 2019.
- ↑ Firke, Sam (18 March 2019). "Predicting March Madness". https://github.com/sfirke/predicting-march-madness. Retrieved 19 March 2019.
- ↑ Brooks, Dan; Folsom, Keith (11 May 2016). "Predicting March Madness". https://rstudio-pubs-static.s3.amazonaws.com/180553_8d12f96839b74f4aa3b562beb54dff25.html. Retrieved 20 March 2019.
- ↑ Lopez, Michael; Matthews, Gregory (30 November 2014). "Building an NCAA men's basketball predictive model and quantifying its success". https://arxiv.org/abs/1412.0248. Retrieved 19 March 2019.
- ↑ Kovalchik, Stephanie (13 October 2017). "Measuring Match Fatigue". http://on-the-t.com/2017/10/13/fatigue-effects/. Retrieved 9 December 2017.
- ↑ Kovalchik, Stephanie (20 October 2017). "Is Fatigue Cumulative?". http://on-the-t.com/2017/10/20/cumulative-fatigue-effects/. Retrieved 9 December 2017.
- ↑ Burris, Kyle (7 September 2017). "Relief-Fatigue". https://github.com/burrisk/Relief-Fatigue. Retrieved 9 December 2017.
- ↑ Ritz, Christian et al (2015). "Dose-Response Analysis Using R". PLoS ONE 10(12).
- ↑ Kovalchik, Stephanie (18 March 2018). "Cape Town celebrates R and tennis data science at satRday". http://on-the-t.com/2018/03/16/satrday-capetown/. Retrieved 24 March 2018.
- ↑ Kovalchik, Stephanie (18 March 2018). "satRday". https://github.com/skoval/satRday. Retrieved 24 March 2018.
- ↑ Kovalchik, Stephanie (10 July 2018). "Material from 2018 UseR Conference: Statistical Models for Sport in R". https://github.com/skoval/UseR2018. Retrieved 24 July 2018.
- ↑ Tran, Jacquie (2 January 2018). "How much do you get paid? Part I - An initial exploration". http://underthehood.jacquietran.com/2018/01/02/how-much-do-you-get-paid-part-1/. Retrieved 3 January 2018.
- ↑ Smith, David (23 January 2018). http://blog.revolutionanalytics.com/2018/01/strava-visualization.html. Retrieved 24 January 2018.
- ↑ Rinker, Tyler (20 March 2018). "Building the Olympics blog: tidy data preparation". https://edwinth.github.io/olympics-dataprep/. Retrieved 22 March 2018.
- ↑ Rinker, Tyler (20 March 2018). "Building the Olympics blog: tidy data preparation". https://edwinth.github.io/olympics-dataprep/. Retrieved 22 March 2018.
- ↑ Rinker, Tyler (9 February 2014). "Sochi Olympic Medals". https://trinkerrstuff.wordpress.com/2014/02/09/sochi-olympic-medals-2/. Retrieved 22 March 2018.
- ↑ Oldach, Matthew (8 May 2018). "Analyzing extreme skiing and snowboarding in R: Freeride World Tour 1996–2018". https://medium.com/@MattOldach_65321/analyzing-extreme-skiing-and-snowboarding-in-r-freeride-world-tour-1996-2018-ffde401fb3ae. Retrieved 10 May 2018.
- ↑ Petti, Bill (21 September 2015). "A Short(-ish) Introduction to Using R Packages for Baseball Research". https://www.fangraphs.com/tht/a-short-ish-introduction-to-using-r-for-baseball-research/. Retrieved 2 June 2018.
- ↑ Protacio, Angeline (September 2019). "Using R and the Tidyverse to Play Fantasy Baseball". https://github.com/angelinepro/useR_july2019/blob/master/Using%20R%20and%20the%20Tidyverse%20to%20Play%20Fantasy%20Baseball_useR2019.pdf. Retrieved 11 September 2019.
- ↑ Petersen, Isaac. "Fantasy Football Analytics". https://fantasyfootballanalytics.net/. Retrieved 6 September 2018.
- ↑ . https://github.com/jflancer/nwhlR. Retrieved 12 December 2018.
- ↑ Duursma, Remko; Powell, Jeff; Stone, Glenn (28 August 2017). https://www.westernsydney.edu.au/__data/assets/pdf_file/0011/830909/Rnotes_20170828_web.pdf. Retrieved 26 November 2017.
- ↑ Wickham, Hadley (2011). "ggplot2". WIREs Computational Statistics 3 (2): 180-185.
- ↑ Wickham, Hadley (2007). http://ggplot2.org/resources/2007-past-present-future.pdf. Retrieved 26 November 2017.
- ↑ Chen, Edwin (17 January 2012). http://blog.echen.me/2012/01/17/quick-introduction-to-ggplot2/. Retrieved 26 November 2017.
- ↑ Wickham, Hadley (2016). ggplot2: Elegant Graphics for Data Analysis. Berlin: Springer.
- ↑ Atkinson, Anthony (1986). "Comment: Aspects of Diagnostic Regression Analysis". Statistical Science 1(3): 379-402.
- ↑ Healy, Kieran (2017). "Data Visualization for Social Science: A practical introduction with R and ggplot2". http://socviz.co/index.html. Retrieved 9 December 2017.
- ↑ MacKintosh, John (16 May 2016). "Intro to ggplot2". https://cdn.rawgit.com/johnmackintosh/ggplot2_demo/a18cc631/pres.html#1. Retrieved 22 February 2018.
- ↑ Tyner, Sam; Briatte, François; Hofmann, Henke (2017). "Network Visualization with ggplot2". The R Journal 9(1).
- ↑ Fry, Chris (9 April 2015). "Graphing in R". https://chrisfryperformanceanalyst.wordpress.com/2015/04/09/graphing-in-r/. Retrieved 21 February 2018.
- ↑ Toumi, Asmae (February 2018). "R for data visualization". https://docs.google.com/presentation/d/1f5PGhzkW0ouqvtow9JbnpNe9AKATKXJac5CLV7JSWbU/edit#slide=id.gc6f90357f_0_0. Retrieved 25 February 2018.
- ↑ Toumi, Asmae (February 2018). "R for data visualization". https://drive.google.com/drive/folders/1A-yoLHJ7VJHlo0QL28LMDg0CGogF6xeq. Retrieved 25 February 2018.
- ↑ Hvitfeldt, Emil (12 June 2018). "ggplot2 trial and error - US trade data". https://www.hvitfeldt.me/2018/06/ggplot2-trial-and-error-us-trade-data/. Retrieved 14 June 2018.
- ↑ Navarro, Danielle (6 April 2019). "Data visualisation in R". https://djnavarro.github.io/satrdayjoburg/. Retrieved 7 April 2019.
- ↑ Byrd, Larie (8 February 2018). "The First (and Namesake) Post: Is It Cake?". https://aczane.netlify.com/2018/02/08/the-first-and-namesake-post-is-it-cake/. Retrieved 10 February 2018.
- ↑ Robinson, David (14 November 2017). "Advice to aspiring data scientists: start a blog". http://varianceexplained.org/r/start-blog/. Retrieved 15 February 2018.
- ↑ Salmon, Maelle (15 March 2018). "Get on your soapbox!". http://www.masalmon.eu/rladiesct/slides#1. Retrieved 16 March 2018.
- ↑ Koehrsen, William (11 August 2018). "The most important part of a data science project is writing a blog post". https://towardsdatascience.com/the-most-important-part-of-a-data-science-project-is-writing-a-blog-post-50715f37833a. Retrieved 15 August 2018.
- ↑ SportSciData (4 April 2019). "How to Create Interactive Reports with R Markdown Part I:". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 16 April 2016.
- ↑ SportSciData (12 April 2019). "How to Create Interactive Reports in R Markdown Part II: Data Visualisation". https://www.sportscidata.com/2019/04/12/using-data-visualisation-in-r-markdown/. Retrieved 16 April 2016.
- ↑ SportSciData (4 April 2019). "How to Create Interactive Reports with R Markdown Part I:". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 16 April 2016.
- ↑ SportSciData (12 April 2019). "How to Create Interactive Reports in R Markdown Part II: Data Visualisation". https://www.sportscidata.com/2019/04/12/using-data-visualisation-in-r-markdown/. Retrieved 16 April 2016.
- ↑ Bajak, Aleszu (25 August 2017). "How to convert a Google Doc to RMarkdown and publish on Github pages". http://www.storybench.org/convert-google-doc-rmarkdown-publish-github-pages/. Retrieved 15 November 2017.
- ↑ Collins, Neil. "How to Create Reports In R Markdown I: Data Tables". https://www.sportscidata.com/2019/04/04/how-to-create-interactive-reports-with-r-markdown-part-i/. Retrieved 17 June 2019.
- ↑ Monkman, Martin. "Per-game run scoring by league". https://monkmanmh.shinyapps.io/MLBrunscoring_shiny/. Retrieved 17 February 2018.
- ↑ Monkman, Martin (26 March 2017). "Updated Shiny app". https://bayesball.blogspot.com.au/2017/03/updated-shiny-app.html. Retrieved 17 February 2018.
- ↑ Davis, Scott (9 June 2018). "NBA Finals Gamecast Summary". https://sdavis.shinyapps.io/NBAFinals/. Retrieved 10 June 2018 2018.
- ↑ Berndsen, Chris (8 March 2018). "Introduction to RMarkdown and Shiny". https://youtu.be/O04l-LpmoE8. Retrieved 13 March 2018.
- ↑ Biecek, Przemysław; Kosiński, Marcin (2017). "archivist: An R Package for Managing, Recording and Restoring Data Analysis Results". Journal of Statistical Software 82(11): 10.18637/jss.v082.i11.
- ↑ Biecek, Przemysław (14 December 2017). "archivist: Boost the reproducibility of your research". http://smarterpoland.pl/index.php/2017/12/boost-the-reproducibility-of-your-research-with-archivist/. Retrieved 16 December 2017.
- ↑ Xiao, Nan (20 May 2017). "Persistent Reproducible Reporting with Docker and R". https://nanx.me/talks/#talk-chinar-2017. Retrieved 31 July 2018 2018.
- ↑ Xiao, Nan (30 July 2018). "liftr: an R Package for Persistent Reproducible Research". https://nanx.me/talks/#talk-jsm-2018. Retrieved 31 July 2018.
- ↑ Turnbull, Jamres (August 2018). "Documentation as a gateway to open source". https://increment.com/documentation/documentation-as-a-gateway-to-open-source/. Retrieved 10 August 2018.
- ↑ Vuorre, Matti; Curley, James (11 April 2018). "Curating Research Assets: A Tutorial on the Git Version Control System". Advances in Methods and Practices in Psychological Science https://doi.org/10.1177/2515245918754826.
- ↑ Sweeting, Alice (29 January 2019). "A little about me…". https://sportstatisticsrsweet.rbind.io/#about. Retrieved 29 January 2019.