This page lists resources that this course draws on frequently or at least occasionally. For a more extensive list of data science resources, see the General Resources page.

All materials described below are (should be) freely available online. If you can’t get to them, let me know. Note that a lot of the listed resources are dynamic and ever changing. That means occasionally links might not work, sites go offline, chapters in online books get re-arranged, etc. If you notice anything that’s not quite right, please let me know.

Books and online tutorials

  • IDS = Introduction to Data Science is the book I’ll refer to a good bit in the 1st part of the course.
  • R4DS = R for Data Science (2nd edition). This book will also be mentioned frequently. It is a very gentle and good introduction to data science in R. It has a lot of good exercises and solutions. Note that most of the course material was written when only the 1st edition existed. I’ve tried to update references to specific chapters/pages of R4DS on this website, but it could be that there is still the occasional pointer to something in the 1st edition. If you find such outdated references/links, please let me know so I can fix.
  • ADS = The Art of Data Science has some good big-picture, introductory level chapters on different aspects of a data analysis. It is a pay what you want book with a minimum price of zero, which means you can get it for free (note that the book + video bundle has a non-zero minimum price). (I heard that sometimes, once you register, the email with the book link goes to the spam folder, make sure to check there.)
  • ISL(R) = An Introduction to Statistical Learning (in R) is a good introduction to the statistical and machine learning (i.e. model fitting) part of the data analysis workflow. I’ll refer to it frequently in the second part of the course. While we won’t work through the exercises/labs that are part of ISL, if you are interested in trying out some of those, here is a website that does the labs using the tidymodels set of packages, which we will also be using in this course.
  • HMLR = Hands-on Machine Learning with R covers somewhat similar material as ISL, but with different emphasis and a different approach. I refer to it in several of the later course modules.
  • FES = Feature Engineering and Selection focuses on a specific aspect of the data analysis workflow and is a good resource for those topics.
  • HGR = happygitwitR is a good resource to learn some of the basics of Git/Github with R.
  • RP = R Studio Primers are a great source for introductory, interactive tutorials covering basics of R coding and data analysis.

Tools

Important notes

  • The last time this course was taught, the company now called Posit was called R Studio. You will likely see the old label show up. The software we use for coding R is still called R Studio.
  • The last time the course was taught, Quarto did not exist, we used R Markdown instead. You will likely see references to R Markdown. Just think of it as now being Quarto (i.e. instead of .Rmd files we now work with .qmd files).

General help

  • The main place to get any course specific help are our course discussion boards. Use them widely to ask questions, to answer others’ questions, to post links to interesting resources, etc.
  • Most questions you will have are likely not course specific, but will have to do with R/Github/RMarkdown/etc. For that, Google will be your best friend. Most of the time, someone had the same problem/question you do and someone else answered it. The only tricky thing is finding that post. Even after years of doing this, I probably google how to do something in R every day 😃.
  • When you search online for help, quite often you land on some StackExchange site (often Stack Overflow). These are a collection of widely-used online question and answer sites covering all kinds of topics (including R, Github, Data Analysis, etc.). The majority of the time, you will find an answer, or at least something that gets you closer, on those sites. I rarely go to those sites directly, instead I type my query into Google and it often sends me to one of the StackExchange sites.
  • The Rstudio community is another good place to ask questions. It is not as widely used as StackExchange, but it seems to be more newcomer friendly. I haven’t used it much.
  • Another highly recommended resource is the R for data science community. They have a Slack workspace which you can join for free and ask questions about R, Data Science and related topics. They are very newbie-friendly. I have not used them much but they seem to respond fairly quickly and are helpful.

More resources

The general resources page has a more extensive list of relevant resources.