Course Resources

Author

Andreas Handel

Modified

2024-03-20

This page lists resources that this course draws on frequently or at least occasionally. For a more extensive list of data science resources, see the General Resources page.

All materials described below are (should be) freely available online. If you can’t get to them, let me know. Note that a lot of the listed resources are dynamic and ever changing. That means occasionally links might not work, sites go offline, chapters in online books get re-arranged, etc. If you notice anything that’s not quite right, please let me know.

Books and online tutorials

  • Introduction to Data Science (IDS is the book I’ll refer to a good bit in the 1st part of the course. The link above takes you to the 1st edition of the book. Mentions of specific chapters of IDS refer to the 1st edition. However, there is now a new version of the book, which splits the old version into 2, Introduction to Data Science and Advanced Data Science. I assume the newer versions are improved, so you might want to read them instead of version 1. I just haven’t gotten around to updating references to version 2.

  • R for Data Science (R4DS) (2nd edition). This book will also be mentioned frequently. It is a very gentle and good introduction to data science in R. It has a lot of good exercises and solutions. Note that most of the course material was written when only the 1st edition existed. I’ve tried to update references to specific chapters/pages of R4DS on this website, but it could be that there is still the occasional pointer to something in the 1st edition. If you find such outdated references/links, please let me know so I can fix.

  • The Art of Data Science (ADS) has some good big-picture, introductory level chapters on different aspects of a data analysis. It is a pay what you want book with a minimum price of zero, which means you can get it for free (note that the book + video bundle has a non-zero minimum price). (I heard that sometimes, once you register, the email with the book link goes to the spam folder, make sure to check there.)

  • An Introduction to Statistical Learning (with R) (ISL) is a good introduction to the statistical and machine learning (i.e. model fitting) part of the data analysis workflow. I’ll refer to it frequently in the second part of the course. While we won’t work through the exercises/labs that are part of ISL, if you are interested in trying out some of those, Emil Hvitfeldt’s website does the labs using the tidymodels set of packages, which we will also be using in this course.

  • Hands-on Machine Learning with R (HMLR) covers somewhat similar material as ISL, but with different emphasis and a different approach. I refer to it in several of the later course modules.

  • Feature Engineering and Selection (FES) focuses on a specific aspect of the data analysis workflow and is a good resource for those topics.

  • happygitwitR (HGR) is a good resource to learn some of the basics of Git/Github with R.

  • Posit Recipes (PR) are a great source for short code examples to do common tasks.

Tools

Important notes

  • The company now called Posit was called R Studio. You will likely see the old label show up. The software editor we use for writing R code is still called R Studio.
  • Quarto is fairly new, in the past we used R Markdown instead. You will likely see references to R Markdown. Just think of it as now being Quarto (i.e. instead of .Rmd files we now work with .qmd files). They are very similar, Quarto is basically the newer version of R Markdown. R Markdown still works, but Quarto is more powerful, so it’s the better option to use/learn it.

General help

  • The main place to get any course specific help are our course discussion boards. Use them widely to ask questions, to answer others’ questions, to post links to interesting resources, etc.
  • Most questions you will have are likely not course specific, but will have to do with R/Github/RMarkdown/etc. For that, Google will be your best friend. Most of the time, someone had the same problem/question you do and someone else answered it. The only tricky thing is finding that post. Even after years of doing this, I probably google how to do something in R every day 😃.
  • When you search online for help, quite often you land on some StackExchange site (often Stack Overflow). These are a collection of widely-used online question and answer sites covering all kinds of topics (including R, Github, Data Analysis, etc.). The majority of the time, you will find an answer, or at least something that gets you closer, on those sites. I rarely go to those sites directly, instead I type my query into Google and it often sends me to one of the StackExchange sites.
  • The Posit community is another good place to ask questions. It is not as widely used as StackExchange, but it seems to be more newcomer friendly. I haven’t used it much.
  • Another highly recommended resource is the R for Data Science learning community. They have a Slack workspace which you can join for free and ask questions about R, Data Science and related topics. They are very newbie-friendly. I have not used them much but they seem to respond fairly quickly and are helpful.

More resources

The General Resources page has a more extensive list of relevant resources.