I kept adding resources until things got too unwieldy and the Course Resources page was becoming too large 😁. So I decided to split things into two pages. The Course Resources page lists materials directly related to and used/mentioned in the course. This page lists a lot of other resources that are not heavily featured in the course, but that might be useful and interesting. Everything listed here is broadly related to the course topic, i.e. the resources focus on Data Science/Stats/R Coding/GitHub/etc. For even more materials, see the links to various lists by others at the end of this document.

Most materials described below are (should be) freely available online. For better or for worse, a lot of the resources I list below are dynamic and ever changing. That means occasionally links might not work, sites go offline, chapters in online books get re-arranged, etc. If any link does not work and you can’t access the materials for some reason, let me know so I can update this document.

I placed them into categories according to main topic, but there is a lot of overlap. Many R coding resources focus on data analysis, and most data science resources I list focus on R.

I am familiar with some, but not all of these resources. Sometimes I just took a quick glimpse to decide if it was worth including them here. If you find particular resources especially helpful or unhelpful (both listed and not listed), I’d love to receive feedback.

General Data Science

Pitfalls and best practices in data analysis

Researcher degrees of freedom (p-hacking)

  • The concept of Researcher degrees of freedom, which is related to Data Dredging and p-hacking are important ideas to keep in mind when doing a data analysis. Note that this issue is often cast in the language of p-values since those are still (unfortunately) the most common approach to statistical analyses. But the concept applies even if one doesn’t use p-values.

  • You can find a fun hands-on exploration of the potential problem of researcher degrees of freedom in this 538 visualization and another choose-your-own adventure story here.

  • For further discussions of this general problem, see e.g. this article from 538 (which goes with the hands-on example just mentioned) or this article by Gelman and Loken, with a closely related article here.

  • This paper provides a nice and easy to follow illustration how researcher degrees of freedom, combined with incomplete reporting, can lead to apparently nonsensical results. The study is a (fake) psychology study, but everything applies in general and it is easy to follow.

  • Not surprisingly, xkcd has also covered the topic of p-hacking.

Reproducible research

  • This study provides a nice glimpse at the problems that still exist when trying to reproduce/replicate prior studies by re-running the code.

Bayesian Analysis

While we don’t cover Bayesian methods in this course, I personally find them very useful and compelling. Here are some resources that could be worth checking out if you want to learn some Bayesian statistics/data analysis.

  • Statistical Rethinking by Richard McElreath. My favorite stats book (bayesian or otherwise). It starts slow but goes pretty far. The book is not free (but worth the price), but there are resources on the website which are free.
  • Bayes Rules by Johnson, Ott and Dogucu. Very hands-on introduction to Bayesian statistics. The online version is free.

Longitudinal Analysis

Causal Analysis

Unfortunately, as part of this course, we cannot cover the broad and important topic of causal analysis. However, it is a topic worth learning. If you are interested, here are a few basic references that can get you started. Most of the ones listed are fairly non-technical and thus beginner-friendly.

  • This short paper provides a very basic and easy introduction and commentary on the topic of causal analysis.

R coding

  • Swirl is a package that teaches R inside R. Especially beginner students have found it to be a good start since it provides very encouraging feedback. The downside is that all code writing happens interactively in the console, which is not the way one should write real code. It’s still worth checking out if you want to get some more direct, hands-on R practice.
  • R Studio primers are a great collection of lessons covering the basics of R coding and data analysis. I highly recommend them.
  • R Studio education is a fairly new website that I expect will contain an increasing collection to all kinds of useful teaching resources related to R and Data Science. Check their Learn section for links to resources.
  • Ready for R - materials for a basic introductory online R course taught by Ted Laderas.
  • Intro to Programming for Analytics - materials for an online course teaching intro to programming with R, taught by John Paul Helveston.
  • Efficient R programming contains a lot of good tips and tricks towards writing better code.
  • R for Epidemiology - an introduction to R with a focus on tasks that are often used in Epidemiology/Public Health.
  • Tidy Modeling with R are the beginnings of a hopefully great and comprehensive book that describes analysis/modeling using the tidyverse set of packages.
  • Learning statistics with R - I’ve not read/used it, but heard from others who like it.
  • What They Forgot to Teach You About R is the beginning of an online book which covers some topics rarely found elsewhere. As of this writing, the book is fairly incomplete, but still worth checking out. Especially the first several chapters and the debugging R code sections are worth learning/reading.
  • The Introverse R package is providing more novice-friendly help files for important tidyverse functions. If you struggle with the default help file for a function, check out this package.

Git/GitHub

  • The Software Carpentry has a great introductory course that walks you through the basics of Git (and GitHub) step-by-step. This is useful if you want to know what exactly is going on, even if you mainly use a graphical interface for your Git/GitHub work. The whole course materials are online.

Data Visualization

  • Data Visualization - comprehensive materials for an online course on data visualization in R, taught by Andrew Heiss.
  • A great free book which discusses the principles of good data visualization is Fundamentals of Data Visualization. The book is not R specific (and doesn’t show R code, but all figures are made in R).
  • The books and resources by Edward Tufte, are classics. He is one of the most influential people in the field of visualization. Unfortunately, as far as I’m aware, his books are not freely available.
  • Data Visualization - A practical introduction is a fairly complete free online draft of a book by the same name. It provides a general introduction to making good graphs, and the R code for the figures is shown.
  • Flowing Data is a website with a lot of cool information on how to make great data visualizations. Some content is free, other parts are not.
  • The Esquisse R package lets you quickly make ggplots in an interactive manner. Very good to get started on some exploratory plots. You can take the ggplot code you generated and tweak further.

Lists and other sources

  • By now, there are hundreds of books on R and Data Science available online. Many of these books are written in bookdown, a version of R Markdown. You will learn all about it in this course. It is worth checking out the main bookdown website as well as the archive list and scrolling through the list of books. Some of the books you can find there are very good. Of course, there is also a good bit of “noise”.
  • Another recent list of good R and Data Science resources can be found here.
  • Teach Data Science - a blog with short, informative posts on various aspects related to data science using R.
  • Machine Learning - an online reference (almost book) which nicely explains some of the basics of machine learning.
  • Data Sci Guide is a website that is trying to collect and curate a lot of data science sources.
  • RStudio has a collection of materials for data science.
  • R Studio cheatsheets are 1 page reference documents that quickly let you see how you use specific R packages or do certain tasks. A very useful resource, definitely check them out.
  • A meta-cheatsheet - this is a cheat-sheet showing you links to different R packages and their cheat-sheets for specific tasks. A nice overview document, developed by the folks at business-science.io.
  • I created lists related to R and Data Analysis (as well as other topics). You can find all resource lists here. (These lists are works in progress, and some are better/more useful than others. Feel free to send me links/resources to include).