I kept adding resources until things got too unwieldy and the
Course Resources page was becoming too large đ. So I decided
to split things into two pages. The Course Resources page lists
materials directly related to and used/mentioned in the course. This
page lists a lot of other resources that are not heavily featured in the
course, but that might be useful and interesting. Everything listed here
is broadly related to the course topic, i.e. the resources focus on Data
Science/Stats/R Coding/GitHub/etc. For even more materials, see the
links to various lists by others at the end of this document.
Most materials described below are (should be) freely available
online. For better or for worse, a lot of the resources I list below are
dynamic and ever changing. That means occasionally links might not work,
sites go offline, chapters in online books get re-arranged, etc. If any
link does not work and you canât access the materials for some reason,
let me know so I can update this document.
I placed them into categories according to main topic, but there is a
lot of overlap. Many R coding resources focus on data analysis, and most
data science resources I list focus on R.
I am familiar with some, but not all of these resources. Sometimes I
just took a quick glimpse to decide if it was worth including them here.
If you find particular resources especially helpful or unhelpful (both
listed and not listed), Iâd love to receive feedback.
General Data Science
- Cloud
Based Data Science - a nice online course covering many of the
topics we cover at a somewhat more basic level. You can decide what to
pay for it, including getting it for free. That course used to be called
Chromebook
Data Science and seems to be now updated and rebranded as Cloud Based Data Science.
It is done by Jeff Leek and his team.
Youâll run into Jeff multiple times throught this course.
- âData
Science Specializationâ on Coursera. One of the first comprehensive
online offerings. Coursera has gotten more restrictive over the years,
but I think you can still get each course for free.
- Stat 545 is the name of
Jenny Bryanâs previous course on Data Wrangling and exploratory
analysis. She has since turned this into a stand-alone
website/book/course/resource. Covers a bit similar topics to the R4DS
book, but with a different emphasis and from a more comprehensive and
advanced perspective.
- Advanced data
analysis for the social sciences
- Advanced Data Science version 1 and version
2
- Data science for
economists
- STOR 390 - Introduction to
data science
- Kaggle (owned by Google) is a
website that hosts data analysis competitions. Everyone can participate
and compete for - sometimes rather large - prizes. The website also has
a lot of good datasets and code, as well as other resources related to
data analysis. Definitely worth checking out.
- I used to recommend and use Datacamp, an online platform that has
interactive courses teaching R and Data Analysis (and other topics).
Unfortunately, the company dealt rather poorly with a case
of sexual harassment. They also became much less academic-friendly,
their student discount is much less nice than it used to be, and
apparently they recently sued R Studio (a company I think highly of).
Iâm not sure what the current status is on both their company culture
and their academic/student-friendliness, but I have basically moved on.
Too much other good stuff available to bother further.
- Exploratory Data Analysis -
materials for an online course teaching exploratory data analysis using
R, taught by John Paul
Helveston.
- The journal PeerJ has a collection of articles on the topic of Practical
Data Science for Stats. A lot of the papers in that collection use
R.
- Roger Peng and Hillary Parker have a Stats and Data Science related
podcast called Not so standard
deviations.
- A few individuals, most notably Roger Peng, Brian Caffo and Jeff Leek have books on Leanpub
related to R and data science. Most of the books have a minimum price of
zero and are worth looking at. If you feel any of these Leanpub books
are worth paying for, go ahead and do so. But I am fairly sure those
authors do not rely on the book royalties for their living đ, so if you
canât or donât want to pay, getting them for free is ok. As a side note,
Leanpub uses Markdown, which means if you write a report in (R)Markdown
and want to turn it into a (self)-published book, it is rather easy to
do with Leanpub. Thatâs how those individuals made their books, as
spin-offs from their RMarkdown course materials.
- ModernDive - Statistical Inference
via Data Science - another good recent book covering data analysis
with R.
- Introduction to Modern
Statistics is a free online textbook teaching statistics using R in
a modern framework.
- Telling Stories with
Data - an interesting way to discuss data analysis, focusing on the
story/message.
- Reproducible
Medical Research with R - free online book showing how to use R to
do basic analysis.
- Data Science for the Biomedical
Sciences - another free online textbook. Part of a workshop, but can
also be used for self-learning.
- Elements of
Statistical Learning - is a somewhat advanced book on
statistical/machine learning. Not useful as introduction, but a
potentially good reference.
- Interpretable
Machine Learning is an online book that discusses approaches that
can be used to start making sense of sometimes complex ML models.
- Jesse Mostipak, aka
Kiersi streams data science sessions on Twitch.
- Nick Wan is
another date science Twitch streamer.
- David
Robinson has videos of screencasts showing him digging into datasets
from TidyTuesday and other sources.
- Andrew Heiss has a lot of
good materials related to R and data analysis on his website.
Data Sources and Wrangling
Data Visualization
- Data
Visualization - comprehensive materials for an online course on data
visualization in R, taught by Andrew Heiss.
- A great free book which discusses the principles of good data
visualization is Fundamentals of Data
Visualization. The book is not R specific (and doesnât show R code,
but all figures are made in R). * Data
Visualization - A practical introduction is a fairly complete free
online draft of a book by the same name. It provides a general
introduction to making good graphs, and the R code for the figures is
shown.
- Flowing Data is a website
with a lot of cool information on how to make great data visualizations.
Some content is free, other parts are not.
- The Esquisse R
package lets you quickly make ggplots in an interactive manner. Very
good to get started on some exploratory plots. You can take the ggplot
code you generated and tweak further.
- Graphics
Principles is a website that gives general tips for effective visual
communication. Examples using R are also provided.
Pitfalls and best practices in data analysis
Researcher degrees of freedom (p-hacking)
- The concept of Researcher
degrees of freedom, which is related to Data Dredging and
p-hacking
are important ideas to keep in mind when doing a data analysis. Note
that this issue is often cast in the language of p-values since those
are still (unfortunately) the most common approach to statistical
analyses. But the concept applies even if one doesnât use p-values.
- You can find a fun hands-on exploration of the potential problem of
researcher degrees of freedom in this 538
visualization and another choose-your-own adventure story here.
* For further discussions of this general problem, see e.g. this
article from 538 (which goes with the hands-on example just
mentioned) or this
article by Gelman and Loken, with a closely related article here.
- This paper
provides a nice and easy to follow illustration how researcher degrees
of freedom, combined with incomplete reporting, can lead to apparently
nonsensical results. The study is a (fake) psychology study, but
everything applies in general and it is easy to follow.
- Not surprisingly, xkcd has also covered the topic of p-hacking.
Reproducible research
- This
study provides a nice glimpse at the problems that still exist when
trying to reproduce/replicate prior studies by re-running the code.
- R Workflow is an online
book describing how to do reproducible research using the R ecosystem
and the still fairly new Quarto
framework.
- For more Quarto, the Awesome Quarto
repository has a nice curated list of links to resources.
General Statistical Analysis
- Common
statistical tests are linear models is a website that illustrates
how many standard statistical tests are equivalent to certain types of
linear models. Very useful if you are bewildered by the zoo of
statistical tests and wonder how they are related to regression
models.
- Library of Statistical
Techniques is a collection of short explanations and code covering a
range of different statistical topics. More general data analysis
topics, e.g. wranging and visualization, are also covered.
- Common
statistical myths and how to push back - this is a collection of
links to references that address/refute common statistical myths (i.e.,
things that are wrong but that are commonly done/said/written in the
scientific literature anyway.)
- Improving
Your Statistical Inferences is an online resource with useful
information on how to improve various types of statistical
analyses.
- Moving
to a World Beyond âp<0.05â is a nice article with suggestions for
how to report statistical results more appropriately than being fixated
on p-values.
Bayesian Analysis
While we donât cover Bayesian methods in this course, I
personally find them very useful and compelling. Here are some resources
that could be worth checking out if you want to learn some Bayesian
statistics/data analysis.
- Statistical
Rethinking by Richard McElreath. My favorite stats book (Bayesian or
otherwise). It starts slow but goes pretty far. The book is not free
(but worth the price), but there are resources on the website which are
free.
- Bayes Rules by
Johnson, Ott and Dogucu. Very hands-on introduction to Bayesian
statistics. The online version is free.
Causal Analysis
Unfortunately, as part of this course, we cannot cover the broad and
important topic of causal
analysis. However, it is a topic worth learning. If you are
interested, here are a few basic references that can get you started.
Most of the ones listed are fairly non-technical and thus
beginner-friendly.
Machine Learning (ML)
- Machine Learning University
(MLU) is an educational offering from Amazon with several nice
tutorials covering important ML-related topics. It also includes very
basic statistical concepts such as linear/logistic regression.
R coding
- R Studio primers
are a great collection of lessons covering the basics of R coding and
data analysis. I highly recommend them.
- R Studio education is a
fairly new website that I expect will contain an increasing collection
to all kinds of useful teaching resources related to R and Data Science.
Check their Learn section for links to resources.
- Swirl is a package that teaches
R inside R. Especially complete beginner students have found it to be a
nice start since it provides very encouraging feedback. The downside is
that all code writing happens interactively in the
R
console, which is not the way one writes real code. Itâs still worth
checking out if you want to get some more direct, hands-on R practice.
Unfortunately, the package seems dormant and hasnât been updated in a
few years (but whatâs in there probably still works?)
- Ready for R - materials
for a basic introductory online R course taught by Ted Laderas.
- Modern R with the tidyverse
- online book that provides a very nice introduction to important
concepts of R coding with a focus on data analysis.
- Intro to Programming for
Analytics - materials for an online course teaching intro to
programming with R, taught by John
Paul Helveston.
- Efficient R
programming contains a lot of good tips and tricks towards writing
better code.
- R for Epidemiology - an
introduction to R with a focus on tasks that are often used in
Epidemiology/Public Health.
- Tidy Modeling with R are the
beginnings of a hopefully great and comprehensive book that describes
analysis/modeling using the
tidyverse
set of packages.
- Learning statistics
with R - Iâve not read/used it, but heard from others who like
it.
- What They Forgot to Teach You About
R is the beginning of an online book which covers some topics rarely
found elsewhere. As of this writing, the book is fairly incomplete, but
still worth checking out. Especially the first several chapters and the
debugging R code sections are worth learning/reading.
- The
Introverse
R package is providing more novice-friendly
help files for important tidyverse
functions. If you
struggle with the default help file for a function, check out this
package.
Git/GitHub
- The Software Carpentry
has a great introductory course that walks you through the basics of Git
(and GitHub) step-by-step. This is useful if you want to know what
exactly is going on, even if you mainly use a graphical interface for
your Git/GitHub work. The whole course materials are online.
Quarto
- The Quarto website has a ton of
great information and documentation.
- Here is another example and template of setting up a website with Quarto,
similar to what you are asked to do for the Introductory
Exercise.
- Quarto Club is a collection of
nice Quarto website examples. Most of them have their source code on
GitHub, so you can see how the creators of those pages accomplished what
they made, and shamelessly copy/paste/adapt đ.
Lists and other sources
- Big Book of R - a website
listing and summarizing several hundred books, many free, related to R
and Data Science. If you are looking for a resource on a specific topic,
this is a good place to check.
- By now, there are hundreds of books on R and Data Science available
online. Many of these books are written in bookdown, a version of R
Markdown. You will learn all about it in this course. It is worth
checking out the main bookdown
website as well as the archive list and scrolling
through the list of books. Some of the books you can find there are very
good. Of course, there is also a good bit of ânoiseâ.
- Another recent list of good R and Data Science resources can be found
here.
- Teach Data Science - a
blog with short, informative posts on various aspects related to data
science using R.
- Machine
Learning - an online reference (almost book) which nicely explains
some of the basics of machine learning.
- RStudio has a collection
of materials for data science.
- R Studio
cheatsheets are 1 page reference documents that quickly let you see
how you use specific R packages or do certain tasks. A very useful
resource, definitely check them out.
- A
meta-cheatsheet - this is a cheat-sheet showing you links to
different R packages and their cheat-sheets for specific tasks. A nice
overview document, developed by the folks at business-science.io.
- Data
Science Learning Resources - a collection of links to resources that
discuss general aspects of the data science field.
- I created lists related to R and Data Analysis (as well as other
topics). You
can find all resource lists here. (These lists are works in
progress, and some are better/more useful than others. Feel free to send
me links/resources to include).