This page lists resources that this course draws on frequently or at
least occasionally. For a more extensive list of data science resources,
see the General Resources page.
All materials described below are (should be) freely available
online. If you can’t get to them, let me know. Note that a lot of the
listed resources are dynamic and ever changing. That means occasionally
links might not work, sites go offline, chapters in online books get
re-arranged, etc. If you notice anything that’s not quite right, please
let me know.
Books and online tutorials
- IDS = Introduction to
Data Science is the book I’ll refer to a good bit in the 1st part of
the course.
- R4DS = R for Data Science (2nd
edition). This book will also be mentioned frequently. It is a very
gentle and good introduction to data science in R. It has a lot of good
exercises and
solutions. Note that most of the course material was written when
only the 1st edition existed. I’ve
tried to update references to specific chapters/pages of R4DS on this
website, but it could be that there is still the occasional pointer to
something in the 1st edition. If you find such outdated
references/links, please let me know so I can fix.
- ADS = The Art of Data
Science has some good big-picture, introductory level chapters on
different aspects of a data analysis. It is a pay what you want
book with a minimum price of zero, which means you can get it for free
(note that the book + video bundle has a non-zero minimum price). (I
heard that sometimes, once you register, the email with the book link
goes to the spam folder, make sure to check there.)
- ISL(R) = An Introduction to
Statistical Learning (in R) is a good introduction to the
statistical and machine learning (i.e. model fitting) part of the data
analysis workflow. I’ll refer to it frequently in the second part of the
course. While we won’t work through the exercises/labs that are part of
ISL, if you are interested in trying out some of those, here
is a website that does the labs using the tidymodels set of
packages, which we will also be using in this course.
- HMLR = Hands-on
Machine Learning with R covers somewhat similar material as ISL, but
with different emphasis and a different approach. I refer to it in
several of the later course modules.
- FES = Feature Engineering
and Selection focuses on a specific aspect of the data analysis
workflow and is a good resource for those topics.
- HGR = happygitwitR is a
good resource to learn some of the basics of Git/Github with R.
- RP = R Studio
Primers are a great source for introductory, interactive tutorials
covering basics of R coding and data analysis.
Important notes
- The last time this course was taught, the company now called
Posit was called R Studio. You will
likely see the old label show up. The software we use
for coding R is still called R Studio.
- The last time the course was taught, Quarto did not
exist, we used R Markdown instead. You will likely see
references to R Markdown. Just think of it as now being
Quarto (i.e. instead of
.Rmd
files we now work with
.qmd
files).
General help
- The main place to get any course specific help are our course
discussion boards. Use them widely to ask questions, to answer others’
questions, to post links to interesting resources, etc.
- Most questions you will have are likely not course specific, but
will have to do with R/Github/RMarkdown/etc. For that, Google will be
your best friend. Most of the time, someone had the same
problem/question you do and someone else answered it. The only tricky
thing is finding that post. Even after years of doing this, I probably
google how to do something in R every day 😃.
- When you search online for help, quite often you land on some StackExchange site (often Stack Overflow). These are a collection of
widely-used online question and answer sites covering all kinds of
topics (including R, Github, Data Analysis, etc.). The majority of the
time, you will find an answer, or at least something that gets you
closer, on those sites. I rarely go to those sites directly, instead I
type my query into Google and it often sends me to one of the
StackExchange sites.
- The Rstudio community
is another good place to ask questions. It is not as widely used as
StackExchange, but it seems to be more newcomer friendly. I haven’t used
it much.
- Another highly recommended resource is the R for data science community.
They have a Slack workspace which you can join for free and ask
questions about R, Data Science and related topics. They are very
newbie-friendly. I have not used them much but they seem to respond
fairly quickly and are helpful.
More resources
The general resources page has
a more extensive list of relevant resources.