Quiz

Get the quiz sheet for this module from the general Assessments page. Fill it in, then submit to the online grading system before the deadline.

Exercise

Overview

For this exercise, we will join the R data science community and participate in Tidy Tuesday, a weekly exercise that allows you to practice some parts of data analysis. The focus for Tidy Tuesday is on getting, cleaning (i.e., tidying) and exploring a data set, though people sometimes do statistical analyses as well.

This is another solo exercise, so no group work this week, but of course more GitHub. This exercise will go into your portfolio.

What is Tidy Tuesday

You briefly encountered Tidy Tuesday before in the Data Analysis Motivation unit. If you can’t remember, revisit that unit and re-read that section. Also read the information at the provided links.

If you want to see another Tidy Tuesday example, here is one I did, as part of a previous version of MADA. I did it at the same time the students did theirs. I plan to do my own Tidy Tuesday again this time.

For even more examples of previous analyses, see this Shiny app which scans twitter for posts mentioning #tidytuesday. Unfortunately, it doesn’t seem to be updated/maintained anymore. But you can still find people on social media posting links to their Tidy Tuesday creations (feel free to share yours as well).

Apparently, there is now also a Tidy Tuesday podcast. I’m not familiar with it, so I’m not sure what it’s all about and if it is worth checking out. It probably is worth giving it a try, since it’s (almost) always good to check out new things 😁 and those are very short recordings, so I think they just introduce the data of the week. If you end up listening to some, let me know what you think (post somewhere on Slack).

Your Tidy Tuesday Exercise

Your assignment is to participate in Tidy Tuesday by analyzing this week’s dataset. You can start as soon as the dataset is posted, which is Mondays. The datasets are released on GitHub. As I’m writing this, I have no idea what the data will be the week this exercise happens. That’s part of the fun of it 😄.

Here are some more detailed instructions:

Use your portfolio website. Make sure it’s up to date and fully synced. Open it in R Studio. Then open the tidytuesday_exercise.Rmd file. That’s where you’ll write your Tidy Tuesday analysis.

Go to the TidyTuesday Github repository. Look for the dataset for this week, and read the instructions on how to get the data. You will also be provided with a data dictionary. If the data is available for download, place it somewhere in your portfolio repository (e.g., in a new folder called data).

Write a Quarto file (with R code as part of the Quarto file or in a separate file) that loads the data, performs any needed cleaning and wrangling, and produces some results (tables, figures). For this exercise, you don’t need to implement any statistical models. If you want, you can use some simple ones (e.g. adding a linear fit line to a scatterplot, or doing some simple stats tests). This is a fairly open-ended task. I trust that you get the right balance between spending too little and too much time/effort on this. Depending on your coding skills, I think around 2-4 hours (actually working, not having Facebook open at the same time 😁) might be a good time commitment.

Once done with your Tidy Tuesday analysis, rebuild your portfolio site to make sure everything works and looks good, that all the links work, etc. Then push to GitHub by the deadline. Since this will be on your portfolio website, and I know where to find it, there is no need to post any link this week.

And of course, for this and any other products you create, if you are active online (e.g., social media or LinkedIn), you can post an announcement of your exercise there.

Discussion

Write a post in this week’s discussion channel that answers this question:

What is (so far) your favorite R package or command or plot type or… for data cleaning or exploration? And what data cleaning or exploration operations do you find tricky to do and wish you knew a better way of tackling that (with R)?

Then comment on each others posts. Especially if someone is looking for better ways of doing certain cleaning/exploration operations in R, if you know of a good way, tell them.