Get the quiz sheet for this module from the general Assessments page. Fill it in, then submit to the online grading system before the deadline.
For this exercise, we will join the R data science community and participate in Tidy Tuesday, a weekly exercise that allows you to practice some parts of data analysis. The focus for Tidy Tuesday is on getting, cleaning (i.e., tidying) and exploring a data set, though people sometimes do statistical analyses as well.
This is another solo exercise, so no group work this week, but of course more GitHub. This exercise will go into your portfolio.
You briefly encountered Tidy Tuesday before in the Data Analysis Motivation unit. If you can’t remember, revisit that unit and re-read that section. Also read the information at the provided links.
If you want to see another Tidy Tuesday example, here is
one I did, as part of a previous version of MADA. I did it at the
same time the students did theirs. I plan to do my own Tidy Tuesday
again this time. For even more examples of previous analyses, see this nice
Shiny app which scans twitter for posts mentioning
Apparently, there is now also a Tidy Tuesday podcast. I’m not familiar with it, so I’m not sure what it’s all about and if it is worth checking out. It probably is worth giving it a try, since it’s (almost) always good to check out new things 😁 and those are very short recordings, so I think they just introduce the data of the week. If you end up listening to some, let me know what you think (post somewhere on Slack).
Your assignment is to participate in Tidy Tuesday by analyzing this week’s dataset. You can start as soon as the dataset is posted, which is Mondays. The datasets are released on GitHub. As I’m writing this, I have no idea what the data will be the week this exercise happens. That’s part of the fun of it 😄.
Here are some more detailed instructions:
Use your portfolio website. Make sure it’s up to date and fully
synced. Open it in R Studio. Then open the
tidytuesday_exercise.Rmd file. That’s where you’ll write
your Tidy Tuesday analysis.
Go to the TidyTuesday Github
repository. Look for the dataset for this week, and read the
instructions on how to get the data. You will also be provided with a
data dictionary. If the data is available for download, place it
somewhere in your portfolio repository (e.g., you can make a new folder
Write an R Markdown file that loads the data, performs any needed cleaning and wrangling, and produces some results (tables, figures). For this exercise, you don’t need to implement any statistical models. If you want, you can use some simple ones (e.g. adding a linear fit line to a scatterplot, or doing some simple stats tests). This is a fairly open-ended task. I trust that you get the right balance between spending too little and too much time/effort on this. Depending on your coding skills, I think around 1-4 hours (actually working, not having Facebook open at the same time 😁) might be a good time commitment.
Once done with your Tidy Tuesday analysis, rebuild your portfolio site to make sure everything works and looks good, that all the links work, etc. Then push to GitHub by the deadline. Since this will be on your portfolio website, and I know where to find it, there is no need to post any link this week.
And of course, for this and any other products you create, if you are active online (e.g., social media or LinkedIn), you can post an announcement of your exercise there.
Write a post in this week’s discussion channel that answers this question:
What is (so far) your favorite R package or command or plot type or… for data cleaning or exploration? And what data cleaning or exploration operations do you find tricky to do and wish you knew a better way of tackling that (with R)?
Then comment on each others posts. Especially if someone is looking for better ways of doing certain cleaning/exploration operations in R, if you know of a good way, tell them.