Get the quiz sheet for this module from the general Assessments page. Fill it in, then submit to the online grading system before the deadline.
For this exercise, you are asked to re-create a professional-looking graph that you find online. No group work this week, but of course more GitHub. This exercise will go into your portfolio.
For motivation take a quick look at this blog post by Rafael Irizarry (the author of the IDS book we’ve been using). He shows how one can use R and ggplot to reproduce figures that look very similar to professional ones. You’ll see that it usually doesn’t require that many lines of code to get an outstanding looking figure!
We’ll try to do that ourselves.
Open your portfolio website project in RStudio. Then open the
visualization_exercise.Rmd file. We’ll use that file for this exercise.
Find some interesting graph from a news website. My main suggestion is to look at graphs from FiveThirtyEight. They are known to produce high-quality graphs, and for some of their stories and figures, they also provide the original data source.
Good starting points for a chart from FiveThirtyEight might be their annual summaries of weirdest charts, which you can find here for 2020, 2019, 2018, 2016, and 2015. (No, I don’t know why there’s no 2017.)
If you can’t find a graph on FiveThirtyEight that you like and would like to reproduce, you are also allowed to check out other major news outlets (USA Today, NY Times, Wall Street Journal, The Guardian, etc.).
The only requirements are that the original graph must be interesting and good looking (a basic scatterplot is not enough) and freely available online, not behind a paywall.
Once you have a graph you’d like to reproduce, either extract (e.g, read off or some other extraction method) the data right from the graph or find the original data source. If neither is an option, move on to the next graph. Having access to the data is critical.
Make a new folder in your repository, call it
data. Place the data file you found or created inside that folder. If you found it, it will be in whatever format you got. If you made it yourself by extracting it from the original graph, I recommend a CSV file.
Once you have the data and the original graph, you’ll first need to add a few lines of code to your
visualization_exercise.Rmd file to load the data, and if needed, do some cleaning.
Then write R code to try and get as close as possible to the original graph. You will likely use ggplot2 & friends, but if you want to use base R, lattice or another plotting approach that’s ok too, as long as everything happens with R code. The goal is to put in some effort to get close, but don’t spend an insane amount of time trying to make a perfect copy. A few (maybe 1-4) hours of working on this should suffice. If it still looks a bit different in the end, that’s ok.
Your final product should be code and explanatory text inside your Rmd file that shows the original plot and links to its source, then shows your code that re-creates the graph, and finally shows your graph. Also, provide some additional information that helps readers understand how you went about making the plot. For instance, you should provide some additional text describing things you tried that did and didn’t work, or include links to sources from which you took inspiration, adapted your code, etc.
Once done, re-build your website and make sure that your data visualization exercise shows up nicely. Then, post the link to your online page showing your visualization into the module discussion channel (see next).
And of course, for this and any other products you create, if you are active on social media, you can post an announcement of your project there. I’ll be looking out for #MADA2021 tagged tweets 😁.
Post the link to your visualization page to the discussion channel for this module (module 5). Take a look at some of the figures your colleagues created, and provide feedback.
Since the exercise is due Friday, you can either look at and comment on those posts from classmates that finished the exercise early, or continue the discussion after Friday. I won’t check/assess until beginning of the following week.