Get the quiz sheet for this module from the general Assessments page. Fill it in, then submit to the online grading system before the deadline.


For this exercise, you will practice with GitHub some more, you are exploring a data analysis (at least a template), and you will start doing group work.


Start by going to the Github repository called dataanalyis-template. It is meant as a template for doing a data analysis project. You will also use it for the class project. Read the information in the README file (at the bottom of the website).

Next, use the template to start your own repository. Follow the same steps you used previously with the portfolio template. Name it YOURNAME-MADA-test-analysis. Clone it to your local machine, open the project in RStudio.

Take a look at the different folders, files, readme comments. Feel free to run the various R and Quarto scripts. The idea is that you get some familiarity with the whole setup to prepare you for the next steps.

Group setup

If you haven’t already, find your group. You can find that information in the pinned post in the announcements channel. Get in touch with your group members, e.g. by using the group-specific channels on Slack, or some other means. Assign each group member a role for this exercise. One person is the project owner, the others are contributors number 1, 2, … depending on the number of group members. (If you want to know what each role does, read through the rest of the exercise instructions first.) You also need to exchange GitHub user names.

Pretend data-analysis

The person your group designated as project owner should do the following: Go ahead and start another repository using the template. Name it GROUP-N-MADA-test-analysis (with N of course your group number). Next, find the repository on your GitHub page, go into Settings -> Manage Access and use the Invite Collaborator button to add the other group members to the repository. You need to give them write access. Your group members need to accept the invitation.

Once everyone is added and all invitations are accepted, all group members can go ahead and clone the repository to their local computer using GitKraken (or your preferred client). Make sure when you clone, you choose Github.com as source, and then you should be able to see the repository in the Repository to clone list.

Once cloned to the local computer, contributor 1 goes ahead and opens the exampledata.xlsx file (with Excel or a similar program) and adds 2 more columns to the data. One column should be something numeric, the other can be something consisting of (a few) different categories. As a (boring) example, you could add eye color and waist size. Feel free to come up with more creative attributes/variables to add. Also add descriptions of your new variables to the Codebook sheet. Once done, save the new data as exampledata2.xlsx, then commit your changes (write a meaningful commit message) and push them to GitHub.com.

Once contributor 1 is done, contributor 2 pulls the repository with the updated changes (i.e., the added data). Take a look at the new variabes (use the Codebook information if they aren’t obvious). Adjust the processingcode.R file such that it now loads the new data file called exampledata2.xlsx. Take a look at the new data. If necessary (depending on what contributor 1 added), add a few lines of code to clean the new data as needed. If this is needed, you either need to know at least a little bit of R coding, otherwise Google your way to a solution. Once everything works and the script creates a new clean dataset that can be saved as processeddata.rds, go to processingfile_v2.qmd and adjust as needed to make sure the updates you made to the R script are reflected in the Quarto file. Once both processingscript.R and processingfile_v2.qmd are updated and run without error messages, commit your changes and push to GitHub. Then let the next person in your group know.

If your team does not have 4 persons, then contributor 2 will also need to do the part of contributor 3.

Contributor 3 pulls the latest version of the repository, then opens statistical_analysis.R. Edit the code such that it creates a boxplot with the new categorical variable (whatever it is) on the x-axis, and height on the y-axis. Also create another scatterplot with weight on the x-axis and the new numerical variable on the y-axis. Save both figures to files. Once done, commit and push.

Finally, it’s back to the project owner. Pull the latest version of the repository. Then open Manuscript.qmd. Add code to display the two new figures for the new data your other group members produced. Process the Quarto file to make sure it renders into a word doc and everything looks good. Commit and push your changes. Finally, post the link to your repository with the updated manuscript into the module2_exercise Slack channel by the deadline.

After the deadline, I’ll check/grade the repository of each group. The grade will be a single one and the same for every group member.


Look online and find an example of a research project that provides (or claims to provide) all materials to allow reproduction of results, similar to Dr. Brian McKay’s project I shared with you. If you are able, download the materials and see if you can reproduce things. I suggest you focus on projects that are done with our set of tools (R/Quarto, etc.), but that’s not required. Report the project you found and your experience being able (or not) to reproduce it as a post in the Module 2 Discussion channel.

Then take a look at a few of your classmates postings and explore/comment on what they found.