Assessment - READy Workflow
Quiz
Get the quiz sheet for this module from the general Assessments page. Fill it in, then submit to the online grading system before the deadline.
Discussion
Describe (at least) one feature of the READy workflow that you found surprising, or unintuitive or confusing, or something you think is missing or you can think of a alternative and possibly better way of doing things.
Then comment/reply to your classmates posts.
Exercise
This project asks you to go through the steps of a simple data analysis. As you go about executing this small project, think about the READy concepts and implement them as suitable. This exercise also involves group work, and of course GitHub.
This exercise has staggered due dates, see below.
Group setup
As in the previous exercise, assign each member in your group a different, arbitrary number (I’m calling them M1, M2, …). Make sure you are teamed up with a different person this time. For this exercise, everyone will first work on their own and finish this part by Wednesday. Then M1 will contribute to M2’s repository, M2 will work on M3, etc. The last person (M3/M4/M5, based on the number of people in your group), will work on M1’s repository. This way, everyone will work on their own and one group member’s repository.
Part 1
Adding data
You will be using another project template for this exercise (and the same one also for your class project). Go to the Github repository for the data-analysis-template and read the information in the README.md file (shown at the bottom of the repository website). You can ignore the bits about Package Management.
Then start a new repository using this template by clicking the green Use this template button on the top right of the repository page, then chose Create a new repository. Name the repository FirstnameLastname-ready-exercise. Make sure the repository is public. Add a description. Then create the repository. Once it has been created on GitHub.com, clone it to your local computer.
One you have a local version, open it in Positron an look at the different folders, files, and readme.md comments that you find inside the various folders. You should be able to run all the various R and Quarto scripts. The idea is that you get some familiarity with the whole setup to prepare you for the next steps.
Now go ahead and open the exampledata.xlsx file (with Excel or a similar program) located in the raw-data folder, and add 2 more columns to the data. One column should be something numeric, the other can be something consisting of (a few) different categories. As a (boring) example, you could add eye color and waist size. Feel free to come up with more creative attributes/variables to add. Also add descriptions of your new variables to the Codebook sheet/tab. Once done, save the new data as exampledata2.xlsx.
Then find report.qmd in the products/report folder, open it, and add a few sentences explaining what you added to the data into the Description of data and data source section.
Commit your changes (write a meaningful commit message) and push them to GitHub.com.
Collaborating
Instead of using the fork + pull-request workflow, we will explore a different collaborative approach. In this approach, you and your collaborator work on the same repository. To that end, you need to add your classmate as collaborator. Go to Github.com, find the repository for this exercise (FirstnameLastname-ready-exercise). Then go to Settings, then Collaborators. Choose Add People and add your classmate (based on their GitHub account name). Your classmate should receive an invitation, which they need to accept. With this, they are now able to directly push and pull to your repository, without them needing to create a fork. (You can remove them after this exercise if you don’t want them to be able to continue having write access to your repository).
To avoid any potential merge conflicts, once your classmate takes over, you shouldn’t make further changes to the repository.
You need to have this done by Tuesday evening.
Part 2
You should have received an invitation to be a collaborator on your classmate’s repository. Accept it, then directly clone (not fork) the repository to your local computer.
Look at the exampledata2.xlsx file and information in Codebook, as well as the description in report.qmd to understand what new variables your classmate created.
Find processingfile-v1.qmd inside /code/processing-code/, make a copy of the file. Update the code such that it now loads the new data file called exampledata2.xlsx. Take a look at the new data. Add code to clean the new data as needed. Have the code save the updated data to processeddata2.rds.
If you prefer, you can also make copies and edit processingcode.R and processingfile-v2.qmd instead of processingfile-v1.qmd. The first option is a bit easier since you have to only edit one file, but the second option of working in a separate R script can become more efficient once you are doing larger projects with more code.
Next, repeat those steps for the exploratory data anaylsis (EDA) code. Make a copy of the relevant files using either approach (single qmd file or R-script + qmd file), add code to the new file to create a boxplot with the new categorical variable (whatever it is) on the x-axis, and height on the y-axis. Also create a scatterplot with weight on the x-axis and the new numerical variable on the y-axis. Save both figures to files into the appropariate results folder.
Make sure everything works, then commit and push your changes to GitHub.com. Let your classmate know that you are done with your part.
Note that you are now directly pushing to the original repo which is owned by your classmate. This is easier, you don’t need to do fork and pull request (PR). It’s also more dangerous, since you could potentially mess up your classmate’s repo. So make sure things work before committing and pushing.
You need to have this done by Thursday evening.
Part 3
Everyone goes back to their own repository. This should now contain the content your classmate contributed. Pull the changes from GitHub.com to your local computer.
Find statistical-analysis.R inside /code/analysis-code/. Make a copy. Edit the code such that it fits a third linear model with Height as outcome and the 2 new variables as predictors. Save the result into resulttable3.rds into the appropriate results folder.
I’m sure you noticed that for this last part, we switched from Quarto files to R scripts. I’m doing this to show you that either are fine. Sometimes it’s better to combine code and text in one file. Sometimes just having code by itself is easier/better. In any case, you should add lots of comments and documentation to the Quarto or R file.
Now we are ready to include the new findings. Open report.qmd inside the products/report folder.
At the top of the file, list yourself and your classmate as authors. Remove everything else from the example that’s irrelevant for this exercise.
Write the appropriate text into the relevant Methods and Results sections. This will include code to pull in the new figures and the new table.
Adjust sections as you see fit, just make sure you explain everything that is happening in this - very basic - analysis. Also delete everything in this report that is not relevant/applicable to your exercise.
Once all is done, render the report. Find the report.html file, open it on your browser and make sure everything looks ok. If all is good, commit and push to make sure your local repo is in sync with your remote.
Now continue in your portfolio website repository.
Open the portfolio repo in Positron. Create a new folder inside the main folder called ready-exercise. Copy the report.html file you just created into that folder, and rename it ready-exercise.html.
Now open _.quarto.yml. Find the menu section and copy/paste the 2 lines from “Starter Analysis Exercise” with “READy Exercise”, adjusting the path and the text accordingly. Make sure to point to ready-exercise.html NOT .qmd. Save the file.
Fully rebuild/render your portfolio website. Make sure everything works and the new exercise shows up in the menu and links to the correct page. Commit and push your changes. Go to your website URL to make sure it also looks ok on your website. I will only be checking the entry on your portfolio website, so you need to make sure it is all there.
You need to have this done by Friday evening.
Since this will be part of your portfolio site, and you already posted a link to that previously, you don’t need to post anything, I know where to find it. I will assess both the contribution of the repository owner and the classmate who added to this.
Some more comments on GitHub workflows:
In general, if you work closely with someone on a project, it might make sense to add them as collaborator, and as needed coordinate with them to avoid merge conflicts. Otherwise, telling someone to contribute by forking and sending a pull request is the safer approach, and you have control if you want to accept their changes or not.
There is yet another common way to use GitHub, namely collaborators working in the same repository, but with different branches. Think of a branch like a fork, but it happens inside the repository. Work can occur independently on branches, and at some point one can merge branches. This allows people to work in a single repository, but minimizes possible merge conflicts. This approach is standard for larger projects with many collaborators. For this class, we won’t use branches, but note that they are useful and commonly used.