Overview
This documents provides more detailed instructions and grading
rubrics for each part of the project.
Note that you should not consider the scores below as corresponding
to a letter grade. Instead, I use those to differentiate between
different aspects of the project. Only at the end while I make actual
grade-like scores, which will likely involve curving (up, never
down).
At my discretion, I might consider assignments even if they are
submitted after the deadline. In that case, I might take off a point for
lateness.
Part 1
The main objective for part 1 is to describe data and
question in enough detail that I can assess if the planned project is
feasible.
Use the Manuscript.qmd
file. Remove or replace the
template information. Add your proposed project title and your
name(s).
To that end, provide the following pieces of information:
- Briefly describe what the data is, how it was collected, where you
will get (or got) it from. How many observations do you have, what was
measured? Anything else important to report about the data?
- At this stage you are not required to already have and show the
data, but if you do, even better. Then add a few lines of code which
load the data and using some of the commands you learned about, provide
summary descriptions of the data.
- Explain the question you want to answer using the data. What will be
your outcome(s) of interest (if any)? What (if any) specific predictors
will you focus on? What relations/patterns are you looking for in the
data?
- As much as you know, suggest how you will analyze it. At this stage
in the course, we haven’t covered analysis approaches yet, so you can
keep things vague and non-technical here.
- You are allowed, but not yet required, to provide background
information for the question you plan to answer. For instance you can
describe why it’s an interesting question, who else has done similar
analyses, how your analysis will be new/different, etc. Similar to what
you read in an introduction to a research paper. For the final report,
you’ll need these parts. For part 1, they are not required, but you are
welcome to already write down some of that.
- Eventually, for your final report, what you write for this part will
go into different sections of the full report. Some will go into the
introduction, some in the methods section. You can already place these
items there, or for now just write them as a single section.
Grading for this part will follow the following rubric:
Sufficient |
Submission is (almost) complete, contains enough information to
allow assessment of feasibility of proposed project. Fully
reproducible. |
3 |
Somewhat insufficient |
Most components are present, but noticeable gaps exist, or existing
materials can’t be fully reproduced. |
2 |
Not sufficient |
Submission is rather incomplete, has major missing parts which does
not allow assessment of feasibility of proposed project |
1 |
Absent |
(Almost) everything of submission is missing |
0 |
Part 2
The main objective for part 2 is to have mostly completed the
data loading/cleaning/wrangling/exploring part.
To that end, provide the following pieces of information:
- Everything from part 1. That doesn’t mean you need to keep what you
wrote for part 1 frozen. Just that the description you provided there
should be part of this submission. As appropriate, you can
rewrite/reformat things to get it closer to a final report structure
(e.g. start moving some parts into a method section).
- A somewhat detailed description containing text and code showing
your cleaning/wrangling/exploring steps. Place this in separate Quarto/R
files inside the
code
folder or its sub-folders. Make sure
it’s clear which files are the relevant ones, delete any non-relevant
files.
- Update the main manuscript. Add a few of the exploratory results and
any other content you think should be shown.
- Removal or replacement of any left-over files and leftover
text and code from the templates. Update all readme files,
delete any files and folders that are not part of your project. Remove
any comments and bits of code that are not relevant. At this stage, only
information, code and files relevant to your project should be present,
with appropriate documentation.
- The main text should show plots or tables that explore the data,
with a focus on the quantities of main interest (outcome, main
predictor, co-variates of specific interest, etc.).
- It is up to you how you structure things. You can use a combination
of R or Rmd scripts. As long as things are well documented, reproducible
and logical, the exact setup is your choice.
- Everything needs to be fully reproducible and you need to provide
somewhere (e.g. in the main text file or in the readme file in your
repository) instructions on what one needs to do to completely reproduce
everything.
- Your main article and - if applicable supplementary files - should
knit into a word or pdf or html documents.
- If you start including references, you should use a reference
manager and a bibtex file from which you cite references in your
manuscript. I recommend managing the bibtex file with the free Zotero
reference manager, but if you have another reference manager that can
handle bibtex files, you can use that too. Your bib file should be part
of the project repository (for instance in the same folder as the
manuscript). Feel free to pick any citation style you like (you can get
CSL files from e.g. this style
repository).
Grading for this part will follow the following rubric:
Sufficient |
Submission is (almost) complete |
3 |
Somewhat insufficient |
Submission is somewhat incomplete, parts missing or not
reproducible |
2 |
Insufficient |
Submission is very incomplete, major parts missing or not
reproducible |
1 |
Absent |
(Almost) everything of submission is missing |
0 |
Part 3
The main objective for part 3 is to have started the analysis
part of the project and continued to improve everything.
To that end, provide the following pieces of information:
- All relevant files and documents needed to reproduce
everything.
- All non-relevant files (e.g., leftovers from the template) removed
or updated.
- Some documentation (e.g. a readme.md file) explaining how your
project is set up and which scripts need to be run in what order to
reproduce everything.
- Files with well-documented code (either R scripts or Rmd files) that
do all the previous tasks (cleaning/processing), as well as running a
few analyses. You can start with simple bivariate ones, looking for
patterns between your outcome(s) and individual predictors of interest.
I suggest as much as possible you use the
tidymodels
framework. You can also try a few multivariable GLM. Results from those
explorations should be saved in whatever form you consider most
appropriate (figures or tables).
- A main article/manuscript file which contains the most pertinent
results and findings from everything you have done so far. You can
include more figures/tables here than you would in a regular manuscript.
However, it should still be nicely readable and somewhat focused, so
don’t produce page-long raw R output or a ton of exploratory figures or
similar things. The main results of your analysis should be in this
manuscript. Any further explorations and results (figures and tables)
should go into a separate Rmarkdown file that comprises the
supplementary material. For some idea on how this can be structured, you
can for instance revisit Brian’s project we looked at in this unit.
Grading for this part will follow this rubric:
Sufficient |
Submission is (almost) complete |
3 |
Somewhat insufficient |
Submission is somewhat incomplete, minor parts missing or not
reproducible |
2 |
Insufficient |
Submission is very incomplete, major parts missing or not
reproducible |
1 |
Absent |
(Almost) everything of submission is missing |
0 |
This part of your project will be assessed by some of your
classmates. See the Reviews
document for more
details.
Part 4
The objective is to have mostly completed implementation of
analyses following the approaches covered in the course.
To that end, provide the following pieces of information:
- All relevant files and documents needed to reproduce
everything.
- Documentation for everything you are doing. Any files/documentation
not related to your project should be removed, only relevant information
should be present.
- Code that performs statistical analyses of your data using the
approaches we cover in class, such as: train/test split,
cross-validation, trying different models, exploring model quality
(performance, uncertainty, diagnostics, etc.). Depending on your data
and question, not all approaches will make sense for your data. Choose
the ones that make sense. E.g., if you happen to do an analysis of text
or high-dimensional data, use methods/models appropriate for that data.
The main point is that you should show you understand the main concepts
regarding analysis and model evaluation and can apply them to your data
with the tools we covered.
- Update your manuscript and supplementary files with the new results
(figures/tables) from your analysis.
- At this point, make sure you are also far along with your background
section, including referencing (using a reference manager and bibtex,
file, no manual references) and everything starts
looking like a full analysis similar to what one could submit to a
journal for publication.
Grading for this part will follow this rubric:
Sufficient |
Submission is (almost) complete |
3 |
Somewhat insufficient |
Submission is somewhat incomplete, minor parts missing or not
reproducible |
2 |
Insufficient |
Submission is very incomplete, major parts missing or not
reproducible |
1 |
Absent |
(Almost) everything of submission is missing |
0 |
Part 5
The objective for part 5 is to have a fully completed
project, ready for peer review.
To that end, provide the following pieces of information:
- Every file needed to reproduce your complete analysis and report in
your project repository (that includes files such as readme, bibtex,
style files, etc).
- Detailed instructions (e.g. as a
readme.md
file)
explaining how to reproduce all your results.
- A complete, nicely readable and well formatted report, written in
structure like a research paper
(Abstract/Introduction/Methods/Results/Discussion/Citations).
- Well documented supplementary material
- Well-documented code as Rmd or R scripts that explain cleary all
steps in your analysis (including the wrangling/exploring parts).
- Meta-data explaining your data as needed.
- Everything very well documented and polished as much as
possible.
Check the Project Review
Template file to see how this submission will be assessed by your
peers. I will not grade this part but it will be assessed by your peers,
as described in the Project
Review document.
An great project would be at the level of Brian McKay’s paper
example you checked out in the Motivating Examples document,
or one of the projects on the Project Examples
page.
Part 6
The main objective for part 6 is to have a finished project,
ready for final grading.
To that end, provide the following pieces of information:
- A complete project, with as many further improvements as you want to
implement, based on feedback from your classmates and any other
improvements you can think of.
I will grade the final project using the same criteria your
classmates used for peer review. This will be graded on a 100 points
scale and combined with the other project scores for a final overall
project score.