Some of this will only fully make sense once we are a few weeks into the course. It’s nevertheless a good idea to read through it and get an overall idea before we have discussed all the different tools and details mentioned. Re-visit/re-read as needed.
A semester long data analysis project will be a major method of assessment for this class. The project should include all the different components of a data analysis we will cover in this class.
You can do the class project either on your own or in a team of two.
There are several deadlines throughout the course at which time you need to submit parts of your project
The deadlines are listed in the Schedule document (it’s unlikely, but they might change, so keep checking that document).
You can pick your project topic and data source pretty much however you like. The only requirements are:
You can use data from any source you like, here are some possible ones.
If you have data that you are using for some research project(s) you are doing, you are welcome and encouraged to work on this as part of the class project. Of course, what you do for this class project needs to be new work, not a previously done and recycled analysis. Also, since the analysis needs to be fully reproducible, you need to provide the data at least within the class (no need to make it publicly available). I encourage you to use the class project toward helping you do an analysis and write a report that can help you with a project you want to publish as part of your research!
You can use any data you can get access to. Since it needs to be reproducible, you need to be able to share the data at least with me and the classmates who will review your project. If you need some ideas for data, check out this website I maintain with various links to resources, among them is a list with links to various data sources. Of course you are not limited to data listed there.
Often, the most interesting questions can be asked by combining data from more than one source.
Each assignment needs to be submitted in a fully reproducible form,
using the tools we cover in the class (R, Quarto, GitHub, etc.). You
should create a Github repository (using the template described below)
which should contain all the files for your project. Name it
YOURLASTNAME-MADA-project
. The preferred type is a public
repository, if you have data that requires to be kept confidential, you
can make it a private repository. If you need to use a private
repository, please let me know.
The main document should be a Quarto file, which can be turned into a
suitable output format (html or word or pdf). This should have the
structure of a scientific paper. I suggest you use the
Manscript.qmd
file of the template and replace the template
content with your content. I also suggest having the output be a word
document, since that is what most journals want for submission of a
scientific paper. The template I provide is set up for word output.
However, if you for instance decide to include interactive figures or
apps (using e.g. Shiny) in your analysis project, you can also pick html
as output. If for some reason you want/need pdf, that’s ok too (then you
need a LaTeX system installed, see the Quarto documentation for how to
do that).
Structure your project similar to the provided template (see below), with data, scripts, results and manuscript in different folders, various R scripts to perform different bits of the analysis, and a final Rmd file that pulls everything in and generates the report.
Use a setup that resembles a real research paper. A main manuscript file in Quarto format should contain text, the main results (figures/tables), references, etc. A supplementary Quarto file should contain additional results, such as some of your exploratory analysis findings. Any code and further results should be in additional Quarto files (with code either inside the Quarto file, or in separate R files). Overall, your write-up can be a bit more detailed than what would normally go into a peer reviewed paper, with the most salient parts shown in the main text, and the rest in a supplementary file.
For all your submissions, you need to provide everything needed (data, code, etc.) to allow a full and automated reproduction of your analysis.
References should be included as a bibtex file and cited in the Rmd file. See the template for an example.
I created a public Github repository called dataanalyis-template which is meant as a template for doing a data analysis project. It has different folders for organizing your project. Various readme files are provided to explain what each folder should contain. The template also contains several example files to show how the whole project workflow (or any data analysis workflow for that matter) can work.
Inside the manuscript
folder, there is a Quarto template
file called Manuscript.qmd
with a suggested outline for the
report write-up. I have also indicated which sections need to be
completed/filled for which part of the project. This template is just a
guide, you do not have to follow exactly that structure, as long as you
provide all the requested parts for the the different submission
deadlines and a final, complete, fully automated and reproducible
project, with all files in one GitHub repository, at the end.
In addition to the Quarto file, there is a bibliography file in bibtex format which contains the references, and a style file which indicates reference formatting. You can use it as starting point. Feel free to switch the reference formatting style. You do need to use a setup and reference manager that plays nicely with Quarto.
Use the provided template as starting point for your project. To do
so, go to its Github
repository and follow the instructions to turn it into a new
repository for your class project, call it
YOURLASTNAME-MADA-project
. Once you made the new
repository, follow the usual Github workflow to get it to your local
computer. Then open the readme file and change the text so it states
somewhere “This is YOURNAME class project repository”.
You will receive feedback from me and/or your classmates after each submission.
The initial submissions and peer reviews make up 40% of the project grade. The final submission counts for the remaining 60%.
More details on what exactly is expected and how it is assessed is provided in the Project Details document.
Any communication regarding this project should happen in the Project discussion channel. Go there to ask project-specific questions, to post links to your repository whenever you have a part finished, etc. I will also post any further or clarifying information there.