Some of this will only fully make sense once we are a few weeks into the course. It’s nevertheless a good idea to read through it and get an overall idea before we have discussed all the different tools and details mentioned. Re-visit/re-read as needed.
A semester long data analysis project will be a major method of assessment for this class. The project should include all the different components of a data analysis we will cover in this class.
You can do the class project either on your own or in a team of two.
There are several deadlines throughout the course at which time you need to submit parts of your project
The deadlines are listed in the Schedule document (it’s unlikely, but they might change, so keep checking that document).
You can pick your project topic and data source pretty much however you like. The only requirements are:
You can use data from any source you like, here are some possible ones.
If you have data that you are using for some research project(s) you are doing, you are welcome and encouraged to work on this as part of the class project. Of course, what you do for this class project needs to be new work, not a previously done and recycled analysis. Also, since the analysis needs to be fully reproducible, you need to provide the data at least within the class (no need to make it publicly available). I encourage you to use the class project toward helping you do an analysis and write a report that can help you with a project you want to publish as part of your research!
You can use any data you can get access to. Since it needs to be reproducible, you need to be able to share the data at least with me and the classmates who will review your project. If you need some ideas for data, check out this website I maintain with various links to resources, among them is a list with links to various data sources. Of course you are not limited to data listed there.
Often, the most interesting questions can be asked by combining data from more than one source.
Each assignment needs to be submitted in a fully reproducible form. You should create a Github repository which should contain all the files for your project. Name it
YOURNAME-MADA-project. The preferred type is a public repository, if you have data that requires to be kept confidential, you can make it a private repository.
The main document should be an Rmarkdown (or bookdown) file, which can be turned into a suitable output format (html or word or pdf). I suggest having the output be a word document, since that is what most journals want for submission of a scientific paper. The template I provide is set up for word output. However, if you for instance decide to include interactive figures or apps (using e.g. Shiny) in your analysis project, you can also pick html as output. If for some reason you want/need pdf, that’s ok too.
Structure your project similar to the provided template (see below), with data, scripts, results and manuscript in different folders, various R scripts to perform different bits of the analysis, and a final Rmd file that pulls everything in and generates the report.
Use a setup that resembles a real research paper. A main manuscript file in R Markdown format should contain text, the main results (figures/tables), references, etc. A supplementary R markdown file should contain additional results, such as some of your exploratory analysis findings. Any code and further results should be in additional R or Rmd files. Overall, your write-up can be a bit more detailed than what would normally go into a peer reviewed paper, with the most salient parts shown in the main text, and the rest in a supplementary R Markdown file.
For all your submissions, you need to provide everything needed (data, code, etc.) to allow a full and automated reproduction of your analysis.
References should be included as a bibtex file and cited in the Rmd file. See the template for an example.
I created a public Github repository called dataanalyis-template which is meant as a template for doing a data analysis project. It has different folders for organizing your project. Various readme files are provided to explain what each folder should contain. The template also contains several example files to show how the whole project workflow (or any data analysis workflow for that matter) can work.
manuscript folder, there is a R Markdown template file with a suggested outline for the report write-up. I have also indicated which sections need to be completed/filled for which part of the project. This template is just a guide, you do not have to follow exactly that structure, as long as you provide all the requested parts for the the different submission deadlines and a final, complete, fully automated and reproducible project, with all files in one github repository, at the end.
In addition to the Rmd file, there is a bibliography file in bibtex format which contains the references, and a style file which indicates reference formatting. You can use it as starting point. Feel free to switch the reference formatting style.
I recommend you use this template as starting point for your project. To use the template for your class project, go to its Github repository and follow the instructions to turn it into a new repository for your class project, call it
YOURNAME-MADA-project. Once you made the new repository, follow the usual Github workflow to get it to your local computer. Then open the readme file and change the text so it states somewhere “This is YOURNAME class project repository”.
You will receive feedback from me and/or your classmates after each submission.
The initial submissions and peer reviews make up 25% of the project grade. The final submission counts for the remaining 75%.
More details on what exactly is expected and how it is assessed is provided in the Project Rubric document.
Any communication regarding this project should happen in the Class Project discussion board. Go there to ask project-specific questions, to post links to your repository whenever you have a part finished, etc. I will also post any further or clarifying information there.