R Coding Tips

Overview

This unit discusses a few (R) coding tips that are useful to consider as you learn how to write code.

Goals

  • Be familiar with some general good coding practices.
  • Be aware of some pitfalls related to R packages.
  • Know a few R coding tips to make your code more efficient and easier to read.

Reading

This is a collection of guides and tips related to getting started and using R.

General Coding Tips

Here are a few brief suggestions for good coding practices in R (and any other language).

Write modular code. The idea is that you break things into logical components, and structure your code that way. For instance R Markdown chunks are little modules. Each chunk should do some actions that make sense to be grouped together. For instance one or a few data cleaning steps can be in a code chunk. Then for the next set of actions (e.g. making a plot), you start a new chunk. You should also add comments explaining not only each line of code, but also what those modules do. Structuring your code into sections/modules that do specific things makes it easier to follow what is going on.

Write DRY code. DRY stands for Don’t Repeat Yourself. The idea is that if you have code that does the same thing more than once (or maybe more than a few times), you should rewrite your code in such a way that you write a single bit (module) of code that you call repeatedly. The most natural way this is done is with writing a function.

Write functions. Pretty much everything you use in R (or other languages) is a function. A function is a bit of code that you give some inputs (also called arguments), the function does something with those inputs, and usually returns something back to you (not always, sometimes the function might write a file to your computer). For instance if you define some object/variable x and send it to the function sum(), then sum(x) returns the sum of the elements in x. It likely also does a few other things, such as checking that everything contained in x is a number that can be summed. While writing your own functions is a bit more advanced, at some point you should do it. Functions are automatically modular and allow for DRY coding. They are therefore important components of most code. It takes a bit of practice, but once you understand how functions work, you should be able to write your own as part of your code. Functions can either live in separate files, or in the same file (usually at the top) from which you call it. We generally call code that just runs, without being passed in things and returning stuff, a script to differentiate it from a function.

Have consistent style. There are often many ways to implement a certain operation with code. There are also many ways how you style your code (e.g., give variables names, use indentation). It’s important to write code that looks consistent and is written in a way that’s easy to read and comprehend by others and your future self. This gets trickier with the use of AI tools that might generate code for you. You should always review and potentially re-style any code generated by AI tools to make sure it fits with your overall coding style.

R packages

  • Sometimes when you install or update a package, you might get the message Do you want to install from sources the packages which need compilation? What that means is that there is an R package available that has just been updated and exists as source but not yet turned into what’s called a binary format that’s pre-compiled for whatever operating system you use. There is almost never a reason to get the very latest source version. And if you want to install from source, you might need Rtools to do so. Therefore, I suggest that when you get this question/message, you check No.

  • Sometimes during install or updating of a package, you might get a package is loaded/in use message. You are usually offered to restart R. Agree to that. It might fix things, but not always. The best way to install new packages if you run into those problems is to completely close Positron/R, then restart. Next, immediately install the needed packages with install.packages('PACKAGENAME') in the console, without doing anything else. That should minimize any potential conflicts.

  • Sometimes when you try to install a package you get the message package ‘PACKAGENAME’ is not available for this version of R. This is a bit confusing, it sounds like you need to upgrade your R version. What really happens most of the time is that you mis-typed the package name, and that package just doesn’t exist (for any version of R). So first make sure you are spelling the package name correctly, including all capitalization being correct. That almost always fixes things. It is rare to find an R package you want that is incompatible with your version of R, unless your R installation is several years out of date.

R Coding

Vectorize. In R, most things you do can be vectorized. What that means is that if you have multiple elements in an object, e.g., multiple entries in a vector, or many rows and columns in a data frame, you can use functions that process each entry at once. For instance if you use the filter function from dplyr like this: filter(mydata, age > 50), the function looks at the age for each row in the column called age and keeps those that are larger than 50. If you wanted to do the same in another programming language, like C/C++, you would need to write code that explicitly runs through each row of mydata and looks at age for each row and decides to keep it or not. That would be multiple lines of code. There are other languages that are vectorized like R, so it’s not just R. But it’s an important feature of R and whenever possible, you should try to apply your operations on all elements at the same time instead of writing for loops. (Though sometimes, loops are easier to read and code, so don’t feel like you can’t ever use them again).

Don’t mix and match too much. You have already or will likely figure out soon that there are often many ways to do something in R, especially once you start using R packages, such as the tidyverse set of packages. While it is perfectly ok to write some code that uses just the basic R commands, and other code that makes use of packages, you do need to be somewhat careful when you mix and match. As an example, there are the basic R commands cbind and rbind to combine data frames by rows or columns. There are also the commands bind_cols and bind_rows in the dplyr package. They do more or less the same thing, but have slightly different behavior. For instance the returned object might be of a different class (e.g., if you cbind two matrices, the result is a matrix, if you use bind_cols, you get a tibble/data frame.) Sometimes, these slight differences do not matter, but other times they might. It is therefore important to pay attention if you switch between different ways of doing things in R, and if possible stick with a given approach (e.g. all/most base R commands or mostly tidyverse).

Summary

In this unit, we briefly discussed a few general tips and guidelines for writing good R code.

Further Resources

  • If you are completely new to functions, you might want to check out the Functions section in Hands-On Programming with R.
  • The tidyverse has a style guide which is a good one to use. But you don’t have to follow it, you can have your own style.

Test yourself

Which coding practice best helps reduce repeated code when performing the same operation multiple times in R?

Writing a function allows you to reuse the same logic without repeating code, which follows the DRY principle.

  • True
  • False
  • False
  • False

You see the message “package ‘PACKAGENAME’ is not available for this version of R” when installing a package. What should you check first?

Most of the time this message occurs because the package name was misspelled, not because of R version incompatibility.

  • False
  • False
  • False
  • True

Why is vectorized code generally preferred over writing explicit for-loops in R?

Vectorized operations apply functions to entire vectors or data frames at once, making code shorter and often easier to read.

  • False
  • False
  • True
  • False

Practice

  • Find some code, either online or your own or something AI produced, and see if you can identify ways to make it more modular, DRY, or otherwise improved.
  • Find some code, and ask AI to rewrite it in multiple ways. For instance with as many functions as possible, or in a certain style (e.g. tidyverse versus base R), etc.