print("Hello, World!")
- 1
-
The
print()
function displays the text within on the console.
[1] "Hello, World!"
Let’s begin at the most logical starting point: getting the software installed. To get started, I recommend this comprehensive installation guide that will walk you through the process seamlessly.
Think of R as a high-performance engine—capable of incredible feats but not particularly user-friendly on its own. RStudio is the sleek vehicle that lets you harness that power efficiently. While you could interact with R through a basic terminal, RStudio’s interface smoothly handles the otherwise clunky commands and makes certain tasks—like visualizing results or managing code—remarkably more accessible.
Despite RStudio’s popularity, it is hardly the only game in town. VSCode has emerged as a popular alternative—a versatile, general-purpose IDE that supports multiple programming languages. It’s well-designed, feature-rich, and, importantly, free and open-source. Personally, I use VSCode for most of my R programming work.
Another option on the horizon is Positron, developed by the same innovative team behind RStudio. It’s the modern evolution of RStudio, currently in beta but already showing great promise. It will eventually replace RStudio as the go-to IDE for R programming.
RStudio is an incredible well-designed software that makes your life easier when working with R (and Python). It has a lot of features that make your life easier. We’ll explore many of these features as we progress, starting with making your R code look better.
One of the subtle joys of programming is crafting code that’s not only functional but also aesthetically pleasing. To enhance the readability of your R scripts, consider installing the Fira Code font. Instruction can be found here and here.
Another way to make your code more readable is to format the codes according to the tidyverse style guide. Of course, no one wants to memorize all. Luckily, you can use the styler
package to do this as an addin in RStudio (link).
In the grand tradition of programming tutorials, let’s start with the classic “Hello, World!”—a humble beginning to our journey with R.
print("Hello, World!")
print()
function displays the text within on the console.
[1] "Hello, World!"
Next, you can use R as a straightforward calculator:
3 + 2
[1] 5
You can also store results in variables (so you can use them later):
<- 3 + 2
x x
3 + 2
to the variable x
.
x
displays the value of x
on the console.
[1] 5
<-
?
In R, <-
is for assignment while =
is for function arguments. You can technically use =
for assignment in almost all cases, meaning x <- 3 + 2
is equivalent to x = 3 + 2
. But then why does R continue to favor <-
?
One reason is conceptual clarity. In R, the distinction between assignment and function arguments is explicit, providing a cleaner syntax and helping avoid ambiguity in complex code. By differentiating assignment with <-
, R signals that an action is being performed, where data is transferred from one entity to another. This reinforces the principle that assignment and function arguments are inherently distinct constructs.
A second reason is flexibility. R allows for the reverse assignment arrow, ->
, which lets you assign values in the opposite direction. For instance, 3 -> x
assigns the value 3
to x
, a feature that can sometimes be handy.
But why don’t most other languages use a similar convention? One factor is typing efficiency: <-
requires three keystrokes, while =
only requires one (although you eventually get used to it). However, there’s also a historical element here: early keyboards designed for statistical computing actually had a dedicated <-
key, which made the operator as convenient as =
(source):
y
and assign it the value of 5 * 3
. Then display the value of y
on the console.z
that is half of the value of y
, and then display the value of z
on the console.There are many basic functions in R that you can use. For example, the sqrt()
function calculates the square root of a number. I know that it could be annoying to remember all the functions, but luckily we have the powerful AIs to help us. It is pretty easy to use Copilot, a powerful AI tool developed by Github, to get the function you need. It is straightforward to use it in RStudio (link).
However, AIs can be unreliable sometimes. To make sure it works, you can always use the ?
to double check. For example, to see the documentation R has on the sqrt()
function, you can use the following code:
In addition, you should check some simple cases to make sure the function works as expected. For example, you can use the following code to check the sqrt()
function:
sqrt(1) == 1
[1] TRUE
sqrt(4) == 2
[1] TRUE
To unlock R’s full potential, you’ll often need to install additional packages—think of them as apps that extend your smartphone’s capabilities. This is a one-time process for each package on your computer. For example, to install the ggplot2
package:
install.packages("ggplot2")
After that, simply load it when you need it:
library(ggplot2)
This is the first design inconsistencies when using R: when installing a package, you enclose its name in quotes, but when loading it, you don’t. It’s a small quirk, but one that can trip you up if you’re not careful (speaking from experience :-(
There will be many others along the way. I will try to point them out as we go along.
Notably, not all packages are on CRAN. For example, many of them are on GitHub, GitLab, or other platforms. To install these packages, you can use the pak
package. For example, to install the ggthemr
package from GitHub:
# install.packages("pak")
::pkg_install("cttobin/ggthemr") pak
What is this mysterious ::
here? It simply means that we are loading the pkg_install()
function from the pak
package. It allows you to access a function from a package without loading the entire package. This can help avoid conflicts with other packages that might have a similarly named function (which is a huge source for hidden errors!).
I highly recommend this approach, because it clearly document which dependencies it comes with and it provides a universal way to install packages from different sources.
There is another popular strategy. Consider using the pacman
package. It automatically detects if the pointed package is installed and installs it if it’s not, and then load it. Here’s how you can use it:
# install.packages("pacman")
::p_load(ggplot2) pacman
p_load()
to install and load the ggplot2 package in one go.
This can greatly simplify things when sharing code. You don’t have to wonder whether a particular library is installed on someone else’s system—p_load()
handles it gracefully.
Install the package gt
using the standard method, the pak
method, and the pacman
method.
As stated in its official website:
The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Installing it may take a bit:
install.packages("tidyverse")
Once done, loading it is straightforward and a common part of most R scripts (I usually begin nearly all my scripts with this line):
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.4 ✔ tibble 3.2.1
✔ purrr 1.0.4.9000 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
As you can see, Looking at the message generated by executing the above line, we see that nine packages are now loaded. They are called ggplot2
, tibble
, and so on. We will get to know these in more detail throughout the class.
As you can see from the message, it shows what conflicts are there. The two conflicts are horrible design choices in tidyverse. Many mysterious bugs happen because of these conflicts. To avoid these, you can use the following code:
library(conflicted)
library(tidyverse)
conflict_prefer("filter", "dplyr")
conflict_prefer("lag", "dplyr")
conflicted
package.
tidyverse
package.
filter
, always prefer dplyr
.
lag
, always prefer dplyr
.
Another way is to just write your code using the package::funcion()
format:
::filter()
dplyr::lag() dplyr
Given how often you’ll use these functions, I find it annoying with this approach. But it’s a personal choice.
If you do not want to see the messages (although I do not recommend), you can use the suppressPackageStartupMessages()
function. For example, you can use the following code to load the tidyverse
package:
suppressPackageStartupMessages(library(tidyverse))
This approach hides the typical start-up output, though it remains entirely optional.
We’ll rely on the tidyverse extensively throughout this course.