How to Set Seed in R

Introduction

When simulating data or generating random numbers, it is often required to set a seed to allow for reproducing results. In this article, we will learn how to set a seed in R.

Computers are not completely random. They often use the current time or other factors to fake randomness. This value is usually called a seed. In R, we can use the set.seed method to remove randomness and allow us to reproduce results.

Example without Seed

Let’s take a look at generating random samples without setting a seed. We can run the same number twice, but do to the randomness the computer will produce different results.

rnorm(10)

##  [1] -0.545167332  1.842017726  0.422230086  0.007563858 -0.732984341
##  [6]  0.170904793  1.201678973 -0.370823567 -0.429057895 -1.726838994

rnorm(10)

##  [1] -1.7311722 -1.8829807 -1.9160241 -0.7696578  1.5110700 -2.1699924
##  [7]  0.9709133  0.7222102  1.1091088 -0.2723699

Setting a Seed

To set a seed, we can use the set.seed function right before a random function call to generate the same results.

set.seed(100)

rnorm(10)

##  [1] -0.50219235  0.13153117 -0.07891709  0.88678481  0.11697127  0.31863009
##  [7] -0.58179068  0.71453271 -0.82525943 -0.35986213

Now, your results could be different depending on your version of R, but if you use the same seed “100”, you should get the same results as me.

.Random.seed

Let’s take a look at one caveat when using random numbers in R. Notice that if we run the following, we get two different samples.

set.seed(100)

rnorm(10)

##  [1] -0.50219235  0.13153117 -0.07891709  0.88678481  0.11697127  0.31863009
##  [7] -0.58179068  0.71453271 -0.82525943 -0.35986213

rnorm(10)

##  [1]  0.08988614  0.09627446 -0.20163395  0.73984050  0.12337950 -0.02931671
##  [7] -0.38885425  0.51085626 -0.91381419  2.31029682

One might expect that both of the rnorm calls be the same. However, let’s try to run this code again.

set.seed(100)

rnorm(10)

##  [1] -0.50219235  0.13153117 -0.07891709  0.88678481  0.11697127  0.31863009
##  [7] -0.58179068  0.71453271 -0.82525943 -0.35986213

rnorm(10)

##  [1]  0.08988614  0.09627446 -0.20163395  0.73984050  0.12337950 -0.02931671
##  [7] -0.38885425  0.51085626 -0.91381419  2.31029682

Now, we can see the pattern. If we set a seed, then run the two rnorm calls, we get the same results as the previous block, although this might not be what we expected.

The reason that there are two different samples is due to the global value of .Random.seed that will change every time we run a random function. Let’s extract the value to see.

set.seed(100)

first.seed = .Random.seed
first.samp = rnorm(10)


sec.seed = .Random.seed
sec.samp = rnorm(10)

print(identical(first.seed, sec.seed))

## [1] FALSE

If we want to force both samples to be the same, we can call set.seed before each.

set.seed(100)
rnorm(10)

##  [1] -0.50219235  0.13153117 -0.07891709  0.88678481  0.11697127  0.31863009
##  [7] -0.58179068  0.71453271 -0.82525943 -0.35986213

set.seed(100)
rnorm(10)

##  [1] -0.50219235  0.13153117 -0.07891709  0.88678481  0.11697127  0.31863009
##  [7] -0.58179068  0.71453271 -0.82525943 -0.35986213

However, note that setting the seed at the beginning, then running a simulation will still be deterministic and common in real life.

Unset seed

If we would like to remove the seed, we can either set the seed to the current time, which is the default, or null.

set.seed(Sys.time())

set.seed(NULL)

Why set the seed

There are two main reasons we would like to set a seed in R. The first is to reproduce results. Often there are papers in mathematics or code we find online we would like to try. By setting a seed, we can compare to the authors code to see if we made any mistakes.

Also, sometimes we would like to debug code. Setting a seed helps us rule out if the random generator is causing issues.