When simulating data or generating random numbers, it is often required to set a seed to allow for reproducing results. In this article, we will learn how to set a seed in R.
Computers are not completely random. They often use the current time or
other factors to fake randomness. This value is usually called a seed.
In R, we can use the set.seed
method to remove randomness and allow us
to reproduce results.
Let’s take a look at generating random samples without setting a seed. We can run the same number twice, but do to the randomness the computer will produce different results.
rnorm(10)
## [1] -0.545167332 1.842017726 0.422230086 0.007563858 -0.732984341
## [6] 0.170904793 1.201678973 -0.370823567 -0.429057895 -1.726838994
rnorm(10)
## [1] -1.7311722 -1.8829807 -1.9160241 -0.7696578 1.5110700 -2.1699924
## [7] 0.9709133 0.7222102 1.1091088 -0.2723699
To set a seed, we can use the set.seed
function right before a random
function call to generate the same results.
set.seed(100)
rnorm(10)
## [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009
## [7] -0.58179068 0.71453271 -0.82525943 -0.35986213
Now, your results could be different depending on your version of R, but if you use the same seed “100”, you should get the same results as me.
Let’s take a look at one caveat when using random numbers in R. Notice that if we run the following, we get two different samples.
set.seed(100)
rnorm(10)
## [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009
## [7] -0.58179068 0.71453271 -0.82525943 -0.35986213
rnorm(10)
## [1] 0.08988614 0.09627446 -0.20163395 0.73984050 0.12337950 -0.02931671
## [7] -0.38885425 0.51085626 -0.91381419 2.31029682
One might expect that both of the rnorm
calls be the same. However,
let’s try to run this code again.
set.seed(100)
rnorm(10)
## [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009
## [7] -0.58179068 0.71453271 -0.82525943 -0.35986213
rnorm(10)
## [1] 0.08988614 0.09627446 -0.20163395 0.73984050 0.12337950 -0.02931671
## [7] -0.38885425 0.51085626 -0.91381419 2.31029682
Now, we can see the pattern. If we set a seed, then run the two rnorm
calls, we get the same results as the previous block, although this
might not be what we expected.
The reason that there are two different samples is due to the global
value of .Random.seed
that will change every time we run a random
function. Let’s extract the value to see.
set.seed(100)
first.seed = .Random.seed
first.samp = rnorm(10)
sec.seed = .Random.seed
sec.samp = rnorm(10)
print(identical(first.seed, sec.seed))
## [1] FALSE
If we want to force both samples to be the same, we can call set.seed
before each.
set.seed(100)
rnorm(10)
## [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009
## [7] -0.58179068 0.71453271 -0.82525943 -0.35986213
set.seed(100)
rnorm(10)
## [1] -0.50219235 0.13153117 -0.07891709 0.88678481 0.11697127 0.31863009
## [7] -0.58179068 0.71453271 -0.82525943 -0.35986213
However, note that setting the seed at the beginning, then running a simulation will still be deterministic and common in real life.
If we would like to remove the seed, we can either set the seed to the current time, which is the default, or null.
set.seed(Sys.time())
set.seed(NULL)
There are two main reasons we would like to set a seed in R. The first is to reproduce results. Often there are papers in mathematics or code we find online we would like to try. By setting a seed, we can compare to the authors code to see if we made any mistakes.
Also, sometimes we would like to debug code. Setting a seed helps us rule out if the random generator is causing issues.