Inside-out version of the Fisher-Yates shuffle can be used to generate a random sample of size k from a very large source set as follows.
Order of the first k samples is immaterial, therefore the first loop can be removed and A can be initialized to be the first k items of Source. This yields Algorithm R (An instance of a family of randomized algorithms called Reservoir Sampling).
The modified algorithm is as follows:
This algorithm's time complexity is O(n).
Further Reading
Wikipedia Article on Reservoir Sampling
Tutorial Slides
Jeffery Stock's Reservoir Sampling Paper
Lecture Silde
Pseudocode:
Source[n] // n size array of elements
A[k] // k size array of containing random samples
A[0] ← S[0]
for i from 1 to k - 1 do
r ← random (0 .. i)
A[i] ← A[r]
A[r] ← Source[i]
for i from 1 to k - 1 do
r ← random (0 .. i)
A[i] ← A[r]
A[r] ← Source[i]
for i from k to n - 1 do
r ← random (0 .. i)
if (r < k) then A[r] ← Source[i]r ← random (0 .. i)
Order of the first k samples is immaterial, therefore the first loop can be removed and A can be initialized to be the first k items of Source. This yields Algorithm R (An instance of a family of randomized algorithms called Reservoir Sampling).
The modified algorithm is as follows:
Pseudocode:
Source[n] // n size array of elements
A[k] // k size array of containing random samples
for i from 0 to k - 1 do
A[i] ← Source[i]
A[i] ← Source[i]
for i from k to n - 1 do
r ← random (0 .. i)
if (r < k) then A[r] ← Source[i]r ← random (0 .. i)
This algorithm's time complexity is O(n).
Further Reading
Wikipedia Article on Reservoir Sampling
Tutorial Slides
Jeffery Stock's Reservoir Sampling Paper
Lecture Silde