A probability distribution is “the mathematical function that gives the probabilities(likelihood) of occurrence of different possible outcomes for an experiment” - Wikipedia
Probability distributions are chiefly employed in modelling real-life phenomena such as customer arrival patterns, the lifespan of machine components, and even the spread of diseases.
If you’d like to learn more about probability distributions, here are some resources that might interest you:
- Probability concepts explained: probability distributions
- Understanding Probability Distributions
- An extensive list of probability distributions.
This article presents 3 tools that can be used to generate random samples from probability distributions:
I. random
The random module is part of the Python Standard Library, and so is readily available. Its functions return a single value. Thus, if you wish to create a sample of desired size n, you’ll need to use looping techniques e.g. a list comprehension.
II. NumPy
NumPy is a third-party package - it has to be installed. It is much beloved in Python because it provides objects that enable fast operations (learn more). The numpy.random
sub-module can be used to generate samples from various probability distributions.
III. SciPy
SciPy is also a third-party package that has to be installed. It is closely knit with NumPy. The scipy.stats
sub-module can be used to generate samples from quite a large number of probability distributions.
Examples
What follows is a demonstration of how to create samples from the Normal, Uniform and Exponential distributions using the above tools.
NOTE: You can run the examples in this demo jupyter notebook, courtesy of Binder:
import numpy as np
import pandas as pd
import random
import scipy
import seaborn as sns
sns.set_theme(font="serif", style="white", palette="tab10")
SAMPLE_SIZE = 5000
# For reproducability
SEED = 12345
random.seed(SEED)
numpy_gen = np.random.default_rng(SEED)
def plot_samples(distribution: str, **samples) -> None:
"""Get a layered kde-plot of the various `samples`.
Args:
distribution (str): The probability distribution sampled from.
**samples: `dict` of samples to plot.
"""
df = pd.DataFrame(samples)
ax = sns.kdeplot(data=df)
ax.set_title(
f"{distribution.title()} Distribution Sample",
pad=16,
size=16,
weight=600,
)
sns.despine()
1. Normal Distribution
random_normal = [random.gauss(mu=0, sigma=1) for _ in range(SAMPLE_SIZE)]
numpy_normal = numpy_gen.normal(loc=0, scale=1, size=SAMPLE_SIZE)
scipy_normal = scipy.stats.norm.rvs(loc=0, scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Normal", random=random_normal, numpy=numpy_normal, scipy=scipy_normal)
2. Uniform Distribution
random_uniform = [random.uniform(a=0, b=1) for _ in range(SAMPLE_SIZE)]
numpy_uniform = numpy_gen.uniform(low=0, high=1, size=SAMPLE_SIZE)
scipy_uniform = scipy.stats.uniform.rvs(loc=0, scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Uniform", random=random_uniform, numpy=numpy_uniform, scipy=scipy_uniform)
3. Exponential Distribution
random_exponential = [random.expovariate(lambd=1) for _ in range(SAMPLE_SIZE)]
numpy_exponential = numpy_gen.exponential(scale=1, size=SAMPLE_SIZE)
scipy_exponential = scipy.stats.expon.rvs(scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Exponential", random=random_exponential, numpy=numpy_exponential, scipy=scipy_exponential)
Further Reading
Please see
- Real-valued distributions (random)
- Random Generator: Distributions (NumPy)
- Statistical functions (SciPy)
for more probability distrbutions.