Sampling From Probability Distributions In Python


Updated:

A probability distribution is “the mathematical function that gives the probabilities(likelihood) of occurrence of different possible outcomes for an experiment” - Wikipedia

Probability distributions are chiefly employed in modelling real-life phenomena such as customer arrival patterns, the lifespan of machine components, and even the spread of diseases.

falling dice

Photo by Riho Kroll on Unsplash

If you’d like to learn more about probability distributions, here are some resources that might interest you:

This article presents 3 tools that can be used to generate random samples from probability distributions:

I. random

The random module is part of the Python Standard Library, and so is readily available. Its functions return a single value. Thus, if you wish to create a sample of desired size n, you’ll need to use looping techniques e.g. a list comprehension.

II. NumPy

NumPy is a third-party package - it has to be installed. It is much beloved in Python because it provides objects that enable fast operations (learn more). The numpy.random sub-module can be used to generate samples from various probability distributions.

III. SciPy

SciPy is also a third-party package that has to be installed. It is closely knit with NumPy. The scipy.stats sub-module can be used to generate samples from quite a large number of probability distributions.

Examples

What follows is a demonstration of how to create samples from the Normal, Uniform and Exponential distributions using the above tools.

NOTE: You can run the examples in this demo jupyter notebook, courtesy of Binder:

import numpy as np
import pandas as pd
import random
import scipy
import seaborn as sns

sns.set_theme(font="serif", style="white", palette="tab10")
SAMPLE_SIZE = 5000

# For reproducability
SEED = 12345
random.seed(SEED)
numpy_gen = np.random.default_rng(SEED)


def plot_samples(distribution: str, **samples) -> None:
    """Get a layered kde-plot of the various `samples`.
    
    Args:
        distribution (str): The probability distribution sampled from.
        **samples: `dict` of samples to plot.
    """
    df = pd.DataFrame(samples)
    ax = sns.kdeplot(data=df)
    ax.set_title(
        f"{distribution.title()} Distribution Sample",
        pad=16,
        size=16,
        weight=600,
    )
    sns.despine()

1. Normal Distribution

random_normal = [random.gauss(mu=0, sigma=1) for _ in range(SAMPLE_SIZE)]
numpy_normal = numpy_gen.normal(loc=0, scale=1, size=SAMPLE_SIZE)
scipy_normal = scipy.stats.norm.rvs(loc=0, scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Normal", random=random_normal, numpy=numpy_normal, scipy=scipy_normal)

normal distribution sample kde-plot

2. Uniform Distribution

random_uniform = [random.uniform(a=0, b=1) for _ in range(SAMPLE_SIZE)]
numpy_uniform = numpy_gen.uniform(low=0, high=1, size=SAMPLE_SIZE)
scipy_uniform = scipy.stats.uniform.rvs(loc=0, scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Uniform", random=random_uniform, numpy=numpy_uniform, scipy=scipy_uniform)

uniform distribution sample kde-plot

3. Exponential Distribution

random_exponential = [random.expovariate(lambd=1) for _ in range(SAMPLE_SIZE)]
numpy_exponential = numpy_gen.exponential(scale=1, size=SAMPLE_SIZE)
scipy_exponential = scipy.stats.expon.rvs(scale=1, size=SAMPLE_SIZE, random_state=SEED)
plot_samples("Exponential", random=random_exponential, numpy=numpy_exponential, scipy=scipy_exponential)

exponential distribution samples kde-plot

Further Reading

Please see

for more probability distrbutions.