Mastering The Poisson Distribution: Intuition And Foundations

Trending 2 weeks ago
ARTICLE AD BOX

You’ve astir apt utilized nan normal distribution 1 aliases 2 times excessively many. We each person — It’s a existent workhorse. But sometimes, we tally into problems. For instance, erstwhile predicting aliases forecasting values, simulating information fixed a peculiar data-generating process, aliases erstwhile we effort to visualise exemplary output and explicate them intuitively to non-technical stakeholders. Suddenly, things don’t make overmuch sense: tin a personification really person made -8 clicks connected nan banner? Or moreover 4.3 clicks? Both are examples of really count information doesn’t behave.

I’ve recovered that amended encapsulating nan information generating process into my modelling has been cardinal to having sensible exemplary output. Using nan Poisson distribution erstwhile it was due has not only helped maine convey much meaningful insights to stakeholders, but it has besides enabled maine to nutrient much meticulous correction estimates, amended Inference, and sound decision-making.

In this post, my purpose is to thief you get a heavy intuitive consciousness for nan Poisson distribution by stepping done illustration applications, and taking a dive into nan foundations — nan maths. I dream you study not conscionable really it works, but besides why it works, and erstwhile to use nan distribution.

If you cognize of a assets that has helped you grasp nan concepts successful this blog peculiarly well, you’re invited to stock it successful nan comments!

Outline

  1. Examples and usage cases: Let’s locomotion done immoderate usage cases and sharpen nan intuition I conscionable mentioned. Along nan way, nan relevance of nan Poisson Distribution will go clear.
  2. The foundations: Next, let’s break down nan equation into its individual components. By studying each part, we’ll uncover why nan distribution useful nan measurement it does.
  3. The assumptions: Equipped pinch immoderate formality, it will beryllium easier to understand nan assumptions that powerfulness nan distribution, and astatine nan aforesaid clip group nan boundaries for erstwhile it works, and erstwhile not.
  4. When existent life deviates from nan model: Finally, let’s research nan typical links that nan Poisson distribution has pinch nan Negative Binomial distribution. Understanding these relationships tin deepen our understanding, and supply alternatives erstwhile nan Poisson distribution is not suited for nan job.

Example successful an online marketplace

I chose to heavy dive into nan Poisson distribution because it often appears successful my day-to-day work. Online marketplaces trust connected binary personification choices from 2 sides: a seller deciding to database an point and a purchaser deciding to make a purchase. These micro-behaviours thrust proviso and demand, some successful nan short and agelong term. A marketplace is born.

Binary choices aggregate into counts — nan sum of galore specified decisions arsenic they occur. Attach a timeframe to this counting process, and you’ll commencement seeing Poisson distributions everywhere. Let’s research a actual illustration next.

Consider a seller connected a platform. In a fixed month, nan seller whitethorn aliases whitethorn not database an point for waste (a binary choice). We would only cognize if she did because past we’d person a measurable count of nan event. Nothing stops her from listing different point successful nan aforesaid month. If she does, we count those events. The full could beryllium zero for an inactive seller or, say, 120 for a highly engaged seller.

Over respective months, we would observe a varying number of listed items by this seller — sometimes fewer, sometimes much — hovering astir an mean monthly listing rate. That is fundamentally a Poisson process. When we get to nan assumptions section, you’ll spot what we had to presume distant to make this illustration work.

Other examples

Other phenomena that tin beryllium modelled pinch a Poisson distribution include:

  • Sports analytics: The number of goals scored successful a lucifer betwixt 2 teams.
  • Queuing: Customers arriving astatine a thief table aliases customer support calls.
  • Insurance: The number of claims made wrong a fixed period.

Each of these examples warrants further inspection, but for nan remainder of this post, we’ll usage nan marketplace illustration to exemplify nan soul workings of nan distribution.

The mathy bit

… aliases foundations.

I find opening up nan probability wide usability (PMF) of distributions adjuvant to knowing why things activity arsenic they do. The PMF of nan Poisson distribution goes like:

Where λ is nan complaint parameter, and 𝑘 is nan manifested count of nan random adaptable (𝑘 = 0, 1, 2, 3, … events). Very neat and compact.

 The probability wide usability of nan Poisson distribution, for a fewer different lambdas.The probability wide usability of nan Poisson distribution, for a fewer different lambdas.

Contextualising λ and k: nan marketplace example

In nan discourse of our earlier illustration — a seller listing items connected our level — λ represents nan seller’s mean monthly listings. As nan expected monthly worth for this seller, λ orchestrates nan number of items she would database successful a month. Note that λ is simply a Greek letter, truthful read: λ is simply a parameter that we tin estimate from data. On nan different hand, 𝑘 does not clasp immoderate accusation astir nan seller’s idiosyncratic behaviour. It’s nan target worth we group for nan number of events that whitethorn hap to study astir its probability.

The dual domiciled of λ arsenic nan mean and variance

When I said that λ orchestrates nan number of monthly listings for nan seller, I meant it rather literally. Namely, λ is some nan expected worth and variance of nan distribution, indifferently, for each values of λ. This intends that nan mean-to-variance ratio (index of dispersion) is ever 1.

To put this into perspective, nan normal distribution requires 2 parameters — 𝜇 and 𝜎², nan mean and variance respectively — to afloat picture it. The Poisson distribution achieves nan aforesaid pinch conscionable one.

Having to estimate only 1 parameter tin beryllium beneficial for parametric inference. Specifically, by reducing nan variance of nan exemplary and expanding nan statistical power. On nan different hand, it tin beryllium excessively limiting of an assumption. Alternatives for illustration nan Negative Binomial distribution tin alleviate this limitation. We’ll research that later.

Breaking down nan probability wide function

Now that we cognize nan smallest building blocks, let’s zoom retired 1 step: what is λᵏ, 𝑒^⁻λ, and 𝑘!, and much importantly, what is each of these components’ usability successful nan whole?

  • λᵏ is a weight that expresses really apt it is for 𝑘 events to happen, fixed that nan anticipation is λ. Note that “likely” present does not mean a probability, yet. It’s simply a awesome strength.
  • 𝑘! is a combinatorial correction truthful that we tin opportunity that nan bid of nan events is irrelevant. The events are interchangeable.
  • 𝑒^⁻λ normalises nan integral of nan PMF usability to sum up to 1. It’s called nan partition usability of exponential-family distributions.

In much detail, λᵏ relates nan observed worth 𝑘 to nan expected worth of nan random variable, λ. Intuitively, much probability wide lies astir nan expected value. Hence, if nan observed worth lies adjacent to nan expectation, nan probability of occurring is larger than nan probability of an study acold removed from nan expectation. Before we tin cross-check our intuition pinch nan numerical behaviour of λᵏ, we request to see what 𝑘! does.

Interchangeable events

Had we cared astir nan bid of events, past each unsocial arena could beryllium ordered successful 𝑘! ways. But because we don’t, and we deem each arena interchangeable, we “divide out” 𝑘! from λᵏ to correct for nan overcounting.

Since λᵏ is an exponential term, nan output will ever beryllium larger arsenic 𝑘 grows, holding λ constant. That is nan other of our intuition that location is maximum probability erstwhile λ = 𝑘, arsenic nan output is larger erstwhile 𝑘 = λ + 1. But now that we cognize astir nan interchangeable events presumption — and nan overcounting rumor — we cognize that we person to facet successful 𝑘! for illustration so: λᵏ 𝑒^⁻λ / 𝑘!, to spot nan behaviour we expect.

Now let’s cheque nan intuition of nan narration betwixt λ and 𝑘 done λᵏ, corrected for 𝑘!. For nan aforesaid λ, opportunity λ = 4, we should spot λᵏ 𝑒^⁻λ / 𝑘! to beryllium smaller for values of 𝑘 that are acold removed from 4, compared to values of 𝑘 that dishonesty adjacent to 4. Like so: inline code: 4²/2 = 8 is smaller than 4⁴/24 = 10.7. This is accordant pinch nan intuition of a higher likelihood of 𝑘 erstwhile it’s adjacent nan expectation. The image beneath shows this narration much generally, wherever you spot that nan output is larger arsenic 𝑘 approaches λ.

 The probability wide usability without nan normalising constituent e^-lambda.The probability wide usability without nan normalising constituent e^-lambda.

The assumptions

First, let’s get 1 point disconnected nan table: nan quality betwixt a Poisson process, and the Poisson distribution. The process is a stochastic continuous-time exemplary of points happening successful fixed interval: 1D, a line; 2D, an area, aliases higher dimensions. We, information scientists, astir often woody pinch nan one-dimensional case, wherever nan “line” is time, and nan points are nan events of liking — I situation to say.

These are nan assumptions of nan Poisson process:

  1. The occurrence of 1 arena does not impact nan probability of a 2nd event. Think of our seller going connected to database different point tomorrow indifferently of having done truthful already today, aliases nan 1 from 5 days agone for that matter. The constituent present is that location is nary representation betwixt events.
  2. The mean complaint astatine which events occur, is independent of immoderate occurrence. In different words, nary arena that happened (or will happen) alters λ, which remains changeless passim nan observed timeframe. In our seller example, this intends that listing an point coming does not summation aliases alteration nan seller’s information aliases likelihood of listing different point tomorrow.
  3. Two events cannot hap astatine precisely nan aforesaid instant. If we were to zoom astatine an infinite granular level connected nan timescale, nary 2 listings could person been placed simultaneously; ever sequentially.

From these assumptions — nary memory, changeless rate, events happening unsocial — it follows that 1) immoderate interval’s number of events is Poisson-distributed pinch parameter λₜ and 2) that disjoint intervals are independent — 2 cardinal properties of a Poisson process.

A Note connected nan distribution:
The distribution simply describes probabilities for various numbers of counts successful an interval. Strictly speaking, 1 tin usage nan distribution pragmatically whenever nan information is nonnegative, tin beryllium unbounded connected nan right, has mean λ, and reasonably models nan data. It would beryllium conscionable convenient if nan underlying process is simply a Poisson one, and really justifies utilizing nan distribution.

The marketplace example: Implications

So, tin we warrant utilizing nan Poisson distribution for our marketplace example? Let’s unfastened up nan assumptions of a Poisson process and return nan test.

Constant λ

  • Why it whitethorn fail: The seller has patterned online activity; holidays; promotions; listings are seasonal goods.
  • Consequence: λ is not constant, starring to overdispersion (mean-to-variance ratio is larger than 1, aliases to temporal patterns.

Independence and memorylessness

  • Why it whitethorn fail: The propensity to database again is higher aft a successful listing, aliases conversely, listing erstwhile depletes nan banal and intervenes pinch nan propensity of listing again.
  • Consequence: Two events are nary longer independent, arsenic nan occurrence of 1 informs nan occurrence of nan other.

Simultaneous events

  • Why it whitethorn fail: Batch-listing, a caller feature, was introduced to thief nan sellers.
  • Consequence: Multiple listings would travel online astatine nan aforesaid time, clumped together, and they would beryllium counted simultaneously.

Balancing rigour and pragmatism

As Data Scientists connected nan job, we whitethorn consciousness trapped betwixt rigour and pragmatism. The 3 steps beneath should springiness you a sound instauration to determine connected which broadside to err, erstwhile nan Poisson distribution falls short:

  1. Pinpoint your goal: is it inference, simulation aliases prediction, and is it astir high-stakes output? List nan worst point that tin happen, and nan costs of it for nan business.
  2. Identify nan problem and solution: why does nan Poisson distribution not fit, and what tin you do astir it? database 2-3 solutions, including changing nothing.
  3. Balance gains and costs: Will your workaround amended things, aliases make it worse? and astatine what cost: interpretability, caller assumptions introduced and resources used. Does it thief you successful achieving your goal?

That said, present are immoderate counters I usage erstwhile needed.

When existent life deviates from your model

Everything described truthful acold pertains to nan standard, aliases homogenous, Poisson process. But what if reality begs for thing different?

In nan adjacent section, we’ll screen 2 extensions of nan Poisson distribution erstwhile nan changeless λ presumption does not hold. These are not mutually exclusive, but neither they are nan same:

  1. Time-varying λ: a azygous seller whose listing complaint ramps up earlier holidays and slows down afterward
  2. Mixed Poisson distribution: multiple sellers listing items, each pinch their ain λ tin beryllium seen arsenic a substance of various Poisson processes

Time-varying λ

The first hold allows λ to person its ain worth for each time t. The PMF past becomes

Where nan number of events 𝐾(𝑇) successful an interval 𝑇 follows nan Poisson distribution pinch a complaint nary longer adjacent to a fixed λ, but 1 adjacent to:

More intuitively, integrating complete nan interval 𝑡 to 𝑡 + 𝑖 gives america a azygous number: nan expected worth of events complete that interval. The integral will alteration by each arbitrary interval, and that’s what makes λ alteration complete time. To understand really that integration works, it was adjuvant for maine to deliberation of it for illustration this: if nan interval 𝑡 to 𝑡₁ integrates to 3, and 𝑡₁ to 𝑡₂ integrates to 5, past nan interval 𝑡 to 𝑡₂ integrates to 8 = 3 + 5. That’s nan 2 expectations summed up, and now nan anticipation of nan full interval.

Practical implication 
One whitethorn want to modeling nan expected worth of nan Poisson distribution arsenic a usability of time. For instance, to exemplary an wide alteration successful trend, aliases seasonality. In generative exemplary notation:

Time whitethorn beryllium a continuous variable, aliases an arbitrary usability of it.

Process-varying λ: Mixed Poisson distribution

But past there’s a gotcha. Remember erstwhile I said that λ has a dual domiciled arsenic nan mean and variance? That still applies here. Looking astatine nan “relaxed” PMF*, nan only point that changes is that λ tin alteration freely pinch time. But it’s still nan 1 and only λ that orchestrates some nan expected worth and nan dispersion of nan PMF*. More precisely, 𝔼[𝑋] = Var(𝑋) still holds.

There are various reasons for this constraint not to clasp successful reality. Model misspecification, arena interdependence and unaccounted for heterogeneity could beryllium nan issues astatine hand. I’d for illustration to attraction connected nan second case, arsenic it justifies nan Negative Binomial distribution — 1 of nan topics I promised to unfastened up.

Heterogeneity and overdispersion
Imagine we are not dealing pinch 1 seller, but pinch 10 of them listing astatine different strength levels, λᵢ, wherever 𝑖 = 1, 2, 3, …, 10 sellers. Then, essentially, we person 10 Poisson processes going on. If we unify nan processes and estimate nan expansive λ, we simplify nan substance away. Meaning, we get a correct estimate of each sellers connected average, but nan resulting expansive λ is naive and does not cognize astir nan original dispersed of λᵢ. It still assumes that nan variance and mean are equal, arsenic per nan axioms of nan distribution. This will lead to overdispersion and, successful turn, to underestimated errors. Ultimately, it inflates nan mendacious affirmative complaint and drives mediocre decision-making. We request a measurement to clasp nan heterogeneity amongst sellers’ λᵢ.

Negative binomial: Extending nan Poisson distribution
Among nan fewer ways 1 tin look astatine nan Negative Binomial distribution, 1 measurement is to spot it arsenic a compound Poisson process — 10 sellers, sounds acquainted yet? That intends aggregate independent Poisson processes are summed up to a azygous one. Mathematically, first we tie λ from a Gamma distribution: λ ~ Γ(r, θ), past we tie nan count 𝑋 | λ ~ Poisson(λ).

In 1 image, it is arsenic if we would sample from plentifulness Poisson distributions, corresponding to each seller.

A antagonistic Binomial distribution arises from galore Poisson distributions.A antagonistic Binomial distribution arises from galore Poisson distributions.

The much exposing othername of nan Negative binomial distribution is Gamma-Poisson substance distribution, and now we cognize why: nan dictating λ comes from a continuous mixture. That’s what we needed to explicate nan heterogeneity amongst sellers.

Let’s simulate this script to summation much intuition.

Gamma substance of lambda.Gamma substance of lambda.

First, we tie λᵢ from a Gamma distribution: λᵢ ~ Γ(r, θ). Intuitively, nan Gamma distribution tells america astir nan assortment successful nan strength — listing complaint — amongst nan sellers.

On a applicable note, 1 tin instill their assumptions astir nan grade of heterogeneity successful this measurement of nan model: how different are sellers? By varying nan levels of heterogeneity, 1 tin observe nan effect connected nan last Poisson-like distribution. Doing this type of checks (i.e., posterior predictive check), is communal successful Bayesian modeling, wherever nan assumptions are group explicitly.

Gamma-Poisson substance distribution versus homogenous Poisson distribution. Τhe dashed statement reflects λ, which is 4 for some distributions.Gamma-Poisson substance distribution versus homogenous Poisson distribution. Τhe dashed statement reflects λ, which is 4 for some distributions.

In nan 2nd step, we plug nan obtained λ into nan Poisson distribution: 𝑋 | λ ~ Poisson(λ), and get a Poisson-like distribution that represents nan summed subprocesses. Notably, this unified process has a larger dispersion than expected from a homogeneous Poisson distribution, but it is successful statement pinch nan Gamma substance of λ.

Heterogeneous λ and inference

A applicable consequence of introducing elasticity into your assumed distribution is that conclusion becomes much challenging. More parameters (i.e., nan Gamma parameters) request to beryllium estimated. Parameters enactment arsenic elastic explainers of nan data, tending to overfit and explicate distant variance successful your variable. The much parameters you have, nan amended nan mentation whitethorn seem, but nan exemplary besides becomes much susceptible to sound successful nan data. Higher variance reduces nan powerfulness to place a quality successful means, if 1 exists, because — good — it gets mislaid successful nan variance.

Countering nan nonaccomplishment of power

  1. Confirm whether you so request to widen nan modular Poisson distribution. If not, simplify to nan simplest, astir fresh model. A speedy cheque connected overdispersion whitethorn suffice for this.
  2. Pin down nan estimates of nan Gamma substance distribution parameters utilizing regularising, informative priors (think: Bayes).

During my investigation process for penning this blog, I learned a awesome woody astir nan connective insubstantial underlying each of this: really nan binomial distribution plays a basal domiciled successful nan processes we’ve discussed. And while I’d emotion to ramble connected astir this, I’ll prevention it for different post, perhaps. In nan meantime, consciousness free to stock your knowing successful nan comments conception beneath 👍.

Conclusion

The Poisson distribution is simply a elemental distribution that tin beryllium highly suitable for modelling count data. However, erstwhile nan assumptions do not hold, 1 tin widen nan distribution by allowing nan complaint parameter to alteration arsenic a usability of clip aliases different factors, aliases by assuming subprocesses that collectively dress up nan count data. This added elasticity tin reside nan limitations, but it comes astatine a cost: accrued elasticity successful your modelling raises nan variance and, consequently, undermines nan statistical powerfulness of your model.

If your extremity end is inference, you whitethorn want to deliberation doubly and see exploring simpler models for nan data. Alternatively, move to nan Bayesian paradigm and leverage its built-in solution to regularise estimates: informative priors.

I dream this has fixed you what you came for — a amended intuition astir nan Poisson distribution. I’d emotion to perceive your thoughts astir this successful nan comments!

Unless different noted, each images are by nan author.
Originally published at https://aalvarezperez.github.io on January 5, 2025.

More