Intro to Statistics

🎲Intro to Statistics Unit 4 – Discrete Random Variables

Discrete random variables are a fundamental concept in statistics, describing variables that can only take on specific, countable values. These variables are crucial in modeling real-world scenarios involving counting or finite outcomes, such as the number of successes in a series of trials. This unit explores the properties of discrete random variables, including probability mass functions, expected values, and variance. It also covers common discrete distributions like binomial and Poisson, and their applications in various fields such as quality control, insurance, and clinical trials.

What Are Discrete Random Variables?

  • Discrete random variables are variables that can only take on a countable number of distinct values
  • Unlike continuous random variables, discrete random variables have a finite or countably infinite number of possible outcomes
  • Examples of discrete random variables include the number of heads in a series of coin flips or the number of defective items in a batch of products
  • Discrete random variables are often denoted by uppercase letters (X, Y, Z) and their specific values by lowercase letters (x, y, z)
  • The probability of a discrete random variable taking on a specific value is described by a probability mass function (PMF)
  • Discrete random variables are commonly used in scenarios involving counting, such as the number of successes in a fixed number of trials or the number of events occurring in a given time interval
  • The sum of probabilities for all possible values of a discrete random variable equals 1

Probability Mass Functions (PMF)

  • A probability mass function (PMF) is a function that describes the probability distribution of a discrete random variable
  • The PMF assigns a probability to each possible value of the discrete random variable
  • For a discrete random variable X, the PMF is denoted as P(X = x), where x is a specific value that X can take
  • The PMF satisfies two conditions:
    • P(X = x) ≥ 0 for all values of x
    • The sum of P(X = x) over all possible values of x equals 1
  • The PMF can be represented as a table, graph, or formula, depending on the nature of the discrete random variable
  • The cumulative distribution function (CDF) of a discrete random variable is the sum of the PMF values up to a given point
  • The CDF, denoted as F(x), represents the probability that the random variable X takes on a value less than or equal to x

Expected Value and Variance

  • The expected value (or mean) of a discrete random variable is a measure of the central tendency of its probability distribution
  • For a discrete random variable X with PMF P(X = x), the expected value is calculated as: E(X)=xxP(X=x)E(X) = \sum_{x} x \cdot P(X = x)
  • The expected value represents the average value of the random variable over a large number of trials
  • The variance of a discrete random variable measures the spread or dispersion of its probability distribution around the expected value
  • For a discrete random variable X with PMF P(X = x), the variance is calculated as: Var(X)=E(X2)[E(X)]2Var(X) = E(X^2) - [E(X)]^2
    • E(X2)E(X^2) is the expected value of the squared random variable, calculated as: E(X2)=xx2P(X=x)E(X^2) = \sum_{x} x^2 \cdot P(X = x)
  • The standard deviation is the square root of the variance and provides a measure of the average distance between the random variable's values and its expected value

Common Discrete Distributions

  • Bernoulli distribution: Models a single trial with two possible outcomes (success or failure), with a fixed probability of success (p)
  • Binomial distribution: Models the number of successes in a fixed number of independent Bernoulli trials, with a constant probability of success (p) for each trial
  • Poisson distribution: Models the number of events occurring in a fixed interval of time or space, given an average rate of occurrence (λ)
  • Geometric distribution: Models the number of trials needed to achieve the first success in a series of independent Bernoulli trials, with a constant probability of success (p) for each trial
  • Hypergeometric distribution: Models the number of successes in a fixed number of draws from a population without replacement, where the population consists of a known number of successes and failures
  • Negative binomial distribution: Models the number of failures before a specified number of successes is achieved in a series of independent Bernoulli trials, with a constant probability of success (p) for each trial

Calculating Probabilities

  • To calculate probabilities for discrete random variables, use the probability mass function (PMF) specific to the distribution
  • For the binomial distribution with parameters n (number of trials) and p (probability of success), the PMF is given by: P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
    • (nk)\binom{n}{k} represents the binomial coefficient, which can be calculated as: (nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!(n-k)!}
  • For the Poisson distribution with parameter λ (average rate of occurrence), the PMF is given by: P(X=k)=eλλkk!P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}
  • To find the probability of a range of values, sum the PMF values for each value in the range
  • For cumulative probabilities, use the cumulative distribution function (CDF) or sum the PMF values up to the desired value
  • When working with tables or graphs, be careful to identify the correct probability values and use the appropriate formulas or methods for the given distribution

Applications in Real Life

  • Quality control: The binomial distribution can be used to model the number of defective items in a batch of products, helping manufacturers make decisions about product inspection and acceptance
  • Call centers: The Poisson distribution can be used to model the number of calls arriving at a call center within a given time interval, aiding in staffing and resource allocation decisions
  • Insurance claims: The negative binomial distribution can be used to model the number of claims filed by policyholders, assisting insurance companies in setting premiums and managing risk
  • Inventory management: The geometric distribution can be used to model the number of items sold before restocking is needed, helping businesses optimize their inventory levels and minimize costs
  • Clinical trials: The hypergeometric distribution can be used to model the number of patients responding to a treatment when drawing a sample from a population with a known number of responders and non-responders
  • Rare events: The Poisson distribution is often used to model the occurrence of rare events, such as the number of earthquakes in a given region or the number of traffic accidents at a particular intersection

Key Formulas and Concepts

  • Discrete random variable: A variable that can only take on a countable number of distinct values
  • Probability mass function (PMF): A function that describes the probability distribution of a discrete random variable, denoted as P(X = x)
  • Expected value: The average value of a discrete random variable over a large number of trials, calculated as E(X)=xxP(X=x)E(X) = \sum_{x} x \cdot P(X = x)
  • Variance: A measure of the spread or dispersion of a discrete random variable's probability distribution around its expected value, calculated as Var(X)=E(X2)[E(X)]2Var(X) = E(X^2) - [E(X)]^2
  • Binomial distribution: Models the number of successes in a fixed number of independent Bernoulli trials, with PMF P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
  • Poisson distribution: Models the number of events occurring in a fixed interval of time or space, with PMF P(X=k)=eλλkk!P(X = k) = \frac{e^{-\lambda}\lambda^k}{k!}
  • Cumulative distribution function (CDF): The sum of the PMF values up to a given point, denoted as F(x)

Practice Problems and Examples

  1. A fair coin is tossed 5 times. Let X be the number of heads observed. Find the PMF of X and calculate the expected value and variance of X.
  2. A car manufacturer has found that 2% of their products are defective. If a batch of 100 cars is selected, find the probability that exactly 3 cars are defective, using the binomial distribution.
  3. A call center receives an average of 10 calls per hour. Find the probability that the call center receives exactly 5 calls in a 30-minute period, using the Poisson distribution.
  4. A basketball player has a free throw success rate of 80%. Calculate the probability that the player makes at least 3 out of 5 free throws, using the binomial distribution.
  5. A company has 20 employees, 5 of whom are managers. If a committee of 4 employees is randomly selected, find the probability that the committee includes exactly 2 managers, using the hypergeometric distribution.
  6. A machine produces bolts, and the probability of a bolt being defective is 0.1. Calculate the expected number of bolts that need to be produced until the first defective bolt is encountered, using the geometric distribution.
  7. A store has 100 light bulbs in stock, 20 of which are known to be defective. If a customer buys 10 light bulbs at random, find the probability that at most 2 of the purchased bulbs are defective, using the hypergeometric distribution.
  8. A factory produces 10,000 items per day, and the probability of an item being defective is 0.001. Use the Poisson distribution to approximate the probability that exactly 5 defective items are produced in a day.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.