Descriptive Statistics Formulas to Know for Intro to Statistics

Descriptive statistics help summarize and understand data through key measures like mean, median, and mode. These formulas provide insights into central tendencies and variability, making it easier to analyze and compare different datasets effectively.

  1. Mean (Arithmetic Average)

    • Calculated by summing all values in a dataset and dividing by the number of values.
    • Sensitive to extreme values (outliers), which can skew the mean.
    • Represents a central point of the data, useful for comparing different datasets.
  2. Median

    • The middle value when data is arranged in ascending or descending order.
    • Not affected by outliers, making it a better measure of central tendency for skewed distributions.
    • Divides the dataset into two equal halves, providing insight into the data's distribution.
  3. Mode

    • The value that appears most frequently in a dataset.
    • Can be used with nominal data, unlike mean and median.
    • A dataset may have one mode (unimodal), more than one mode (bimodal or multimodal), or no mode at all.
  4. Range

    • The difference between the maximum and minimum values in a dataset.
    • Provides a simple measure of variability but does not account for the distribution of values.
    • Useful for understanding the spread of data at a glance.
  5. Variance

    • Measures the average of the squared differences from the mean.
    • Indicates how much the data points deviate from the mean, providing insight into data variability.
    • A higher variance signifies greater dispersion in the dataset.
  6. Standard Deviation

    • The square root of the variance, representing the average distance of each data point from the mean.
    • Provides a more interpretable measure of spread than variance, as it is in the same units as the data.
    • Essential for understanding the distribution and variability of data.
  7. Interquartile Range (IQR)

    • The difference between the first quartile (Q1) and the third quartile (Q3), representing the middle 50% of the data.
    • Useful for identifying the spread of the central portion of the dataset and for detecting outliers.
    • Less affected by extreme values compared to range.
  8. Percentiles

    • Indicate the value below which a given percentage of observations fall.
    • Useful for understanding the relative standing of a value within a dataset.
    • Commonly used in educational assessments and standardized testing.
  9. Z-score

    • Measures how many standard deviations a data point is from the mean.
    • Helps identify outliers and understand the relative position of a value within a distribution.
    • A Z-score of 0 indicates the value is exactly at the mean.
  10. Skewness

    • Measures the asymmetry of the data distribution.
    • Positive skew indicates a longer tail on the right, while negative skew indicates a longer tail on the left.
    • Important for understanding the shape of the distribution and its implications for statistical analysis.
  11. Kurtosis

    • Measures the "tailedness" of the data distribution, indicating the presence of outliers.
    • High kurtosis indicates heavy tails and potential outliers, while low kurtosis indicates light tails.
    • Helps assess the risk of extreme values in a dataset.
  12. Coefficient of Variation

    • A standardized measure of dispersion calculated as the ratio of the standard deviation to the mean.
    • Useful for comparing the relative variability of datasets with different units or means.
    • A higher coefficient indicates greater relative variability, aiding in decision-making across different contexts.


© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.

© 2024 Fiveable Inc. All rights reserved.
AP® and SAT® are trademarks registered by the College Board, which is not affiliated with, and does not endorse this website.