Statistics — Describing & Inferring

Describing data — centre & spread

ELI5: two numbers summarize a pile of data — where it centres and how spread out it is.

mean = the balance point (average)
median = the middle value (robust to outliers)
standard deviation = the typical distance from the mean.

data := [4, 8, 6, 5, 7]

[4, 8, 6, 5, 7]

five measurements

m_{data} = \mathrm{mean}\left(data\right)

6

the average: 6

med_{data} = \mathrm{median}\left(data\right)

6

the middle value: 6

sd_{data} = \mathrm{std}\left(data\right)

1.5811

typical spread ≈ 1.58

✓ pass

m_{data} == 6 and med_{data} == 6

centre pinned two ways

Real-world hook: mean vs median is why "average income" misleads (a few billionaires drag the mean up) while the median reflects a typical person; standard deviation is the error bar on every scientific measurement.

Try it yourself: the dataset [10, 12, 14] — what is its mean?

mean_{you} := mean([10, 12, 14])

12

✏️ Your turn: average the three numbers (add them, divide by 3).

✓ pass

abs(mean_{you} - mean([10, 12, 14])) < 1e-9

green when it matches the mean of [10, 12, 14]