NapkinCalc

Statistics — Describing & Inferring

Describing data — centre & spread

ELI5: two numbers summarize a pile of data — where it centres and how spread out it is.

  • mean = the balance point (average)
  • median = the middle value (robust to outliers)
  • standard deviation = the typical distance from the mean.
data:=[4,8,6,5,7]data := [4, 8, 6, 5, 7] = [4,8,6,5,7][4, 8, 6, 5, 7] five measurements
mdata=mean(data)m_{data} = \mathrm{mean}\left(data\right) = 66 the average: 6
meddata=median(data)med_{data} = \mathrm{median}\left(data\right) = 66 the middle value: 6
sddata=std(data)sd_{data} = \mathrm{std}\left(data\right) = 1.58111.5811 typical spread ≈ 1.58
✓ pass mdata==6andmeddata==6m_{data} == 6 and med_{data} == 6 centre pinned two ways

Real-world hook: mean vs median is why "average income" misleads (a few billionaires drag the mean up) while the median reflects a typical person; standard deviation is the error bar on every scientific measurement.

Try it yourself: the dataset [10, 12, 14] — what is its mean?

meanyou:=mean([10,12,14])mean_{you} := mean([10, 12, 14]) = 1212 ✏️ Your turn: average the three numbers (add them, divide by 3).
✓ pass abs(meanyoumean([10,12,14]))<1e9abs(mean_{you} - mean([10, 12, 14])) < 1e-9 green when it matches the mean of [10, 12, 14]