Statistical Methods

This page uses a local copy of KaTeX to render math notation, which is free software licensed under the MIT license. Without JavaScript, some math notation may be difficult to read. PS, if someone knows how to render math server-side with the Zola SSG, let me know.

Table of Contents

Summarising data
- Graphical summaries
- Descriptive summaries

In GUI browsers, I have this page themed dynamically, so when it's dark outside, you'll get a dark theme. You need JS enabled if you want the dynamic theme; I don't run any JS-based analytics or tracking.

Summarising data

data distribution: we want to know what the data looks like

a good summary needs to show location, spread, range, extremes, gaps/holes, symmetry, etc.

Graphical summaries

Frequency distribution (table)

Grade	Frequency
5	2
6	1
7	3
8	2
9	1
10	2

Bar chart

Pareto bar chart

orders categories based on frequency. only for nominal level of measurement

Pie chart

size of pieces of pie shows frequency of category.

Histogram

size of bar shows frequency of that category.

Time series

shows quantity that varies over time.

Descriptive summaries

qualitative description:

shape:
location: position on x axis (around 0, around 10, etc.)
dispersion: spread out graph == large dispersion

numerical description:

location: measure of center
- mean: average (sum everything, divide by the total number)
- median: sort, find the middle number
- mode: most often occurring value (highest frequency)
  - unimodal: unique mode
  - bimodal: two modes
  - multimodal: more than two modes
dispersion:
- measures of variation
  - sample standard deviation (how much values deviate from mean)
    - same units as data (unlike variance)
    - standard deviation is $\sqrt{s^{2}}$
    - $s^{2} = \frac{\sum_{i=1} n(x_{i} - \bar{x}^{2})}{n-1}$
    - for population: σ², σ
  - range
    - (minimum - maximum)
    - sensitive to extreme values
- relative standing
  - percentiles, quartiles (special percentiles: Q1, Q2 (median), Q3)
  - IQR: interquartile range = (Q3 - Q1)
  - 5-number summary: min, Q1, median (Q2), Q3, max
    - boxplot is graph of this
    - whiskers are lines from box (by default, not more than 1.5 × IQR
    - outliers: points outside of whiskers