Introduction: Data
statistics: the science of data - collecting, organising, analysing, interpreting, presenting
sample: a selected subcollection from the population
Collecting sample data
concepts:
- variables:
- independent: might cause the effect being studied
- dependent: represents the effect being studied
- confounding: when there’s too many variables and you have no clue wtf is causing the effect
sampling methods:
- voluntary response: subjects decide to be included
- random: each member from population has equal probability to be selected
- simple random: each sample of size n has equal probability to be selected
- systematic: after starting point, select every k-th member (based on a system)
- convenience: choose what’s convenient
- startified: split population into subgroups with same characteristics, simple random sample each group
- cluster: split population into clusters, then randomly select some of them
types of studies:
- observational study: subjects observed, not modified
- retrospective: data from past
- cross-sectional: data from one point in time
- prospective: data to be collected (future)
- experiment: some treatment applied to subjects
- sometimes control and treatment group
- gotta watch out for placebo and observer effects
Types of data
What to do with data?
- parameter: numerical measurement of population (in Greek: μ, σ, …)
- statistic: numerical measurement of sample (in English: $\bar{x}$, s, …)
data can be:
- qualitative: names or labels (strings)
- quantitative: numbers (ints, floats)
- discrete: countable
- continuous: not countable (on a continuous scale like length, weight, distance)
you have different levels of measurement:
- qualitative:
- nominal: no ordering (gender, eye color)
- ordinal: ordering, but differences between categories have no meaning (e.g. agree/disagree)
- quantitative:
- interval: ordering, differences, but no natural zero point (year of birth, temperatures in F/C)
- ratio: ordering, differences, natural zero point (body length, marathon times)