Supervised learning
Learn functional relationship (“unknown target function”) from an observation to a target.
Assume unknown conditional target distribution p(y|x).
If we calculate f(x), add noise from noise distribution (Bernoulli/categorical for discrete target, normal for continuous).
- Separate dataset into training, validation, test.
- Learn function that fits our observed data in training set
- should stratify training set if unbalanced (e.g. oversample)
- Evaluate generalizability of function on test set
- Stop learning process based on validation set.
- If small dataset, use cross validation.
Error measure:
- assume hypothesis for target function h. How far is h from f (risk), what is its value per point (loss).
- approximate it using the data we have
- in sample error: error made on training set
- out of sample error: error that you make on all the other possible elements
- we try to minimize in sample error
Model selection:
- select hypothesis with lowest in-sample error on validation set
- watch out for overfitting, don’t use too many features
- PAC (“probably approximately correct”) learnable – formal definition of an “almost perfect” model
- VC dimension: the max number of input vectors (points) that can be shattered (model can represent every possible labelling)
- all hypothesis sets with finite VC-dimension are PAC learnable
Predictive modeling without notion of time
- Think about the learning setup (what do you want to learn)
- Don’t overfit, select features with forward and backward selection, consider regularization (punishing more complex models)
- forward selection: iteratively add most predictive feature
- backward selection: iteratively remove least predictive feature
- regularization: add term to error function to punish more complex models