Linear Regression Overview
Main Objective
Predict the value of an unobserved variable y based on knowledge of a related variable x.
Scope
- Focus on cases where y has a continuous range
- Focus on linear predictors (linear regression)
- We do not discuss classification problems (predict the "type" of an individual)
Learning Approach
Focus on supervised learning: learning from labeled examples
Regression Components
- Formulation
- Solution
- Interpretation
Performance Assessment
Classical Approach
- Quantifying the quality of the estimates
- Confidence intervals and hypothesis testing
Enhancements
- Use nonlinear features of the data
- Regularization
- Data driven performance assessments
Medical Example
If you are doctors:
- X: symptoms, test results etc.
- Y: state of health
Training: We trained on lots of patients in the past. Goal: "Predict" Y based on X.
Two types of predictions:
- Y: sick or not (binary) [classification]
- Y: life expectancy (any real number) [regression]
Data Flow
Data -> ML -> Prediction
Where data consists of (x₁, y₁), ..., (xₙ, yₙ)
Definition
Regression is a statistical approach to build the relationship between the dependent and one or more independent variables.
Objective
We want to create a theory and understand the mechanism of how the Xs cause the Ys. So want to deploy statistical methods to create a model, a full probabilistic model that relates the Xs to the Ys.
Notable Quote
"All models are wrong, some are useful." - George E.P. Box
The data flow diagram shown in the handwritten notes is a simple linear flow: Data → ML → Prediction with an arrow pointing to "Data X" from "Prediction"
Comments NOTHING