Linear Regression Overview

Main Objective

Predict the value of an unobserved variable y based on knowledge of a related variable x.

Scope

  • Focus on cases where y has a continuous range
  • Focus on linear predictors (linear regression)
  • We do not discuss classification problems (predict the "type" of an individual)

Learning Approach

Focus on supervised learning: learning from labeled examples

Regression Components

  1. Formulation
  2. Solution
  3. Interpretation

Performance Assessment

Classical Approach

  • Quantifying the quality of the estimates
  • Confidence intervals and hypothesis testing

Enhancements

  • Use nonlinear features of the data
  • Regularization
  • Data driven performance assessments

Medical Example

If you are doctors:

  • X: symptoms, test results etc.
  • Y: state of health

Training: We trained on lots of patients in the past. Goal: "Predict" Y based on X.

Two types of predictions:

  1. Y: sick or not (binary) [classification]
  2. Y: life expectancy (any real number) [regression]

Data Flow

Data -> ML -> Prediction

Where data consists of (x₁, y₁), ..., (xₙ, yₙ)

Definition

Regression is a statistical approach to build the relationship between the dependent and one or more independent variables.

Objective

We want to create a theory and understand the mechanism of how the Xs cause the Ys. So want to deploy statistical methods to create a model, a full probabilistic model that relates the Xs to the Ys.

Notable Quote

"All models are wrong, some are useful." - George E.P. Box

The data flow diagram shown in the handwritten notes is a simple linear flow: Data → ML → Prediction with an arrow pointing to "Data X" from "Prediction"

This author has not provided a description.
Last updated on 2025-01-16