Systematic Approaches to Model Evaluation

We need systematic ways to accomplish several key tasks in model evaluation:

  1. Assess Standard Error
  2. Set Hyperparameters
  3. Choose Variables/Features to Include
  4. Try Less or More "Complex" Models
  5. Choose Between Different Learning Algorithms

Goal and Context

Our primary objective is to achieve both:

  • High R2R^2 (coefficient of determination)
  • Strong generalization capability (which enables good performance on new data)

This is particularly crucial when we don't have applicable formulas to guide us.

Core Question

How do we assess predictor performance based purely on data, without relying on formulas?

Validation Process

The validation process follows these steps:

  1. Data Division: We (randomly) divide our data into two parts:
    • Training set
    • Validation (hold-out) set
  2. Model Development: We perform regression on the training set to fit a model using a chosen algorithm.
  3. Error Assessment:
    • Use the validation set to measure prediction errors
    • Compare different predictors and algorithms
    • Repeat this process systematically for each candidate predictor and algorithm

Model Selection Strategy

  1. Choose the model structure that performs best on the validation set
  2. Important consideration: Be aware that the model might have gotten "lucky" and be overfitting the data set
  3. Final evaluation should be performed on a third data set (test set)

Hyperparameters

Hyperparameters serve two crucial functions:

  1. They are clearly specified parameters that control the training process
  2. They are required for model optimization

Visual Representation

Let me create a visualization of the relationship between prediction error and number of features:

This graph illustrates several key concepts:

  1. The x-axis shows model complexity (number of features)
  2. The y-axis shows prediction error
  3. Two curves are shown:
    • Training error (decreases with complexity)
    • Validation error (U-shaped curve)
  4. The graph highlights the regions of:
    • Underfitting (high error on both sets)
    • Optimal fitting (balanced error)
    • Overfitting (low training error but high validation error)

This visualization helps understand why we need to find the right balance in model complexity to achieve optimal generalization.