Key Concepts
-
Distribution of estimator hat \hat{\theta}
-
Standard error of an estimator and confidence interval
-
Hypothesis Testing
Estimator Properties and Distribution
Reliability of Estimates
-
Question: How noisy/reliable are estimates of θ^*?
-
Problem: If predictors we get using random finite sample are very different, we can't trust our predictor.
-
The estimate \hat{\theta}_j is random (due to randomness in sampling or noisy data)
Distribution Properties
-
Typically normally distributed: Mean: \theta_j
-
Standard deviation: \sigma_j (standard error)
[Visualization: A bell curve (normal distribution) centered at \theta_j^* representing the distribution of the estimator]
-
On average, our estimates will equal the true value
-
For any particular instance of regression, it will be different
-
How far away it will be is described by the normal distribution
Standard Error and Confidence Intervals
-
How far off depends on the width of normal distribution
-
Width is captured by standard deviation (std) or variance
-
σj\sigma_jσj is the std of the normal distribution, called standard error
-
Standard error tells us how big the mistake is
-
Smaller standard error is better
95% Confidence Rule
-
If we go ±2 standard deviations from the mean: Captures 95% of the distribution Probability of being more than 2 std away from mean is just 5%
Example with Real Data
-
In 2005, percentage of adults who smoked was 20.7%
-
95% confidence range surrounding the estimate = ±1.1%
-
95% certain the actual percentage was between 19.6% to 21.8%
Mathematical Notation
Hypothesis Testing
Testing θ^*_j = 0
-
Null hypothesis: θj∗ = 0
-
Wald test shown with confidence interval relative to 0
P-value Interpretation
-
P-value: Probability of seeing something at least as extreme as the observed \hat{\theta}_j, under \theta_j^* = 0
-
Reject if p-value < 0.05
Important Interpretation Notes
-
Interpretation of hypothesis tests needs to be careful
-
Many papers' interpretations are wrong
-
When not rejecting null:
-
No effect: \theta_j^* is zero
-
Small effect: \theta_j^* is non-zero, but too close for data to tell
-
Too few data: \theta_j^* is non-zero, but dataset is too small to provide evidence
[The graphs shown include normal distributions with confidence intervals and critical regions for hypothesis testing. They illustrate the concepts of confidence intervals and hypothesis testing regions.]
Comments NOTHING