STATS 191 (Autumn 2019, Stanford)
Description
Syllabus
Course Overview
Statistical tools for modern data analysis. Topics include regression and prediction, elements of the analysis of variance, bootstrap, and cross-validation. Emphasis is on conceptual rather than theoretical understanding. Student assignments require use of the software package R.
Expected outcomes
By the end of the course, students should be able to:
- Enter tabular data using R.
- Plot data using R, to help in exploratory data analysis.
- Formulate regression models for the data, while understanding some of the limitations and assumptions implicit in using these models.
- Fit models using R and interpret the output.
- Test for associations in a given model.
- Use diagnostic plots and tests to assess the adequacy of a particular model.
- Find confidence intervals for the effects of different explanatory variables in the model.
- Use some basic model selection procedures, as found in R, to find a best model in a class of models.
- Fit simple ANOVA models in R, treating them as special cases of multiple regression models.
- Fit simple logistic and Poisson regression models.
Course Information
- Term: Autumn 2019
- Units: 3
Textbook
- Required:
- (CH) Regression Analysis by Example.
- Authors: Samprit Chatterjee, Ali S. Hadi
- Edition: \(5^{th}\) Edition
- Print ISBN:978-0-470-90584-05
- (CH) Regression Analysis by Example.
Software
- In this course, we will use R for computing and R Markdown for producing lecture slides, solutions for homework assignments. R Markdown is highly recommended to write the solutions for homework assignments. Install the following software:
- R (required): https://www.r-project.org/.
- R Studio is highly recommended for syntax highlighting, package management, document generation, and more: https://www.rstudio.com/.
- The newest version of R Studio is highly recommended.
- LaTeX, which will enable you to create PDFs directly from the R Markdown in RStudio.
Evaluation
The final letter grade for this course will be determined by each method of assessment weighted as follows:
- 7 weekly homework assignments (55%)
- Midterm examination (15%, Wednesday, 10/23/2019)
- Final examination (30%, according to Stanford calendar: Wednesday, 12/11/2019 @ 3:30 PM, location TBD)
Lecture Notes
Course Schedule
Date | Week | Topic | Reading | Notes |
---|---|---|---|---|
09/23/2019 | Week 1 Lecture 1 | Course introduction and review | Syllabus | |
09/25/2019 | Week 1 Lecture 2 | Review | CH: 1 | |
09/27/2019 | Week 1 Lecture 3 | Some tips on R | Homework 1 posted | |
09/30/2019 | Week 2 Lecture 4 | Simple linear regression 1 (introduction, correlation, model, estimation) | CH: 2.1-2.4 | – |
10/02/2019 | Week 2 Lecture 5 | Simple linear regression 2 (inference and prediction) | CH: Chapter 2.5-2.8 | – |
10/04/2019 | Week 2 Lecture 6 | Diagnostics for simple linear regression | CH: 2.9 | Homework 2 posted, Homework 1 Due |
10/07/2019 | Week 3 Lecture 7 | Multiple linear regression 1 (introduction, model, estimation, geometry of least squares) | CH: 3.1-3.5 | – |
10/09/2019 | Week 3 Lecture 8 | Multiple linear regression 2 (interpretation, matrix formulation, estimation, inference) | CH: 3.6-3.9 | – |
10/11/2019 | Week 3 Lecture 9 | Multiple linear regression 3 (prediction, contrasts, testing) | CH: 3.10-3.11 | Homework 3 posted, Homework 2 Due |
10/14/2019 | Week 4 Lecture 10 | Diagnostics in multiple linear regression (types of residuals, influence) | CH: 4 | – |
10/16/2019 | Week 4 Lecture 11 | Diagnostics in multiple linear regression (outlier detection, residual plots) | CH: 4 | – |
10/18/2019 | Week 4 Lecture 12 | Interactions and qualitative variables (interactions) | CH: 5 | Homework 4 posted, Homework 3 Due |
10/21/2019 | Week 5 Lecture 13 | Interactions and qualitative variables (visualization, ANOVA) | CH: 5 | – |
10/23/2019 | – | – | Midterm Examinations | |
10/25/2019 | Week 5 Lecture 14 | ANOVA models (one-way ANOVA, testing, contrasts) | CH: 5 | – |
10/28/2019 | Week 6 Lecture 15 | ANOVA models (two-way ANOVA, testing, contrasts, mixed effects model) | CH: 5 | – |
10/30/2019 | Week 6 Lecture 16 | Transformations and Weighted Least Squares | CH: 6,7 | – |
11/01/2019 | Week 6 Lecture 17 | Correlated errors | CH: Chapter 8,9 | Homework 5 posted, Homework 4 Due |
11/04/2019 | Week 7 Lecture 18 | Correlated errors | CH: Chapter 8,9 | – |
11/06/2019 | Week 7 Lecture 19 | Bootstrapping regression | An Introduction to the Bootstrap by Bradley Efron, Robert Tibshirani, Chapter 9 | – |
11/08/2019 | Week 7 Lecture 20 | Model selection | CH: 11 | Homework 6 posted, Homework 5 Due |
11/11/2019 | Week 8 Lecture 21 | Selection | CH: 11 | – |
11/13/2019 | Week 8 Lecture 22 | Selection | CH: 11 | – |
11/15/2019 | Week 8 Lecture 23 | Penalized regression | CH: 10 | Homework 7 posted, Homework 6 Due |
11/18/2019 | Week 9 Lecture 24 | Penalized regression | CH: 10 | – |
11/20/2019 | Week 9 Lecture 25 | Penalized regression | CH: 10 | – |
11/22/2019 | Week 9 Lecture 26 | Logistic regression | CH: 12 | Homework 7 Due |
11/25/2019 | – | – | – | Thanksgiving Recess (no classes) |
11/27/2019 | – | – | – | Thanksgiving Recess (no classes) |
11/29/2019 | – | – | – | Thanksgiving Recess (no classes) |
12/02/2019 | Week 10 Lecture 27 | Logistic regression | CH: 12 | – |
12/04/2019 | Week 10 Lecture 28 | Poisson regression | CH: Chapter 13.3 | – |
12/06/2019 | Week 10 Lecture 29 | Final Review | Review will be posted | – |
12/11/2019 | – | – | End-Quarter examinations |
R Markdown files
R Markdown files to create the lecture slides and PDFs are available in https://github.com/PratheepaJ/STATS191.