STATS 3DS3 (Winter 2021, McMaster)
Description
Syllabus
Course overview
An introduction to data science theory is provided with some focus on analytics. Topics covered include an introduction to R and other appropriate computational platforms, data types, data manipulation, data frames, data visualization, data reporting, statistical/machine learning, classification, clustering, cross-validation, classification and regression trees, gradient boosting, ridge regression, LASSO, and generalized additive models. Familiarity with some computer package, e.g., SAS, Python, or MatLab, is required. This course includes a scientific communication component.
Expected outcomes
Upon completion of this course, the student will be able to:
Course Information
2 Lectures and 1 Lab.
Prerequisites
One of ECON 3EE3 or PNB 3XE3 or SFWRTECH 4DA3 or STATS 3A03.
Textbook
- Suggested textbooks:
- ISLR: An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani (Springer, 1st ed., 2013). - available at campus store.
- RDS: R for Data Science by Garrett Grolemund and Hadley Wickham. - available online.
- RMC: R Markdown Cookbook by Yihui Xie, Christophe Dervieux, Emily Riederer. - available online.
Software
This course uses R and RStudio, which are both free. We recommend to set-up the computing environment earlier. We will also use the first-week lab session to ensure everyone is up and running with the computing environment.
- Install the following software (follow the interactive tutorial to install R and R Studio: click here):
- R (required): https://www.r-project.org/.
- RStudio is highly recommended for syntax highlighting, package management, document generation, and more: https://www.rstudio.com/.
- The newest version of RStudio is highly recommended.
- Latex, which will enable you to create PDFs directly from the RMarkdown in RStudio.
Evaluation
The homework assignments and bonus points will determine the final letter grade for this course:
- Six homework assignments will be assigned.
- The best five assignments will count towards your overall grade.
- Each of your five best assignments will be worth 20% of the final grade.
The final percentage to letter grade conversion will follow McMaster’s Grading Scale.
Lecture notes
Course Schedule
Date | Week | Topic | Readings | Notes |
---|---|---|---|---|
01/11/2021 or 01/12/2021 | Week 1 (Lab 1) | Set up R, R Studio, RMarkdown example | Interactive tutorial to install R and R Studio, RMarkdown for Scientists Chapters 1-5 | |
01/13/2021 | Week 1 (Lecture 1) | Course introduction | RDS: 1, 2, and 4, RMarkdown for Scientists Chapters 6-13 | |
01/15/2021 | Week 1 (Lecture 2) | Data visualization I | RDS: 3, 7.1-7.2, Modern Statistics for Modern Biology: Chapter 3 | |
01/18/2021 or 01/19/2021 | Week 2 (Lab 2) | Week 1 computing in R | Homework assignment template and submission, R tips - RDS: 4, RStudio diagnostics - RDS: 6, Data visualization - RDS : 7.1-7.2 | |
01/20/2021 | Week 2 (Lecture 3) | Data visualization II | RDS: 5, 7.3-7.8 | Homework 1 posted |
01/22/2021 | Week 2 (Lecture 4) | Data visualization III | 1) RDS: 7.3-7.8, Word clouds - Text Mining With R Case study: comparing Twitter archives, Network - Modern Statistics for Modern Biology Chapter 10, Time series plots - https://www.r-graph-gallery.com/279-plotting-time-series-with-ggplot2.html | |
01/25/2021 or 01/26/2021 | Week 3 (Lab 3) | Week 2 computing in R | RDS: Chapter 8 (create an RProject for STATS3DS3), R codes for Lecture 3, R codes for Lecture 4, Word clouds, Network, Time series plots | |
01/27/2021 | Week 3 (Lecture 5) | Interactive visualization and Shiny | Watch the video | |
01/29/2021 | Week 3 (Lecture 6) | Classification | ISLR: Pages 39 - 42 (K-Nearest Neighbors) | Homework 1 Due |
02/01/2021 or 02/02/2021 | Week 4 (Lab 4) | Week 3 computing in R | RMarkdown - RDS : 26-27.4.2, Shiny (Lecture 5), KNN Classifier (Lecture 6) | Homework 2 posted |
02/03/2021 | Week 4 (Lecture 7) | Classification tree | ISLR: 8.1, 8.1.2, 8.1.4 | |
02/05/2021 | Week 4 (Lecture 8) | Regression tree | ISLR: 8.1.1, 8.1.4 | |
02/08/2021 or 02/09/2021 | Week 5 (Lab 5) | Week 4 computing in R | Classification trees, Strings in R - RDS: 14 | |
02/10/2021 | Week 5 (Lecture 9) | Cross-validation | ISLR: 5.1 | |
02/12/2021 | Week 5 (Lecture 10) | Bagging | ISLR: 8.2.1 | Homework 3 posted, Homework 2 Due |
02/15/2021 or 02/16/2021 | Week 6 | Midterm recess | ||
02/17/2021 | Week 6 | Midterm recess | ||
02/19/2021 | Week 6 | Midterm recess | ||
02/22/2021 or 02/23/2021 | Week 7 (Lab 6) | Week 5 computing in R | ISLR: 8.3.2 (regression tree), ISLR: 8.3.3 (bagging) | |
02/24/2021 | Week 7 (Lecture 11) | Random forest and boosting | ISLR: 8.2.2, 8.2.3 | |
02/26/2021 | Week 7 (Lecture 12) | Neural network | The Elements of Statistical Learning: 11.1-11.4 | Homework 4 posted, Homework 3 Due |
03/01/2021 or 03/02/2021 | Week 8 (Lab 7) | Week 7 computing in R | ISLR: 8.3.3, 8.3.4 (RF and boosting), classification using NN | |
03/03/2021 | Week 8 (Lecture 13) | Clustering I | ISLR: 10.3.1 | |
03/05/2021 | Week 8 (Lecture 14) | Clustering II | ISLR: 10.3.1 | |
03/08/2021 or 03/09/2021 | Week 9 (Lab 8) | Week 8 computing in R | ||
03/10/2021 | Week 9 (Lecture 15) | PCA | ISLR: 10.2 | |
03/12/2021 | Week 9 (Lecture 16) | Discriminant analysis I | ISLR: 4.4.1, 4.4.2 | Homework 5 posted, Homework 4 Due |
03/15/2021 or 03/16/2021 | Week 10 (Lab 9) | Week 9 computing in R | ||
03/17/2021 | Week 10 (Lecture 17) | Discriminant analysis II | ISLR: 4.4.3, 4.4.4 | |
03/19/2021 | Week 10 (Lecture 18) | Subset selection | ISLR: 6.1 | |
03/22/2021 or 03/23/2021 | Week 11 (Lab 10) | Week 10 computing in R | ||
03/24/2021 | Week 11 (Lecture 19) | Penalized regression I | ISLR: 2.2, 6.2.1 | |
03/26/2021 | Week 11 (Lecture 20) | Penalized regression II | ISLR: 6.2.2, 6.2.3 | Homework 6 posted, Homework 5 Due |
03/29/2021 or 03/30/2021 | Week 12 (Lab 11) | Week 11 computing in R | ||
03/31/2021 | Week 12 (Lecture 21) | Logistic regression I | ISLR: 4.3.1 | |
04/02/2021 | Good Friday: No classes or examinations | |||
04/05/2021 or 04/06/2021 | Week 13 (Lab 12) | Week 12 computing in R | ||
04/07/2021 | Week 13 (Lecture 22) | Logistic regression II | ISLR: 4.3.2, 4.3.3 | |
04/09/2021 | Week 13 (Lecture 23) | Sensitivity and specificity | Homework 6 Due | |
04/12/2021 or 04/13/2021 | Week 14 (Lab 13) | Week 13 computing in R | ||
04/14/2021 | Week 14 (Lecture 24) | Wrap-up |
R Markdown files
Available upon request.