STATS 3DS3 (Winter 2021, McMaster)

Description

Course link.

Syllabus

Course overview

An introduction to data science theory is provided with some focus on analytics. Topics covered include an introduction to R and other appropriate computational platforms, data types, data manipulation, data frames, data visualization, data reporting, statistical/machine learning, classification, clustering, cross-validation, classification and regression trees, gradient boosting, ridge regression, LASSO, and generalized additive models. Familiarity with some computer package, e.g., SAS, Python, or MatLab, is required. This course includes a scientific communication component.

Expected outcomes

Upon completion of this course, the student will be able to:

  • use visualization tools to explore the data using R
  • perform analysis using unsupervised and supervised learning methods
  • analyze a real data set of moderate size using R and interpret the output
  • write reusable data analysis reports using R, RStudio, and RMarkdown

Course Information

2 Lectures and 1 Lab.

Prerequisites

One of ECON 3EE3 or PNB 3XE3 or SFWRTECH 4DA3 or STATS 3A03.

Textbook

Software

This course uses R and RStudio, which are both free. We recommend to set-up the computing environment earlier. We will also use the first-week lab session to ensure everyone is up and running with the computing environment.

  • Install the following software (follow the interactive tutorial to install R and R Studio: click here):
    • R (required): https://www.r-project.org/.
    • RStudio is highly recommended for syntax highlighting, package management, document generation, and more: https://www.rstudio.com/.
      • The newest version of RStudio is highly recommended.
    • Latex, which will enable you to create PDFs directly from the RMarkdown in RStudio.
      • Install TinyTex package: install.packages("tinytex", repos = "https://cloud.r-project.org/").
      • After installing TinyTex, close RStudio.
      • Reopen RStudio.
      • Run the following: tinytex::install_tinytex().

Evaluation

The homework assignments and bonus points will determine the final letter grade for this course:

  • Six homework assignments will be assigned.
  • The best five assignments will count towards your overall grade.
  • Each of your five best assignments will be worth 20% of the final grade.

The final percentage to letter grade conversion will follow McMaster’s Grading Scale.

Lecture notes

Course Schedule

Date Week Topic Readings Notes
01/11/2021 or 01/12/2021 Week 1 (Lab 1) Set up R, R Studio, RMarkdown example Interactive tutorial to install R and R Studio, RMarkdown for Scientists Chapters 1-5
01/13/2021 Week 1 (Lecture 1) Course introduction RDS: 1, 2, and 4, RMarkdown for Scientists Chapters 6-13
01/15/2021 Week 1 (Lecture 2) Data visualization I RDS: 3, 7.1-7.2, Modern Statistics for Modern Biology: Chapter 3
01/18/2021 or 01/19/2021 Week 2 (Lab 2) Week 1 computing in R Homework assignment template and submission, R tips - RDS: 4, RStudio diagnostics - RDS: 6, Data visualization - RDS : 7.1-7.2
01/20/2021 Week 2 (Lecture 3) Data visualization II RDS: 5, 7.3-7.8 Homework 1 posted
01/22/2021 Week 2 (Lecture 4) Data visualization III 1) RDS: 7.3-7.8, Word clouds - Text Mining With R Case study: comparing Twitter archives, Network - Modern Statistics for Modern Biology Chapter 10, Time series plots - https://www.r-graph-gallery.com/279-plotting-time-series-with-ggplot2.html
01/25/2021 or 01/26/2021 Week 3 (Lab 3) Week 2 computing in R RDS: Chapter 8 (create an RProject for STATS3DS3), R codes for Lecture 3, R codes for Lecture 4, Word clouds, Network, Time series plots
01/27/2021 Week 3 (Lecture 5) Interactive visualization and Shiny Watch the video
01/29/2021 Week 3 (Lecture 6) Classification ISLR: Pages 39 - 42 (K-Nearest Neighbors) Homework 1 Due
02/01/2021 or 02/02/2021 Week 4 (Lab 4) Week 3 computing in R RMarkdown - RDS : 26-27.4.2, Shiny (Lecture 5), KNN Classifier (Lecture 6) Homework 2 posted
02/03/2021 Week 4 (Lecture 7) Classification tree ISLR: 8.1, 8.1.2, 8.1.4
02/05/2021 Week 4 (Lecture 8) Regression tree ISLR: 8.1.1, 8.1.4
02/08/2021 or 02/09/2021 Week 5 (Lab 5) Week 4 computing in R Classification trees, Strings in R - RDS: 14
02/10/2021 Week 5 (Lecture 9) Cross-validation ISLR: 5.1
02/12/2021 Week 5 (Lecture 10) Bagging ISLR: 8.2.1 Homework 3 posted, Homework 2 Due
02/15/2021 or 02/16/2021 Week 6 Midterm recess
02/17/2021 Week 6 Midterm recess
02/19/2021 Week 6 Midterm recess
02/22/2021 or 02/23/2021 Week 7 (Lab 6) Week 5 computing in R ISLR: 8.3.2 (regression tree), ISLR: 8.3.3 (bagging)
02/24/2021 Week 7 (Lecture 11) Random forest and boosting ISLR: 8.2.2, 8.2.3
02/26/2021 Week 7 (Lecture 12) Neural network The Elements of Statistical Learning: 11.1-11.4 Homework 4 posted, Homework 3 Due
03/01/2021 or 03/02/2021 Week 8 (Lab 7) Week 7 computing in R ISLR: 8.3.3, 8.3.4 (RF and boosting), classification using NN
03/03/2021 Week 8 (Lecture 13) Clustering I ISLR: 10.3.1
03/05/2021 Week 8 (Lecture 14) Clustering II ISLR: 10.3.1
03/08/2021 or 03/09/2021 Week 9 (Lab 8) Week 8 computing in R
03/10/2021 Week 9 (Lecture 15) PCA ISLR: 10.2
03/12/2021 Week 9 (Lecture 16) Discriminant analysis I ISLR: 4.4.1, 4.4.2 Homework 5 posted, Homework 4 Due
03/15/2021 or 03/16/2021 Week 10 (Lab 9) Week 9 computing in R
03/17/2021 Week 10 (Lecture 17) Discriminant analysis II ISLR: 4.4.3, 4.4.4
03/19/2021 Week 10 (Lecture 18) Subset selection ISLR: 6.1
03/22/2021 or 03/23/2021 Week 11 (Lab 10) Week 10 computing in R
03/24/2021 Week 11 (Lecture 19) Penalized regression I ISLR: 2.2, 6.2.1
03/26/2021 Week 11 (Lecture 20) Penalized regression II ISLR: 6.2.2, 6.2.3 Homework 6 posted, Homework 5 Due
03/29/2021 or 03/30/2021 Week 12 (Lab 11) Week 11 computing in R
03/31/2021 Week 12 (Lecture 21) Logistic regression I ISLR: 4.3.1
04/02/2021 Good Friday: No classes or examinations
04/05/2021 or 04/06/2021 Week 13 (Lab 12) Week 12 computing in R
04/07/2021 Week 13 (Lecture 22) Logistic regression II ISLR: 4.3.2, 4.3.3
04/09/2021 Week 13 (Lecture 23) Sensitivity and specificity Homework 6 Due
04/12/2021 or 04/13/2021 Week 14 (Lab 13) Week 13 computing in R
04/14/2021 Week 14 (Lecture 24) Wrap-up

R Markdown files

Available upon request.