DATA 5600 Introduction to Regression and Machine Learning for Analytics

This course introduces machine learning for business analytics, including linear, logistic, and penalized regression. Emphasis is on building interpretable models, evaluating assumptions, and communicating results, with real-world projects connecting modeling techniques to business decision-making. Prerequisites: DATA 3100 and DATA 3300

By the end of this course, students will be able to:

Build, evaluate, and interpret models to inform decision-making for non-technical stakeholders.
Diagnose and address violations of model assumptions to ensure appropriate model use.
Communicate model results clearly in a business context.

Study and Success

Successful students in this course will demonstrate conceptual understanding and skill mastery by applying the modeling workflow within their chosen business context and as part of a group. Each student is an essential member of a community of learners and should consider the instructor as both a teacher and a mentor.

Students can focus on learning by using the following study tips:

Prepare for class by previewing material and identifying questions.
Engage during class by asking questions, taking notes, and actively coding.
Apply what you learn in class by completing exercises and working on projects.
Evaluate what you’re learning by reviewing and reflecting on course materials and exercise solutions.
Reinforce what you’re learning by utilizing office hours and working with classmates.

After completing the course, student resumes should reflect the tools, skills, and methods they have learned and showcase the projects they have completed. For example:

DATA 5600 provides the foundation as a prerequisite for subsequent courses in the modeling sequence. This includes DATA 5610 Advanced Machine Learning for Analytics, DATA 5620 Advanced Regression for Causal Inference, and DATA 5630 Deep Forecasting.

Data Stack

Each student will need to bring a laptop, either their own or one rented from Utah State. While students are welcome to use their preferred tools, the following data stack is recommended and certain tools are required, as indicated below.

Python

Python is a general purpose, open source programming language developed by computer scientists. It is the most commonly used programming language for data wrangling, visualizations, and modeling. Python will be required for the course. See the data stack training for details on how to best install and manage Python versions and project environments.

Positron

A code editor or integrated development environment (IDE), outside of an open source programming language, is a data analyst’s most important tool. Positron is a next-generation data science IDE. Built on VS Code’s open source core, Positron combines the multilingual extensibility of VS Code with essential data tools common to language-specific IDEs. See the data stack training for a summary of Positron’s data-friendly features.

GitHub

GitHub is an online hosting service for project repositories managed using Git, a powerful version control system and the industry standard for software development and data projects. Git and GitHub facilitates collaboration on a single code base and enables students to organize an online portfolio of work. See the data stack training for the basics of using Git and GitHub and a project template.

Quarto

Quarto is an open source publishing system that combines text, code, and output. Quarto documents are similar to Jupyter notebooks, except the content can be rendered into a variety of formats, including PDFs, Word documents, PowerPoint presentations, Revealjs slide decks, interactive dashboards, websites, etc. While Quarto is not required for the course, students will be required to submit code and output in a PDF format. See the data stack training for more details on Quarto, including how to use Quarto to render a Jupyter notebook into a PDF.

Copilot

Students may use their preferred AI to assist in studying and completing assignments. All students have access to Copilot through Utah State. However, students must remember that the objective of this course is learning. AI can contribute to learning, including helping to debug code and explain concepts in new ways. AI can also be a detriment to learning, including when students use AI to think for them. See the data stack training for details on getting access to AI and a discussion on using AI responsibly.

Assessment

Assignments are designed to be aligned with what students will be expected to do in practice. No credit will be given for late work unless an arrangement is made prior to the relevant deadline. Students are encouraged to review their graded work and ask questions to avoid repeated mistakes.

Letter grades will follow the standard rubric and will be determined as follows.


A	93-100%	B-	80-82%	D+	67-69%
A-	90-92%	C+	77-79%	D	63-66%
B+	87-89%	C	73-76%	D-	60-62%
B	83-86%	C-	70-72%	E	0-59%

Exercises (20%)

Each lecture ends with an exercise designed to help students practice what was covered in class and prepare to apply it to their projects. Each exercise is due before the following class. While students are encouraged to work together, each student is required to submit their own work. Each class begins with a student being called on at random to share their exercise solution. Additionally, for each exercise, every student will be randomly assigned to review one other student’s exercise solution, including rating their work from 1-3 (i.e., “Needs Improvement,” “Good,” “Excellent”), by the end of the week that the exercise was due.

Students won’t get credit for an exercise if they don’t submit their exercise on time, aren’t prepared to share their exercise when called on at random, or don’t complete their randomly assigned exercise review on time.

Interviews (30%)

Interviews are an opportunity for students to demonstrate their personal understanding and prepare for future real-world job interviews. Designed to complement exercise practice and group project work, interviews will include questions about course concepts, exercise and project work (including code), and reflections on performance in the course.

Interviews with the instructor will occur at the beginning, middle, and end of the semester during office hours or by appointment.

Projects (50%)

Projects are the focus of learning by doing in the course, serving as the means for students to apply their conceptual understanding and skill mastery both as a group and within their business domain of interest. Students will complete two group projects, one focused on regression and another focused on classification. The groups will both present and submit a report.

The week before the presentations, groups will submit a draft of their slides to get feedback and have time for revision. The other students in the class, as well as the group members themselves, will help evaluate each of the presentations.

Schedule

Please note that the instructor reserves the right to change the following schedule at any time and will provide students sufficient notice as it relates to assignment deadlines.

Week 01

Regression and Machine Learning
Modeling Workflow

Week 02

Decisions and Data
Probability and Statistics

Week 03

Linear Models

Week 04

Validity, Representativeness, and Linearity
Independence, Constant Variance, Normality, and Identifiability

Week 05

Ordinary Least Squares
Frequentist and Bayesian Inference

Week 06

Model Evaluation and Prediction
Communicating Results

Week 07

Presentations

Week 08

Asymmetric Loss
Generalized Linear Models

Week 09

Logistic Regression
Maximum Likelihood Estimation

Week 10

Spring Break

Week 11

Hyperparameters
Confusion and Cross-Validation

Week 12

Penalized Regression
Ridge Regression, LASSO, and Elastic Net

Week 13

Dimensionality Reduction
Principal Component Regression

Week 14

Interactions
Multilevel Models

Week 15

Presentations
Regression and Other Stories

Name		Name	Last commit message	Last commit date
Latest commit History 289 Commits
data		data
figures		figures
topics		topics
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
README.qmd		README.qmd
_quarto.yml		_quarto.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DATA 5600 Introduction to Regression and Machine Learning for Analytics

Study and Success

Data Stack

Python

Positron

GitHub

Quarto

Copilot

Assessment

Exercises (20%)

Interviews (30%)

Projects (50%)

Schedule

Week 01

Week 02

Week 03

Week 04

Week 05

Week 06

Week 07

Week 08

Week 09

Week 10

Week 11

Week 12

Week 13

Week 14

Week 15

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DATA 5600 Introduction to Regression and Machine Learning for Analytics

Study and Success

Data Stack

Python

Positron

GitHub

Quarto

Copilot

Assessment

Exercises (20%)

Interviews (30%)

Projects (50%)

Schedule

Week 01

Week 02

Week 03

Week 04

Week 05

Week 06

Week 07

Week 08

Week 09

Week 10

Week 11

Week 12

Week 13

Week 14

Week 15

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages