This course introduces machine learning for business analytics, including linear, logistic, and penalized regression. Emphasis is on building interpretable models, evaluating assumptions, and communicating results, with real-world projects connecting modeling techniques to business decision-making. Prerequisites: DATA 3100 and DATA 3300
By the end of this course, students will be able to:
- Build, evaluate, and interpret models to inform decision-making for non-technical stakeholders.
- Diagnose and address violations of model assumptions to ensure appropriate model use.
- Communicate model results clearly in a business context.
Successful students in this course will demonstrate conceptual understanding and skill mastery by applying the modeling workflow within their chosen business context and as part of a group. Each student is an essential member of a community of learners and should consider the instructor as both a teacher and a mentor.
Students can focus on learning by using the following study tips:
- Prepare for class by previewing material and identifying questions.
- Engage during class by asking questions, taking notes, and actively coding.
- Apply what you learn in class by completing exercises and working on projects.
- Evaluate what you’re learning by reviewing and reflecting on course materials and exercise solutions.
- Reinforce what you’re learning by utilizing office hours and working with classmates.
After completing the course, student resumes should reflect the tools, skills, and methods they have learned and showcase the projects they have completed. For example:
DATA 5600 provides the foundation as a prerequisite for subsequent courses in the modeling sequence. This includes DATA 5610 Advanced Machine Learning for Analytics, DATA 5620 Advanced Regression for Causal Inference, and DATA 5630 Deep Forecasting.
Each student will need to bring a laptop, either their own or one rented from Utah State. While students are welcome to use their preferred tools, the following data stack is recommended and certain tools are required, as indicated below.
Python is a general purpose, open source programming language developed by computer scientists. It is the most commonly used programming language for data wrangling, visualizations, and modeling. Python will be required for the course. See the data stack training for details on how to best install and manage Python versions and project environments.
A code editor or integrated development environment (IDE), outside of an open source programming language, is a data analyst’s most important tool. Positron is a next-generation data science IDE. Built on VS Code’s open source core, Positron combines the multilingual extensibility of VS Code with essential data tools common to language-specific IDEs. See the data stack training for a summary of Positron’s data-friendly features.
GitHub is an online hosting service for project repositories managed using Git, a powerful version control system and the industry standard for software development and data projects. Git and GitHub facilitates collaboration on a single code base and enables students to organize an online portfolio of work. See the data stack training for the basics of using Git and GitHub and a project template.
Quarto is an open source publishing system that combines text, code, and output. Quarto documents are similar to Jupyter notebooks, except the content can be rendered into a variety of formats, including PDFs, Word documents, PowerPoint presentations, Revealjs slide decks, interactive dashboards, websites, etc. While Quarto is not required for the course, students will be required to submit code and output in a PDF format. See the data stack training for more details on Quarto, including how to use Quarto to render a Jupyter notebook into a PDF.
Students may use their preferred AI to assist in studying and completing assignments. All students have access to Copilot through Utah State. However, students must remember that the objective of this course is learning. AI can contribute to learning, including helping to debug code and explain concepts in new ways. AI can also be a detriment to learning, including when students use AI to think for them. See the data stack training for details on getting access to AI and a discussion on using AI responsibly.
Assignments are designed to be aligned with what students will be expected to do in practice. No credit will be given for late work unless an arrangement is made prior to the relevant deadline. Students are encouraged to review their graded work and ask questions to avoid repeated mistakes.
Letter grades will follow the standard rubric and will be determined as follows.
| A | 93-100% | B- | 80-82% | D+ | 67-69% |
| A- | 90-92% | C+ | 77-79% | D | 63-66% |
| B+ | 87-89% | C | 73-76% | D- | 60-62% |
| B | 83-86% | C- | 70-72% | E | 0-59% |
Each lecture ends with an exercise designed to help students practice what was covered in class and prepare to apply it to their projects. Each exercise is due before the following class. While students are encouraged to work together, each student is required to submit their own work. Each class begins with a student being called on at random to share their exercise solution. Additionally, for each exercise, every student will be randomly assigned to review one other student’s exercise solution, including rating their work from 1-3 (i.e., “Needs Improvement,” “Good,” “Excellent”), by the end of the week that the exercise was due.
Students won’t get credit for an exercise if they don’t submit their exercise on time, aren’t prepared to share their exercise when called on at random, or don’t complete their randomly assigned exercise review on time.
Interviews are an opportunity for students to demonstrate their personal understanding and prepare for future real-world job interviews. Designed to complement exercise practice and group project work, interviews will include questions about course concepts, exercise and project work (including code), and reflections on performance in the course.
Interviews with the instructor will occur at the beginning, middle, and end of the semester during office hours or by appointment.
Projects are the focus of learning by doing in the course, serving as the means for students to apply their conceptual understanding and skill mastery both as a group and within their business domain of interest. Students will complete two group projects, one focused on regression and another focused on classification. The groups will both present and submit a report.
The week before the presentations, groups will submit a draft of their slides to get feedback and have time for revision. The other students in the class, as well as the group members themselves, will help evaluate each of the presentations.
Please note that the instructor reserves the right to change the following schedule at any time and will provide students sufficient notice as it relates to assignment deadlines.
- Regression and Machine Learning
- Modeling Workflow
- Decisions and Data
- Probability and Statistics
- Linear Models
- Validity, Representativeness, and Linearity
- Independence, Constant Variance, Normality, and Identifiability
- Ordinary Least Squares
- Frequentist and Bayesian Inference
- Model Evaluation and Prediction
- Communicating Results
- Presentations
- Asymmetric Loss
- Generalized Linear Models
- Logistic Regression
- Maximum Likelihood Estimation
- Spring Break
- Hyperparameters
- Confusion and Cross-Validation
- Penalized Regression
- Ridge Regression, LASSO, and Elastic Net
- Dimensionality Reduction
- Principal Component Regression
- Interactions
- Multilevel Models
- Presentations
- Regression and Other Stories

