Lecture 3 · Tue, 22 Sept 2026

Foundations and preprocessing pipelines

Data cleaning, feature engineering, and scikit-learn pipelines

Open slides

Foundations and Preprocessing Pipelines

Preprocessing is part of the model, not a separate cleanup step. This lecture covers the key preprocessing tasks (imputation, encoding, scaling), why the split-first rule prevents data leakage, and how scikit-learn’s Pipeline and ColumnTransformer keep the workflow reproducible. We also discuss how different models require different preprocessing — tree-based methods vs distance-based methods vs linear models.

MST0052 Predictive Modelling with Machine Learning · Fall 2026 · BI Norwegian Business School