Summary table of models

Introduction

Throughout the course, we will go over several supervised and unsupervised machine learning models. This page summarizes the models.

Model Type	Strengths	Limitations	Example Use Cases	Implementation
Logistic Regression	Simple and interpretable Fast to train	Assumes linear boundaries Not suitable for complex relationships	Credit approval Medical diagnosis	R / Python
Decision Trees	Intuitive Can model non-linear relationships	Prone to overfitting Sensitive to small changes in data	Customer segmentation Loan default prediction	R / Python
Random Forest	Handles overfitting Can model complex relationships	Slower to train and predict Black box model	Fraud detection Stock price movement prediction	R / Python
Support Vector Machines (SVM)	Effective in high dimensional spaces Works well with clear margin of separation	Sensitive to kernel choice Slow on large datasets	Image classification Handwriting recognition	R / Python
K-Nearest Neighbors (KNN)	Simple and intuitive No training phase	Slow during query phase Sensitive to irrelevant features and scale	Product recommendation Document classification	R / Python
Neural Networks	Capable of approximating complex functions Flexible architecture Trainable with backpropagation	Can require a large number of parameters Prone to overfitting on small data Training can be slow	Pattern recognition Basic image classification Function approximation	R / Python
Deep Learning	Can model highly complex relationships Excels with vast amounts of data State-of-the-art results in many domains	Requires a lot of data Computationally intensive Interpretability challenges	Advanced image and speech recognition Machine translation Game playing (like AlphaGo)	R / Python
Naive Bayes	Fast Works well with large feature sets	Assumes feature independence Not suitable for numerical input features	Spam detection Sentiment analysis	R / Python
Gradient Boosting Machines (GBM)	High performance Handles non-linear relationships	Prone to overfitting if not tuned Slow to train	Web search ranking Ecology predictions	R / Python
Rule-Based Classification	Transparent and explainable Easily updated and modified	Manual rule creation can be tedious May not capture complex relationships	Expert systems Business rule enforcement	R / Python
Bagging	Reduces variance Parallelizable	May not handle bias well	Random Forest is a popular example	R / Python
Boosting	Reduces bias Combines weak learners	Sensitive to noisy data and outliers	AdaBoost Gradient Boosting	R / Python
XGBoost	Scalable and efficient Regularization	Requires careful tuning Can overfit if not used correctly	Competitions on Kaggle Retail prediction	R / Python
Linear Discriminant Analysis (LDA)	Dimensionality reduction Simple and interpretable	Assumes Gaussian distributed data and equal class covariances	Face recognition Marketing segmentation	R / Python
Regularized Models (Shrinking)	Prevents overfitting Handles collinearity	Requires parameter tuning May result in loss of interpretability	Ridge and Lasso regression	R / Python
Stacking	Combines multiple models Can improve accuracy	Increases model complexity Risk of overfitting if base models are correlated	Meta-modeling Kaggle competitions	R / Python

Model Type	Strengths	Limitations	Example Use Cases	Implementation
Linear Regression	Simple and interpretable	Assumes linear relationship Sensitive to outliers	Sales forecasting Risk assessment	R / Python
Polynomial Regression	Can model non-linear relationships	Can overfit with high degrees	Growth prediction Non-linear trend modeling	R / Python
Ridge Regression	Prevents overfitting Regularizes the model	Does not perform feature selection	High-dimensional data Preventing overfitting	R / Python
Lasso Regression	Feature selection Regularizes the model	May exclude useful variables	Feature selection High-dimensional datasets	R / Python
Elastic Net Regression	Balance between Ridge and Lasso	Requires tuning for mixing parameter	High-dimensional datasets with correlated features	R / Python
Quantile Regression	Models the median or other quantiles	Less interpretable than ordinary regression	Median house price prediction Financial quantiles modeling	R / Python
Support Vector Regression (SVR)	Flexible Can handle non-linear relationships	Sensitive to kernel and hyperparameters	Stock price prediction Non-linear trend modeling	R / Python
Decision Tree Regression	Handles non-linear data Interpretable	Can overfit on noisy data	Price prediction Quality assessment	R / Python
Random Forest Regression	Handles large datasets Reduces overfitting	Requires more computational resources	Large datasets Environmental modeling	R / Python
Gradient Boosting Regression	High performance Can handle non-linear relationships	Prone to overfitting if not tuned	Web search ranking Price prediction	R / Python

Model Type	Strengths	Limitations	Example Use Cases	Implementation
K-Means Clustering	Simple and widely used Fast for large datasets	Sensitive to initial conditions Requires specifying the number of clusters	Market segmentation Image compression	R / Python
Hierarchical Clustering	Doesn’t require specifying the number of clusters Produces a dendrogram	May be computationally expensive for large datasets	Taxonomies Determining evolutionary relationships	R / Python
DBSCAN (Density-Based Clustering)	Can find arbitrarily shaped clusters Doesn’t require specifying the number of clusters	Sensitive to scale Requires density parameters to be set	Noise detection and anomaly detection	R / Python
Agglomerative Clustering	Variety of linkage criteria Produces a hierarchy of clusters	Not scalable for very large datasets	Sociological hierarchies Taxonomies	R / Python
Mean Shift Clustering	No need to specify number of clusters Can find arbitrarily shaped clusters	Computationally expensive Bandwidth parameter selection is crucial	Image analysis Computer vision tasks	R / Python
Affinity Propagation	Automatically determines the number of clusters Good for data with lots of exemplars	High computational complexity Preference parameter can be difficult to choose	Image recognition Data with many similar exemplars	R / Python
Spectral Clustering	Can capture complex cluster structures Can be used with various affinity matrices	Choice of affinity matrix is crucial Can be computationally expensive	Image and speech processing Graph-based clustering	R / Python