Classification vs Regression

Understand the two core ML tasks, decision boundaries, evaluation metrics, and logistic regression implementation.

Intermediate · 16 min read

Two Fundamental ML Tasks

Every supervised ML problem is either classification (predicting a category) or regression (predicting a continuous number). The choice determines your model architecture, loss function, and evaluation metrics.

Classification	Regression
Predicts discrete categories (cat/dog, spam/not-spam)	Predicts continuous numbers (price, temperature)
Output: probabilities + class label	Output: single numeric value
Loss: cross-entropy	Loss: mean squared error (MSE)
Metrics: accuracy, precision, recall, F1	Metrics: MSE, RMSE, MAE, R²
Examples: email spam, disease diagnosis, image recognition	Examples: house prices, stock forecasting, weather prediction

Decision Boundaries

In classification, the model learns a decision boundary — an invisible line (or surface) that separates different classes. A linear model draws a straight line; neural networks can learn curved, complex boundaries.

TIP: Analogy: A decision boundary is like a fence between two properties. A simple fence is a straight line. A complex fence might curve around trees and gardens. More complex models build more flexible fences.

The Confusion Matrix

For classification, the confusion matrix tells you exactly where the model succeeds and fails:

	Predicted Positive	Predicted Negative
Actually Positive	True Positive (TP) ✓	False Negative (FN) — missed it
Actually Negative	False Positive (FP) — false alarm	True Negative (TN) ✓

Evaluation Metrics

Metric	Formula	When to Use
Accuracy	(TP + TN) / Total	Balanced classes only
Precision	TP / (TP + FP)	When false positives are costly (spam filter)
Recall	TP / (TP + FN)	When false negatives are costly (cancer detection)
F1 Score	2 × (P × R) / (P + R)	Balance between precision and recall

CAUTION: Accuracy is misleading with imbalanced data. If 99% of emails are not spam, a model that always predicts "not spam" has 99% accuracy but catches zero spam. Use precision, recall, and F1 instead.

Logistic Regression Implementation

Logistic regression is linear regression wrapped with a sigmoid function, turning a continuous output into a probability between 0 and 1:

import numpy as np

class LogisticRegression:
    """Binary classifier using sigmoid activation."""

    def __init__(self, lr: float = 0.1):
        self.lr = lr
        self.w = None
        self.b = None

    def sigmoid(self, z: np.ndarray) -> np.ndarray:
        return 1 / (1 + np.exp(-np.clip(z, -250, 250)))

    def fit(self, X: np.ndarray, y: np.ndarray, epochs: int = 200):
        n, d = X.shape
        self.w = np.zeros(d)
        self.b = 0.0

        for epoch in range(epochs):
            # Forward pass
            z = X @ self.w + self.b
            probs = self.sigmoid(z)

            # Binary cross-entropy loss
            loss = -np.mean(y * np.log(probs + 1e-8) +
                            (1 - y) * np.log(1 - probs + 1e-8))

            # Gradients
            error = probs - y
            dw = (1 / n) * (X.T @ error)
            db = (1 / n) * np.sum(error)

            # Update
            self.w -= self.lr * dw
            self.b -= self.lr * db

            if epoch % 50 == 0:
                acc = np.mean((probs >= 0.5) == y)
                print(f"Epoch {epoch}: loss={loss:.4f}, acc={acc:.2%}")

    def predict(self, X: np.ndarray) -> np.ndarray:
        probs = self.sigmoid(X @ self.w + self.b)
        return (probs >= 0.5).astype(int)

# --- Demo: classify pass/fail based on hours studied + sleep ---
np.random.seed(42)
X = np.random.rand(200, 2) * [10, 8]  # hours studied, hours slept
y = ((X[:, 0] * 0.6 + X[:, 1] * 0.4) > 4.5).astype(float)

model = LogisticRegression(lr=0.5)
model.fit(X, y, epochs=300)

preds = model.predict(X)
accuracy = np.mean(preds == y)
print(f"\nFinal accuracy: {accuracy:.2%}")

Key Takeaways

Classification predicts categories; regression predicts numbers
The confusion matrix shows TP, FP, TN, FN — the basis for all classification metrics
Use precision when false positives are costly, recall when false negatives are costly
Logistic regression = linear model + sigmoid → outputs probability for binary classification

Part of the AI & Machine Learning series on Tekivex. Browse all tutorials or explore our open-source products.