Activity 7: Polynomial Features and Ridge Regression Live

Activity 7: Polynomial Features and Ridge Regression Live#

2026-02-19

Imports and setup#

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
def rmse(y_hat: np.ndarray, y: np.ndarray) -> float:
    """Root mean squared error."""
    assert y_hat.shape == y.shape
    return np.sqrt(np.mean((y_hat - y)**2))

Data loading#

We’ll use the same California housing dataset and 80/20 split from Activity 6:

# Load data and split (same as Activity 6)
housing_df = pd.read_csv("~/COMSC-335/data/housing_data.csv")

housing_train = housing_df[:4000]
housing_test = housing_df[4000:]

X_train = housing_train.drop(columns=["MedHouseVal"])
X_test = housing_test.drop(columns=["MedHouseVal"])

# apply standardization to all features, helps with numerical stability
# we'll talk more preprocessing in future lectures!
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

y_train = housing_train["MedHouseVal"].to_numpy()
y_test = housing_test["MedHouseVal"].to_numpy()

print(f"Training set: {X_train.shape[0]} examples, {X_train.shape[1]} features")
print(f"Test set: {X_test.shape[0]} examples, {X_test.shape[1]} features")
Training set: 4000 examples, 8 features
Test set: 1000 examples, 8 features

Part 2#

PolynomialFeatures is what is known as a “transformer” in scikit-learn (not the neural network kind). The standard workflow is:

  • Initialize the transformer object with any hyperparameters

  • Call fit() on the training data: if the transformer has parameters, it will learn them from the training data

  • Call transform() on both the training and test data: use the learned parameters to transform the data

# generates a degree 2 polynomial feature from a single feature $x$
poly_features = PolynomialFeatures(degree=3)

# fit the transformer to the training data
poly_features.fit(X_train)

# transform both the training and test data
X_poly_train = poly_features.transform(X_train)
X_poly_test = poly_features.transform(X_test)

# Examine the shape of the transformed data
print(X_train.shape)
print(X_poly_train.shape)
(4000, 8)
(4000, 165)

We’ll combine polynomial features with a ridge-regularized linear regression model: Ridge(alpha=a).

The Ridge model operates exactly like all of our other scikit-learn models, with a fit() method and a predict() method.

We need to set the regularization hyperparameter alpha (called \(\lambda\) in the lecture slides and in most other ML literature) to a positive value if we want to use regularization.

If we set alpha=0, we get back the unregularized linear regression model.

# initialize the Ridge model with alpha = 1.0
# alpha in scikit is the same as lambda everywhere in ML
ridge_model = Ridge(alpha=1.0)

#ridge_model.fit()
#ridge_model.predict()

Let’s see what the largest weights are with and without ridge regularization:

deg = 5

poly_features = PolynomialFeatures(degree=deg)

# fit the transformer to the training data
poly_features.fit(X_train)

X_poly_train = poly_features.transform(X_train)
X_poly_test = poly_features.transform(X_test)

print("Degree", deg, "polynomial features:", X_poly_train.shape[1])

ridge_model = Ridge(alpha=1) # this is the same as LinearRegression

ridge_model.fit(X_poly_train, y_train)

# we can access the fitted weights using the `coef_` attribute
#print(ridge_model.coef_)

# np.abs() returns the absolute value of each element in the array
# np.max() returns the maximum value in the array   
np.max(np.abs(ridge_model.coef_))
Degree 5 polynomial features: 1287
np.float64(0.9134298471142629)

By convention, we typically try out regularization hyperparameters in the range of powers of 10, e.g:

\[ [10^{-3}, 10^{-2}, 10^{-1}, 1, 10, 10^2, 10^3] \]

There is a convenient np.logspace() function that generates values in this range:

alphas = np.logspace(-3, 3, 7)
print(alphas)
[1.e-03 1.e-02 1.e-01 1.e+00 1.e+01 1.e+02 1.e+03]

Below is starter code for training a ridge model and evaluating its train and test RMSE. Complete the code to:

  • Transform the training and test data using PolynomialFeatures

  • Iterate over the range of alphas

Experiment with higher values of the degree. Compare with folks around you what values of the regularization hyperparameter and degree give the best test RMSE, and submit your best test RMSE to PollEverywhere:

https://pollev.com/tliu

Discuss with folks around you: from what you’re seeing, is a higher or lower alpha needed to get most of the benefits of regularization?

degree = 3

poly_features = PolynomialFeatures(degree=degree) # TODO initialize the transformer

# TODO fit the transformer on the training data
poly_features.fit(X_train)

# TODO transform both the training and test data
X_poly_train = poly_features.transform(X_train)
X_poly_test = poly_features.transform(X_test)

print("Degree", degree, "polynomial features:", X_poly_train.shape[1])

# TODO iterate over alphas here
alphas = np.logspace(-3,3,7)

for alpha in alphas:

    # TODO initialize the model with the current alpha
    ridge_model = Ridge(alpha=alpha)

    # TODO fit the model on the polynomial training data
    ridge_model.fit(X_poly_train, y_train)

    # TODO compute the train and test RMSE
    train_rmse = rmse(ridge_model.predict(X_poly_train), y_train)
    test_rmse = rmse(ridge_model.predict(X_poly_test), y_test)

    print(f'  alpha={alpha:.4f} | Train: {train_rmse:.4f} | Test: {test_rmse:.4f}')
Degree 3 polynomial features: 165
  alpha=0.0010 | Train: 0.5248 | Test: 0.5578
  alpha=0.0100 | Train: 0.5248 | Test: 0.5578
  alpha=0.1000 | Train: 0.5248 | Test: 0.5578
  alpha=1.0000 | Train: 0.5249 | Test: 0.5575
  alpha=10.0000 | Train: 0.5264 | Test: 0.5560
  alpha=100.0000 | Train: 0.5413 | Test: 0.5686
  alpha=1000.0000 | Train: 0.6038 | Test: 0.6290