HW 1 part 2

HW 1 part 2#

Linear Regression: Models

—TODO your name here

Collaboration Statement

TODO brief statement on the nature of your collaboration.
TODO your collaborator’s names here

Part 2 Table of Contents and Rubric#

Section	Points
Linear Regression Implementation	1.5
Analysis	2
Ethics	1.5
Reflection	0.5
Total	5.5 pts

Notebook and function imports#

Tip

If you click on the vertical blue bar on the left of a cell, you can collapse the code which can help organize the notebook as you work through the project.

If you have tested your implementation in Part 1 against the autograder, you would have generated a file called hw1_foundations.py. Let’s now import those functions into this notebook for use in Part 2.

If you are running this notebook on the JupyterHub allocated for the course:

Open the file browser by going to the menu bar “View -> File Browser”
Navigate to comsc335.github.io/hws/, you should see your hw1_models.ipynb file in that folder
Click on the upload button in the upper right and upload the hw1_foundations.py file to this directory
Run the following cell below to import the functions.

import numpy as np
from sklearn.base import BaseEstimator
from typing import Self
rng = np.random.RandomState(42)

# import your functions from Part 1 
from hw1_foundations import linreg_grad_descent, mse_loss

3. Linear regression model class [1.5 pts]#

Let’s put the gradient descent implementation together and create a linear regression model class. Following the same pattern as we did in Worksheet 1 and Activity 4, we’ll create a class that inherits from scikit-learn’s BaseEstimator and implements the fit and predict methods.

ML model class documentation

As Guido van Rossum, the creator of Python, likes to say:

Code is read much more often than it is written.

Part of growing as a computer scientist or data scientist is to be able to effectively communicate your implementation to others.

As such, a part of the homework assignments will be completing the documentation of the machine learning model classes you implement in this course.

Please make sure that you document every method parameter, and provide descriptions of the class and its methods.

The docstrings shown in the methods in Part 1 and in Worksheet 1 follow the Google Python Style Guide, and you can use them as examples for your own documentation.

class MHCLinearRegressor(BaseEstimator):
    """TODO class description"""
    
    def __init__(self, alpha: float, max_iters: int=5000):
        """TODO constructor description"""
        
        # TODO initialize the hyperparameters of alpha and max_iters
        pass

    def fit(self, X: np.ndarray, y: np.ndarray) -> Self:
        """TODO method description"""

        # TODO save the fitted weights, loss values, and weight history by
        # calling the linreg_grad_descent function with the correct parameters
        self.weights_, self.loss_values_, self.w_history_ = None, None, None

    def predict(self, X: np.ndarray) -> np.ndarray:
        """TODO method description"""

        # TODO use the fitted weights to make predictions on the input data
        pass

if __name__ == "__main__":
    # Initialize some simple data for testing, n=3, p=2
    X = np.array([[1, 2], 
                  [2, 3], 
                  [3, 3]])
    y = np.array([0, 1, 2])
    alpha = 0.05

    # Test the linear regression model
    model = MHCLinearRegressor(alpha=0.05)
    model = model.fit(X, y)
    assert model is not None, "The model should be fitted and returned in fit()"
    assert np.allclose(model.predict(X), y, atol=1e-3), "The predictions should be equal to the y targets"

4. Analysis [2 pts]#

4.1 Gradient descent alpha simulation [0.5 pts]#

You’ll now use your newly implemented MHCLinearRegressor class to explore the effect of the learning rate \(\alpha\) on gradient descent convergence.

Run the code cell below to see an interactive plot of the gradient descent algorithm path for fitting your MHCLinearRegressor model with different values of \(\alpha\). The plot shows the contour plot along with how the weights update at each iteration (represented by the smaller white circles), with the title showing the final MSE loss and the number of iterations needed to converge.

if __name__ == "__main__":
    import sys; sys.path.insert(0, '..')
    from utils import explore_alpha
    import ipywidgets as widgets

    widgets.interact_manual(explore_alpha, 
        # Tells the widget to use the MHCLinearRegressor class
        LinModel=widgets.fixed(MHCLinearRegressor), 
        # Creates an interactive slider for the learning rate
        alpha=widgets.FloatSlider(value=0.1, min=0.1, max=0.66, step=0.05)
    );

4.1.1: Increase the learning rate by increments of 0.05 from 0.1 to 0.65. Comment on what you see in both the number of iterations needed to converge as well as the “shape” of the gradient descent path as \(\alpha\) increases.

4.1.2: Now, slide the learning rate all the way to 0.66. What do you observe in the ending loss value and the start and end points of the gradient descent path?

4.1.3: Summarize the tradeoffs you see between setting a high vs low \(\alpha\), and propose a potential strategy for picking \(\alpha\) in practice (It’s okay to speculate here as long as you provide a rationale, as this is an open-ended question).

TODO your responses:

4.1.1:

4.1.2:

4.1.3:

4.2 Bikeshare regression and feature selection [1 pts]#

Let’s now conduct a regression prediction task using bikeshare information. We can imagine that we’re machine learning engineers working for a bikeshare company, where our goal is to predict the number of daily bike rentals \(y\) to better model demand and plan for maintenance. Each row of the dataset contains weather and time-based information on a given day:

Predictors:

workingday: Is it a work day? Yes=1, No=0.
temp: Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39.
hum: Normalized humidity. The values are divided to 100 (max).
windspeed: Normalized wind speed. The values are divided by 67 (max).

Target:

bikers: the number of bike rentals in a day.

Normalized features

As a data preprocessing step, the dataset creators normalized the weather features, which means that all of the values are scaled to be between 0 and 1. This aids in the model fitting process and with the interpretation of the coefficients, as all of the features are in the same range. We will discuss data preprocessing principles in more detail in upcoming classes.

We’ll first perform a feature selection task. Our goal in machine learning is to fit a model on the training data so that it has low loss on new data, which indicates that model generalizes well to unseen data. The new data is often called the test set. Since our MHCLinearRegressor model can take data with two features, we’ll have to select two of the four features to use in our model. We have provided starter code below that helps load in and create the X_train_all, X_test_all, y_train, and y_test NumPy arrays. It then iterates over all possible combinations of two features to include in the model.

Complete the code below that:

Trains a MHCLinearRegressor model on the selected training data, using alpha=0.05.
Predicts the testing data using the trained model.
Computes the loss of the model on the testing data, and extracts the weights of the model.

Which two features achieves the lowest test set MSE loss?

Lowest test MSE loss: TODO
Features used: TODO

if __name__ == "__main__":
    # Loads in the bikeshare data
    import os
    data = np.load(os.path.expanduser("~/COMSC-335/data/bikeshare_hw1.npz"), allow_pickle=True)

    X_train_all = data["X_train"]   # shape (n_train, 4) for all 4 features
    y_train = data["y_train"]       # shape (n_train,)
    X_test_all = data["X_test"]     # shape (n_test, 4) for all 4 features
    y_test = data["y_test"]         # shape (n_test,)

    # Feature names: ["workingday", "temp", "hum", "windspeed"]
    feature_names = data["feature_names"].tolist()  

    # Maps the feature names to their indices
    name_to_idx = {name: i for i, name in enumerate(feature_names)}

    # All possible pairs of features
    pairs = [
        ("workingday", "temp"),
        ("workingday", "hum"),
        ("workingday", "windspeed"),
        ("temp", "hum"),
        ("temp", "windspeed"),
        ("hum", "windspeed"),
    ]

    for feat1, feat2 in pairs:
        # Select the columns corresponding to the two features
        i, j = name_to_idx[feat1], name_to_idx[feat2]

        # Create the training and testing data
        X_train = X_train_all[:, [i, j]]
        X_test = X_test_all[:, [i, j]]

        # TODO initialize the model
        model = None

        # TODO train the model using X_train and y_train


        # TODO predict on X_test


        # TODO compute the loss of the model compared to y_test using your mse_loss function
        test_mse = 0

        # TODO extract the weights from the trained model object
        w0, w1, w2 = None

        # Prints the results for each feature pair
        print(f"Model: ({feat1}, {feat2})")
        print(f"\ttest MSE: {test_mse:.3f}")
        print(f"\tweights: {feat1}={w1:.3f}, {feat2}={w2:.3f}")
        print()

4.3 Bikeshare interpretation [0.5 pts]#

As we discussed in class, we sometimes take the square root of the MSE (abbreviated as RMSE for Root Mean Square Error) to better interpret the model performance. By taking the square root, we convert the units of the loss back to the original units of the target variable. Compute the RMSE of the best model from the previous question, and provide an interpretation of the model’s performance in words.

TODO your response here:

Let’s also interpret the fitted weights of the best model from above. For both of the two feature weights, when the feature increases, does it increase or decrease the predicted number of bike rentals? Give a brief explanation of whether you think the signs of the weights make intuitive sense.

TODO feature 1:
TODO feature 2:

Click to check RMSE range

You should see that the model with the lowest test RMSE loss should be approximately in the ~100-120 range.

5. “Data Neutrality” Ethics [1.5 pts]#

Alongside exploring the technical foundations of machine learning, we will also consider the ethics of machine learning within socio-technical systems. Here, we will examine the notion of “data neutrality.”

Prof. Catherine D’Ignazio interview on data neutrality

Read “Data is never a raw, truthful input, and it is never neutral” and answer the discussion questions below.

5.1: When the author says “data is never neutral,” what do they mean?

5.2: What are “who questions”? Why does the author advocate for using them?

5.3: Discuss who you believe has the responsibility to ask the “who questions.” Is it the person who funds the research? The researcher or engineer who collected the data? The data scientist or computer scientist who analyzed it? Other parties?

5.4: The creators of the bikeshare dataset included the following description:

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

The core data set is related to the two-year historical log corresponding to years 2011 and 2012 from Capital Bikeshare system, Washington D.C., USA which is publicly available in http://capitalbikeshare.com/system-data. We aggregated the data on a daily basis and then extracted and added the corresponding weather and seasonal information. Weather information are extracted from http://www.freemeteo.com.

Also note that Lyft bought the parent company of Capital Bikeshare in 2018. Building on the interview article, reflect on who benefits from using this dataset and a machine learning model that predicts bike rentals to guide decisions (e.g. where new stations go, how many bikes to stock, pricing changes, etc.) as well as who could be potentially harmed or overlooked, and why. Discuss these in brief paragraphs (~2-3 sentences) each.

TODO your responses:

5.1:

5.2:

5.3:

5.4:

Potential benefits:
Potential harms:

6. Reflection [0.5 pts]#

How much time did you spend on this assignment?
Were there any parts of the assignment that you found particularly challenging?
What is one thing you have a better understanding of after completing this assignment and going through the class content?
Do you have any follow-up questions about concepts that you’d like to explore further?
Indicate the number of late days (if any) you are using for this assignment.

TODO your responses here:

6.1:

6.2:

6.3:

6.4:

6.5:

How to submit

Like Worksheet 1, follow the instructions on the course website to submit your work. For all of Homework 1, your submission will include the files from both parts:

hw1_foundations.ipynb and hw1_foundations.py
hw1_models.ipynb and hw1_models.py

Acknowledgements#

The bikeshare dataset is sourced from the ISLP repository
The data ethics exercise is sourced from Yaniv Yacoby’s Probabilistic Foundations of ML course