Activity 2#

2026-01-29

Part 1: NumPy and coding environment#

Jupyter notebooks allow us to interleave code, text, math equations, and plots all in a single document. The content is organized into cells, which can either be a Python code cell or a markdown cell. Markdown is a lightweight markup language that allows us to write text with formatting instructions.

Warning

Unlike typical Python scripts, Jupyter notebooks execute code on a per-cell basis. This means that:

  • you can run a cell multiple times

  • it is possible for a cell that is “below” another cell to use variables defined in the cell above it

Please be careful about this! A good practice is to make sure that the cells are always ordered from top to bottom.

You can double-click a cell to edit it. You can also run a cell by clicking the play button in the toolbar above, or by pressing Ctrl+Enter.

# standard way to import numpy
import numpy as np

# create a numpy array from a list

Below is a cell that creates a numpy array representing our housing data:

\(x_1\): sq ft

\(x_2\): # bedrooms

\(x_3\): yr built

\(y\): price

2500

3

1999

$500,000

4000

5

1950

$1,000,000

1000

2

1980

$250,000

900

2

2010

$300,000

# create a 2 dimensional numpy array representing our housing data
data = np.array([
    #  x1,  x2,   x3,  y
    [2500,   3, 1999, 500000],
    [4000,   5, 1950, 1000000],
    [1000,   2, 1980, 250000],
    [900,    2, 2010, 300000]
])
# We can examine the "shape" of the array, which is the number of rows and columns
# To access a specific element, we use square bracket indexing for each dimension: row, column
# Example: access the cell that indicates year built: 1950
# We use the `:` operator to access a range of elements
# Example 1: access the first row 
# Example 2: access the x2 column
# We can also access a range of rows and columns by specifying a "slice" index
# Syntax: start_index:end_index

# Example: Access the "X" portion of the data by selecting all rows and the first three columns
X = None

Checkpoint: What is line of code would we write to access the y column?

Your response: https://pollev.com/tliu

y = TODO

Part 2: Derivatives and LaTeX#

Juypter notebooks and markdown cells also support LaTeX, which is a powerful language for writing math equations. These sections are demarcated by dollar signs, which can be used to write inline equations like \(f(x) = x^2\).

We can also use LaTeX to write display equations using double dollar signs $$, which are centered and on a new line:

\[ f(x_1, x_2) = 4x_1^2 + 2x_2 + 1 \]

Tip

Double click a markdown cell in any of the assignments to see the LaTeX code that generates it.

Superscripts are generated using ^: $\(x^2\)$

Subscripts are generated using _: $\(x_1\)$

The command to generate the curly d for partial derivatives is \partial: $\(\partial\)$

The command for fractions is \frac{<numerator>}{<denominator>}: $\(\frac{1}{2}\)$

Checkpoint: replace the TODO in the latex cell below with the partial derivative of \(f\) with respect to \(x_1\):

\[ \frac{\partial}{\partial x_1} f(x_1, x_2) = TODO \]

Part 3: scikit-learn#

We’ll implement the “mean” regression model from scratch, having it extend the BaseEstimator class in scikit-learn.

from sklearn.base import BaseEstimator

# TODO In Python, we can inherit from a class by specifying the parent class in parentheses
class MeanRegressor:
    """Simple model that predicts the mean of the training data."""

    # constructors in Python are defined using the `__init__` method
    # A quirk of Python OOP: the first argument is always `self`, which refers to the object itself
    def __init__(self):
        pass


    # fit method trains the model on the given data, and always takes X and y as arguments
    def fit(self, X, y):
        """Fits the mean regressor to the training data.

        Args:
            X: the data examples of shape (n, p)
            y: the answers vector of shape (n,)

        Returns:
            self: the fitted model
        """

        # TODO store the mean prediction
        # fitted model parameters are stored in `self` as instance variables and suffixed with `_`


        # TODO As convention, sklearn fit() methods return self


    # predict method makes predictions on new data, and always takes X as an argument
    def predict(self, X):
        """Predicts the values for new points X.

        This model will only predict the mean value of the fitted data for all new points.

        Args:
            X: the new points of shape (n_new, p)

        Returns:
            the predicted values of shape (n_new,)
        """
        
        predictions = []

        # TODO this loops over the rows of X
        for x in X:
            pass 

        # TODO return the mean prediction for all new points
# Create a new model

# Fit the model to the data

# Predict the value of a new point
X_new = np.array([
    [2000, 3, 2015]
])