Activity 2 Live#

2026-01-29

Part 1: NumPy and coding environment#

Jupyter notebooks allow us to interleave code, text, math equations, and plots all in a single document. The content is organized into cells, which can either be a Python code cell or a markdown cell. Markdown is a lightweight markup language that allows us to write text with formatting instructions.

Warning

Unlike typical Python scripts, Jupyter notebooks execute code on a per-cell basis. This means that:

  • you can run a cell multiple times

  • it is possible for a cell that is “below” another cell to use variables defined in the cell above it

Please be careful about this! A good practice is to make sure that the cells are always ordered from top to bottom.

You can double-click a cell to edit it. You can also run a cell by clicking the play button in the toolbar above, or by pressing Ctrl+Enter.

print("hello, world!")
hello, world!
# standard way to import numpy
import numpy as np

# create a numpy array from a list
a = np.array([1,2,3,4])
print(a)
type(a)
[1 2 3 4]
numpy.ndarray

Below is a cell that creates a numpy array representing our housing data:

\(x_1\): sq ft

\(x_2\): # bedrooms

\(x_3\): yr built

\(y\): price

2500

3

1999

$500,000

4000

5

1950

$1,000,000

1000

2

1980

$250,000

900

2

2010

$300,000

# create a 2 dimensional numpy array representing our housing data
data = np.array([
    #  x1,  x2,   x3,  y
    [2500,   3, 1999, 500000],
    [4000,   5, 1950, 1000000],
    [1000,   2, 1980, 250000],
    [900,    2, 2010, 300000]
])
# We can examine the "shape" of the array, which is the number of rows and columns
data.shape[0]
4
# To access a specific element, we use square bracket indexing for each dimension: row, column
# Example: access the cell that indicates year built: 1950
data[1, 2]
np.int64(1950)
# We use the `:` operator to access a range of elements
# Example 1: access the first row 
data[0, :]
array([  2500,      3,   1999, 500000])
# Example 2: access the x2 column
data[:, 1]
array([3, 5, 2, 2])
# We can also access a range of rows and columns by specifying a "slice" index
# Syntax: start_index:end_index

# Example: Access the "X" portion of the data by selecting all rows and the first three columns
X = data[:, 0:3]
X
array([[2500,    3, 1999],
       [4000,    5, 1950],
       [1000,    2, 1980],
       [ 900,    2, 2010]])

Checkpoint: What is line of code would we write to access the y column?

Your response: https://pollev.com/tliu

y = data[:, 3]
y
array([ 500000, 1000000,  250000,  300000])

Part 2: Derivatives and LaTeX#

Juypter notebooks and markdown cells also support LaTeX, which is a powerful language for writing math equations. These sections are demarcated by dollar signs, which can be used to write inline equations like \(f(x) = x^2\).

We can also use LaTeX to write display equations using double dollar signs $$, which are centered and on a new line:

\[ f(x_1, x_2) = 4x_1^2 + 2x_2 + 1 \]

Tip

Double click a markdown cell in any of the assignments to see the LaTeX code that generates it.

Superscripts are generated using ^: $\(x^2\)$

Subscripts are generated using _: $\(x_1\)$

The command to generate the curly d for partial derivatives is \partial: $\(\partial\)$

The command for fractions is \frac{<numerator>}{<denominator>}: $\(\frac{1}{2}\)$

Checkpoint: replace the TODO in the latex cell below with the partial derivative of \(f\) with respect to \(x_1\):

\[ \frac{\partial}{\partial x_1} f(x_1, x_2) = 8x_1 \]

Part 3: scikit-learn#

We’ll implement the “mean” regression model from scratch, having it extend the BaseEstimator class in scikit-learn.

from sklearn.base import BaseEstimator

# TODO In Python, we can inherit from a class by specifying the parent class in parentheses
class MeanRegressor(BaseEstimator):
    """Simple model that predicts the mean of the training data."""

    # constructors in Python are defined using the `__init__` method
    # A quirk of Python OOP: the first argument is always `self`, which refers to the object itself
    def __init__(self):
        pass


    # fit method trains the model on the given data, and always takes X and y as arguments
    def fit(self, X, y):
        """Fits the mean regressor to the training data.

        Args:
            X: the data examples of shape (n, p)
            y: the answers vector of shape (n,)

        Returns:
            self: the fitted model
        """

        # TODO store the mean prediction
        # fitted model parameters are stored in `self` as instance variables and suffixed with `_`
        self.mean_ = np.mean(y)

        # TODO As convention, sklearn fit() methods return self
        return self

    # predict method makes predictions on new data, and always takes X as an argument
    def predict(self, X):
        """Predicts the values for new points X.

        This model will only predict the mean value of the fitted data for all new points.

        Args:
            X: the new points of shape (n_new, p)

        Returns:
            the predicted values of shape (n_new,)
        """
        
        predictions = []

        # TODO this loops over the rows of X
        for x in X:
            predictions.append(self.mean_) 

        # TODO return the mean prediction for all new points
        return np.array(predictions)
# Create a new model
model = MeanRegressor()

# Fit the model to the data
model.fit(X, y)

# Predict the value of a new point
X_new = np.array([
    [2000, 1, 2015]
])

model.predict(X_new)
array([512500.])