Activity 2

Activity 2#

2026-01-29

Part 1: NumPy and coding environment#

Jupyter notebooks allow us to interleave code, text, math equations, and plots all in a single document. The content is organized into cells, which can either be a Python code cell or a markdown cell. Markdown is a lightweight markup language that allows us to write text with formatting instructions.

Warning

Unlike typical Python scripts, Jupyter notebooks execute code on a per-cell basis. This means that:

you can run a cell multiple times
it is possible for a cell that is “below” another cell to use variables defined in the cell above it

Please be careful about this! A good practice is to make sure that the cells are always ordered from top to bottom.

You can double-click a cell to edit it. You can also run a cell by clicking the play button in the toolbar above, or by pressing Ctrl+Enter.

# standard way to import numpy
import numpy as np

# create a numpy array from a list

Below is a cell that creates a numpy array representing our housing data:

$x_1$: sq ft	$x_2$: # bedrooms	$x_3$: yr built	$y$: price
2500	3	1999	$500,000
4000	5	1950	$1,000,000
1000	2	1980	$250,000
900	2	2010	$300,000

# create a 2 dimensional numpy array representing our housing data
data = np.array([
    #  x1,  x2,   x3,  y
    [2500,   3, 1999, 500000],
    [4000,   5, 1950, 1000000],
    [1000,   2, 1980, 250000],
    [900,    2, 2010, 300000]
])

# We can examine the "shape" of the array, which is the number of rows and columns

# To access a specific element, we use square bracket indexing for each dimension: row, column
# Example: access the cell that indicates year built: 1950

# We use the `:` operator to access a range of elements
# Example 1: access the first row 

# Example 2: access the x2 column

# We can also access a range of rows and columns by specifying a "slice" index
# Syntax: start_index:end_index

# Example: Access the "X" portion of the data by selecting all rows and the first three columns
X = None

Checkpoint: What is line of code would we write to access the y column?

Your response: https://pollev.com/tliu

y = TODO

Part 2: Derivatives and LaTeX#

Juypter notebooks and markdown cells also support LaTeX, which is a powerful language for writing math equations. These sections are demarcated by dollar signs, which can be used to write inline equations like $f(x) = x^2$.

We can also use LaTeX to write display equations using double dollar signs $$, which are centered and on a new line:

\[ f(x_1, x_2) = 4x_1^2 + 2x_2 + 1 \]

Tip

Double click a markdown cell in any of the assignments to see the LaTeX code that generates it.

Superscripts are generated using ^: $$x^2$$

Subscripts are generated using _: $$x_1$$

The command to generate the curly d for partial derivatives is \partial: $$\partial$$

The command for fractions is \frac{<numerator>}{<denominator>}: $$\frac{1}{2}$$

Checkpoint: replace the TODO in the latex cell below with the partial derivative of $f$ with respect to $x_1$:

\[ \frac{\partial}{\partial x_1} f(x_1, x_2) = TODO \]

Part 3: scikit-learn#

We’ll implement the “mean” regression model from scratch, having it extend the BaseEstimator class in scikit-learn.

from sklearn.base import BaseEstimator

# TODO In Python, we can inherit from a class by specifying the parent class in parentheses
class MeanRegressor:
    """Simple model that predicts the mean of the training data."""

    # constructors in Python are defined using the `__init__` method
    # A quirk of Python OOP: the first argument is always `self`, which refers to the object itself
    def __init__(self):
        pass


    # fit method trains the model on the given data, and always takes X and y as arguments
    def fit(self, X, y):
        """Fits the mean regressor to the training data.

        Args:
            X: the data examples of shape (n, p)
            y: the answers vector of shape (n,)

        Returns:
            self: the fitted model
        """

        # TODO store the mean prediction
        # fitted model parameters are stored in `self` as instance variables and suffixed with `_`


        # TODO As convention, sklearn fit() methods return self


    # predict method makes predictions on new data, and always takes X as an argument
    def predict(self, X):
        """Predicts the values for new points X.

        This model will only predict the mean value of the fitted data for all new points.

        Args:
            X: the new points of shape (n_new, p)

        Returns:
            the predicted values of shape (n_new,)
        """
        
        predictions = []

        # TODO this loops over the rows of X
        for x in X:
            pass 

        # TODO return the mean prediction for all new points

# Create a new model

# Fit the model to the data

# Predict the value of a new point
X_new = np.array([
    [2000, 3, 2015]
])