Activity 2#
2026-01-29
Part 1: NumPy and coding environment#
Jupyter notebooks allow us to interleave code, text, math equations, and plots all in a single document. The content is organized into cells, which can either be a Python code cell or a markdown cell. Markdown is a lightweight markup language that allows us to write text with formatting instructions.
Warning
Unlike typical Python scripts, Jupyter notebooks execute code on a per-cell basis. This means that:
you can run a cell multiple times
it is possible for a cell that is “below” another cell to use variables defined in the cell above it
Please be careful about this! A good practice is to make sure that the cells are always ordered from top to bottom.
You can double-click a cell to edit it. You can also run a cell by clicking the play button in the toolbar above, or by pressing Ctrl+Enter.
# standard way to import numpy
import numpy as np
# create a numpy array from a list
Below is a cell that creates a numpy array representing our housing data:
\(x_1\): sq ft |
\(x_2\): # bedrooms |
\(x_3\): yr built |
\(y\): price |
|---|---|---|---|
2500 |
3 |
1999 |
$500,000 |
4000 |
5 |
1950 |
$1,000,000 |
1000 |
2 |
1980 |
$250,000 |
900 |
2 |
2010 |
$300,000 |
# create a 2 dimensional numpy array representing our housing data
data = np.array([
# x1, x2, x3, y
[2500, 3, 1999, 500000],
[4000, 5, 1950, 1000000],
[1000, 2, 1980, 250000],
[900, 2, 2010, 300000]
])
# We can examine the "shape" of the array, which is the number of rows and columns
# To access a specific element, we use square bracket indexing for each dimension: row, column
# Example: access the cell that indicates year built: 1950
# We use the `:` operator to access a range of elements
# Example 1: access the first row
# Example 2: access the x2 column
# We can also access a range of rows and columns by specifying a "slice" index
# Syntax: start_index:end_index
# Example: Access the "X" portion of the data by selecting all rows and the first three columns
X = None
Checkpoint: What is line of code would we write to access the y column?
Your response: https://pollev.com/tliu
y = TODO
Part 2: Derivatives and LaTeX#
Juypter notebooks and markdown cells also support LaTeX, which is a powerful language for writing math equations. These sections are demarcated by dollar signs, which can be used to write inline equations like \(f(x) = x^2\).
We can also use LaTeX to write display equations using double dollar signs $$, which are centered and on a new line:
Tip
Double click a markdown cell in any of the assignments to see the LaTeX code that generates it.
Superscripts are generated using ^: $\(x^2\)$
Subscripts are generated using _: $\(x_1\)$
The command to generate the curly d for partial derivatives is \partial: $\(\partial\)$
The command for fractions is \frac{<numerator>}{<denominator>}: $\(\frac{1}{2}\)$
Checkpoint: replace the TODO in the latex cell below with the partial derivative of \(f\) with respect to \(x_1\):
Part 3: scikit-learn#
We’ll implement the “mean” regression model from scratch, having it extend the BaseEstimator class in scikit-learn.
from sklearn.base import BaseEstimator
# TODO In Python, we can inherit from a class by specifying the parent class in parentheses
class MeanRegressor:
"""Simple model that predicts the mean of the training data."""
# constructors in Python are defined using the `__init__` method
# A quirk of Python OOP: the first argument is always `self`, which refers to the object itself
def __init__(self):
pass
# fit method trains the model on the given data, and always takes X and y as arguments
def fit(self, X, y):
"""Fits the mean regressor to the training data.
Args:
X: the data examples of shape (n, p)
y: the answers vector of shape (n,)
Returns:
self: the fitted model
"""
# TODO store the mean prediction
# fitted model parameters are stored in `self` as instance variables and suffixed with `_`
# TODO As convention, sklearn fit() methods return self
# predict method makes predictions on new data, and always takes X as an argument
def predict(self, X):
"""Predicts the values for new points X.
This model will only predict the mean value of the fitted data for all new points.
Args:
X: the new points of shape (n_new, p)
Returns:
the predicted values of shape (n_new,)
"""
predictions = []
# TODO this loops over the rows of X
for x in X:
pass
# TODO return the mean prediction for all new points
# Create a new model
# Fit the model to the data
# Predict the value of a new point
X_new = np.array([
[2000, 3, 2015]
])