Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Course 3 notes and code #19

Merged
merged 5 commits into from
Nov 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions Course3/Notes/collab_filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Collaborative Filtering Algorithm

Learn both feature vector X and user parameter (linear regression) vector W and b.
Users (samples) that rated, i.e. have a parameter for a given feature, are kept track of in
an binar matrix R. Matrix Y are the ratings. Features X and parameters W and b must be learned collaboratively.

Features = X
User pars = w, b
R = mapping between users and movie ratings
Y = movie ratings

Y(movie, user) = R(movie, user) * (w(user) . x(movie) + b(user))
119 changes: 119 additions & 0 deletions Course3/Notes/content_filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Content based filtering

## Difference to collaborative filtering

Learning to match features instead of learning
from parameters on features.

So users have features, movies have features,
create vector for each feature set, predict user/movie
rating match. (Recommend movie to user or predict user score for movie).

No constant vector `b`.

`V_M . V_U`. Must calculate from feature vector.

### How to calculate V? Use deep learning (neural network NN)

NN output layer should not have single unit, but many
(unit per vector element) HOW MANY?? (idk, 32). Hidden layers can be any complexity, but output layers of `V_M`` and `V_U` must match!

Instead of dot product, simply take sigmoid etc. of
V_U and V_M, and find where g(V.V) = 1.

## Cost Function

```Latex
J = Sum (v_u(j) . v_m(i) - y(i,j)) + NN regularization.
```

Basically need labels Y, with existing movie/user ratings(matches).
Same cost function for NN for both vectors.

### Tips

To Find: Similar movies take L2 norm distance.
This can and should be pre-computed!
Now you have a similarity matrix. Movies are related like
a graph.

NN benefit realized: Allows easily integrating movie and
user NN by taking dot product of outer layer of each.
Really powerful!

The feature engineering is critical.

Algorithm as described is computational expensive to run,
need modifications to scale.

## Scale up Recommender system

Retrieval & Ranking

### Retrieval

Generate large list of plausible item candidates.

Use pre-computed `||Vm(k) - V_m(j) || ^2`

Find similar movies, most viewed 3 genres, top movies of
all times, top X movies in same country, etc.

### Ranking

Now we have small list of movies, rank them.
V_m can be pre-computed (since new users and user
feature values change way more often).
We only need to calculate V_u from pared retrieval
step, which is fast. Can be done on edge.

Retrieval step should be tuned using offline experiments
and A/B testing, etc.

## Ethics

Don't be evil. Don't be naive.
Think about goal. Think about bad actors.

Be transparent with users. Need to be careful with exploitative recommendations.

## Tensorflow Recommender Algorithm

Same as NN, Sequential model from keras

```Python
import tensorflow as tf

user_nn = tf.keras.models.Sequential([tf.keras.layers.Dense(..., activation='relu'), ...])
...

# add input layer
user_input = tf.keras.layers.Input(shape=(num_user_features))

vu = user_nn(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1) # normalize the L2 norm, Yo!
# Repeat for item/movie
vm = ...

# Keras dot product layer
tf.keras.layers.Dot(axes=1)([vu, vm])

# Use simple MSE for loss
cost_fn = tf.keras.losses.MeanSquaredError # I guess, idk.
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

# Training model using keras api.
n_iterations = 30
model = tf.keras.Model([input_user, input_item], output)
model.compile(optimizer=optimizer, loss=cost_fn)
model.fit([user_train, item_train], y_train, epochs=n_iterations)
```

### Lab

Using sklearn StandardScaler for user but MinMaxScaler for target. Not clear why. Uses `inverse_transform` of scaler to get back originals. Ready-made `test_train_split` for the split with a 20% test.

Based on the fact that test loss is similar to training
loss, we infer that model has not substantially overfit.
(Weird to not use CV set, but model params and parts were
just given, so no need.)
11 changes: 11 additions & 0 deletions Course3/Notes/pca.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# PCA

Each pc is projection that "explains" maximum variance.
Used to be good for dimensionality reduction and compression
, especiall during training or feature selection,
but nowadays mainly used for visualization in AI/ML.

Eigen vector and eigen value for deeper understanding.

Just use sklearn.
I published a paper on this so need for more.
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -66,4 +66,7 @@ exclude = '''
| htmlcov
| .coverage
)/
'''
'''

[tool.mypy]
plugins = "numpy.typing.mypy_plugin"
31 changes: 31 additions & 0 deletions rawsight/recommender/collaborative_filtering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
import numpy as np
import numpy.typing as npt


def cofi_cost_func(
X: npt.NDArray[np.number],
W: npt.NDArray[np.number],
b: npt.NDArray[np.number],
Y: npt.NDArray[np.number],
R: npt.NDArray[np.number],
lam: float,
) -> float:
"""Return cost with regularization using numpy for collaborative learning
Args:
X np(num_feature_samples, num_features)): matrix of feature samples
W np(num_parameter_samples, num_features)) : matrix of parameter samples
b np(1, num_parameter_samples) : constant parameter vector per param sample.
Y np(num_feature_samples,num_parameter_samples) : matrix of pars per feature sample
R np(num_feature_samples,num_parameter_samples) : R(i, j) = 1 if feature sample has parameters.
lam (float): regularization parameter

Simples example is X features of movies and W is features of user ratings (for movies)
Y is matrix of user ratings for each movie and R just records if a user rated a movie.
"""
# Regularization is simple and applies to all values
regularization: float = (np.sum(W**2) + np.sum(X**2)) * (lam / 2)

# Linear regression analog vectorized implementation.
cost: float = np.sum((R * (np.dot(X, W.T) + b - Y)) ** 2) / 2

return cost + regularization