Collaborative Filtering with Embeddings

Most online ecommerce website use some kind of Recommendation engies to predict what prodcts the user would likely purchase and thus derive sales. They leverage the behavior of their previous customers: navigation, viewing, shopping history to deliver better recommendations. Collaborative filtering is a basic model for recommendation, such model is build with the assumtion that people like things similar to other things they like (if they like orange they will probably like oragne juice). Also people with similar taste would like same things.

There are different algorithms for collaborative filtering, the following implements Matrix factorization. The products of the factorizations gives the user-item ratings matrix. Then, gradient descent is used to find optimal solution (i.e. best factorization).

Data

In the following, the movie ratings dataset from Grouplens MovieLens is used. First download the data, un-compressed and have a look to the different files

The ratings.csv file contains ratings, it has 20 million ratings on 27,000 movies by 138,000 users.

In the user-item matrix, in a every cell (i, j) we will have the rating of user i on the movie j. A look into the first few rows:

The following pictures depicts the distribution of ratingsâ€™ mean per movie:

Model

This Base model for callaborative filtering (as depicted in the picture below - source), will try to learn user-item matrix using embeddings (i.e. a matrix of weights) for users and items, the dot product should give the rating matrix. When defining the embeddings, e.g. user_embed: the number of words in vocab is the number of users we have, and the number of factors represent the dimensional embeddings.

The model also try to learn bias by user and by movie (there are movies that too many people would like or hate), and there are users who likes (or hates) every movie. Then, it applies a sigmoid function to get a probability (a value between 0 and 1), which later is scaled to the appropriate ratings and get the predicted ratings.

The model loss function is simply an Mean Squared Error (MSE), and Gradient descent (or similar) algo can be used to find optimal weights.

Here is a full Keras-based implementation:

Training

The previous snippets are grouped together into a helper class for parsing Reuters dataset.

After trainning, print the history of losses and accuracy both available in the history variable.

The full jypiter notebook can be found here - link.