Evaluation module#

Python evaluation#

exception recommenders.evaluation.python_evaluation.ColumnMismatchError[source]#

Exception raised when there is a mismatch in columns.

This exception is raised when an operation involving columns encounters a mismatch or inconsistency.

message#

Explanation of the error.

Type:

str

exception recommenders.evaluation.python_evaluation.ColumnTypeMismatchError[source]#

Exception raised when there is a mismatch in column types.

This exception is raised when an operation involving column types encounters a mismatch or inconsistency.

message#

Explanation of the error.

Type:

str

recommenders.evaluation.python_evaluation.auc(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate the Area-Under-Curve metric for implicit feedback typed recommender, where rating is binary and prediction is float number ranging from 0 to 1.

https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve

Note

The evaluation does not require a leave-one-out scenario. This metric does not calculate group-based AUC which considers the AUC scores averaged across users. It is also not limited to k. Instead, it calculates the scores on the entire prediction results regardless the users.

Parameters:
  • rating_true (pandas.DataFrame) – True data

  • rating_pred (pandas.DataFrame) – Predicted data

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

auc_score (min=0, max=1)

Return type:

float

recommenders.evaluation.python_evaluation.catalog_coverage(train_df, reco_df, col_user='userID', col_item='itemID')[source]#

Calculate catalog coverage for recommendations across all users. The metric definition is based on the “catalog coverage” definition in the following reference:

Citation:

G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

Returns:

catalog coverage

Return type:

float

recommenders.evaluation.python_evaluation.distributional_coverage(train_df, reco_df, col_user='userID', col_item='itemID')[source]#

Calculate distributional coverage for recommendations across all users. The metric definition is based on formula (21) in the following reference:

Citation:

G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

Returns:

distributional coverage

Return type:

float

recommenders.evaluation.python_evaluation.diversity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#

Calculate average diversity of recommendations across all users.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_item_features (str) – item feature column name.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_sim (str) – This column indicates the column name for item similarity.

  • col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.

Returns:

diversity.

Return type:

float

recommenders.evaluation.python_evaluation.exp_var(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate explained variance.

Parameters:
  • rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs

  • rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

Explained variance (min=0, max=1).

Return type:

float

recommenders.evaluation.python_evaluation.get_top_k_items(dataframe, col_user='userID', col_rating='rating', k=10)[source]#

Get the input customer-item-rating tuple in the format of Pandas DataFrame, output a Pandas DataFrame in the dense format of top k items for each user.

Note

If it is implicit rating, just append a column of constants to be ratings.

Parameters:
  • dataframe (pandas.DataFrame) – DataFrame of rating data (in the format

  • customerID-itemID-rating)

  • col_user (str) – column name for user

  • col_rating (str) – column name for rating

  • k (int or None) – number of items for each user; None means that the input has already been

  • again. (filtered out top k items and sorted by ratings and there is no need to do that)

Returns:

DataFrame of top k items for each user, sorted by col_user and rank

Return type:

pandas.DataFrame

recommenders.evaluation.python_evaluation.historical_item_novelty(train_df, reco_df, col_user='userID', col_item='itemID')[source]#

Calculate novelty for each item. Novelty is computed as the minus logarithm of (number of interactions with item / total number of interactions). The definition of the metric is based on the following reference using the choice model (eqs. 1 and 6):

Citation:

P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011

The novelty of an item can be defined relative to a set of observed events on the set of all items. These can be events of user choice (item “is picked” by a random user) or user discovery (item “is known” to a random user). The above definition of novelty reflects a factor of item popularity. High novelty values correspond to long-tail items in the density function, that few users have interacted with and low novelty values correspond to popular head items.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

Returns:

A dataframe with the following columns: col_item, item_novelty.

Return type:

pandas.DataFrame

recommenders.evaluation.python_evaluation.logloss(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate the logloss metric for implicit feedback typed recommender, where rating is binary and prediction is float number ranging from 0 to 1.

https://en.wikipedia.org/wiki/Loss_functions_for_classification#Cross_entropy_loss_(Log_Loss)

Parameters:
  • rating_true (pandas.DataFrame) – True data

  • rating_pred (pandas.DataFrame) – Predicted data

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

log_loss_score (min=-inf, max=inf)

Return type:

float

recommenders.evaluation.python_evaluation.mae(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate Mean Absolute Error.

Parameters:
  • rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs

  • rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

Mean Absolute Error.

Return type:

float

recommenders.evaluation.python_evaluation.map(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#

Mean Average Precision for top k prediction items

The implementation of MAP is referenced from Spark MLlib evaluation metrics. https://spark.apache.org/docs/2.3.0/mllib-evaluation-metrics.html#ranking-systems

A good reference can be found at: http://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

Note

The MAP is meant to calculate Avg. Precision for the relevant items, so it is normalized by the number of relevant items in the ground truth data, instead of k.

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

Returns:

MAP (min=0, max=1)

Return type:

float

recommenders.evaluation.python_evaluation.map_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#

Mean Average Precision at k

The implementation of MAP@k is referenced from Spark MLlib evaluation metrics. apache/spark

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

Returns:

MAP@k (min=0, max=1)

Return type:

float

recommenders.evaluation.python_evaluation.merge_ranking_true_pred(rating_true, rating_pred, col_user, col_item, col_prediction, relevancy_method, k=10, threshold=10, **_)[source]#

Filter truth and prediction data frames on common users

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user (optional)

  • threshold (float) – threshold of top items per user (optional)

Returns:

DataFrame of recommendation hits, sorted by col_user and rank DataFrame of hit counts vs actual relevant items per user number of unique user ids

Return type:

pandas.DataFrame, pandas.DataFrame, int

recommenders.evaluation.python_evaluation.merge_rating_true_pred(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Join truth and prediction data frames on userID and itemID and return the true and predicted rated with the correct index.

Parameters:
  • rating_true (pandas.DataFrame) – True data

  • rating_pred (pandas.DataFrame) – Predicted data

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

Array with the true ratings numpy.ndarray: Array with the predicted ratings

Return type:

numpy.ndarray

recommenders.evaluation.python_evaluation.ndcg_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, score_type='binary', discfun_type='loge', **_)[source]#

Normalized Discounted Cumulative Gain (nDCG).

Info: https://en.wikipedia.org/wiki/Discounted_cumulative_gain

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

  • score_type (str) – type of relevance scores [‘binary’, ‘raw’, ‘exp’]. With the default option ‘binary’, the relevance score is reduced to either 1 (hit) or 0 (miss). Option ‘raw’ uses the raw relevance score. Option ‘exp’ uses (2 ** RAW_RELEVANCE - 1) as the relevance score

  • discfun_type (str) – type of discount function [‘loge’, ‘log2’] used to calculate DCG.

Returns:

nDCG at k (min=0, max=1).

Return type:

float

recommenders.evaluation.python_evaluation.novelty(train_df, reco_df, col_user='userID', col_item='itemID')[source]#

Calculate the average novelty in a list of recommended items (this assumes that the recommendation list is already computed). Follows section 5 from

Citation:

P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

Returns:

novelty.

Return type:

float

recommenders.evaluation.python_evaluation.precision_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#

Precision at K.

Note

We use the same formula to calculate precision@k as that in Spark. More details can be found at http://spark.apache.org/docs/2.1.1/api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RankingMetrics.precisionAt In particular, the maximum achievable precision may be < 1, if the number of items for a user in rating_pred is less than k.

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

Returns:

precision at k (min=0, max=1)

Return type:

float

recommenders.evaluation.python_evaluation.r_precision_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#

R-precision at K.

R-precision can be defined as the precision@R for each user, where R is the numer of relevant items for the query. Its also equivalent to the recall at the R-th position.

Note

As R can be high, in this case, the k indicates the maximum possible R. If every user has more than k true items, then r-precision@k is equal to precision@k. You might need to raise the k value to get meaningful results.

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

Returns:

recall at k (min=0, max=1). The maximum value is 1 even when fewer than k items exist for a user in rating_true.

Return type:

float

recommenders.evaluation.python_evaluation.recall_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#

Recall at K.

Parameters:
  • rating_true (pandas.DataFrame) – True DataFrame

  • rating_pred (pandas.DataFrame) – Predicted DataFrame

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_prediction (str) – column name for prediction

  • relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.

  • k (int) – number of top k items per user

  • threshold (float) – threshold of top items per user (optional)

Returns:

recall at k (min=0, max=1). The maximum value is 1 even when fewer than k items exist for a user in rating_true.

Return type:

float

recommenders.evaluation.python_evaluation.rmse(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate Root Mean Squared Error

Parameters:
  • rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs

  • rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

Root mean squared error

Return type:

float

recommenders.evaluation.python_evaluation.rsquared(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Calculate R squared

Parameters:
  • rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs

  • rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs

  • col_user (str) – column name for user

  • col_item (str) – column name for item

  • col_rating (str) – column name for rating

  • col_prediction (str) – column name for prediction

Returns:

R squared (min=0, max=1).

Return type:

float

recommenders.evaluation.python_evaluation.serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#

Calculate average serendipity for recommendations across all users.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_item_features (str) – item feature column name.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_sim (str) – This column indicates the column name for item similarity.

  • col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.

Returns:

serendipity.

Return type:

float

recommenders.evaluation.python_evaluation.user_diversity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#

Calculate average diversity of recommendations for each user. The metric definition is based on formula (3) in the following reference:

Citation:

Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_item_features (str) – item feature column name.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_sim (str) – This column indicates the column name for item similarity.

  • col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.

Returns:

A dataframe with the following columns: col_user, user_diversity.

Return type:

pandas.DataFrame

recommenders.evaluation.python_evaluation.user_item_serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#

Calculate serendipity of each item in the recommendations for each user. The metric definition is based on the following references:

Citation:

Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012

Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_item_features (str) – item feature column name.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_sim (str) – This column indicates the column name for item similarity.

  • col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.

Returns:

A dataframe with columns: col_user, col_item, user_item_serendipity.

Return type:

pandas.DataFrame

recommenders.evaluation.python_evaluation.user_serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#

Calculate average serendipity for each user’s recommendations.

Parameters:
  • train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.

  • reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_item_features (str) – item feature column name.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_sim (str) – This column indicates the column name for item similarity.

  • col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.

Returns:

A dataframe with following columns: col_user, user_serendipity.

Return type:

pandas.DataFrame

PySpark evaluation#

class recommenders.evaluation.spark_evaluation.SparkDiversityEvaluation(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_user='userID', col_item='itemID', col_relevance=None)[source]#

Spark Evaluator for diversity, coverage, novelty, serendipity

__init__(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_user='userID', col_item='itemID', col_relevance=None)[source]#

Initializer.

This is the Spark version of diversity metrics evaluator. The methods of this class calculate the following diversity metrics:

  • Coverage - it includes two metrics:
    1. catalog_coverage, which measures the proportion of items that get recommended from the item catalog;

    2. distributional_coverage, which measures how unequally different items are recommended in the recommendations to all users.

  • Novelty - A more novel item indicates it is less popular, i.e. it gets recommended less frequently.

  • Diversity - The dissimilarity of items being recommended.

  • Serendipity - The “unusualness” or “surprise” of recommendations to a user. When ‘col_relevance’ is used,

    it indicates how “pleasant surprise” of recommendations is to a user.

The metric definitions/formulations are based on the following references with modification:

Citation:

G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.

Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012

P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011

Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020

Parameters:
  • train_df (pyspark.sql.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.

  • reco_df (pyspark.sql.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.

  • item_feature_df (pyspark.sql.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).

  • item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.

  • col_user (str) – User id column name.

  • col_item (str) – Item id column name.

  • col_relevance (str) – Optional. This column indicates whether the recommended item is actually relevant to the user or not.

catalog_coverage()[source]#

Calculate catalog coverage for recommendations across all users. The metric definition is based on the “catalog coverage” definition in the following reference:

Citation:

G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.

Returns:

catalog coverage

Return type:

float

distributional_coverage()[source]#

Calculate distributional coverage for recommendations across all users. The metric definition is based on formula (21) in the following reference:

Citation:

G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.

Returns:

distributional coverage

Return type:

float

diversity()[source]#

Calculate average diversity of recommendations across all users.

Returns:

diversity.

Return type:

float

historical_item_novelty()[source]#

Calculate novelty for each item. Novelty is computed as the minus logarithm of (number of interactions with item / total number of interactions). The definition of the metric is based on the following reference using the choice model (eqs. 1 and 6):

Citation:

P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011

The novelty of an item can be defined relative to a set of observed events on the set of all items. These can be events of user choice (item “is picked” by a random user) or user discovery (item “is known” to a random user). The above definition of novelty reflects a factor of item popularity. High novelty values correspond to long-tail items in the density function, that few users have interacted with and low novelty values correspond to popular head items.

Returns:

A dataframe with the following columns: col_item, item_novelty.

Return type:

pyspark.sql.dataframe.DataFrame

novelty()[source]#

Calculate the average novelty in a list of recommended items (this assumes that the recommendation list is already computed). Follows section 5 from

Citation:

P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011

Returns:

A dataframe with following columns: novelty.

Return type:

pyspark.sql.dataframe.DataFrame

serendipity()[source]#

Calculate average serendipity for recommendations across all users.

Returns:

serendipity.

Return type:

float

user_diversity()[source]#

Calculate average diversity of recommendations for each user. The metric definition is based on formula (3) in the following reference:

Citation:

Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012

Returns:

A dataframe with the following columns: col_user, user_diversity.

Return type:

pyspark.sql.dataframe.DataFrame

user_item_serendipity()[source]#

Calculate serendipity of each item in the recommendations for each user. The metric definition is based on the following references:

Citation:

Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012

Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020

Returns:

A dataframe with columns: col_user, col_item, user_item_serendipity.

Return type:

pyspark.sql.dataframe.DataFrame

user_serendipity()[source]#

Calculate average serendipity for each user’s recommendations.

Returns:

A dataframe with following columns: col_user, user_serendipity.

Return type:

pyspark.sql.dataframe.DataFrame

class recommenders.evaluation.spark_evaluation.SparkRankingEvaluation(rating_true, rating_pred, k=10, relevancy_method='top_k', col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', threshold=10)[source]#

Spark Ranking Evaluator

__init__(rating_true, rating_pred, k=10, relevancy_method='top_k', col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', threshold=10)[source]#

Initialization. This is the Spark version of ranking metrics evaluator. The methods of this class, calculate ranking metrics such as precision@k, recall@k, ndcg@k, and mean average precision.

The implementations of precision@k, ndcg@k, and mean average precision are referenced from Spark MLlib, which can be found at the link.

Parameters:
  • rating_true (pyspark.sql.DataFrame) – DataFrame of true rating data (in the format of customerID-itemID-rating tuple).

  • rating_pred (pyspark.sql.DataFrame) – DataFrame of predicted rating data (in the format of customerID-itemID-rating tuple).

  • col_user (str) – column name for user.

  • col_item (str) – column name for item.

  • col_rating (str) – column name for rating.

  • col_prediction (str) – column name for prediction.

  • k (int) – number of items to recommend to each user.

  • relevancy_method (str) – method for determining relevant items. Possible values are “top_k”, “by_time_stamp”, and “by_threshold”.

  • threshold (float) – threshold for determining the relevant recommended items. This is used for the case that predicted ratings follow a known distribution. NOTE: this option is only activated if relevancy_method is set to “by_threshold”.

map()[source]#

Get mean average precision.

Returns:

MAP (min=0, max=1).

Return type:

float

map_at_k()[source]#

Get mean average precision at k.

Returns:

MAP at k (min=0, max=1).

Return type:

float

ndcg_at_k()[source]#

Get Normalized Discounted Cumulative Gain (NDCG)

Note

More details can be found on the ndcgAt PySpark documentation.

Returns:

nDCG at k (min=0, max=1).

Return type:

float

precision_at_k()[source]#

Get precision@k.

Note

More details can be found on the precisionAt PySpark documentation.

Returns:

precision at k (min=0, max=1)

Return type:

float

recall_at_k()[source]#

Get recall@K.

Note

More details can be found on the recallAt PySpark documentation.

Returns:

recall at k (min=0, max=1).

Return type:

float

class recommenders.evaluation.spark_evaluation.SparkRatingEvaluation(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Spark Rating Evaluator

__init__(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#

Initializer.

This is the Spark version of rating metrics evaluator. The methods of this class, calculate rating metrics such as root mean squared error, mean absolute error, R squared, and explained variance.

Parameters:
  • rating_true (pyspark.sql.DataFrame) – True labels.

  • rating_pred (pyspark.sql.DataFrame) – Predicted labels.

  • col_user (str) – column name for user.

  • col_item (str) – column name for item.

  • col_rating (str) – column name for rating.

  • col_prediction (str) – column name for prediction.

exp_var()[source]#

Calculate explained variance.

Note

Spark MLLib’s implementation is buggy (can lead to values > 1), hence we use var().

Returns:

Explained variance (min=0, max=1).

Return type:

float

mae()[source]#

Calculate Mean Absolute Error.

Returns:

Mean Absolute Error.

Return type:

float

rmse()[source]#

Calculate Root Mean Squared Error.

Returns:

Root mean squared error.

Return type:

float

rsquared()[source]#

Calculate R squared.

Returns:

R squared.

Return type:

float