Evaluation module#
Python evaluation#
- exception recommenders.evaluation.python_evaluation.ColumnMismatchError[source]#
Exception raised when there is a mismatch in columns.
This exception is raised when an operation involving columns encounters a mismatch or inconsistency.
- message#
Explanation of the error.
- Type:
str
- exception recommenders.evaluation.python_evaluation.ColumnTypeMismatchError[source]#
Exception raised when there is a mismatch in column types.
This exception is raised when an operation involving column types encounters a mismatch or inconsistency.
- message#
Explanation of the error.
- Type:
str
- recommenders.evaluation.python_evaluation.auc(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate the Area-Under-Curve metric for implicit feedback typed recommender, where rating is binary and prediction is float number ranging from 0 to 1.
https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve
Note
The evaluation does not require a leave-one-out scenario. This metric does not calculate group-based AUC which considers the AUC scores averaged across users. It is also not limited to k. Instead, it calculates the scores on the entire prediction results regardless the users.
- Parameters:
rating_true (pandas.DataFrame) – True data
rating_pred (pandas.DataFrame) – Predicted data
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
auc_score (min=0, max=1)
- Return type:
float
- recommenders.evaluation.python_evaluation.catalog_coverage(train_df, reco_df, col_user='userID', col_item='itemID')[source]#
Calculate catalog coverage for recommendations across all users. The metric definition is based on the “catalog coverage” definition in the following reference:
- Citation:
G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
col_user (str) – User id column name.
col_item (str) – Item id column name.
- Returns:
catalog coverage
- Return type:
float
- recommenders.evaluation.python_evaluation.distributional_coverage(train_df, reco_df, col_user='userID', col_item='itemID')[source]#
Calculate distributional coverage for recommendations across all users. The metric definition is based on formula (21) in the following reference:
- Citation:
G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
col_user (str) – User id column name.
col_item (str) – Item id column name.
- Returns:
distributional coverage
- Return type:
float
- recommenders.evaluation.python_evaluation.diversity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#
Calculate average diversity of recommendations across all users.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_item_features (str) – item feature column name.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_sim (str) – This column indicates the column name for item similarity.
col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.
- Returns:
diversity.
- Return type:
float
- recommenders.evaluation.python_evaluation.exp_var(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate explained variance.
- Parameters:
rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs
rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
Explained variance (min=0, max=1).
- Return type:
float
- recommenders.evaluation.python_evaluation.get_top_k_items(dataframe, col_user='userID', col_rating='rating', k=10)[source]#
Get the input customer-item-rating tuple in the format of Pandas DataFrame, output a Pandas DataFrame in the dense format of top k items for each user.
Note
If it is implicit rating, just append a column of constants to be ratings.
- Parameters:
dataframe (pandas.DataFrame) – DataFrame of rating data (in the format
customerID-itemID-rating)
col_user (str) – column name for user
col_rating (str) – column name for rating
k (int or None) – number of items for each user; None means that the input has already been
again. (filtered out top k items and sorted by ratings and there is no need to do that)
- Returns:
DataFrame of top k items for each user, sorted by col_user and rank
- Return type:
pandas.DataFrame
- recommenders.evaluation.python_evaluation.historical_item_novelty(train_df, reco_df, col_user='userID', col_item='itemID')[source]#
Calculate novelty for each item. Novelty is computed as the minus logarithm of (number of interactions with item / total number of interactions). The definition of the metric is based on the following reference using the choice model (eqs. 1 and 6):
- Citation:
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011
The novelty of an item can be defined relative to a set of observed events on the set of all items. These can be events of user choice (item “is picked” by a random user) or user discovery (item “is known” to a random user). The above definition of novelty reflects a factor of item popularity. High novelty values correspond to long-tail items in the density function, that few users have interacted with and low novelty values correspond to popular head items.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
col_user (str) – User id column name.
col_item (str) – Item id column name.
- Returns:
A dataframe with the following columns: col_item, item_novelty.
- Return type:
pandas.DataFrame
- recommenders.evaluation.python_evaluation.logloss(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate the logloss metric for implicit feedback typed recommender, where rating is binary and prediction is float number ranging from 0 to 1.
https://en.wikipedia.org/wiki/Loss_functions_for_classification#Cross_entropy_loss_(Log_Loss)
- Parameters:
rating_true (pandas.DataFrame) – True data
rating_pred (pandas.DataFrame) – Predicted data
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
log_loss_score (min=-inf, max=inf)
- Return type:
float
- recommenders.evaluation.python_evaluation.mae(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate Mean Absolute Error.
- Parameters:
rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs
rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
Mean Absolute Error.
- Return type:
float
- recommenders.evaluation.python_evaluation.map(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#
Mean Average Precision for top k prediction items
The implementation of MAP is referenced from Spark MLlib evaluation metrics. https://spark.apache.org/docs/2.3.0/mllib-evaluation-metrics.html#ranking-systems
A good reference can be found at: http://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf
Note
The MAP is meant to calculate Avg. Precision for the relevant items, so it is normalized by the number of relevant items in the ground truth data, instead of k.
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
- Returns:
MAP (min=0, max=1)
- Return type:
float
- recommenders.evaluation.python_evaluation.map_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#
Mean Average Precision at k
The implementation of MAP@k is referenced from Spark MLlib evaluation metrics. apache/spark
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
- Returns:
MAP@k (min=0, max=1)
- Return type:
float
- recommenders.evaluation.python_evaluation.merge_ranking_true_pred(rating_true, rating_pred, col_user, col_item, col_prediction, relevancy_method, k=10, threshold=10, **_)[source]#
Filter truth and prediction data frames on common users
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user (optional)
threshold (float) – threshold of top items per user (optional)
- Returns:
DataFrame of recommendation hits, sorted by col_user and rank DataFrame of hit counts vs actual relevant items per user number of unique user ids
- Return type:
pandas.DataFrame, pandas.DataFrame, int
- recommenders.evaluation.python_evaluation.merge_rating_true_pred(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Join truth and prediction data frames on userID and itemID and return the true and predicted rated with the correct index.
- Parameters:
rating_true (pandas.DataFrame) – True data
rating_pred (pandas.DataFrame) – Predicted data
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
Array with the true ratings numpy.ndarray: Array with the predicted ratings
- Return type:
numpy.ndarray
- recommenders.evaluation.python_evaluation.ndcg_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, score_type='binary', discfun_type='loge', **_)[source]#
Normalized Discounted Cumulative Gain (nDCG).
Info: https://en.wikipedia.org/wiki/Discounted_cumulative_gain
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
score_type (str) – type of relevance scores [‘binary’, ‘raw’, ‘exp’]. With the default option ‘binary’, the relevance score is reduced to either 1 (hit) or 0 (miss). Option ‘raw’ uses the raw relevance score. Option ‘exp’ uses (2 ** RAW_RELEVANCE - 1) as the relevance score
discfun_type (str) – type of discount function [‘loge’, ‘log2’] used to calculate DCG.
- Returns:
nDCG at k (min=0, max=1).
- Return type:
float
- recommenders.evaluation.python_evaluation.novelty(train_df, reco_df, col_user='userID', col_item='itemID')[source]#
Calculate the average novelty in a list of recommended items (this assumes that the recommendation list is already computed). Follows section 5 from
- Citation:
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
col_user (str) – User id column name.
col_item (str) – Item id column name.
- Returns:
novelty.
- Return type:
float
- recommenders.evaluation.python_evaluation.precision_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#
Precision at K.
Note
We use the same formula to calculate precision@k as that in Spark. More details can be found at http://spark.apache.org/docs/2.1.1/api/python/pyspark.mllib.html#pyspark.mllib.evaluation.RankingMetrics.precisionAt In particular, the maximum achievable precision may be < 1, if the number of items for a user in rating_pred is less than k.
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
- Returns:
precision at k (min=0, max=1)
- Return type:
float
- recommenders.evaluation.python_evaluation.r_precision_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#
R-precision at K.
R-precision can be defined as the precision@R for each user, where R is the numer of relevant items for the query. Its also equivalent to the recall at the R-th position.
Note
As R can be high, in this case, the k indicates the maximum possible R. If every user has more than k true items, then r-precision@k is equal to precision@k. You might need to raise the k value to get meaningful results.
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
- Returns:
recall at k (min=0, max=1). The maximum value is 1 even when fewer than k items exist for a user in rating_true.
- Return type:
float
- recommenders.evaluation.python_evaluation.recall_at_k(rating_true, rating_pred, col_user='userID', col_item='itemID', col_prediction='prediction', relevancy_method='top_k', k=10, threshold=10, **_)[source]#
Recall at K.
- Parameters:
rating_true (pandas.DataFrame) – True DataFrame
rating_pred (pandas.DataFrame) – Predicted DataFrame
col_user (str) – column name for user
col_item (str) – column name for item
col_prediction (str) – column name for prediction
relevancy_method (str) – method for determining relevancy [‘top_k’, ‘by_threshold’, None]. None means that the top k items are directly provided, so there is no need to compute the relevancy operation.
k (int) – number of top k items per user
threshold (float) – threshold of top items per user (optional)
- Returns:
recall at k (min=0, max=1). The maximum value is 1 even when fewer than k items exist for a user in rating_true.
- Return type:
float
- recommenders.evaluation.python_evaluation.rmse(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate Root Mean Squared Error
- Parameters:
rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs
rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
Root mean squared error
- Return type:
float
- recommenders.evaluation.python_evaluation.rsquared(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Calculate R squared
- Parameters:
rating_true (pandas.DataFrame) – True data. There should be no duplicate (userID, itemID) pairs
rating_pred (pandas.DataFrame) – Predicted data. There should be no duplicate (userID, itemID) pairs
col_user (str) – column name for user
col_item (str) – column name for item
col_rating (str) – column name for rating
col_prediction (str) – column name for prediction
- Returns:
R squared (min=0, max=1).
- Return type:
float
- recommenders.evaluation.python_evaluation.serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#
Calculate average serendipity for recommendations across all users.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_item_features (str) – item feature column name.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_sim (str) – This column indicates the column name for item similarity.
col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.
- Returns:
serendipity.
- Return type:
float
- recommenders.evaluation.python_evaluation.user_diversity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#
Calculate average diversity of recommendations for each user. The metric definition is based on formula (3) in the following reference:
- Citation:
Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_item_features (str) – item feature column name.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_sim (str) – This column indicates the column name for item similarity.
col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.
- Returns:
A dataframe with the following columns: col_user, user_diversity.
- Return type:
pandas.DataFrame
- recommenders.evaluation.python_evaluation.user_item_serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#
Calculate serendipity of each item in the recommendations for each user. The metric definition is based on the following references:
- Citation:
Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012
Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_item_features (str) – item feature column name.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_sim (str) – This column indicates the column name for item similarity.
col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.
- Returns:
A dataframe with columns: col_user, col_item, user_item_serendipity.
- Return type:
pandas.DataFrame
- recommenders.evaluation.python_evaluation.user_serendipity(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_item_features='features', col_user='userID', col_item='itemID', col_sim='sim', col_relevance=None)[source]#
Calculate average serendipity for each user’s recommendations.
- Parameters:
train_df (pandas.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows.
reco_df (pandas.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pandas.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_item_features (str) – item feature column name.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_sim (str) – This column indicates the column name for item similarity.
col_relevance (str) – This column indicates whether the recommended item is actually relevant to the user or not.
- Returns:
A dataframe with following columns: col_user, user_serendipity.
- Return type:
pandas.DataFrame
PySpark evaluation#
- class recommenders.evaluation.spark_evaluation.SparkDiversityEvaluation(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_user='userID', col_item='itemID', col_relevance=None)[source]#
Spark Evaluator for diversity, coverage, novelty, serendipity
- __init__(train_df, reco_df, item_feature_df=None, item_sim_measure='item_cooccurrence_count', col_user='userID', col_item='itemID', col_relevance=None)[source]#
Initializer.
This is the Spark version of diversity metrics evaluator. The methods of this class calculate the following diversity metrics:
- Coverage - it includes two metrics:
catalog_coverage, which measures the proportion of items that get recommended from the item catalog;
distributional_coverage, which measures how unequally different items are recommended in the recommendations to all users.
Novelty - A more novel item indicates it is less popular, i.e. it gets recommended less frequently.
Diversity - The dissimilarity of items being recommended.
- Serendipity - The “unusualness” or “surprise” of recommendations to a user. When ‘col_relevance’ is used,
it indicates how “pleasant surprise” of recommendations is to a user.
The metric definitions/formulations are based on the following references with modification:
- Citation:
G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.
Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011
Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020
- Parameters:
train_df (pyspark.sql.DataFrame) – Data set with historical data for users and items they have interacted with; contains col_user, col_item. Assumed to not contain any duplicate rows. Interaction here follows the item choice model from Castells et al.
reco_df (pyspark.sql.DataFrame) – Recommender’s prediction output, containing col_user, col_item, col_relevance (optional). Assumed to not contain any duplicate user-item pairs.
item_feature_df (pyspark.sql.DataFrame) – (Optional) It is required only when item_sim_measure=’item_feature_vector’. It contains two columns: col_item and features (a feature vector).
item_sim_measure (str) – (Optional) This column indicates which item similarity measure to be used. Available measures include item_cooccurrence_count (default choice) and item_feature_vector.
col_user (str) – User id column name.
col_item (str) – Item id column name.
col_relevance (str) – Optional. This column indicates whether the recommended item is actually relevant to the user or not.
- catalog_coverage()[source]#
Calculate catalog coverage for recommendations across all users. The metric definition is based on the “catalog coverage” definition in the following reference:
- Citation:
G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.
- Returns:
catalog coverage
- Return type:
float
- distributional_coverage()[source]#
Calculate distributional coverage for recommendations across all users. The metric definition is based on formula (21) in the following reference:
- Citation:
G. Shani and A. Gunawardana, Evaluating Recommendation Systems, Recommender Systems Handbook pp. 257-297, 2010.
- Returns:
distributional coverage
- Return type:
float
- diversity()[source]#
Calculate average diversity of recommendations across all users.
- Returns:
diversity.
- Return type:
float
- historical_item_novelty()[source]#
Calculate novelty for each item. Novelty is computed as the minus logarithm of (number of interactions with item / total number of interactions). The definition of the metric is based on the following reference using the choice model (eqs. 1 and 6):
- Citation:
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011
The novelty of an item can be defined relative to a set of observed events on the set of all items. These can be events of user choice (item “is picked” by a random user) or user discovery (item “is known” to a random user). The above definition of novelty reflects a factor of item popularity. High novelty values correspond to long-tail items in the density function, that few users have interacted with and low novelty values correspond to popular head items.
- Returns:
A dataframe with the following columns: col_item, item_novelty.
- Return type:
pyspark.sql.dataframe.DataFrame
- novelty()[source]#
Calculate the average novelty in a list of recommended items (this assumes that the recommendation list is already computed). Follows section 5 from
- Citation:
P. Castells, S. Vargas, and J. Wang, Novelty and diversity metrics for recommender systems: choice, discovery and relevance, ECIR 2011
- Returns:
A dataframe with following columns: novelty.
- Return type:
pyspark.sql.dataframe.DataFrame
- serendipity()[source]#
Calculate average serendipity for recommendations across all users.
- Returns:
serendipity.
- Return type:
float
- user_diversity()[source]#
Calculate average diversity of recommendations for each user. The metric definition is based on formula (3) in the following reference:
- Citation:
Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012
- Returns:
A dataframe with the following columns: col_user, user_diversity.
- Return type:
pyspark.sql.dataframe.DataFrame
- user_item_serendipity()[source]#
Calculate serendipity of each item in the recommendations for each user. The metric definition is based on the following references:
- Citation:
Y.C. Zhang, D.Ó. Séaghdha, D. Quercia and T. Jambor, Auralist: introducing serendipity into music recommendation, WSDM 2012
Eugene Yan, Serendipity: Accuracy’s unpopular best friend in Recommender Systems, eugeneyan.com, April 2020
- Returns:
A dataframe with columns: col_user, col_item, user_item_serendipity.
- Return type:
pyspark.sql.dataframe.DataFrame
- class recommenders.evaluation.spark_evaluation.SparkRankingEvaluation(rating_true, rating_pred, k=10, relevancy_method='top_k', col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', threshold=10)[source]#
Spark Ranking Evaluator
- __init__(rating_true, rating_pred, k=10, relevancy_method='top_k', col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction', threshold=10)[source]#
Initialization. This is the Spark version of ranking metrics evaluator. The methods of this class, calculate ranking metrics such as precision@k, recall@k, ndcg@k, and mean average precision.
The implementations of precision@k, ndcg@k, and mean average precision are referenced from Spark MLlib, which can be found at the link.
- Parameters:
rating_true (pyspark.sql.DataFrame) – DataFrame of true rating data (in the format of customerID-itemID-rating tuple).
rating_pred (pyspark.sql.DataFrame) – DataFrame of predicted rating data (in the format of customerID-itemID-rating tuple).
col_user (str) – column name for user.
col_item (str) – column name for item.
col_rating (str) – column name for rating.
col_prediction (str) – column name for prediction.
k (int) – number of items to recommend to each user.
relevancy_method (str) – method for determining relevant items. Possible values are “top_k”, “by_time_stamp”, and “by_threshold”.
threshold (float) – threshold for determining the relevant recommended items. This is used for the case that predicted ratings follow a known distribution. NOTE: this option is only activated if relevancy_method is set to “by_threshold”.
- map_at_k()[source]#
Get mean average precision at k.
Note
More details on the meanAveragePrecision PySpark documentation.
- Returns:
MAP at k (min=0, max=1).
- Return type:
float
- ndcg_at_k()[source]#
Get Normalized Discounted Cumulative Gain (NDCG)
Note
More details can be found on the ndcgAt PySpark documentation.
- Returns:
nDCG at k (min=0, max=1).
- Return type:
float
- precision_at_k()[source]#
Get precision@k.
Note
More details can be found on the precisionAt PySpark documentation.
- Returns:
precision at k (min=0, max=1)
- Return type:
float
- recall_at_k()[source]#
Get recall@K.
Note
More details can be found on the recallAt PySpark documentation.
- Returns:
recall at k (min=0, max=1).
- Return type:
float
- class recommenders.evaluation.spark_evaluation.SparkRatingEvaluation(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Spark Rating Evaluator
- __init__(rating_true, rating_pred, col_user='userID', col_item='itemID', col_rating='rating', col_prediction='prediction')[source]#
Initializer.
This is the Spark version of rating metrics evaluator. The methods of this class, calculate rating metrics such as root mean squared error, mean absolute error, R squared, and explained variance.
- Parameters:
rating_true (pyspark.sql.DataFrame) – True labels.
rating_pred (pyspark.sql.DataFrame) – Predicted labels.
col_user (str) – column name for user.
col_item (str) – column name for item.
col_rating (str) – column name for rating.
col_prediction (str) – column name for prediction.
- exp_var()[source]#
Calculate explained variance.
Note
Spark MLLib’s implementation is buggy (can lead to values > 1), hence we use var().
- Returns:
Explained variance (min=0, max=1).
- Return type:
float