How Recommendation system are helping in personalization of products and services
A recommender system is a tool that helps you in making decisions when you are looking for something. Especially, if what you are looking for is not even known to you. The examples of recommender systems are all around us. For instance, on Amazon, when you are looking at a product a recommender system suggests to you the other products which are similar to the one you are looking at. As another example, on Spotify there are playlists which are generated based on your taste by a recommender system.
A recommender system can be considered as a kind of personal assistant who is able to anticipate your needs and provide suggestions about what to do. A personal assistant is able to do his job because he knows you very well. Similarly, a recommender system is able to make a good quality of recommendations only if it “knows” something about you. That’s why writing or building a recommender system is not only a matter of writing an algorithm, but also of selecting which data is relevant and should be provided to the system.
Recommendations have an increasing impact on our life, as they are a valuable way to overcome the problem of information overload by filtering the alternatives. Recommender System exposes us to the most interesting, attractive and useful items, combining relevance. Hence this practice is raising the ethical and social issues like identity, privacy and even manipulation.
Introduction to Recommender Systems:
A recommender system is a system that filters and analyses input data. It’s goal is to provide users with hints and suggestions about items that meet their interests.
Some of the most common applications are E-commerce and streaming services. For example makemytrip.com suggests where to go next based on your previous tri[p and other peoples ratings.
Amazon keeps track of items you look at to give you tailor made suggestions about other similar items that might interest you. Spotify analyzes genres, offers other user’s playlist and more to give you suggestions. In the same way, Netflix suggests you to watch next.
Therefore, a list of available items is the first main input to the recommender algorithm. The description of each user can be enriched by the set of attributes. For instance, if we recommend movies, genre, director, and actors could be a meaningful set of attributes.
A recommender system also needs to know something about the users in order to provide them with the recommendations. As a consequence, a second source of information for a recommender are user attributes. Demographics, such as gender and age are the examples of user attributes.
A third important source of information are interactions between users and items. Interactions reveal the opinions of users on some of the items. For Instance, a user may have rated some movies. In this case we explicitly know about the opinion of the user on these movies.
Alternatively, we may know which book a user has bought in the past or which movies a user has watched. In this case, we can implicitly assume that if a user watched a movie or bought a book, probably the user likes that movie or that book.
Knowing these interactions can be useful to recommend to other users what to read and what to watch next based on what they have already read or watched. For example, if a person buys shoes usually he buys headbands also, a recommender system will encourage you to buy both the items
Interactions have attributes as well, and these attributes are called context. Examples of contextual attributes are global location, day of the week, hour of the day, mood of the user, and so on. The same user may have different opinions on the same item based on context. For instance, a restaurant could be perfect for a business lunch, but not for a romantic dinner. Business lunch and romantic dinner are examples of context when recommending restaurants. Similarly, if the weather is sunny, the user might prefer a restaurant with an outdoor garden, while if the weather is rainy, the user might prefer a restaurant with a fireplace. Sun and rain are two other examples of context.
Taxonomy of Recommender Systems:
Recommended algorithms can be classified into 2 categories,
- Personalized recommenders
- Non-personalized recommenders.
With non-personalized recommendations, all users receive the same recommendations. Examples of non-personalized recommendations are the most popular movies, recent music hits, or the best rated hotels.
With personalized recommendations, different users receive different recommendations. The goal of the personalized recommendations is to make better recommendations than non-personalized techniques.
But there are some scenario in which non-personalized recommendation is a good choice. For example, to recommend the most popular movies, you recommend a movie that most users like.
Personalized recommendation
Personalized recommendation techniques can be further categorized into a number of different categories. The most important being content based and collaborative filtering.
The basic idea with content-based recommendations is to recommend items similar to the items that users liked in the past. For instance, if we like a movie directed by a particular director the recommendation system will recommend the movies directed by the same director or with the same actor that was in that movie. Second example, when you are looking at a page describing a product on Amazon website, you receive recommendations listing the products similar to the one you are looking at. Content-based filtering has been one of the first approaches to build recommender systems. One of the prerequisites for context-based filtering is to have a list of good quality attributes.
Collaborative filtering techniques, on the contrary, do not require any item attribute to work, but rely on the opinions of the community of users. The first type of collaborative recommender invented was based on user-user similarities. The basic idea is to search for the user with similar taste, i.e user sharing the same opinion on a number of items. As an example, if Arvind and Ashish have the same opinion on many movies, it is likely that Arvind likes inception and Ashish also likes inception. But the user-user approach does not always work well so the second type of collaborative recommenders are item-item algorithms. Item based collaborative filtering was first used by Amazon .com in 1998.
The basic idea is to calculate the similarity between each pair of items according to how many users have the same opinions on the two items. For example, a user who likes inception also likes interstellar, it is highly probable that a user who likes inception also likes interstellar. Nowadays, many commercial recommenders rely on item-item algorithms.
Benchmarks
Netflix used to organize a competition awarding the best algorithm for their recommender systems.
During one of them, a new family of collaborative filtering techniques called matrix factorization, which is part of some more general family of algorithms called dimensionality reduction.
Context-aware recommenders also known as CARS, extend collaborative filtering in order to be able to use the context and improve the quality of recommendations.
Modern algorithms are able to simultaneously use multiple heterogeneous sources of information in order to build hybrid recommendation systems that merge and improve the capabilities of context and collaborative techniques. These hybrid algorithms can be roughly considered as collaborative filtering techniques, in which user, item and interaction are enriched with side information. For example, we can build a recommender system for restaurants able to take into account the description of the restaurant, comments written by the users, and the photos of the dishes served at the restaurant. In order to use hybrid algorithms, able to take advantage of side information, new techniques must be used, such as factorization machines or deep learning techniques. These techniques can also be used as collaborative filtering algorithms but their main advantage is when used with contextual or other source of side information.
Item Context Matrix:
Item context matrix or ICR is a mathematical way to define input to a recommender system as a list of items and their attributes. Rows in the item context matrix represent items and columns represent the attributes. In the simplest form the values in the item context matrix are in binary format, either one or zero. If an item contains a specific attribute, the corresponding value in the matrix will be set to one or zero. In this example the ICM represents Tom Cruise as an attribute for the movie Top Gun.
In a more useful scenario, each number in the item context matrix represents how important an attribute is to characterize an item and can assume a positive value. For instance, Stan Lee made a cameo appearance in the movie, The Avengers so the corresponding value in ICM should be set to lower than the value we use to describe leading actors.
User Rating Matrix:
One of the most important inputs to our recommender systems is the user interaction matrix that is the past interactions between the users and items. These interactions can be mathematically described as a user-rating matrix. Numbers in URM represent ratings, either implicit or explicit. The Rows in the user-rating matrix represent the users, the columns represent the items. If we have no information about the opinion of the user on an item, the corresponding value will be set to zero.
The user-rating matrix or URM is the main input to our recommendation systems. This can be represented by the letter R, users as U and items as I. Ratings in the user-rating matrix can be implicit or explicit.
Implicit ratings typically have only zero or one possible value. This is because we can only look at the behavior of a user to understand why he liked something or not. We can mark the interaction as one if we think he is interested in it, zero in case we think he is not. In Explicit ratings we can ask the user to rate an item, for instance in one to five rating scale and the value in URM describes the rating. Zero can indicate the fact that we have no information on that item for that user. The goal of any recommender system is to predict the missing values in the URM. The URM density of an average recommender system, namely the percentage of non-zero elements, is usually below 0.01 percent. In fact, the URM density of an advanced recommender system, such as Netflix’s is very less than that of about 0.02 percent, while Movie Lens, it is just 0.005 percent.
Inferring preferences:
There are different ways to estimate the opinion of a user for an item, without asking for an opinion explicitly. The implicit opinions collected in a way are called implicit ratings. It may be for instance the total viewing time of a movie, the number of items a user has listed to the song, or the fact that a user has made a purchase. In case of movies we can assume if a user has stopped watching movie after 20–30 minutes, he or she did not like the movie while instead if the viewing time corresponds more or less to the length of the movie, probably the user has enjoyed it but this is not an absolute criterion, the system does not know that a user has received an important call, and had to stop watching the movie. Another way to understand user preferences is, of course, asking him or her to explicitly grade the item. The real point here is to decide how to organize the rating scale. We may want to use a large rating scale to have many possible grades that reflect the opinion of the user precisely. On the other hand, we have to be aware of the fact that it requires more effort by the user to choose the correct rating on a large rating scale, and therefore we have to expect fewer ratings. Another option is to have a simple smaller user scale(like or unlike). In this case, we will receive on average more ratings than before.
Another important decision is whether we prefer even or odd rating scales. The even rating scale implies an absence of a neutral element, the one in the middle, you can receive only positive or negative ratings. The user in a way is forced to express an opinion, and that again can result in fewer ratings. If we instead opt for an odd rating scale, we have to be aware that a system receives the ratings that are neutral. The idea is that the possibility of giving neutral ratings will make users even more comfortable and therefore more ratings will be given. Unfortunately, it is a trend that users prefer giving neutral ratings. So in a way, the system receives more ratings but a lot of them are useless because they do not express a real opinion. Users in general tend to publish their rating, only if they have a positive experience. This evidently creates a bias that affects the rating distribution.
Non-personalized Recommenders:
Non-personalized techniques recommend all the users the same list of items. An intuitive type of this category of recommender systems are top popular recommendations. We take the URM matrix and count the number of non-zero ratings for each item. We will see which items have been rated for the greatest number of times and in this way we see which items are most popular. The popularity of an item is computed by using its rating without taking into account the opinion of the users but just the number of users by which the item has been judged. Another non-personalized technique is based on best rated items. In order to compute the best rated items, we take the URM, extract the average rating per item and identify the items with the largest average rating. Mathematically, we can express the average rating per the summation of the nonzero ratings given by the users of an item. It puts on the same page the items rated by hundreds of users or the items rated by a single user.
To correct this bias and give statistical significance, we take the same formula and add a C term to the denominator. The C term is called the shrink term. It is a constant value chosen depending on the properties of the URM. In this example, we have two items. Some Item i has been rated by three users with average 4. Item J has been rated by one user with average 5. So without shrink item j is better rated than item i. If we introduce the shrink term and set it to 1, we can see how the average ratings change. As such, the shrink term does not affect the result much, bi changes from 4 to 3. On the other hand, item j’s rating is only a half of what it was, b changes from 5 to 2.5. As a result, item i has a higher rating than item j because it has taken into account that item i has been voted by most of the people.
Global Effects:
The contents of the user rating matrix are prone to be biased. In order to remove the bias first we need to compute the average rating for all the items and users.
We obtain this value by summation of all the non-zero ratings contained in the user rating matrix and dividing it by the total number of non-zero ratings. This term is called global bias. All we need is to subtract the average we have found before from the rating values in the UMR. For the third step we need to calculate the item bias. We divide the summation of normalized ratings for item i by the number of users who have rated that item. In doing so, we also consider the shrimp term at the denominator. Now we compute the rating for each user and item by subtracting the bias of that item from the normalized rating. The next bias we consider is user bias. This bias is introduced because some users are more generous in their ratings while others are stricter. We take the summation of all the non-zero ratings from one user and divide it by the total number of items he has rated plus the shrimp term. To calculate the final formula we sum the average rating for all items and users, with the user’s bias for that user, and the item bias for that item.
EVALUATION OF RECOMMENDER SYSTEMS
Quality of a Recommender System:
The quality of a recommender system depends on three aspects, the dataset, the algorithm and the user interface. All the three combine to increase the quality of a recommendation system. For example, if the quality of a dataset is poor, we could have the best algorithm, but still the quality of a recommender system will not be so high. Moreover, the user interface must be simple and easy to understand.
The quality of a recommender system can be indicated by
- Relevance: It is the ability of recommending items that a user will appreciate.
- Coverage: is the ability to recommend most of the items of a possibly very large set of items. It is the percentage of items that a recommender system is able to recommend. Recommender systems with high relevance may have low coverage since they recommend items that most users like.
- Novelty: is the ability of recommending items unknown to the user. If Netflix has high relevance but low novelty the overall quality of a recommender system will be poor.
- Diversity: is the ability to diversify the recommended items. For example, if we know the user likes chinese restaurants and we only suggest him/her to eat chinese the recommendation system will be boring.
- Consistency: refers to the stability of recommendation systems. Some recommendation systems are very dynamic, continuously updating the user profiles, and therefore modifying the recommendations.
- Confidence: is the ability of measuring how much the system is sure about the recommendation. Not all recommendation systems are able to provide confidence estimates
- Serendipity: is the ability of surprising the users that is to recommend something unusual to the users who can discover something unexpected. A recommendation system is serendipitous if it recommends something that a user would never be able to search for.
Online Evaluation Techniques:
The first category of evaluation techniques is composed of online ones. There are different online evaluation techniques. The first one is to define direct user feedback. We can ask users to define their level of recommender system. We can investigate the satisfaction using questionnaires but this can not be a good choice as first the size of the sample should be meaningful and second the opinion expressed by the users could not be reliable.
Another possibility is to monitor the online behaviour of the users and to apply A/B testing. The core idea is to compare the behavior of the users who receive recommendations, set A with the behaviour of the users who do not receive recommendations set B. For Example, if we consider Ecommerce, we would expect as a result that users who receive recommendations buy more products. A/B testing is a powerful method of evaluation, but difficult to set up. In addition, its results may be difficult to interpret. If users do not follow recommendations it would be because of lack of relevance or lack of diversity. Another way is by controlled experiment techniques. In controlled experiment technique a mockup application is made available to a group of potential users. Users are allowed to use the application and give their opinion on recommendations received. The method is less expensive but there are some problems. The application is not a real application and the users are not the real users and their opinions cannot be reliable since they aren’t as motivated as the real user of a real online system. For Example, if I decide to volunteer for testing a movie recommendation system and it suggests that I watch the movie star wars, probably I would watch 10 mins of it and give a positive rating. On the other hand if I was a real user, I would think twice before I waste my time if I am not really convinced. The last online evaluation is based on crowdsourcing. It consists of asking people after an offer of some kind of compensation to answer an online questionnaire expressing their opinion about mocking up an application. This technique is powerful since there can be a large crowd of volunteers. But we cannot trust the opinion of volunteers as maybe they are just interested in compensation, so they might give random answers. Typically these users are less reliable, so we need very strong statistical tests in order to understand the reliability of their answers.
Offline Evaluation Techniques:
The recommender system, the dataset, partitioning of a dataset, and the error metrics will allow us to estimate the goodness of our recommender system. The first method we could accomplish the task of off-line evaluation is the rating prediction. The goal is to go as near as possible to the true value of the rating. If the prediction value is 3.7, it could be considered as a good recommendation since the true value is 4. The second task is the top N recommendation. The goal is to find the item relevant to the users. The evaluation data set represents all the information we have available to make proper recommendations which can be represented by a URM consisting of users on rows and items on columns and the ratings in the intersection. Usually we know only a little percentage of possible ratings and these are called ground truth made up of all non-zero ratings divided into two parts. The first part with all the positive opinions given by users and second part all negative opinions. However, some parts of the data are not present in ground truth known as unknown ratings made up of interactions between users and items in which the former have not rated the later.
Use cases:
- E-commerce
- Retail
- Media
- Banking
- Telecom
- Education