Recommendations engines are used to make predictions about what future users would like based on prior users suggestions. Whenever you provide numerical feedback on a product or services this information can be used to provide recommendations in the future.
This post will look at various ways in which recommendation engines derive their conclusions.
Ways of Recommending
There are two common ways to develop a recommendation engine in a machine learning context. These two ways are collaborative filtering and content-based. Content-based recommendations rely solely on the data provided by the user. A user develops a profile through their activity and the engine recommends products or services. The only problem is if there is little data on user poor recommendations are made.
Collaborative filtering is crowd-based recommendations. What this means the data of many is used to recommend to one. This bypasses the concern with a lack of data that can happen with content-based recommendations.
There are four common ways to develop collaborative filters and they are as follows
- User-based collaborative filtering
- Item-baed collaborative filtering
- Singular value decomposition and Principal component analysis
User-based Collaborative Filtering (UBCF)
UBCF uses k-nearest neighbor or some similarity measurement such as Pearson Correlation to predict the missing rating for a user. Once the number of neighbors is determined the algorithm calculates the average of the neighbors to predict the information for the user. The predicted value can be used to determine if a user will like a particular product or service
The predicted value can be used to determine if a user will like a particular product or service. Low values are not recommended while high values may be. A major weakness of UBCF is calculating the similarities of users requires keeping all the data in memory which is a computational challenge.
Item-based Collaborative Filtering (IBCF)
IBCF uses the similarity between items to make recomeendations. This is calculated with the same measures as before (Knn, Pearson correlation, etc.). After finding the most similar items, The algorithm will take the average from the individual user of the other items to predict recommendation the user would make for the unknown item.
In order to assure accuracy, it is necessary to have a huge number of items that can have the similarities calculated. This leads to the same computational problems mentioned earlier.
Singular Value Decomposition and Principal Component Analysis (SVD, PCA)
When the dataset is too big for the first two options. SVD or PCA could be an appropriate choice. What each of these two methods does in a simple way is reduce the dimensionality by making latent variables. Doing this reduces the computational effort as well as reduce noise in the data.
With SVD, we can reduce the data to a handful of factors. The remaining factors can be used to reproduce the original values which can then be used to predict missing values.
For PCA, items are combined in components and like items that load on the same component can be used to make predictions for an unknown data point for a user.
Conclusion
Recommendation engines play a critical part in generating sales for many companies. This post provided an insight into how they are created. Understanding this can allow you to develop recommendation engines based on data.