A Conceptual and Computational Framework for Aspect-Based Collaborative Filtering Recommender Systems ()
1. Introduction
The ever-expanding growth of the e-commerce websites and applications have enriched the recommender dataset with the aspect-level data [1] [2] [3] [4] . Many recommender datasets with ample information about the preferences of many users towards various aspects of many items are available [5] [6] . The aspect related information is either explicitly mentioned or implicit, mainly hidden in the texts such as reviews [7] [8] [9] . NLP algorithms are well advanced to extract the users’ aspect-sentiments from the texts [10] [11] . Different Aspect Based Sentiment Analysis (ABSA) algorithms are regularly developed to mine the opinions of users towards different aspects of items [10] [12] [13] [14] .
Despite the availability of detailed sentiment data, most of the CF Recommender Systems (RS) are still based on overall ratings [15] [16] [17] . Collaborative Filtering (CF) techniques typically recommend relevant items to a user based upon the overall preferences of other users towards the items [18] [19] , in part because a theoretical framework for using user sentiments towards aspects has not been developed.
Aspect-sentiment based studies generally mine the opinion of users towards different aspects of items and present the mined results [20] [21] . The extracted aspects and the related sentiments are left without further analysis as in the two studies in [21] and [22] . In [23] , the authors perform sentiment analysis of reviews to identify the nearest-neighbor items in terms of aspect sentiments but no discussion of emphasis on aspects or of the popularity of aspects was included, and further analysis of aspect-sentiments was not conducted. In [24] , the authors extract aspect-level preferences of a user from the reviews, then compare the users’ aspect-level preferences with aspect-level details of a review to score the helpfulness of the review and subsequently recommend the reviews based on their helpfulness score. The impact on the item-level aspect sentiments, popularity of aspects, or on item recommendation was not included.
Aspect-based information was also discussed in [25] [26] [27] , but their implications to recommendation systems were not fully explored. In [28] , the authors consider the popularity of each aspect of an item during recommendation. But the study does not discuss the impact of user’s emphasis on aspects and does not consider the aspect-sentiment of a user towards specific aspects of an item. In [25] and [29] , the authors introduce approaches to compute the weighted aspect ratings which are then used to infer a user’s overall rating toward an item, but does not involve the analysis of popularity of an aspect of an item. The authors in [30] consider the users’ emphasis on aspect and average sentiment of all users towards product of an item for recommendation but do not consider individual users’ sentiment towards an aspect of an item. The authors in [8] propose a CF RS relying on the user’s experience with the aspects of particular items but not with the overall emphasis of a user on an aspect and it is also missing the involvement of overall popularity of an aspect of an item.
The impact of aspect sentiment on recommendation was explored in several studies, but incompletely [31] [32] . In [33] , a method was proposed to include the sentiment-based explanation of features of items, so that users can make better choices, but the emphasis of a user towards an aspect and the sentiment of a user toward a particular aspect of an item were not used, which we consider in this paper as a crucial ingredient in making recommendation based on the available aspect sentiment. Similarly for [34] , where user-level and item-level importance of aspects were discussed but the sentiments themselves were not used. [35] combines the aspect-level popularity of items with the importance of an aspect to a user to make recommendations, but does not use the user sentiments. The study in [36] determines whether a user is influenced more by positive or negative opinions, then combines the influence score with the item-level aspect importance to rank items. Authors in [7] consider aspect emphasis and item-aspect availability but do not make use of aspect-sentiments. The study in [37] uses the aspect-utility and the aspect-importance values to predict an overall rating towards an item from a review, but the users and items are not related via user sentiments.
The above discussion suggests that researchers in this field are exploring the relationships and uses of several key concepts in the recommendation problem, but no clear framework that unifies these key concepts has been developed. In this paper, we propose and develop such a framework, mainly by showing that the key concepts can be all computed and related once the 3-index Aspect-Sentiment Tensor (AST)
is defined or sampled, where s, u, i, and a denote, sentiment, user, item, and aspect, respectively. Subsequently, we will define, in terms of the AST, concepts like popularity, emphasis, controversy, and similarity of users, items, and aspects. The relationships between the concepts will be made clearer as they are all derived in a consistent manner from an underlying sentiment tensor.
Finally, we end up with an Aspect-based Collaborative Filtering Toolbox (ABCFT) that simplifies the process of developing aspect-based CF approaches and has an extensive potential of developing an explanatory aspect-based CF recommender system. We encourage to extend ABCFT with additional tools so that we together can speed up involving aspect-based information to make more justifiable recommendations.
2. Notation and Concepts
Common notations used in this study are as listed below:
1) Capital letters represent sets, matrices, or tensors. And the notation
to indicate their elements.
2) Small letters represent elements in sets or matrices. When sub-scripted or super-scripted, they represent the elements in the matrices or tensors. For example,
represents the emphasis score of aspect a according to user u and is the element in the corresponding matrix E.
3)
represents the number of elements in the set S.
4) R is the Rating Matrix of size m by n, where m is the total number of users and n is the total number of items. It is further discussed in Section 2.1.
5) S is the Sentiment Tensor of size m by n by k, where m is the total number of users, n is the total number of items and k is the total number of aspects. It is further discussed in Section 2.1.
6)
= the set of all the users in S.
7)
= the set of all the items in S.
8)
= the set of all the aspects in S.
9)
is a sub-set of users of U,
is a sub-set of items of I, and
is a sub-set of aspects of S.
10)
= the set of all users that have reviewed an item i.
11)
= the set of all the items reviewed by a user u.
12)
= the set of all aspects of an item i. In our study, usually
.
13)
= the popularity of an aspect a of the item i. It is discussed in Section 2.2.1.
14)
represents the emphasis score of a user u on an aspect a. It is discussed in Section 2.4.
15)
is the distance between users u and v based on item i.
16)
is the distance between users u and v based on aspect a.
17)
is a set of clusters of users in
.
2.1. Rating Matric and Aspect Sentiment Tensor
A rating matrix R is the matrix (aspect-free) of m users and n items as in the classical recommender systems and
represents the rating of user u on item i [16] . Generally, the ratings in a rating matrix are discrete numerical values.
A sentiment tensor S is a three-index tensor of m users, n items and k aspects. Here,
denotes the sentiment of user u about an item i along an aspect a. A value in S is either +1 or 0 or −1. 1 represents positive sentiment, −1 represents negative sentiment and 0 represents no sentiment.
2.2. Popularity and Controversy
There may be different ways to define the popularity and controversy of an aspect of an item. Simple definitions of popularity and controversy of an aspect of an item, computable using the Sentiment Tensor S are as explained in Sections 2.2.1 and 2.2.2.
The popularity of aspects of a specific item can have a significant role in building an aspect-driven recommender system but are rarely used [38] . The aspect-level popularity of an item can be combined with aspect-level preference of users to build a CF based RS [35] . For an item i recommend to a user u, the popularity score of the aspects of that item can be used as the criteria to recommend the top aspects of that item to the user u.
2.2.1. Popularity of Aspects and Items
The popularity
of an aspect a of an item i is defined as the proportion of users reviewing aspect a of item i positively. It can be interpreted as the probability of assigning a positive sentiment to that aspect by a randomly selected user, given that the selected user reviewed the aspect.
denotes this probability and can be estimated as follows:
= set of all users that have reviewed item i.
represents the number of users who have rated aspect a of item positively.
represents the number of users who have rated aspect a of item i negatively.
represents the number of users who have not rated aspect a of item i.
Then, the popularity
and its compliment
for an aspect a of an item i can be computed as:
(1)
(2)
Note that
. However,
does not represent the probability of assigning a positive sentiment to that aspect by a randomly selected user, because such a user may not rate the aspect of the item. To estimate the probability, we can correct with the probability that a random user rates the aspect. This probability is:
(3)
For simplicity, we will use
computed on
as an estimate of
and
as estimate of its complement.
2.2.2. Controversy of Aspects and Items
The Controversy measure of an aspect a of an item i,
is a measure of disagreement in sentiment between the users regarding an aspect a of an item i. Moreover,
lies in [0, 1]. Mathematically,
(4)
Notice that if
, meaning that an equal number of users liked and disliked that aspect, then
will be one, indicating maximum controversy. On the other hand, a complete agreement among users regarding the aspect a of item i gives
or
, then
will be zero, indicating no controversy, and hence consensus.
The most controversial aspect of an item i noted as ConAsp(i) is the aspect a of item i having the highest controversy
among
. The most controversial item based on an aspect a noted as ConItem(a) is the item i having highest controversy score
among all
.
2.3. Relationship of Users Based on Aspects
The relationship between two or more users can be assessed based on how they rate the aspects of items. For instance, users can be related based on sentiments towards all aspects of an item or based on their sentiments towards one aspect but considering all the items.
In general, positively biased users are the users tending to review every item or aspect under consideration positively. The Most Positive Users (MPU) about all aspects of an item i, noted as MPU(i) are users who review most or all the aspects of item i positively.
Two users are said to be the most disagreeing users if they tend to review every item or aspect under consideration with extreme opposite values in the reviewing scale. The Most Disagreeing Users on a specific aspect a considering all the items, noted as MDU(a) are the users who have reviewed aspect a with opposite sentiment to each other for most, if not all of the items.
The Nearest Neighbors to a user u based on an aspect a considering all the items is a set denoted as NN(a) are the users who think most like the user u toward aspect a considering all the items.
In general, clustering of objects is the process of grouping objects in a way that objects belonging to the same group are more similar to each other based on certain criteria than to the objects in other groups [39] . In our study, a cluster of users expressing similar sentiments to all aspects of item i, denoted as ClSent(Ui) is a group of users who have most similar sentiments to all aspects of item i. Similarly, a cluster of users emphasizing similar aspects of item i, denoted as ClEmp(Ui) consists of users who emphasize similar aspects of item i.
2.4. Emphasis Score
The preference level of users toward different aspects is an important part of aspect-based CF approaches [40] [41] [42] . Usually, the importance of an aspect to a user is inferred from reviews and are involved in aspect-based CF as one of the latent factors [18] [43] [44] [45] . Here, we present a simple approach to compute emphasis of a user towards an aspect based on the information stored in Sentiment tensor S.
The emphasis score of a user u towards an aspect a, eua can be defined as the ratio of times the aspect a is reviewed by user u over the total number of items reviewed by u. The value of eua lies in [0,1]. Mathematically, the emphasis score of a user u toward an aspect a can be computed as:
(5)
2.5. Similarity and Dissimilarity between Users
Item-Based User Disagreement,
between two users u and v is the dissimilarity score between them based on their aspect-sentiments towards all aspects of item i. Similarly, Item-Based User Agreement,
between two users u and v is the similarity score between them based on their aspect-sentiments towards all aspects of item i.
Aspect-Based User Disagreement,
between two users u and v is the dissimilarity score between them based on their sentiments toward aspect a considering all the items. Similarly, Aspect-Based User Agreement,
between two users u and v is the similarity score between them based on their sentiments toward an aspect a considering all the items.
In general, similarity between two data objects is a numerical measure to determine how alike they are [46] . And the value of similarity in general is in [0, 1]. And dissimilarity is a numerical measure to find different two data objects are. Dissimilarity or distance between two objects, not necessarily lie between [0, 1] until normalized.
The distance or similarity measure between objects is a key step in data mining tasks like classification and clustering [47] . The distances may be computed in different ways based on the type of data we are dealing with. Distance between two numerical or ordinal vectors x and y can generally be defined by any mathematical norm for the difference vector
[48] . The Minkowski distance of different orders can be used to compute the distance between vectors formed from the numerical and ordinal data [49] . Minkowski distance between the ordinal vectors x and y of order p can be computed as:
(6)
Minkowski distance of order 1 (p = 1) is Manhattan distance or 1-norm. And Minkowski distance of order 2 (p = 2) is the euclidean distance or 2-norm. K-means clustering which is one of the widely used unsupervised machine learning algorithms also uses the Minkowski distance of different order during clustering.
In this study, the distance between two users u and v based on item i,
is termed as Item-Based User Disagreement (IBUD).
is computed based on Euclidean distance of aspect-sentiments of u and v based on all aspects in A of item i. Mathematically,
(7)
IBUD may be normalized as required by the problems. For a normalized IBUD denoted as IBUD* which lies in [0,1], we define Item-based User Agreement (IBUA) as IBUA = 1 − IBUD*. In this work, the similarity between two users u and v based on an item i,
is computed based on the aspect sentiments of users u and v towards all aspect in
of item i. And the weight between two users u and v is defined as
(8)
The distance between two users u and v based on aspect a,
is termed as Aspect-Based User Disagreement (ABUD).
can be computed considering all the items in I as:
(9)
ABUD may be normalized as required by the problems. For a normalized ABUD denoted as ABUD* which lies in [0, 1], we define Aspect-based User Agreement (ABUA) as ABUA = 1 − ABUD*.
3. Methodology
In this Section, Aspect-Based Tools, their tasks, and process to solve the tasks are discussed. The algorithm of each tool is discussed in Section 4. Each tool presented here is a tool in the proposed Aspect-based Collaborative Filtering Toolbox (ABCFT). ABCFT can be used to build a complete aspect-based explanatory recommender system.
The list of eight Aspect-Based CF Tools is as below:
1) Determine the most controversial aspect a of an item i denoted as ConAsp(i)
The tool ConAsp(i) determines the most controversial aspect a of an item i. This can be achieved by finding the aspect a of the item i with the highest controversial measure
or the lowest uncontroversial measure
. The proposed algorithm is presented in Section 4.1.
2) Find the most controversial item i based on aspect a denoted as ConItem(a)
The tool ConItem(a) finds the most controversial item i based on an aspect a. This is achieved by finding the item i with the highest controversial measure
or the lowest uncontroversial measure
for specific aspect a. The proposed algorithm is presented in 4.2.
3) Determine users who are most positive about all aspects of an item i denoted as MPU(A|i)
The tool MPU(A|i) finds the users who are most positive about all aspects in A of an item i. This is achieved by computing the dissimilarity of every user u with an assumed user u’ who has got positive sentiments for all the aspects in A of the item i. Here, u belongs to the set of users reviewing item i i.e.
. The users in Ui with least value of the defined measure of proximity with u’ are most positive. The algorithm is as presented in Section 4.3.
4) Determine users who feel most like (agree with) specified user u’ based on an aspect a denoted as NN(u’|a)
The tool NN(u’|a) determines the users who feel most like (agree with) specified user u’ based on an aspect a. This is achieved by computing the dissimilarity between user u and every other user based on their aspect-sentiments toward aspect a of all the items. The users with least dissimilarity with u' mostly agree with user u’ based on aspect a. The algorithm is as presented in Section 4.4.
5) Determine pairs of users disagreeing most on a specific aspect a considering all the items denoted as MDU(a|I)
The tool MDU(a|I) determine pairs of users disagreeing most on a specific aspect a considering all the items in I. This is achieved by computing dissimilarity between every unique pair of users u and u', meaning
. The dissimilarity is based on sentiments of u and u' towards aspect a considering all the items in I. The pairs of users with the highest value of dissimilarity are the pairs of users disagreeing most on a specific aspect a considering all the items in I. The algorithm is as presented in Section 4.5.
6) Find groups of users mostly agreeing on all aspects of an item i or find Aspect-Sentiment based User Clusters of a given item i, ASBUC(Ui)
The tool, ASBUC(Ui) finds the groups of users mostly agreeing on all aspects of an item i. This is achieved by clustering the users reviewing item i based on the sentiment values users provide to all aspects of the item i. The algorithm is as presented in Section 4.6.
7) Find groups of users who emphasize the same aspects of an item i or Aspect-Emphasis based User Clusters of a given item i, AEBUC(Ui)
The tool AEBUC(Ui) finds groups of users who emphasize the same aspects of an item i. This is achieved by clustering the users reviewing item i based on the sentiment values users provide to all aspect of the item i, but by treating the positive and negative sentiment as same. The algorithm is as presented in Section 4.7.
8) Rank the aspects based on the emphasis given by a user u to them or Emphasis based Ranking of Aspects in A for a given user u, EBRA(A|u)
The tool EBRA(A|u) ranks all aspects in A based on the emphasis given by a user u to them. This is achieved by computing the emphasis score of a user u towards every aspect in A. Then, aspects are sorted descending based on their emphasis scores. The one with the highest value of emphasis score gets the rank one and so on. The algorithm is as presented in Section 4.8.
The tools in ABCFT, their tasks and the concepts used in each tool are summarized in Table 1.
4. Algorithms and Illustrations
In this section, the algorithms for the aspect-based CF tools proposed in Section 3 are presented. And example solutions of the implementation of the tools to a Hotel dataset are provided. The Hotel dataset [44] [45] involves around 6000 users and 400 hotels from Tripadvisor. Hotel dataset was reformatted to an aspect-sentiment tensor made up of six aspect-sentiment matrices. The sentiment values in hotel data sentiment tensor are +1 for positive sentiment, −1 for negative sentiment and 0 for no sentiment. In the hotel dataset downloaded from [50] , aspects were rated in the discrete values from 1 to 5. Aspect-ratings were converted to aspect-sentiments based on the condition, if aspect-rating > 3.0 then aspect-sentiment = positive (1.0) and if aspect rating ≤ 3.0 then aspect-sentiment = negative (−1.0). The aspects involved are Location, Service,
Table 1. Summary of tools in aspect-based CF toolbox.
Cleanliness, Value, Sleep Quality, and Rooms.
Proposed Algorithms for the tools in ABCFT are to follow. All the tools assume the availability of the sentiment tensor S where
represents the sentiment of user u about an aspect a of an item i.
4.1. Determine the Most Controversial Aspect a of an Item i
The algorithm for finding the most controversial aspect a of an item i noted as ConAsp(i) is as below:
1) For each aspect a of item i,
a) Compute popularity
using Equation (1) and its compliment
using Equation (2).
b) Compute controversy
of the aspect a of the item i using Equation (4).
2) The most controversial aspect a of item: ConAsp(i) = aspect with the maximum
.
The idea of finding the most controversial aspect may look like a simpler problem compared to the big machine learning problems in the Recommendation Systems. But the solution of this problem can play a vital role in making meaningful recommendations to the users when combined with other solutions.
The example in Table 2 gives the controversy
of aspects of the item 0. This example is based on the Hotel dataset used in this study. The aspect Value is the most controversial aspect of hotel 0, because it has the highest controversial measure
among the six aspects of the hotels under consideration.
4.2. Find the Most Controversial Item i Based on Aspect a
The algorithm for finding the most controversial item i based on an aspect a noted as ConItem(a) is as below:
1) For each item i,
a) Compute popularity
using Equation (1) and its compliment
using Equation (2) for specific aspect a.
b) Compute controversy
of the specific aspect a of item i using Equation (4) and store it.
2) The most controversial item i based on specific aspect a: ConItem(a) = item with the maximum
for specific aspect a.
The solution to the problem of finding the most controversial item based on an aspect a can also be a solution to the challenge of recommending items to new users. The most controversial items can be avoided while recommending to new users with insufficient rating or sentiment data.
For the hotel data used in this study, item 300 is the most controversial item based on aspect Location. The result is as disclosed in Table 3.
4.3. Determine the Top N Users Who Are Most Positive about All Aspects of an Item i
Let Ui be the set of all users that have reviewed the item i. The proposed algorithm
Table 2. Example solution of the approach in Section 4.1 for determining the most controversial aspect a of an item 0 using Hotel Dataset.
Table 3. Example solution of the approach in Section 4.2 for finding the most controversial item i based on aspect a using Hotel Dataset.
to solve the problem of finding the top N users who are most positive about all aspects of an item i as below:
1) Assume a reference user u’ as a user who has got positive sentiments for all the aspects of the item i.
2) Compute the dissimilarity, Item-Based User Disagreement
of each user in Ui with the assumed user u’ using Equation (7).
3) Sort the users in Ui based on the dissimilarity values
ascendingly.
4)
= top N sorted users of Ui. Here,
are the N users that are the N most positive about all aspects of the item i.
Here, in Table 4, we give an example solution of top 5 most positive users about all aspects of item 7. This example is based on Hotel Dataset.
Based on the proposed solution for finding the top N users who are most positive about all aspect of item i, a hypothesis can be proposed as:
Hypothesis1: The top N users who are most positive about k − 1 aspects of an item are likely to positive about the kth aspect, which has not been used for finding the top N most positive users.
We evaluated Hypothesis1 by introducing an approach called Leave One Aspect Out. The steps involved during this evaluation are as follows:
I = set of all items,
A = set of all aspects.
1) For an item i in I,
a) Ui = set of all users that have reviewed item i
b) For an aspect a in A,
i) Split the data into training and test data,
Let,
,
Trn = sub-tensor
for all
is training data.
Tst = sub-tensor
for all
is testing data.
ii) Let MPU = set of N most positive users about all aspects in A’ found using approach in Section 4.3.
iii) Find
= number of users with positive sentiment towards aspect a of item i for all
.
iv) Find
= number of users with negative sentiment towards
Table 4. Example solution of the approach in Section 4.3 for determining the top N users who are most positive about all aspects of an item i based on Hotel Dataset.
aspect a of item i for all
.
v) Accuracy [51]
, accuracy obtained is store in a list accL.
c) Step 1b is performed for each aspect in A.
2) Step 1 is performed for each item in I
3) Overall accuracy measure = arithmetic means of accuracy values in accL of all aspects and items
A sentiment
of a user
for aspect a in the test set is not considered during evaluation if sentiment
.
The results of Evaluation of Hypotheis 1 using the Hotel Dataset are tabulated in Table 5.
4.4. Determine the Top N Users Who Feel Most Like a Specified User u’ Based on an Aspect a
The algorithm to find top N users who feel most like a specified user u’ based on an aspect a is as follows:
1) Find
= the set of all items reviewed by user u'
2) Find U' = the set all users reviewing at least one item in
3) For all
, compute Aspect-Based User Disagreement
between users u’ and u based on sentiment toward aspect a considering items rated both and normalize by common number of items reviewed by users u’ and u. Normalized
can be computed as:
(10)
where
is the set of items reviewed by u,
is the set of items reviewed by u’ and,
is the cardinal number of set of items reviewed by both
and
.
4) Sort
ascendingly and the user u associated with
.
5) N nearest neighbors to u' based on aspect a,
= top N ascendingly sorted users based on
. Hence the top N sorted users
Table 5. Evaluation of Hypothesis1 using Hotel Dataset.
based on the values of
are the users who think most like u’ based on an aspect a.
Table 6 gives an example solution for finding the top 5 users who feel most like the user 10 based on the aspect Rooms. This example is based on Hotel Dataset.
4.5. Determine the Top N Pair of Users Disagreeing Most on a Specific Aspect a Considering All the Items
Let U = set of all the users and I = set of all the items. Then, the top N pair of users disagreeing most on a specific aspect a considering all the items noted as
can be found using the following steps:
1) Compute Aspect-Based User Disagreement
between users u’ and u for
considering the sentiments of a specific aspect a of all the items in I using Equation (9). In other words, compute distance between each pair of users in U considering sentiments of the specific aspect a of all items in I.
2) Sort pair of users
descendingly based on distances
.
= top N descendingly sorted
based on
. Hence,
are the pair of users who disagree most on the considered specific aspect a considering all the items in I.
Table 7 gives an example solution for finding the top 5 pairs of users who
Table 6. Example solution of the approach in Section 4.4 for finding the top N users who feel most like a specified user u' based on an aspect a based on Hotel Dataset.
Table 7. Example solution of the approach in Section 4.5 for determining the top N pair of users disagreeing most on a specific aspect a considering all the items based on Hotel Dataset.
disagree most on aspect Location considering all the items based on the Hotel dataset.
4.6. Find the Groups of Users Who Are Most Similar in All Aspects of an Item i
To find the groups of users who are most similar in all aspects of an item i, we can cluster the users based on the sentiment values users provided to all aspects of item i. The K-means clustering algorithm for finding the groups of users who are most similar in all aspects of an item i is used as follows.
1) Find Ui = the set of users that have reviewed item i.
2) Cluster the users in Ui based on their sentiments toward all aspects of item i using K-means clustering.
This algorithm uses Item-Based User Disagreement
computed using Equation (7) during K-means clustering.
Clusters obtained are the groups of users who are most similar in all the aspects of an item i.
Figure 1 provides an example solution of finding the groups of users who are most similar in all aspects of an item i. This example is based on Hotel Dataset. One can see 5 clusters or groups of users who are most similar in all aspects of item 99.
4.7. Find the Groups of Users Who Emphasize the Same Aspects of Item i
The group of users who emphasize the same aspects of an item i can be found by clustering the users reviewing the same aspects of item i. In our approach, we cluster the users by treating positive and negative sentiment as same. We use K-means clustering algorithm to find users who emphasize same aspect of item i as follows:
1) Find Ui = the set of the users that have reviewed the item i. Then,
2) Cluster the users in Ui based on their sentiments toward all aspects of the item i, but by treating positive and negative sentiment as same. Here, the K-means clustering is performed using distance
based on the absolute value of the aspect sentiments. Equation (7) is modified as below to compute modified
.
(11)
Figure 1. Example solution of finding the groups of users who are most similar in all aspects of an item i based on Hotel Dataset.
Figure 2 provides an example solution of finding the groups of users who emphasize the same aspects of item 99 in the Hotel Dataset.
4.8. Rank the Aspects Based on the Emphasis Given by User u to Them
The aspects of an item can be ranked based on the emphasis given by a user u to them using following steps:
1) Find Iu = set of items reviewed by user u and A = set of all aspects.
2) Compute the emphasis score of the user u toward each aspect a using Equation (5).
3) Sort
by aspects descendingly and rank. The aspect with the highest value of
will be the most emphasized aspect.
Table 8 presents an example solution for ranking the aspects based on emphasis given by user 10 to them. This example is based on the Hotel dataset. Table 7 shows the emphasis score of user 10 towards each aspect in A. and indicates that user 10 gives strong emphasis to the aspects Service and Cleanliness whereas aspect Sleep Quality is of least emphasis to user 10.
Here, we presented eight aspect-based CF tools in ABCFT as a start of compiling the tools that can be extracted from Aspect-Sentiment Tensor. And, we would like to encourage exploring and adding the new tools to ABCFT, so the area of recommendation techniques using aspect-based information can grow rapidly.
5. Conclusions and Future Work
In this work, a general framework applicable to the future studies of aspect-based
Figure 2. Example solution of finding the groups of users who emphasize the same aspects of an item i based on Hotel Dataset.
Table 8. Example solution of the approach in Section 4.8 for ranking the aspects based on emphasis given by a user u to them for Hotel Dataset.
Collaborative Filtering (CF) approaches is presented. We present an Aspect-Based Collaborative Filtering Toolbox (ABCFT) consisting of eight tools which can be developed based on Aspect-Sentiment Tensor (AST) only. Eight tools in ABCFT are the partial aspect-based CF problems that can be utilized to develop sophisticated aspect-based recommendation approaches. One goal of developing ABCFT is to ease the process of involving aspect-based information into the recommendation approaches, which can enhance the possibility of making rational recommendations to the users. ABCFT promotes the extensive use of aspect-sentiments extracted from well-advanced Aspect Sentiments Based Analysis (ABSA) techniques, which in general are just used surfacely and left after the extraction.
The use of ABCFT to develop new simple to complex aspect-based recommender systems is encouraged. And the use of ABCFT to improve the performance of current recommender systems can be explored. We initiated the work with 8 simple tools in ABCFT and the work of extension of ABCFT with additional tools can be persuaded to expedite the development of aspect-based recommender approaches.