Uncovering and Displaying the Coherent Groups of Rank Data by Exploratory Riffle Shuffling

Abstract

Let n respondents rank order d items, and suppose that . Our main task is to uncover and display the structure of the observed rank data by an exploratory riffle shuffling procedure which sequentially decomposes the n voters into a finite number of coherent groups plus a noisy group: where the noisy group represents the outlier voters and each coherent group is composed of a finite number of coherent clusters. We consider exploratory riffle shuffling of a set of items to be equivalent to optimal two blocks seriation of the items with crossing of some scores between the two blocks. A riffle shuffled coherent cluster of voters within its coherent group is essentially characterized by the following facts: 1) Voters have identical first TCA factor score, where TCA designates taxicab correspondence analysis, an L1 variant of correspondence analysis; 2) Any preference is easily interpreted as riffle shuffling of its items; 3) The nature of different riffle shuffling of items can be seen in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters; 4) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items. We also introduce a crossing index, which measures the extent of crossing of scores of voters between the two blocks seriation of the items. The novel approach is explained on the benchmarking SUSHI data set, where we show that this data set has a very simple structure, which can also be communicated in a tabular form.

Share and Cite:

Choulakian, V. and Allard, J. (2021) Uncovering and Displaying the Coherent Groups of Rank Data by Exploratory Riffle Shuffling. Open Journal of Statistics, 11, 178-212. doi: 10.4236/ojs.2021.111010.

1. Introduction

Ordering the elements of a set is a common decision making activity, such as, voting for a political candidate, choosing a consumer product, etc. So there is a huge literature concerning the analysis and interpretation of preference data scattered in different disciplines. Often rank data is heterogenous: it is composed of a finite mixture of components. The traditional methods of finding mixture components of rank data are mostly based on parametric probability models, distance or latent class models, and are useful for sparse data and not for diffuse data.

Rank data are sparse if there are at most a small finite number of permutations that capture the majority of the preferences; otherwise they are diffuse. As a running example in this paper, we will consider the famous benchmarking SUSHI data set enumerating n = 5000 preferences of d = 10 sushis, see [1]. The SUSHI data set is diffuse, because there are at most three counts for one observed permutation. It has been analyzed, among others by [2] [3] [4].

A second data set that we shall also analyze is the APA dataset of size n = 5738 by d = 5 , see [5]. APA data set is also considered as non-sparse by [2], because all the 120 permutations exist with positive probability.

For a general background on statistical methods for rank data, see the excellent monograph by [6] and the book [7].

1.1. Riffle Shuffle

The riffle shuffle, see [8], is considered the most popular method of card shuffling, in which one cuts a deck of d cards (aka items) into two piles of sizes d1 and d2, respectively, and successively drops the cards, one by one, so that the piles are interleaved into one deck again.

Let V, named a voting profile, be a set of n preferences on d items. Based on riffle shuffling ideas, [2] proposed the notion of riffled independence to model the joint probability distribution of observed preferences of V. Independently, [9] used it for visual exploration of V, naming it two blocks partition of the Borda scored items with crossing of some scores; this will be further developed here under the following important.

Assumption: d n . This means that the sample size n is quite large compared to the number of items d.

SUSHI and APA data sets satisfy this assumption.

The most important first step in the application of a riffle shuffling procedure is how to partition the d items into two disjoint subsets. In the probabilistic riffle shuffling approach of [2], the partitioning step is essentially done using some adhoc approach in the case of the SUSHI data set or based on background second order information of the items in the case of the APA data set. While in the exploratory riffle shuffling approach of this paper an optimal partition is obtained by maximizing the cut norm of row centered data, or equivalently by taxicab correspondence analysis of nega coded data.

We compare the two formulations of riffle shuffle, probabilistic and exploratory, in section 10.

1.2. Aim

Our aim is to explore and display a given voting profile V by sequentially partitioning it into G coherent groups plus a noisy group; that is,

V = g = 1 G c o h G ( g ) n o i s y G , (1)

where G represents the number of coherent groups and c o h G ( g ) is the gth coherent group. Furthermore, each coherent group is partitioned into a finite number of coherent clusters; that is,

c o h G ( g ) = α = 1 c g c o h C g ( α ) for g = 1, , G , (2)

where c g represents the number of coherent clusters in the gth coherent group. So the coherent clusters are the building blocks for the coherent groups. We note the following facts:

Fact 1: The assumption d n induces the new notion of coherency for the clusters and consequently for the groups; it is a stronger characterization than the notion of interpretability for groups as discussed in [9].

Fact 2: Each coherent group and its clusters have the same latent variable summarized by the Borda scale.

Fact 3: Given that the proposed method sequentially peels the data like Occam’s razor, the number of groups G is calculated automatically. Furthermore, outliers or uninformative voters belonging to the n o i s y G are easily tagged.

Fact 4: The approach is exploratory, visual, data analytic and is developed within the framework of taxicab correspondence analysis (TCA). TCA is an L1 variant of correspondence analysis developed by [10]. TCA is a dimension reduction technique similar to principal component analysis. In this paper, we shall use only the first TCA factor scores of the items and of the voters.

Two major advantages of our method are: First, we can easily identify outliers. For the SUSHI data, our method tags 12.36% of the voters as outliers, which form the noisy group. While no outliers in the SUSHI data have been identified in [3] [4]. Second, it provides a tabular summary of the preferences which compose a coherent group. For instance, consider the first mixture component of the SUSHI data given in [4], where the modal ordering is almost the same as the Borda scale ordering of the ten sushis in cohG(1) obtained by our method, see Table 14 in this paper. The sample size of their first mixture component is 27.56%, which is much smaller than 48.36%, the sample size of our cohG(1), see Table 14. However, Table 13 of this paper provides a tabular-visual summary of the 2418 preferences which form cohG(1). The visual summary describes different kinds of equivalent similar riffle shufflings of the 2418 preferences, and it provides further insight into the structure of the data. Such interesting visual summaries are missing in [3] [4].

1.3. Highlights of a Coherent Cluster

A coherent cluster of voters has interesting mathematical properties and is essentially characterized by the following facts:

1) Voters have identical unique first TCA factor score.

2) Any voter preference is easily interpreted as a particular riffle shuffling of its items.

3) The nature of riffle shuffling of the items can be observed in the structure of the contingency table of the first-order marginals constructed from the Borda scorings of the voters belonging to the coherent cluster.

4) The first TCA factor scores of the items of a coherent cluster are interpreted as Borda scale of the items.

5) We also introduce the crossing index, which measures the extent of interleaving or the crossing of scores of voters between two blocks seriation of the items in a coherent cluster.

1.4. Organization

This paper has eleven sections and its contents are organized as follows: Section 2 presents an overview of TCA; section 3 presents some preliminaries on the Borda coding of the data and related tables and concepts; section 4 presents Theorem 1, which shows that the first principal dimension of TCA clusters the voters into a finite number of clusters; section 5 discusses coherent clusters and their mathematical properties; section 6 discusses riffle shuffling in a coherent cluster; section 7 introduces the crossing index; section 8 introduces the coherent groups; section 9 presents the analysis of APA data set; section 10 presents a comparison of the two formulations of riffle shuffle probabilistic and exploratory; and finally we conclude in section 11.

All mathematical proofs are relegated to the Appendix. Details of the computation are shown only for the first coherent group of SUSHI data set.

2. An Overview of Taxicab Correspondence Analysis

Consider a n × p matrix X where X i j 0 . We have j = 1 p i = 1 n X i j = X . Let P = X / X be the correspondence matrix associated to X ; and as usual, we define p i = j = 1 p p i j , p j = i = 1 n p i j . Let D n = D i a g ( p i ) a diagonal matrix with diagonal elements p i . Similarly D p = D i a g ( p j ) . Let k = r a n k ( P ) 1 .

In TCA the calculation of the dispersion measures ( δ α ) , principal axes ( u α , v α ) , principal basic vectors ( a α , b α ) , and principal factor scores ( f α , g α ) for α = 1, , k is done in a stepwise manner. We put P 1 = ( p i j ( 1 ) = p i j p i p j ) . Let P α be the residual correspondence matrix at the α-th iteration.

The variational definitions of the TCA at the α-th iteration are

δ α = max u p P α u 1 u = max v n P α v 1 v = max u p , v n v P α u u v = max P α u 1 subject to u { 1, + 1 } p = max P α v 1 subject to v { 1, + 1 } n = max v P a u subject to u { 1, + 1 } p , v { 1, + 1 } n .

The α-th principal axes are

u α = arg max u { 1, + 1 } p P α u 1 and v α = arg max v { 1, + 1 } n P α v 1 , (3)

and the α-th basic principal vectors are

a α = P α u α and b α = P α v α , (4)

and the α-th principal factor scores are

f α = D n 1 a α and g α = D p 1 b α ; (5)

furthermore the following relations are also useful

u α = s g n ( b α ) = s g n ( g α ) and v α = s g n ( a α ) = s g n ( f α ) , (6)

where s g n ( . ) is the coordinatewise sign function, s g n ( x ) = 1 if x > 0 , and s g n ( x ) = 1 if x 0 . The α-th taxicab dispersion measure δ α can be represented in many different ways

δ α = P α u α 1 = a α 1 = a α v α = D n f α 1 = u α D n f α , = P α v α 1 = b α 1 = b α u α = D p g α 1 = v α D p g α . (7)

The ( α + 1 ) -th residual correspondence matrix is

P α + 1 = P α D n f α g α D p / δ α . (8)

An interpretation of the term D n g α f α D p / δ α in (8) is that, it represents the best rank-1 approximation of the residual correspondence matrix P α , in the sense of taxicab norm.

In CA and TCA, the principal factor scores are centered; that is,

i = 1 n f α ( i ) p i = 0 = j = 1 p g α ( j ) p j for α = 1 , , k . (9)

The reconstitution formula in TCA and CA is

p i j = p i . p . j [ 1 + α = 1 k f α ( i ) g α ( j ) / δ α ] . (10)

In TCA, the calculation of the principal component weights, u α and v α , and the principal factor scores, g α and f α , can be accomplished by two algorithms. The first one is based on complete enumeration based on equation (3). The second one is based on iterating the transition formulae (4, 5, 6). This is an ascent algorithm; that is, it increases the value of the objective function at each iteration, see [11]. The iterative algorithm could converge to a local maximum; so it should be restarted from several initial configurations. The rows or the columns of the data can be used as starting values.

The TCA map is obtained by plotting ( g 1 , g 2 ) or ( f 1 , f 2 ) .

3. Preliminaries

In this section we review 1) The Borda scoring of a voting profile V into R and the Borda scale; 2) Contingency table of the first order marginals of R; 3) The coded tables R d o u b l e and R n e g a .

3.1. Borda Scorings and Borda Scale

Let A = { a 1 , a 2 , , a d } denote a set of d alternatives/candidates/items, and V a set of n voters/individuals/judges. In this paper, we consider the linear orderings/rankings/preferences, in which all d objects are rank-ordered according to their levels of desirability by the n voters. We denote a linear order by a sequence s = ( a k 1 a k 2 a k d ) , where a k 1 a k 2 means that the alternative a k 1 is preferred to the alternative a k 2 . The Borda scoring of s , see [12], is the vector b ( s ) where to the element a k j the score of ( d j ) is assigned, because a k j is preferred to ( d j ) other alternatives; or equivalently it is the jth most preferred alternative. Let R = ( r i j ) be the matrix having n rows and d columns, where r i j designates the Borda score of the ith voter’s preference of the jth alternative. We note that theith row of R will be an element of S d the set of permutations of the elements of the set { 0,1,2, , d 1 } . A toy example of R is presented in Table 1 for n = 4 and d = 3 .

The Borda scale of the elements of A is β = 1 n R / n , where 1 n is a column vector of 1’s having n coordinates. The Borda scale seriates/orders the d items of the set A according to their average scores: β ( j ) > β ( i ) means item j is preferred to item i, and β ( j ) = β ( i ) means both items ( a i , a j ) are equally preferred. In the toy example of Table 1, the Borda scale seriates { A , B } C .

Similarly, we define the reverse Borda score of s to be the vector b ¯ ( s ) , which assigns to the element a k j the score of ( j 1 ) . We denote R ¯ = ( r ¯ i j ) to be the matrix having n rows and d columns, where r ¯ i j designates the reverse Borda score of the ith judge’s nonpreference of the jth alternative. The reverse Borda scale of the d items is β ¯ = 1 n R ¯ / n .

We note that

R + R ¯ = ( d 1 ) 1 n 1 n

and

β + β ¯ = ( d 1 ) 1 d .

3.2. Contingency Tableof First-Order Marginals

The contingency table of first-order marginals of an observed voting profile V on

Table 1. Toy example with n = 4 preferences of d = 3 items.

d items is a square d × d matrix M, where M ( i , j ) stores the number of times that itemj has Borda score i for i = 0 , , d 1 , see [ [6], p. 17]. Table 2 displays the matrix M for the toy example R displayed in Table 1. We note the following facts:

1) It has uniform row and column marginals equal to the sample size.

2) We can compute the Borda scale β from it.

3) It reveals the nature of crossing of scores attributed to the items for a given binary partition of the items. For the toy example, consider the partition { C } and { B , A } with attributed scores of { 0 } and { 1,2 } respectively (this is the first step in a riffle shuffle). Then the highlighted cells (marked in bold) in Table 2 show that there are two crossing of scores, permutation (transposition) of the scores 0 and 1, between the sets { C } and { B , A } , (this is the second step in a random shuffle). Furthermore, the third row of Table 2 shows that the score 2 is equally attributed to both items of the set { B , A } and it never crossed to { C } .

3.3. Coded Tables Rdouble and Rnega

Our methodological approach is based on Benzécri’s platform, see [ [13], p.1113], that we quote: “the main problem inductive statistics has to face is to build tables that, through appropriate coding and eventual supplementation, give to the available data such a shape that the analysis is able to extract from it the answer to any question that we are allowed to ask”. Italics are ours.

There are three elements in Benzécri’s platform: 1) coding, a kind of pre-processing of data, will be discussed in the following paragraph; 2) eventual supplementation consists in applying TCA and not correspondence analysis (CA), because in the CA case we do not have a result similar to Theorem 1; 3) question that we are allowed to ask is to explore and visualize rank data.

Within the CA framework, there are two codings of rank data R d o u b l e and R n e g a .

3.3.1. Rdouble

The first one is the doubled table of size ( 2 n ) × d

R d o u b l e = ( R R ¯ )

proposed independently by [14] [15], where they showed that CA of R d o u b l e is

Table 2. The matrix of first-order marginals of R.

equivalent to the dual scaling of Nishisato coding of rank data, see [16]. CA of R d o u b l e is equivalent to CA of its first residual correspondence matrix

P d o u b l e 1 = 1 t ( r i j d 1 2 ( r i j d 1 2 ) ) ,

where t = n d ( d 1 ) . The structure of P d o u b l e 1 shows that each row is centered as in Carroll’s multidimensional preference analysis procedure, MDPREF, exposed in Alvo and Yu (2014, p. 15). In TCA the objective function to maximize is a combinatorial problem, see Equation (3); and the first iteration in TCA of R d o u b l e corresponds to computing

δ 1 d o u b l e = max v { 1,1 } n ( v t | v t ) P d o u b l e 1 1 = max v { 1,1 } n 2 t j = 1 d | i = 1 n ( r i j d 1 2 ) v i | (11)

3.3.2. Rnega

In the second approach, we summarize R ¯ by its column total; that is, we create a row named n e g a = n β ¯ = 1 n R ¯ , then we vertically concatenate nega to R, thus obtaining

R n e g a = ( R n e g a )

of size ( n + 1 ) × d .

[17] discussed the relationship between TCA of R d o u b l e and TCA of R n e g a : TCA of R n e g a can be considered as constrained TCA of R d o u b l e , because we are constraining the vector v t = 1 n t in (11); that is, the objective function to maximize corresponds to computing

δ 1 = max v { 1,1 } n ( v t | 1 n t ) P d o u b l e 1 1 = max v { 1,1 } n ( v t 1 ) P n e g a 1 1 = max v { 1,1 } n 1 t j = 1 d | i = 1 n ( r i j d 1 2 ) ( v i + 1 ) | (12)

So, we see that if in (11) the optimal value of v = 1 n , then δ 1 d o u b l e = δ 1 , otherwise δ 1 d o u b l e > δ 1 .

Let

v 1 = arg max v { 1,1 } n 1 t j = 1 d | i = 1 n ( r i j d 1 2 ) ( v i + 1 ) | .

Define the set of indices I + = { i | v 1 i = 1 } and I = { i | v 1 i = 1 } , where v 1 = ( v 1 i ) . Then

δ 1 = 2 t j = 1 d | i I + ( r i j d 1 2 ) | (13)

shows that the summation in (13) is restricted to the subset of assessors that belong to I + . The subset I + indexes the voters having the same direction in their votes. Given that we are uniquely interested in the first TCA dimension, all the necessary information is encapsulated in I + , as discussed in [9] [17] using other arguments.

Furthermore, δ 1 in (13) equals four times the cut norm of R c e n t e r e d ( i , j ) = 1 t ( r i j d 1 2 ) , where the cut norm is defined to be

R c e n t e r e d c u t = max S , T 1 t j S i T ( r i j d 1 2 ) = 1 t j S + i I + ( r i j d 1 2 ) = δ 1 / 4 ,

where S { 1, , d } and T I ; it shows that the subsets I + and S + are positively associated, for further details see for instance, [18] [19].

In the sequel, we will consider only the application of TCA to R n e g a .

4. First TCA Voter Factor Scores of Rnega

We show the results on the SUSHI data set enumerating n = 5000 preferences of d = 10 sushis, see [1]. Even though, our interest concerns only the first TCA voter factor scores of a voting profile V 1 , it is a common practice in CA circles to present the principal map of the row and column projections.

Figure 1 and Figure 2 display the principal maps obtained from CA and TCA of R n e g a of the SUSHI data denoted by V 1 . We observe that, TCA clusters the voters into a finite number of discrete patterns, while CA does not: This is the main reason that we prefer the use of TCA to the use of the classical well-known dimension reduction technique CA.

We have the following theorem concerning the first TCA principal factor

Figure 1. CA map of SUSHI rank data.

Figure 2. TCA map of SUSHI rank data.

scores of the voters belonging to a profile V 1 , f 1 ( i ) for i = 1 , , n , where the first principal axis partitions the d items into d 1 and d 2 parts such that d = d 1 + d 2 .

Theorem 1

a) The maximum number of distinct clusters of the n voters belonging to V 1 on the first TCA principal axis (distinct f 1 ( i ) values for i V 1 ) is d 1 d 2 + 1 .

b) The maximum value that f 1 ( i ) can attain is 2 d 1 d 2 d ( d 1 ) .

c) The minimum value that f 1 ( i ) can attain is 2 d 1 d 2 d ( d 1 ) .

d) If the number of distinct clusters is maximum, d 1 d 2 + 1 , then the gap between two contiguous f 1 ( i ) values is 4 d ( d 1 ) .

Remark 1

a) We fix f 1 ( n e g a ) < 0 to eliminate the sign indeterminacy of the first bilinear term in (10).

b) We partition V 1 into d 1 d 2 + 1 clusters, V 1 = α = 1 d 1 d 2 + 1 V 1, α , where the voters of the αth cluster are characterized by their first TCA factor score; that is, V 1 , α = { i V 1 : f 1 V 1 ( i ) = 2 d 1 d 2 d ( d 1 ) ( α 1 ) 4 d ( d 1 ) } for α = 1, , d 1 d 2 + 1 .

Example 1: In Figure 2, d 1 = 4 and d 2 = 6 , and we observe:

Fact 1: by Theorem 1a, 5000 preferences are clustered into d 1 d 2 + 1 = 25 clusters on the first TCA principal axis.

Fact 2: by Theorem 1b, the maximum value of f 1 ( i ) = 48 / 90 .

Fact 3: by Theorem 1c, the minimum value of f 1 ( i ) = 48 / 90 .

Fact 4: by Theorem 1d, the gap separating two contiguous clusters of voters on the first TCA principal axis is 4/90.

A cluster of voters defined in Remark 1b, V 1, α for α = 1 , , d 1 d 2 + 1 , can be classified as coherent or incoherent. And this will be discussed in the next section.

5. Coherent Cluster

The following definition characterizes a coherent cluster.

Definition 1 (Coherency of a cluster of voters V 1, α for α = 1 , , d 1 d 2 + 1 )

A cluster of voters V 1, α V 1 is coherent if f 1 V 1 , α ( v ) = 2 d 1 d 2 d ( d 1 ) ( α 1 ) 4 d ( d 1 ) for all v V 1, α , where f 1 V 1 , α ( i ) is the first TCA factor score of the voter i V 1, α obtained from TCA of subprofile V 1, α .

Remark 2:

a) It is important to distinguish between f 1 V 1 ( i ) for i = 1 , , | V 1 | where n = | V 1 | , and f 1 V 1 , α ( i ) for i = 1 , , | V 1 , α | , where | V 1 , α | represents the sample size of the cluster | V 1 , α | .

b) Definition 1 implies that a cluster V 1, α is coherent when for all voters i V 1, α the first TCA factor score f 1 V 1 , α ( i ) does not depend on the voter i, but it depends on ( α , d 1 , d 2 ) .

Corollary 1: It follows from Remark 1a and Equation (13) that, a necessary condition, but not sufficient, for a cluster V 1, α to be coherent is that its first TCA factor score obtained from TCA of V 1 is strictly positive; that is, 0 < f 1 V 1 ( i ) for i V 1, α .

Example 2: Figures 3-9 show the coherency of the clusters of voters V 1, α for α = 1 , , 7 , where dots represent clusters of voters; while Figure 10 shows the incoherence of the cluster V 1,8 . Further, the first three columns of Table 3 display the mathematical formulation of the 7 coherent clusters c o h C 1 ( α ) = V 1, α for α = 1 , , 7 as defined in Remark 1b and their sample sizes | V 1, α | .

Proposition 1: For a voting profile V, δ 1 ( V ) | f 1 ( n e g a ) | , where δ 1 ( V ) is the first TCA dispersion value obtained from TCA of V, and f 1 ( n e g a ) is the first TCA factor score of the row nega.

The equality in Proposition 1 is attained only for coherent clusters as shown in the following result.

Proposition 2: The first TCA dispersion value of a coherent cluster c o h C 1 ( α ) satisfies

δ 1 ( c o h C 1 ( α ) ) = | f 1 V 1 , α ( n e g a ) | = 2 d 1 d 2 d ( d 1 ) ( α 1 ) 4 d ( d 1 )

Example 3: propostion 2 can be observed by looking at the columns 3 and 4 of Table 3 which concern the 7 coherent clusters c o h C 1 ( α ) = V 1, α for α = 1 , , 7 .

Figure 3. TCA map of V 1,1 .

Figure 4. TCA map of V 1,2 .

Figure 5. TCA map of V 1,3 .

Figure 6. TCA map of V 1,4 .

Figure 7. TCA map of V 1,5 .

Figure 8. TCA map of V 1,6 .

Figure 9. TCA map of V 1,7 .

Figure 10. TCA map of V 1,8 .

Table 3. Characteristics of c o h C 1 ( α ) = V 1, α of SUSHI data.

While for the incoherent cluster V 1,8 with sample size of | V 1,8 | = 335 , we observe: V 1 , 8 = { i : f 1 V 1 ( i ) = 20 / 90 = 0.222 } , and by Proposition 1, δ 1 ( V 1 , 8 ) = 0.2354 > 2 / 9 . This means that the 335 voters belonging to V 1,8 form a cluster within the whole sample of 5000 voters, but separated as 335 voters they do not form a coherent cluster.

Interpretability of a Coherent Cluster

The following result shows that for coherent clusters, the first TCA dimension can be interpreted as Borda scaled factor.

Proposition 3: The first TCA column factor score of the item j, g 1 ( j ) , is an affine function of the Borda scale β ( j ) ; that is, g 1 ( j ) = 2 d 1 β ( j ) 1 for j = 1 , , d . Or c o r r ( g 1 , β ) = 1 .

Remark 3:

The first TCA principal factor score of item j for j = 1 , , d is bounded: 1 g 1 ( j ) 1 , because 0 β ( j ) d 1 .

Example 4: Table 4 displays the Borda scales of the items, sushis, in the seven coherent clusters c o h C 1 ( α ) = V 1, α for α = 1 , , 7 . To identify the sushi type, one has to refer to Figure 2; for instance, j10 corresponds to 10 cucumber roll in Figure 2. We observe the following main fact: For each of the seven coherent clusters, the first TCA principal axis produced the same binary partition of the items: J 1 = { j 10 , j 7 , j 4 , j 9 } characterized by 4.5 > β ( j 1 ) for j 1 J 1 , and J 2 = { j 3 , j 1 , j 2 , j 6 , j 5 , j 8 } characterized by β ( j 1 ) > 4.5 for j 2 J 2 . The six sushis in J 2 have Borda scales above average score of 4.5 = ( 0 + 9 ) / 2 ; while the four sushis in J 1 have Borda scales below average score of 4.5.

Now we ask the question what are the differences among the seven coherent clusters? The answer is riffle shuffling of the scores of the items, which we discuss next.

6. Exploratory Riffle Shuffling

[8] is the seminal reference on riffle shuffling of cards. [2] generalized the notion

Table 4. Borda scales of the 10 sushis in the seven coherent clusters.

of independence of two subsets of items to riffled independence to uncover the structure of rank data. Within the framework of data analysis of preferences, exploratory riffle shuffling can be described in the following way. We have two sets: J a set of d distinct items andS a set of d Borda scores. We partition both sets into two disjoint subsets of sizes d 1 and d 2 = d d 1 ; that is, J = J 1 J 2 with J 1 = { j 1 , j 2 , , j d 1 } and S = S 1 S 2 with S 1 = { 0 , 1 , , d 1 1 } . Riffle shuffling consists of two steps. In the first step, we attribute the scores of S 1 to J 1 and the scores of S 2 to J 2 . In the second step, we permute some scores attributed to J 1 with the same number of scores attributed to J 2 . The second step can be mathematically described as an application of a permutation τ , such that τ J ( S 1 , S 2 ) = ( τ J 1 ( S 1 ) , τ J 2 ( S 2 ) ) . We interpret τ J 1 ( S 1 ) as the set of scores attributed to J 1 , and τ J 2 ( S 2 ) as the set of scores attributed to J 2 .

Example 5: Table 5 displays a toy example with n = 7 voters’ Borda scorings of d = 10 items with J 1 = { a , b , c , d } and J 2 = { e , f , g , h , i , j } . We observe the following: a) The first four voters have only done the first step in a riffle shuffle: each one of them has attributed the scores in S 1 = { 0 , 1 , 2 , 3 } to the items in J 1 and the scores in S 2 = { 4 , 5 , 6 , 7 , 8 , 9 } to the items in J 2 . This can be described as τ J ( S 1 , S 2 ) = ( S 1 , S 2 ) ; that is the permutation τ is the identity permutation; so there is no crossing of scores between J 1 and J 2 . b) Voters 5, 6 and 7 have done both steps in a riffle shuffle. Voters 5 and 6 have permuted score 3 with 5, so we have τ J 1 ( S 1 ) = { 0 , 1 , 2 , 5 } and τ J 2 ( S 2 ) = { 4 , 3 , 6 , 7 , 8 , 9 } . Voter 7 has permuted the scores { 2 , 3 } with { 4 , 5 } , so we have τ J 1 ( S 1 ) = { 0 , 1 , 4 , 5 } and τ J 2 ( S 2 ) = { 2 , 3 , 6 , 7 , 8 , 9 } .

Further, we note by | τ J 1 ( S 1 ) | the number of voters who have done the riffle shuffle ( τ J 1 ( S 1 ) , τ J 2 ( S 2 ) ) . So | τ J 1 ( S 1 ) = { 0 , 1 , 2 , 3 } | = 4 , | { 0 , 1 , 2 , 5 } | = 2 and | { 0 , 1 , 4 , 5 } | = 1 . The permuted scores between the two blocks of items are in bold in Table 5.

Remark 4: A useful observation that we get from Example 5 is that we can concentrate our study either on J 1 or on J 2 : For if we know τ J 1 ( S 1 ) , the scores attributed to J 1 , we can deduce τ J 2 ( S 2 ) , the scores attributed to J 2 because of mutual exclusivity constraints ensuring that any two items, say a and

Table 5. Borda scorings of 10 items by 7 voters.

b, never map to the same rank by a voter.

A simple measure of magnitude of ( d 1 , d 2 ) riffle shuffling of a voter i is the sum of its Borda scores attributed to the items in J 1 ; that is,

T i ( τ J 1 ( S 1 ) ) = j J 1 r i j ,

where r i j is the Borda score attributed to item j by voter i. In Table 5, for the first four voters, T i ( τ J 1 ( S 1 ) ) = 6 for i = 1 , 2 , 3 , 4 , which is the minimum attainable sum of scores; it implies that for these voters there is no crossing of scores between the two blocks J 1 and J 2 . While for voters 5 and 6, T i ( τ J 1 ( S 1 ) ) = 8 for i = 5 , 6 ; for voter 7, T 7 ( τ J 1 ( S 1 ) ) = 10 . These values show that the crossing of scores between the two blocks J 1 and J 2 of voters 5 and 6 are at a lower level than the crossing of scores for voter 7.

For relatively small sample sizes, it is easy to enumerate the different types of ( d 1 , d 2 ) riffle shuffles. For relatively large sample sizes, we use the contingency table of first-order marginals, that we discuss next.

Types of (d1, d2) Riffle Shufflings in a Coherent Cluster

The contingency table of first order marginals of an observed voting profile V on d items is a square d × d matrix M, where M ( i , j ) stores the number of times that item j has Borda score i for i = 0, , d 1 , see subsection 3.2. It helps us to observe types of ( d 1 , d 2 ) riffle shufflings in a coherent cluster as we explain in Example 6.

Example 6: Tables 6-12 display M 1, α for α = 1, ,7 , the contingency tables of first-order marginals of the seven coherent clusters of the SUSHI data, respectively. We observe the following:

Each one of them reveals the nature of the riffle shuffles of its coherent cluster,

Table 6. M 1,1 , contingency table of first-order marginals of c o h C 1 ( 1 ) .

Table 7. M 1,2 , contingency table of first-order marginals of c o h C 1 ( 3 ) .

Table 8. M 1,3 , contingency table of first-order marginals of c o h C 1 ( 3 ) .

Table 9. M 1,4 , contingency table of first-order marginals of c o h C 1 ( 4 ) .

Table 10. M 1,5 , contingency table of first-order marginals of c o h C 1 ( 5 ) .

Table 11. M 1,6 , contingency table of first-order marginals of c o h C 1 ( 6 ) .

Table 12. M 1,7 , contingency table of first-order marginals of c o h C 1 ( 7 ) .

Table 13. Types of riffle shuffles in the 7 coherent clusters of SUSHI data.

which are summarized in Table 13. The number of observed ( 4,6 ) blocks of scores for the seven coherent clusters, ( τ J 1 ( S 1 ) , τ J 2 ( S 2 ) ) , is only 27 in Table 13 out of the possible total number of 10 ! / ( 4 ! 6 ! ) = 210 . The counts of the observed ( 4,6 ) blocks do not seem to be uniformly distributed in Table 13. Furthermore, we observe that as α increases from 1 to 7, the magnitude of riffle shuffles, T v ( τ J 1 ( S 1 ) ) , increases in the coherent clusters from 6 to 12. Integers in bold in Table 13 are the shuffled-crossed scores.

The counts in Table 13 are calculated from M 1, α for α = 1, ,7 , by reasoning on the permutation of scores between the sets S 1 and S 2 . Here are the details, where J 1 = { j 10 , j 7 , j 4 , j 9 } .

a) c o h C 1 ( 1 )

| { 0,1,2,3 } | = 314 , which is the number of 0 s attributed to J 1 in M 1,1 . Among the M 1, α for α = 1, ,7 , note that M 1,1 is the only contingency table of first-order marginals which is block diagonal.

b) c o h C 1 ( 2 )

| { 0 , 1 , 2 , 4 } | = 235 , which is the number of 4 s attributed to J 1 in M 1,2 .

c) c o h C 1 ( 3 )

| { 0,1,2,5 } | = 171 , which is the number of 5 s attributed to J 1 in M 1,3 .

| { 0,1, 4 ,3 } | = 155 , which is the number of 4 s attributed to J 1 in M 1,3 .

d) c o h C 1 ( 4 )

{ { 0,1,2, 6 } } = 96 , which is the number of 6 s attributed to J 1 in M 1,4 .

| { 0,1, 5 ,3 } | = 119 , which is the number of 5 s attributed to J 1 in M 1,4 .

| { 0, 4 ,2,3 } | = 100 , which is the number of 4 s attributed to J 1 in M 1,4 .

e) c o h C 1 ( 5 )

| { 0,1,2, 7 } | = 79 , which is the number of 7 s attributed to J 1 in M 1,5 .

| { 0,1, 6 ,3 } | = 84 , which is the number of 6 s attributed to J 1 in M 1,5 .

| { 0, 5 ,2,3 } | = 85 , which is the number of 1 s not attributed to J 1 in M 1,5 .

| { 0,1, 4 , 5 } | + | { 0, 5 ,2,3 } | = 170 , which is the total number of 5 s attributed to J 1 in M 1,5 ; so | { 0,1, 4 , 5 } | = 170 85 = 85 .

| { 4 ,1,2,3 } | = 119 , which is the number of 0 s not attributed to J 1 in M 1,5 .

f) c o h C 1 ( 6 )

| { 0,1,2, 8 } | = 48 , which is the number of 8 s attributed to J 1 in M 1,6 .

| { 0,1, 7 ,3 } | = 63 , which is the number of 7 s attributed to J 1 in M 1,6 .

| { 5 ,1,2,3 } | = 98 , which is the number of 0 s not attributed to J 1 in M 1,6 .

| { 0, 4 ,2, 5 } | = 152 98 = 54 , where 152 is the total number of 5 s attributed to J 1 in M 1,6 .

| { 0,1, 4 , 6 } | = 113 54 = 59 , where 113 is the total number of 4 s attributed to J 1 in M 1,6 .

| { 0, 6 ,2,3 } | = 112 59 = 53 , where 112 is the total number of 6 s attributed to J 1 in M 1,6 .

g) c o h C 1 ( 7 )

| { 0,1,2, 9 } | = 26 , which is the number of 9 s attributed to J 1 in M 1,7 .

| { 0,1, 8 ,3 } | = 26 , which is the number of 8 s attributed to J 1 in M 1,7 .

For the remaining counts, we have to solve the following system of 7 linear equations, where, u = | { 0, 7 ,2,3 } | , t = | { 0, 4 , 5 ,3 } | , s = | { 0, 4 ,2, 6 } | , w = | { 0,1, 4 , 7 } | , z = | { 0,1, 5 , 6 } | , x = | { 6 ,1,2,3 } | , and y = | { 4 ,1,2,5 } | .

x + y = 147 , which is the number of 0 s not attributed to J 1 in M 1,7 .

u + w = 72 , which is the number of 7 s attributed to J 1 in M 1,7 .

s + z + x = 169 , which is the number of 6 s attributed to J 1 in M 1,7 .

t + z + y = 157 , which is the number of 5 s attributed to J 1 in M 1,7 .

t + s + w + y = 185 , which is the number of 4 s attributed to J 1 in M 1,7 .

u + t + x = 158 , which is the number of 3 s attributed to J 1 in M 1,7 .

u + s + x + y = 218 , which is the number of 2 s attributed to J 1 in M 1,7 .

7. Crossing Index

The following ( d 1 , d 2 ) crossing index is based on the internal dispersion of a voting profile.

Definition 3: For a voting profile V we define its crossing index to be

C r o s s ( V ) = 1 δ 1 ( V d 1 , d 2 ) max V δ 1 ( V d 1 , d 2 ) = 1 δ 1 ( V d 1 , d 2 ) 2 d 1 d 2 d ( d 1 ) by Proposition 2.

where δ 1 ( V d 1 , d 2 ) is the first taxicab dispersion obtained from TCA of V and ( d 1 , d 2 ) represents the optimal TCA binary partition of the d items of V such that d = d 1 + d 2 .

Proposition 4: The crossing index of a coherent cluster is

C r o s s ( c o h C ( α ) ) = 2 ( α 1 ) d 1 d 2 .

Example 7: The last column in Table 3 contains the values of the crossing indices of the seven coherent clusters of the first iteration of SUSHI data. We observe: a) C r o s s ( c o h C 1 ( 1 ) ) = 0 , because the structure of its matrix of first-order marginals, M 1,1 , is block diagonal; which means that the permutation τ is the identical permutation, so there are no crossing of scores between the two subsets of items J 1 and J 2 in c o h C 1 ( 1 ) . b) C r o s s ( c o h C 1 ( α ) ) for α = 1 , , 7 is a uniformly increasing function of α , similar in spirit to the T v ( τ J 1 ( S 1 ) ) statistic. c) For the incoherent cluster V 1,8 , we have: δ 1 ( V 1,8 ) = 0.2354 given in Example 3; and d 1 = d 2 = 5 from Figure 10. So

C r o s s ( V 1 , 8 ) = 1 0.2354 2 × 5 × 5 / ( 10 × 9 ) = 1 0.4237 = 0.5763 .

8. Coherent Group

Our aim is to explore a given voting profile V by uncovering its coherent mixture groups, see Equation (1); that is, V = g = 1 G c o h G ( g ) n o i s y G , where G represents the number of coherent groups and c o h G ( g ) is the gth coherent group. The computation is done by an iterative procedure in n G steps for n G G that we describe:

For g = 1 ; let V 1 = V ; compute c o h G ( 1 ) from V 1 , then partition V 1 = V 2 c o h G ( 1 ) ;

For g = 2 ; compute c o h G ( 2 ) from V 2 , then partition V 2 = V 3 c o h G ( 2 ) ;

By continuing the above procedure, after n G steps, we get V = g = 1 n G c o h G ( g ) .

However, some of the higher ordered coherent groups may have relatively small sample sizes; so by considering these as outliers, we lump them together thus forming the noisy group denoted by n o i s y G in Equation (1).

Let us recall the definition of a coherent group given in Equation (2)

c o h G ( g ) = α = 1 c g c o h C g ( α ) for g = 1 , , G ;

that is, a coherent group is the union of its coherent clusters. This implies that the sample size of c o h G ( g ) equals the sum of the sample sizes of its coherent clusters

| c o h G ( g ) | = α = 1 c g | c o h C g ( α ) | .

As an example, for the SUSHI data, from the 2nd column of Table 3 we can compute the sample size of the first coherent group

| c o h G ( 1 ) | = α = 1 c g = 7 | c o h C 1 ( α ) | = 2418.

Furthermore, c o h G ( 1 ) is composed of 27 observed riffle shuffles summarized in Table 13, which provides quite a detailed view of its inner structure.

The next result shows important characteristics of a coherent group inherited from its coherent clusters.

Theorem 2: (Properties of a coherent group c o h G ( g ) )

a) The first principal column factor score g 1 of the d items in a coherent group is the weighted average of the first principal column factor score g 1 of the d items of its coherent clusters; that is,

g 1 ( j c o h G ( g ) ) = α = 1 c g | c o h C g ( α ) | | c o h G ( g ) | g 1 ( j c o h C g ( α ) ) for j = 1, , d

= 2 d 1 α = 1 c g | c o h C g ( α ) | | c o h G ( g ) | β ( j c o h C g ( α ) ) 1 by Proposition 3.

And c o r r ( g 1 ( c o h G ( g ) ) , β ( c o h G ( g ) ) ) = 1 .

b) The first TCA dispersion value of a coherent group is the weighted average of the first TCA dispersion values of its coherent clusters; that is,

δ 1 ( c o h G ( g ) ) = α = 1 c g | c o h C g ( α ) | | c o h G ( g ) | δ 1 ( c o h C g ( α ) ) .

c) The crossing index of a coherent group is the weighted average of the crossing indices of its coherent clusters; that is,

C r o s s ( c o h G ( g ) ) = α = 1 c g | c o h C g ( α ) | | c o h G ( g ) | C r o s s ( c o h C g ( α ) ) .

Example 8: Table 14 summarizes the first four coherent groups of SUSHI data, which emerged after 5 iterations. For g = 1 , we get c o h G ( 1 ) = α = 1 c 1 = 7 c o h C 1 ( α ) ; that is, the first coherent group of voters, the majority, is composed of 48.36% of the sample with crossing index of 27.3%. Standard errors of the Borda scale of the items in c o h G ( 1 ) in Table 14 are:

( 0.046,0.051,0.042,0.042,0.053,0.047,0.037,0.034,0.037,0.025 ) .

We can discern the following grouped seriation (bucket ranking) of the items

j 8 j 5 j 6 { j 3, j 2 } j 1 { j 9, j 4 } { j 7 } { j 10 } .

The groupings are based on the standard 95% confidence intervals of the Borda scale of the items.

The 2nd coherent group c o h G ( 2 ) , summarized by its Borda scales in Table 14, is made up of eight coherent clusters; it is composed of 19.0% of the sample with crossing index of 35.38%. The voters in this coherent group disapprove {uni(seaurchin), sake(salmonroe)}, which are considered more “daring sushis”.

The third coherent group c o h G ( 3 ) , summarized by its Borda scales in Table 14, is made up of eight coherent clusters; it is composed of 13.24% of the sample with crossing index of 27.3%. The voters in this coherent group prefer the three types of tuna sushis with sea urchin sushis.

The fourth coherent group c o h G ( 4 ) , summarized by its Borda scales in Table 14, is made up of eight coherent clusters; it is composed of 6.94% of the sample with crossing index of 35.27%. The voters disapprove the three types of tuna sushis.

Remark 6:

a) Note that the number of preferred sushis in c o h G ( 1 ) and c o h G ( 2 ) are six; that is | J 2 | = 6 . While the number of preferred sushis in c o h G ( 3 ) and c o h G ( 4 ) are four.

b) The four coherent groups summarized in Table 14 can also be described as two bipolar latent factors: By noting that the only major difference between the first two coherent groups is that (5. uni (sea urchin), 6. sake (salmon roe)) are swapped with (7. tamago (egg), 4. ika (squid)). While the only major difference between the third and fourth coherent groups is that the three tunas are swapped with (4. ika (squid), 5. uni (sea urchin), 1. ebi (shrimp)).

c) We consider the fifth group as noisy (outliers not shown) composed of 12.36% of the remaining sample: it contains c o h G ( 5 ) = α = 1 2 c o h C 5 ( α ) whose sample size is 38, a very small number. For the sake of completeness we also provide the sample sizes of its two coherent clusters | c o h C 5 ( 1 ) | = 22 and | c o h C 5 ( 2 ) | = 16 .

9. APA Data Set

The 1980 American Psychological Association (APA) presidential election had five candidates: { A , C } were research psychologists, { D , E } were clinical

Table 14. The first four coherent groups of SUSHI data and related statistics.

psychologists and B was a community psychologist. In this election, voters ranked the five candidates in order of preference. Among the 15,449 votes, 5738 votes ranked all five candidates. We consider the data set which records the 5738 complete votes; it is available in [ [20], p. 96] and [ [5], Table 1]. The winner was candidate C.

Table 15 compares the results obtained by our method and the best distance-based mixture model given in [21]. Distance-based models have two parameters, a central modal ranking and a precision parameter. The precision parameter measures the peakedness of the distribution. [21] found that the Cayley distance produced better results than the Kendall and Spearman distances using BIC (Bayesian information criterion) and ICL (integrated complete likelihood) criteria. Parts a and b of Table 15, are reproduced from [ [21], Table 4 and Table 5].

Part c of Table 15 summarizes the results of our approach, where we only describe the first four coherent groups: We find only the first two coherent groups as meaningfully interpretable based on the a priori knowledge of the candidates. Voters in c o h G ( 1 ) , with sample size of 31%, prefer the research oriented psychologists { A , C } over the rest. Voters in c o h G ( 2 ) , with sample size of 23.7%, prefer the clinical psychologists { D , E } over the rest. We interpret c o h G ( 3 ) and c o h G ( 4 ) as mixed B with 14.23% and 12% of the voters, respectively. Additionally, there is a n o i s y G making up 19.1% of the sample, which comprises c o h G ( 5 ) displayed in Table 15.

[5] discussed this data set quite in detail; surprisingly, our results confirm his observations: a) There are two groups of candidates, { A , C } and { D , E } . The voters line up behind one group or the other; b) The APA divides into academicians and clinicians who are on uneasy terms. Voters seem to choose one type or the other, and then choose within; but the group effect predominates; c) Candidate B seems to fall in the middle, perhaps closer to D and E.

The following important observation emerges from the comparison of results in Table 15. We have two distinct concepts of groups for rank data, categorical and latent variable based. To see this, consider groups 3 and 4 in part a of Table 15:

Table 15. A summary of results derived from three methods of analysis of APA election data. Parts (a) and (b) are from Murphy and Martin (2003).

Group 3 is based on the modal category B C A D E and group 4 is based on the modal category B C A E D . The only difference between these two modal categories is the permutation of the least ranked two clinical psychologist candidates { D , E } ; this difference is not important and does not appear in our approach, which is a latent variable approach.

Description

The eight coherent clusters of the first four coherent groups can simply be described as:

c o h 1 C ( 1 ) : T v ( τ J 2 ( S 2 ) = τ { A , C } { 3,4 } = { 3,4 } ) = 7 for v = 1, ,1233 .

c o h 1 C ( 2 ) : T v ( τ J 2 ( S 2 ) = τ { A , C } { 3,4 } = { 2 ,4 } ) = 6 for v = 1, ,545 .

c o h 2 C ( 1 ) : T v ( τ J 2 ( S 2 ) = τ { D , E } { 3,4 } = { 3,4 } ) = 7 for v = 1, ,834 .

c o h 2 C ( 2 ) : T v ( τ J 2 ( S 2 ) = τ { D , E } { 3,4 } = { 2 ,4 } ) = 6 for v = 1, ,526 .

c o h 3 C ( 1 ) : T v ( τ J 1 ( S 1 ) = τ { C , E } { 0,1 } = { 0,1 } ) = 1 for v = 1 , , 512 .

c o h 3 C ( 2 ) : T v ( τ J 1 ( S 1 ) = τ { C , E } { 0,1 } = { 0, 2 } ) = 2 for v = 1 , , 305 .

c o h 4 C ( 1 ) : T v ( τ J 1 ( S 1 ) = τ { A , D } { 0,1 } = { 0,1 } ) = 1 for v = 1 , , 350 .

c o h 4 C ( 2 ) : T v ( τ J 1 ( S 1 ) = τ { A , D } { 0,1 } = { 0, 2 } ) = 2 for v = 1 , , 338 .

In this case, we can also visualize all the orderings belonging to a coherent group: Figure 11 and Figure 12 display all the preferences belonging to the two coherent clusters of the first coherent group. The label CAEBD162 in Figure 11 should be interpreted as the preference C A E B D repeated 162 times.

Figure 11. TCA map of C o h 1 C 1 of APA data.

Figure 12. TCA map of C o h 1 C 2 of APA data.

10. Riffle Independence Model

Riffle independence is a nonparametric probabilistic modelling method of preferences developed by [2], which generalizes the independence model. It can be described in the following way:

(a) Partition the set J of d distinct items into two disjoint subsets J 1 of size d 1 and J 2 of size d 2 . Then generate an ordering of items within each subset according to a certain ranking model. This implies that any ordering of the d items can be written as a direct product of two disconnected orderings; which in its turn implies the independence of the two subsets J 1 and J 2 . So the model complexity of this step is of order d 1 ! + d 2 ! .

(b) Interleave the two independent orderings for these two subsets using a riffle shuffle to form a combined ordering. An interleaving is a binary mapping from the set of orderings to { J 1 , J 2 } . The model complexity of this step is of order d ! / ( d 1 ! d 2 ! ) . The interleaving step generates the riffled independence of the two subsets J 1 and J 2 .

So the combined model complexity of both steps is d 1 ! + d 2 ! + d ! / ( d 1 ! d 2 ! ) which is much smaller than d ! = ( d 1 + d 2 ) ! .

For example, consider an ordering of the items in the set J = { A , B , C , D , E , F } from its two subsets J 1 = { A , C } and J 2 = { B , D , E , F } . In the first step, relative orderings of the items in J 1 and J 2 are drawn independently. Suppose we obtain the relative ordering φ ( J 1 ) = ( C A ) in J 1 , and the relative ordering φ ( J 2 ) = ( B D F E ) in J 2 . Then, in the second step, the two relative orderings are combined by interleaving the items in the two subsets. For instance, if the interleaving process is ω ( J 1 , J 2 ) = ( J 1 , J 2 , J 2 , J 1 , J 2 , J 2 ) , where the relative ordering of the items in each subset remains unchanged, the combined ordering is then determined by the composition

ω ( J 1 , J 2 ) ( φ ( J 1 ) , φ ( J 2 ) ) = ( C B D A F E ) = φ ( J ) .

Given the two subsets J 1 and J 2 with their orderings φ ( J 1 ) and φ ( J 2 ) and interleaving ω ( J 1 , J 2 ) generated from models with probability distributions f J 1 , g J 2 and m ω , respectively, the probability of observed ordering under the riffle independence model is

P ( φ ( J ) ) = m ω ( ω ( J 1 , J 2 ) ) f J 1 ( φ ( J 1 ) ) g J 2 ( φ ( J 2 ) ) .

There are two formulations of riffle shuffle for rank data in statistics: probabilistic and exploratory. In the riffled independence model, the set of items is partitioned recursively, while in the exploratory approach the set of voters is partitioned recursively.

11. Conclusions

The main contribution of this paper is the introduction of an exploratory riffle shuffling procedure to reveal and display the structure of diffuse rank data for large sample sizes. The new notion of a coherent cluster, that we developed, is simply based on the geometric notion of taxicab projection of points on the first TCA axis globally and locally; furthermore, it has nice mathematical properties. Coherent clusters of a coherent group represent the same latent variable opposing preferred items to disliked items, and can easily be interpreted and displayed.

Like Occam’s razor, step by step, our procedure peels the essential structural layers (coherent groups) of rank data.

Our method was able to discover some other aspects of the rank data, such as outliers or small groups, which are eclipsed or masked by well-established methods, such as distance or random utility-based methods. The major reason for this is that in random utility-based methods the multivariate nature of a preference is reduced to binary preferences (paired comparisons), and in Mallows distance related methods distances between any two preferences are bounded.

We presented a new index, Cross, that quantifies the extent of crossing of scores between the optimal binary partition of the items that resulted from TCA. The crossing index of a group is based on the first taxicab dispersion measure: it takes values between 0 and 100%, so it is easily interpretable.

The proposed approach can easily be generalized to the analysis of rankings with ties and partial rankings.

The package TaxicabCA written in R available on CRAN can be used to do the calculations.

Acknowledgements

Choulakian’s research has been supported by NSERC grant (RGPIN-2017-05092) of Canada.

Appendix

Let R = ( r i j ) for i = 1 , , n and j = 1 , , d represent the Borda scorings for preferences, where r i j takes values 0, , d 1 . Similarly, let R ¯ represent the reverse Borda scorings, whose column sums are the cordinates of the row named n e g a = n β ¯ = 1 n R ¯ . We consider the application of TCA to the data set

R n e g a = ( R n e g a )

of size ( n + 1 ) × d . So let

P = R n e g a / t

be the correspondence table associated with R n e g a , where t = 2 n j = 0 d 1 j = n d ( d 1 ) . We have

p i = 1 2 n for i = 1, , n (14)

= 1 2 for i = n + 1, (15)

and

p j = 1 d for j = 1, , d . (16)

The first residuel correspondence matrix will be

p i j ( 1 ) = p i j p i p j (17)

= r i j t 1 2 n . 1 d for i = 1, , n (18)

= n e g a j t 1 2 . 1 d for i = n + 1. (19)

Consider the nontrivial binary partition of the set S = { 0,1, , d 1 } into S = S 1 S 2 , where | S 1 | = d 1 , | S 2 | = d 2 and d = d 1 + d 2 . To eliminate the sign indeterminacy in the first TCA principal axis, we fix v 1 ( n e g a ) = v 1 ( n + 1 ) = 1 ; and we designate by S 1 the set of item indices such that the first TCA principal axis coordinates are negative, that is, u 1 ( j ) = 1 for j S 1 . It follows that u 1 ( j ) = 1 for j S 2 .

Now we have by (4) for i = 1 , , n

a i 1 = j = 1 d u 1 ( j ) p i j ( 1 ) = j S 1 u 1 ( j ) p i j ( 1 ) + j S 2 u 1 ( j ) p i j ( 1 ) = j S 1 p i j ( 1 ) + j S 2 p i j ( 1 ) = 2 j S 1 p i j ( 1 ) by ( 17 ) = 2 j S 1 ( r i j t 1 2 n . 1 d ) by ( 18 ) = d 1 n d 2 t j S 1 r i j ; (20)

and from which we deduce by (5) for i = 1 , , n

f i 1 = a i 1 p i = 2 d 1 d 4 d ( d 1 ) j S 1 r i j . (21)

We have the following Theorem concerning the first TCA principal factor scores of respondents f i 1 for i = 1 , , n .

Theorem 1:

a) The maximum number of distinct clusters of n respondents on the first TCA principal axis (distinct f i 1 values) is d 1 d 2 + 1 .

Proof: We consider the two extreme cases of S 1 and calculate the summation term in (21):

For S 1 = { 0 , 1 , , d 1 1 } , j S 1 r i j = j = 0 d 1 1 j = d 1 ( d 1 1 ) 2 .

For S 1 = { d d 1 , 1 , , d 1 } , j S 1 r i j = j = d d 1 d 1 j = j = d 2 d 1 j = d 1 ( d 2 + d 1 ) 2 .

It follows that

d 1 ( d 1 1 ) 2 j S 1 r i j d 1 ( d 2 + d 1 ) 2 ;

so j S 1 r i j can take at most d 1 ( d 2 + d 1 ) 2 d 1 ( d 1 1 ) 2 + 1 = d 1 d 2 + 1 values.

b) The maximum value that f i 1 can attain is 2 d 1 d 2 d ( d 1 ) .

Proof: From (21) and Part a, it follows that the maximum value that f i 1 can attain is ( 2 d 1 d 4 d ( d 1 ) d 1 ( d 1 1 ) 2 ) = 2 d 1 d 2 d ( d 1 ) .

c) The minimum value that f i 1 can attain is 2 d 1 d 2 d ( d 1 ) .

Proof: From (21) and Part a, it follows that the minimum value that f i 1 can attain is ( 2 d 1 d 4 d ( d 1 ) d 1 ( d 2 + d 1 ) 2 ) = 2 d 1 d 2 d ( d 1 ) .

d) If the number of distinct clusters is maximum, d 1 d 2 + 1 , then the gap between two contiguous f i 1 values is 4 d ( d 1 ) .

Proof: Suppose that the number of distinct clusters is maximum, d 1 d 2 + 1 . We consider the first TCA factor score f i 1 = 2 d 1 d 4 d ( d 1 ) j S 1 r i j which is different in value from the two extreme values ± 2 d 1 d 2 d ( d 1 ) . Then f i 1 1 = 2 d 1 d 4 d ( d 1 ) ( 1 + j S 1 r i j ) will be the contiguous higher value to f i 1 ; and similarly f i 2 1 = 2 d 1 d 4 d ( d 1 ) ( 1 + j S 1 r i j ) will be the contiguous lower value to f i 1 ; and the required result follows.

Proposition 1: For a voting profile V, δ 1 | f 1 ( n e g a ) | .

Proof: Let a 1 = ( a 11 a 1 ( n e g a ) ) . We need the following three observations.

First, it is well known that a 1 is centered by (5) and (9),

1 n + 1 a 1 = 0 = 1 n a 11 + a 1 ( n e g a ) ;

from which we get,

| 1 n a 11 | = | a 1 ( n e g a ) | . (22)

Second, by triangle inequality of the L1 norm we have

a 11 1 | 1 n a 11 | . (23)

Third, the marginal relative frequency of the nega row is p n e g a = 1 / 2 by (15), and f i 1 = a i 1 / p i for i = 1 , , n + 1 by (5); so we have

f 1 ( n e g a ) = 2 a 1 ( n e g a ) . (24)

Now we have by (7)

δ 1 = a 1 1 = a 11 1 + | a 1 ( n e g a ) | | 1 n a 11 | + | a 1 ( n e g a ) | by ( 23 ) = 2 | a 1 ( n e g a ) | by ( 22 ) = | f 1 ( n e g a ) | by ( 24 ) (25)

Propostion 2: Let c o h C m ( α ) = V m , α be the αth coherent cluster of the mth coherent group characterized by f 1 V m , α ( σ ) = f α V m for all σ c o h C m ( α ) . Then δ 1 = f α V m = f 1 ( n e g a ) .

Proof: By Definition 1 of the coherency of the cluster V m , α , we have 0 < f 1 V m , α ( i ) = f α V m for i = 1, , | c o h C m ( α ) | ; by (5) it follows that 0 < a i 1 = f α V m / n for i = 1, , | c o h C m ( α ) | ; so (25) becomes equality, a 11 1 = i = 1 n a i 1 = | 1 n a 11 | , and the required result follows.

Proposition 3 is a corollary to the following general result

Theorem 3: If the first TCA principal axis of the columns of R n e g a is v 1 = ( 1 n 1 ) , then

the first principal column factor score g 1 of the d items is an affine function of the Borda scale β ; that is, g 1 ( j ) = 2 d 1 β ( j ) 1 or c o r r ( g 1 , β ) = 1 .

Proof: Suppose that v 1 = ( 1 n 1 ) ; then by (4) for j = 1 , , d

b 1 ( j ) = i = 1 n + 1 v 1 ( i ) p i j ( 1 ) = i = 1 n p i j ( 1 ) p ( n + 1 ) j ( 1 )

= 2 i = 1 n p i j ( 1 ) by (17)

= 2 i = 1 n ( p i j p i p j )

= 2 i = 1 n r i j / t p j by (14)

= 2 n β ( j ) / t p j

Thus by (5) for j = 1 , , d

g 1 ( j ) = b 1 ( j ) / p j = 2 n β ( j ) / t p j p j = 2 β ( j ) d 1 1.

Proposition 4: The crossing index of a coherent cluster is

C r o s s ( c o h C ( α ) ) = 2 ( α 1 ) d 1 d 2 .

Proof: Easily shown by using Definition 3 and Proposition 2.

The proof of Theorem 2a easily follows from Theorem 3. The proof of Theorem 2b is similar to the proof of Proposition 1. The proof of Theorem 2c is similar to the proof of Proposition 4.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Kamishima, T. (2003) Nantonac Collaborative Filtering: Recommendation Based on Order Responses. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington DC, August 2003, 583-588.
https://doi.org/10.1145/956750.956823
[2] Huang, J. and Guestrin, C. (2012) Uncovering the Riffled Independence Structure of Ranked Data. Electronic Journal of Statistics, 6, 199-230.
https://doi.org/10.1214/12-EJS670
[3] Lu, T. and Boutilier, C. (2014) Effective Sampling and Learning for Mallows Models with Pairwise Preference Data. Journal of Machine Learning Research, 15, 3783-3829.
[4] Vitelli, V., Sørenson, Ø., Crispino, M., Frigessi, A. and Arjas, E. (2018) Probabilistic Preference Learning with the Mallows Rank Model. Journal of Machine Learning Research, 18, 1-49.
[5] Diaconis, P. (1989) A Generalization of Spectral Analysis with Application to Ranked Data. Annals of Statistics, 17, 949-979.
https://doi.org/10.1214/aos/1176347251
[6] Marden, J.I. (1995) Analyzing and Modeling of Rank Data. Chapman & Hall, London
[7] Alvo, M. and Yu, P. (2014) Statistical Methods for Ranking Data. Springer, New York.
https://doi.org/10.1007/978-1-4939-1471-5
[8] Bayer, D. and Diaconis, P. (1992) Trailing the Dovetail Shuffle to Its Lair. Annals of Probability, 2, 294-313.
https://doi.org/10.1214/aoap/1177005705
[9] Choulakian, V. (2016) Globally Homogenous Mixture Components and Local Heterogeneity of Rank Data. arXiv:1608.05058
[10] Choulakian, V. (2006) Taxicab Correspondence Analysis. Psychometrika, 71, 333-345.
https://doi.org/10.1007/s11336-004-1231-4
[11] Choulakian, V. (2016) Matrix Factorizations Based on Induced Norms. Statistics, Optimization and Information Computing, 4, 1-14.
https://doi.org/10.19139/soic.v4i1.160
[12] De Borda, J. (1781) Mémoire sur les élections au scrutin. Histoire de L’Académie Royale des Sciences, 102, 657-665.
[13] Benzécri, J.P. (1991) Comment on Leo A. Goodman’s Invited Paper. Journal of the American Statistical Association, 86, 1112-1115.
https://doi.org/10.1080/01621459.1991.10475157
[14] Van de Velden, M. (2000) Dual Scaling and Correspondence Analysis of Rank Order Data. In: Heijmans, R.D.H., Pollock, D.S.G. and Satorra, A. Eds., Innovations in Multivariate Statistical Analysis, Vol. 36, Kluwer Academic Publishers, Dordrecht, 87-99.
https://doi.org/10.1007/978-1-4615-4603-0_6
[15] Torres, A. and Greenacre, M. (2002) Dual Scaling and Correspondence Analysis of Preferences, Paired Comparisons and Ratings. International Journal of Research in Marketing, 19, 401-405.
https://doi.org/10.1016/S0167-8116(02)00101-5
[16] Nishisato, S. (1980) Analysis of Categorical Data: Dual Scaling and Its Applications. University of Toronto Press, Toronto.
https://doi.org/10.3138/9781487577995
[17] Choulakian, V. (2014) Taxicab Correspondence Analysis of Ratings and Rankings. Journal de la Société Française de Statistique, 155, 1-23.
[18] Khot, S. and Naor, A. (2012) Grothendieck-Type Inequalities in Combinatorial Optimization. Communications on Pure and Applied Mathematics, 65, 992-1035.
https://doi.org/10.1002/cpa.21398
[19] Choulakian, V. and Abou-Samra, G. (2020) Mean Absolute Deviations about the Mean, Cut Norm and Taxicab Correspondence Analysis. Open Journal of Statistics, 10, 97-112.
https://doi.org/10.4236/ojs.2020.101008
[20] Diaconis, P. (1988) Group Representations in Probability and Statistics. Institute of Mathematical Statistics, Hayward, CA.
[21] Murphy, T.B. and Martin, D. (2003) Mixtures of Distance-Based Models for Ranking Data. Computational Statistics and Data Analysis, 41, 645-655.
https://doi.org/10.1016/S0167-9473(02)00165-2

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.