Note on Rank-Biserial Correlation when There Are Ties ()

José Moral de la Rubia^{}

School of Psychology, Universidad Autónoma de Nuevo León, Monterrey, México.

**DOI: **10.4236/ojs.2022.125036
PDF
HTML XML
134
Downloads
865
Views
Citations

School of Psychology, Universidad Autónoma de Nuevo León, Monterrey, México.

The objective of this
article is to demonstrate with examples that the *two-sided* *tie* correction does not work
well. This correction was developed by Cureton so that Kendall’s tau-type and
Spearman’s rho-type formulas for rank-biserial correlation yield the same
result when ties are present. However, a correction based on the *bracket* *ties* achieves the desired
goal, which is demonstrated algebraically
and checked with three examples. On the one hand, the 10-element random sample
given by Cureton, in which the *two-sided* *tie* correction performs well, is taken up. On the other
hand, two other examples are given, one with a 7-element random sample and the
other with a clinical random sample of 31 participants, in which the *two-sided* *tie* correction does not work,
but the new correction does. It is concluded that the new corrected formulas
coincide with Goodman-Kruskal’s gamma as compared to Glass’ formula that
matches Somers’ *d _{Y}*

Keywords

Ordinal Variable, Dichotomy, Linear Association, Nonparametric Statistics, Descriptive Statistics

Share and Cite:

Moral-de la Rubia, J. (2022) Note on Rank-Biserial Correlation when There Are Ties. Open Journal of Statistics, 12, 597-622. https://doi.org/10.4236/ojs.2022.125036

1. Introduction

The objective of this statistical methodology article is to derive and verify a new correction for the rank-biserial correlation in the case of ties to achieve equality of results between Kendall’s tau-type and Spearman’s rho-type formulas. Its reason is that the correction given by Cureton [1] and taken up by Willson [2] does not work well. The derivation of the correction is not explicit in the articles written by Cureton [1] and Willson [2] and is only verified by an example with a sample of 10 elements in Cureton’s article [1].

It begins by defining what rank-biserial correlation is and its calculation with or without ties using Kendall’s tau-type coefficient developed by Cureton [3]. The two Cureton’s formulas, one to use when there are ties and the other when there are no ties, give the same result as Kruskal-Wallis’ gamma, which is a non-directional association coefficient. It continues with the presentation of Spearman’s rho-type formula given by Glass [4] and the one based on the Mann-Whitney U statistic [5] created by Willson [2]. These formulas yield the same result as Cureton’s coefficient [3] when there are no ties. It should be noted that Glass’ and Willson’s formulas, whether or not there are ties, coincide with Somers’ *d _{Y}*

2. Kendall’s Tau-Type Formulas for Rank-Biserial Correlation

Rank-biserial correlation is a measure of association that has been developing since 1949 with Brogden’s pioneering work [6]. It was specified by Cureton [3], studied and expanded by Glass [4], Stanley [7], and Willson [2], and disseminated by authors such as Khamis [8] and Berry, Johnston, and Mielke Jr. [9]. However, it is underreported [9] [10], does not appear in statistical software [11], and is not usually taught in undergraduate statistics courses [12] [13].

In 1956, the American psychologist Edward Eugene Cureton (1902-1992) developed two Kendall’s tau-type formulas to measure the association between a ranking (1 to *n*) or a ranking variable Y and a dichotomous qualitative variable X (0 and 1) or dichotomized from an assumed or unknown ranking X’ (1 to *n*). One formula is applied when there are ties or repeated values in Y that affect both groups of X. The other formula is used in all other cases. Cureton named this measure the rank-biserial correlation coefficient and denoted it by *τ _{rb}* [3]. Its potential amplitude ranges from −1 (the

In case there are no ties in Y (ranking), the formula for *τ _{rb}* is a quotient (Formula (1)). Its numerator is the difference between the number of concordant and discordant pairs. Its denominator is the product between the number of zeros and ones, that is, between the sample sizes of groups 0 and 1 of X. This coefficient coincides with the Goodman-Kruskal gamma [14], as shown in Formula (1).

${\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}}=\frac{A-D}{A+D}=\gamma $ (1)

*A* = agreements or number of concordant pairs with a direct association between X (dichotomy) and Y (ranking). Let be the data pairs (*x _{i}*,

*D* = disagreements or number of discordant pairs with a direct association between X (dichotomy) and Y (ranking). We say that the data pairs (*x _{i}*,

*n _{0}* = sample size of group 0 of X.

*n _{1}* = sample size of group 1 of X.

*n* = *n _{0}* +

*A* + *D* = *n _{0}* ×

Agreements (*A*) and disagreements (*D*) can be computed from a 2 × *k* contingency table, using its joint absolute frequencies, *n _{ij}*. The two values of X (dichotomy) are placed per row: 0 (first row) and 1 (second row). The

Table 1. Joint frequencies between Y (dichotomy) per row and X (ranking or ordinal variable) per column and marginal frequencies of Y.* *

column of the frequency gives the agreements or number of concordant pairs (Formula (2)).

$A={\displaystyle {\sum}_{j=1}^{k-1}\left({n}_{1j}\times {\displaystyle {\sum}_{{j}^{\prime}>j}^{k}{n}_{2{j}^{\prime}}}\right)}$ (2)

The sum of products between each of the last *k* − 1 frequencies in the first row (*n _{12}* to

$D={\displaystyle {\sum}_{j=2}^{k}\left({n}_{1j}\times {\displaystyle {\sum}_{{j}^{\prime}<j}^{k-1}{n}_{2{j}^{\prime}}}\right)}$ (3)

From Table 1, *n _{0}* is the marginal frequency of the value 0 of X and is obtained through the sum of the

${n}_{i}={\displaystyle {\sum}_{j=1}^{k}{n}_{ij}};i=0,1$ (4)

If there are ties or repeated values in Y that affect both groups of X (0 and 1), which Cureton [3] calls *bracket* *ties*, a correction in the denominator of coefficient *τ _{rb}* is required. This corrected formula also matches the Goodman-Kruskal gamma [14], as shown in Formula (5).

${\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{A-D}{A+D}=\gamma $ (5)

Agreements (*A*) and disagreements (*D*) are calculated from the 2 × *k* contingency table as seen in the previous paragraphs (Formulas (2) and (3)). From Table 1, identifying the *bracket* *ties* is easy. They are those values of Y that have non-zero frequencies in both rows. The frequency in group 0 (*n _{0l}*) and the frequency in group 1 (

With tied data

${n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}=A+D$

${n}_{0}\times {n}_{1}=A+D+{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}$ (6)

See an example from Cureton [1] [3] with a random sample of ten paired data (*x _{i}*,

Agreements or number of concordant pairs according to Formula (2).

$\begin{array}{c}A=1\times \left(0+2+1+1+2\right)+2\times \left(2+1+1+2\right)+0\times \left(1+1+2\right)+1\times \left(1+2\right)+0\times 2\\ =6+12+0+3+0=21\end{array}$

Disagreements or number of discordant pairs according to Formula (3).

$\begin{array}{c}D=0\times \left(0+0+2+1+1\right)+0\times \left(0+0+2+1\right)+1\times \left(0+0+2\right)\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}+0\times \left(0+0\right)+2\times 0\\ =0+0+2+0+0=2\end{array}$

*Bracket* *ties* or Y values that are repeated in the two groups of X: value 4 of Y.

Tie correction or sum of products between the frequencies of the *bracket* *ties* in groups 0 and 1 de X according to the subtrahend of Formula (6).

${\sum}_{l=1}^{1}{n}_{0l}\times {n}_{1l}}={n}_{0}\times {n}_{1}=1\times 1=1$

Biserial-rank correlation coefficient according to Formula (5).

${\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{21-2}{4\times 6-1}=\frac{19}{23}\approx 0.8261$

The statement in Formula (5) that the value of the coefficient *τ _{rb}* [3] is equal to the Goodman-Kruskal gamma [14] is verified.

$\gamma =\frac{A-D}{A+D}=\frac{21-2}{21+2}=\frac{19}{23}\approx 0.8261={\tau}_{rb}$

3. Spearman’s Rho-Type and U-Statistic-Based Formulas for Rank-Biserial Correlation with Untied Data

The American statistician Gene V. Glass provided three formulas corresponding

Table 2. Joint frequencies between X (dichotomy) per row and Y (ordinal variable) per column and marginal frequencies of X.

*Note*. *Bracket* *ties* or values of Y that are repeated in both groups of X are indicated in parentheses, *n _{i}* = ∑

to Spearman’s rho-type coefficient for calculating rank-biserial correlation [4]. All three formulas are equivalent (Formula (7)). In addition, Glass demonstrated that these formulas are equal to Kendall’s tau-type coefficient developed by Cureton [3] when there are no ties. The new coefficient was denoted by *r _{rb}* and has a range from −1 to 1.

$\begin{array}{c}{r}_{rb}=\frac{2\left({\stackrel{\xaf}{r}}_{Y|X=1}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)}{n}=\frac{2}{{n}_{0}}\times \left({\stackrel{\xaf}{r}}_{Y|X=1}-{\stackrel{\xaf}{r}}_{Y}\right)=\frac{2}{{n}_{0}}\times \left({\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}\right)\\ =\frac{2}{{n}_{1}}\times \left({\stackrel{\xaf}{r}}_{Y}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)=\frac{2}{{n}_{1}}\times \left(\frac{n+1}{2}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)\end{array}$ (7)

*n* = sample size that includes *n _{0}* values 0 and

${\stackrel{\xaf}{r}}_{Y|X=0}$ = conditional mean rank of Y given the value 0 of X.

${\stackrel{\xaf}{r}}_{Y|X=1}$ = conditional mean rank of Y given the value 1 of X.

${\stackrel{\xaf}{r}}_{Y}$ = (*n *+ 1)/2 = (unconditional) mean rank of Y.

Newson [15] [16] proved that the Glass coefficient *r _{br}* matches Somers’ [17]

${r}_{rb}={d}_{Y|X}=\frac{A-D}{A+D+{E}_{Y}}$ (8)

${E}_{Y}={\displaystyle {\sum}_{j=1}^{k}{n}_{0j}\times {n}_{1j}}$ (9)

Kendall’s tau-type coefficient developed by Cureton [3] and Spearman’s rho-type coefficient created by Glass [4] give the same result when there are no ties or repeated values in Y (ranking) in both groups of X (dichotomy). The latter formula can be related to the Mann-Whitney U test [5], which makes it possible to calculate the rank-biserial correlation from the U-statistic, as a coefficient in absolute value or measure of effect size [2] [4]. In turn, this relationship allows obtaining a formula to compute the asymptotic standard error (*ASE*), based on the convergence to a normal distribution. This error makes interval estimation and significance testing of the rank-biserial correlation coefficient possible (Formula (10)). A sample of at least 20 elements in one of the two independent groups and no less than 8 in the other group is recommended to use this asymptotic approximation [2].

$\left|{r}_{rb}\right|=1-\frac{2U}{{n}_{0}\times {n}_{1}}\in \left[0,1\right]$ (10)

*U* = Mann-Whitney U statistic [5]. The Y ranks are separated by the two groups of X, and the ranks of each group are summed (Formula (11). These sums of ranks allow obtaining the *U _{0}* and

$S{R}_{Y|X=0}={\displaystyle {\sum}_{i=1}^{{n}_{0}}{r}_{{y}_{i0}}}$

$S{R}_{Y|X=0}={\displaystyle {\sum}_{i=1}^{{n}_{1}}{r}_{{y}_{i1}}}$ (11)

${U}_{0}={n}_{0}{n}_{1}+\frac{{n}_{0}\left({n}_{0}+1\right)}{2}-S{R}_{Y|X=0}$

${U}_{1}={n}_{0}{n}_{1}+\frac{{n}_{1}\left({n}_{1}+1\right)}{2}-S{R}_{Y|X=1}$

$\mathrm{min}\left({U}_{0},{U}_{1}\right)=U$ (12)

The asymptotic standard error of *r _{rb}* is computed using Formula (13).

$AS{E}_{{r}_{rb}}=\sqrt{\frac{n+1}{3{n}_{0}{n}_{1}}}$ (13)

Confidence interval for *r _{rb}* is shown in Formula (14). If 0 is not included within the interval, it indicates that the population correlation is not null with a confidence level of 1 −

$P\left({r}_{rb}-{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{n+1}{3{n}_{0}{n}_{1}}}\le {\rho}_{rb}\le {r}_{rb}+{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{n+1}{3{n}_{0}{n}_{1}}}\right)=1-\alpha $ (14)

${\rho}_{rb}\ne 0\to 0\notin \left[{r}_{rb}-{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{n+1}{3{n}_{0}{n}_{1}}},{r}_{rb}+{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{n+1}{3{n}_{0}{n}_{1}}}\right]$

*z*_{1−α/2} = 1 − *α*/2 quantile in a standard normal distribution N(0, 1). If *α* = 0.05 (conventional value), *z _{0}*

4. U-Statistic-Based and Spearman’s Rho-Type Formulas for Rank-Biserial Correlation with Tied Data

In 1968, Cureton [1] remarked that Spearman’s rho-type coefficient developed by Glass [4] and Kendall’s tau-type coefficient developed by him in 1956 only coincide when there are no ties in Y affecting both groups of X. In 1956, such a tie was named a *bracket* *tie* [3]. In the 1968 paper, Cureton [1] proposed a formula that allows the convergence of results of the two types of coefficients when there is a tie that affects the lowest *n _{0}* ranks and the highest

${r}_{rb}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{b}{{n}_{1}}}$ (15)

In turn, when the rank-biserial correlation is calculated from the Mann-Whitney U statistic [5], Willson [2] raises the following formula for the correction in the case of a *two-sided* *tie* (Formula (16)).

$\left|{r}_{rb}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2b}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)$ (16)

To obtain the value of *b*, the *n* values of Y are sorted in descending order. Unaveraged ranks are assigned to Y values in one column and averaged ranks in case of ties in another column. Ranks are separated by a horizontal line. Above the line are the *n _{1}* highest ranks, and below the line are the

Cureton [3] takes up the example of 10 elements, which he had previously presented in 1956 and in which a *two-sided* *tie* appears. With the data, he computed the described correction and checked that *r _{br}* gives the same result as

Table 3. Random sample of 10 elements sorted in descending order from Y-ranks.

*Note*. *i* = sampling order, *x _{i}* = group membership of element

${\stackrel{\xaf}{r}}_{Y|X=1}=\frac{{\displaystyle {\sum}_{i=1}^{{n}_{1}}{r}_{yi}}}{{n}_{1}}=\frac{4.5+4.5+6.5+8+9.5+9.5}{6}=7.08\stackrel{\xaf}{3}$

${\stackrel{\xaf}{r}}_{y}=\frac{n+1}{2}=\frac{{n}_{0}+{n}_{1}+1}{2}=\frac{4+6+1}{2}=\frac{11}{2}=5.5$

In Table 3, when the highest six ranks are separated from the lowest four ranks by a horizontal line, a *two-sided* *tie* is found that corresponds to a Y value of 3. The sum of the unaveraged ranks of the *two-sided* *tie* among the four ranks highest is 5 and the corresponding sum of averaged ranks is 4.5, so the value of *b* or the difference between both sums is 0.5. When Spearman’s type-rho coefficient *r _{br}* is computed using Formula (15) given by Cureton [1], it gives the same result as Kendall’s tau-type coefficient (Formula (5)) developed by Cureton [3].

$b=5-4.5=0.5$

${r}_{rb}=\frac{{\stackrel{\xaf}{r}}_{X|Y=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{b}{{n}_{1}}}=\frac{7.08\stackrel{\xaf}{3}-5.5}{\frac{4}{2}-\frac{0.5}{6}}=\frac{1.58\stackrel{\xaf}{3}}{1.91\stackrel{\xaf}{6}}\approx 0.8261={\tau}_{rb}$

${r}_{rb}={\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{19}{23}\approx 0.8261$

From the formula based on the U-statistic to calculate *r _{rb}* [2], equality of results with

$S{R}_{Y|X=0}={\displaystyle {\sum}_{i=1}^{4}{r}_{yi}}=1+2.5+2.5+6.5=12.5$

$S{R}_{Y|X=1}={\displaystyle {\sum}_{i=1}^{6}{r}_{yi}}=4.5+4.5+6.5+8+9.5+9.5=42.5$

${U}_{0}={n}_{0}{n}_{1}+\frac{{n}_{0}\left({n}_{0}+1\right)}{2}-S{R}_{{r}_{y}|x=0}=4\times 6+\frac{4\times 5}{2}-12.5=21.5$

${U}_{1}={n}_{0}{n}_{1}+\frac{{n}_{1}\left({n}_{1}+1\right)}{2}-S{R}_{{r}_{y}|x=1}=4\times 6+\frac{6\times 7}{2}-42.5=2.5$

$U=\mathrm{min}\left({U}_{0},{U}_{1}\right)=\mathrm{min}\left(21.5,2.5\right)=2.5$

$\begin{array}{c}\left|{r}_{rb}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2b}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{4\times 6}{4\times 6-2\times 0.5}\times \left(1-\frac{2\times 2.5}{4\times 6}\right)\\ =\frac{24}{23}\times \left(1-\frac{5}{24}\right)=\frac{24}{23}\times 0.791\stackrel{\xaf}{6}=\left|{\tau}_{rb}\right|\\ =\frac{\left|A-D\right|}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{19}{23}\approx 0.8261\end{array}$

It should be noted that Formula (15) given by Cureton [1] does not work well in all cases with a *two-sided* *tie*. In turn, the other formula based on the Mann-Whitney U statistic (Formula (16)) developed by Willson [2] also does not fit this definition of *b*. However, both formulas do achieve equality of results with Kendall’s tau-type formula when using the correction based on *bracket* *ties* given by Cureton [3] for *τ _{rb}*. The constant

$A-D=2\left[\left({\displaystyle {\sum}_{i=1}^{{n}_{1}}{r}_{{y}_{i}|x=1}}\right)-\frac{{n}_{1}\left(n+1\right)}{2}\right]=2\left({n}_{1}{\stackrel{\xaf}{r}}_{y|x=1}-\frac{{n}_{1}\left(n+1\right)}{2}\right)$

Both sides of the equality are divided by *n _{0}n_{1}* − ∑

$\frac{A-D}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{2\left({n}_{1}{\stackrel{\xaf}{r}}_{y|x=1}-\frac{{n}_{1}\left(n+1\right)}{2}\right)}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}$

${\tau}_{rb}=\frac{2\left({n}_{1}{\stackrel{\xaf}{r}}_{y|x=1}-\frac{{n}_{1}\left(n+1\right)}{2}\right)}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}$

The numerator and denominator are divided by the inverse of *n _{1}*.

$=\frac{\frac{2}{{n}_{1}}\left({n}_{1}{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{{n}_{1}\left(n+1\right)}{2}\right)}{\frac{1}{{n}_{1}}\left({n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}\right)}=\frac{2\left(\frac{{n}_{1}{\stackrel{\xaf}{r}}_{Y|X=1}}{{n}_{1}}-\frac{{n}_{1}\left(n+1\right)}{2{n}_{1}}\right)}{\frac{{n}_{0}{n}_{1}}{{n}_{1}}-\frac{{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}{{n}_{1}}}=\frac{2\left({\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}\right)}{{n}_{0}-\frac{{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}{{n}_{1}}}$

$=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{1}{2}\left({n}_{0}-\frac{{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}{{n}_{1}}\right)}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}{2{n}_{1}}}$

The new tie correction, which is denoted by *b**, is half of the sum of products between the frequencies in groups 0 and 1 of X of the *c* bracket ties (Formula (17)).

${b}^{*}=\frac{{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}{2}$ (17)

The correction *b** is applied to the formula given by Cureton [3] without requiring any changes (Formula (18)) and gives the same result as its tau-type coefficient for rank-biserial correlation (Formula (5)).

${r}_{rb}^{*}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{{b}^{*}}{{n}_{1}}}={\tau}_{rb}$ (18)

It is also applied to the formula given by Willson [2] without requiring any changes (Formula (19)) and gives the same result as Cureton [1] tau-type coefficient (Formula (5)).

$\left|{r}_{rb}^{*}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2{b}^{*}}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\left|{\tau}_{rb}\right|$ (19)

This new proposal is verified with the previous example of 10 elements in which Formula (15) of Cureton [1] and Formula (16) of Willson [2] work well with the constant *b* defined from the *two-sided* *tie*, that is, they achieve the same result as Cureton’s tau-type coefficient (Formula (5)) and, therefore, as the Goodman-Kruskal gamma [14]. The constant *b* is calculated by Formula (17) and used in Formulas (18) (rho-type coefficient) and (19) (U-statistic-based coefficient).

${b}^{*}=\frac{{\displaystyle {\sum}_{l=1}^{1}{n}_{0l}\times {n}_{1l}}}{2}=\frac{{n}_{0}\times {n}_{1}}{2}=\frac{1\times 1}{2}=\frac{1}{2}=0.5$

$\begin{array}{c}{r}_{rb}^{*}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{{b}^{*}}{{n}_{1}}}=\frac{7.08\stackrel{\xaf}{3}-5.5}{\frac{4}{2}-\frac{0.5}{6}}=\frac{1.58\stackrel{\xaf}{3}}{1.91\stackrel{\xaf}{6}}={\tau}_{rb}=\frac{A-D}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}\\ =\frac{19}{23}\approx 0.8261\end{array}$

$U=\mathrm{min}\left({U}_{0},{U}_{1}\right)=\mathrm{min}\left(21.5,2.5\right)=2.5$

$\begin{array}{c}\left|{r}_{rb}^{*}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2{b}^{*}}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{4\times 6}{4\times 6-2\times 0.5}\times \left(1-\frac{2\times 2.5}{4\times 6}\right)\\ =\frac{24}{23}\times \left(1-\frac{5}{24}\right)=\frac{24}{23}\times 0.791\stackrel{\xaf}{6}=\left|{\tau}_{rb}\right|=\frac{19}{23}\approx 0.8261\end{array}$

A new seven-element example with two *bracket* *ties* and a *two-sided* *tie* is presented. With these data, Formula (15) of Cureton [1] and Formula (16) of Willson [2] fail to coincide with *τ _{rb}* (Formula (5)), when Formula (18) and (19), with the constant

Table 4. Joint frequencies between X (dichotomy) per row and Y (ordinal variable) per column and marginal frequencies of X.

*Note*. *Bracket* *ties* or values of Y that are repeated in both groups of X are indicated in parentheses, *n _{i}* = ∑

We start by calculating Cureton’s tau-type coefficient [3], for which agreements or number of concordant pairs and disagreements or number of discordant pairs are computed using Formulas (2) and (3).

$A=1\times \left(2+1+1\right)+1\times \left(1+1\right)+1\times 1=4+2+1=7$

$D=0\times \left(0+2+1\right)+1\times \left(0+2\right)+1\times 0=0+2+0=2$

*Bracket* *ties* or Y values that are repeated in the two groups of X are identified: values 2 and 3 of Y.

Tie correction or sum of products between the frequencies of the *bracket* *ties* in groups 0 and 1 de X is obtained (subtrahend of Formula (6))

${\sum}_{l=1}^{2}{n}_{0l}\times {n}_{1l}}=1\times 2+1\times 1=3$

After these calculations, Kendall’s tau-type coefficient for the rank-biserial correlation can be calculated using Formula (5), which yields a value of zero point repeating five.

${\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{7-2}{3\times 4-3}=\frac{5}{9}\approx 0.\stackrel{\xaf}{5}$

The statement of Formula (5) that the *τ _{rb}* [3] coincides with the value of the Goodman-Kruskal gamma [14] is verified.

$\gamma =\frac{A-D}{A+D}=\frac{7-2}{7+2}=\frac{5}{9}=0.\stackrel{\xaf}{5}={\tau}_{rb}$

In Table 5, seven sample data are sorted in descending order from the ranks of Y (with averaged ranks in case of ties). When the four highest ranks are separated from the three lowest by a horizontal line, a *two-sided* *tie* is discovered and this corresponds to a Y value of 2. The sum of the unaverage ranks of the *two-sided* *tie* among the four highest ranks is 4 and the corresponding sum of averaged ranks is 3, so the value of *b* or the difference of both sums is 1.

*Two-sided* *tie*: value 2 of Y.

$b=4-3=1$

The corrected Spearman’s rho-type coefficient [1] is calculated according Formula (15).

${r}_{rb}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{b}{{n}_{1}}}=\frac{4.625-4}{\frac{3}{2}-\frac{1}{4}}=\frac{0.625}{1.25}=0.5\ne {\tau}_{rb}=0.\stackrel{\xaf}{5}$

The corrected formula for rank-biserial correlation from U-statistic [2] is computed according Formula (16). The sums of ranks *SR _{Y}*

$S{R}_{Y|X=0}={\displaystyle {\sum}_{i=1}^{3}{r}_{yi}}=1+3+5.5=9.5$

$S{R}_{Y|X=1}={\displaystyle {\sum}_{i=1}^{4}{r}_{yi}}=3+3+5.5+7=18.5$

Table 5. Random sample of seven elements sorted in descending order from Y-ranks.

*Note*. *i* = sampling order, *x _{i}* = group membership of element

${U}_{0}={n}_{0}{n}_{1}+\frac{{n}_{0}\left({n}_{0}+1\right)}{2}-S{R}_{Y|X=0}=3\times 4+\frac{3\times 4}{2}-9.5=8.5$

${U}_{1}={n}_{0}{n}_{1}+\frac{{n}_{1}\left({n}_{1}+1\right)}{2}-S{R}_{Y|X=1}=3\times 4+\frac{4\times 5}{2}-18.5=3.5$

$U=\mathrm{min}\left({U}_{0},{U}_{1}\right)=\mathrm{min}\left(8.5,3.5\right)=3.5$

$\begin{array}{c}\left|{r}_{rb}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2b}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{3\times 4}{3\times 4-2\times 1}\times \left(1-\frac{2\times 3.5}{3\times 4}\right)\\ =1.2\times 0.41\stackrel{\xaf}{6}=0.5\ne \left|{\tau}_{br}\right|=0.\stackrel{\xaf}{5}\end{array}$

Formula (15) of Cureton [1] and Formula (16) of Willson [2] give the same result, which is zero point five, but differ from the result of Cureton’s *τ _{rb}* (Formula (5)), which is zero point repeating five. However, the calculation of

${b}^{*}=\frac{{\displaystyle {\sum}_{l=1}^{2}{n}_{0l}\times {n}_{1l}}}{2}=\frac{1\times 2+1\times 1}{2}=\frac{3}{2}=1.5$

Next, with this constant *b**, Formula (18) (rho-type coefficient) and Formula (19) (U-statistic-based coefficient) are used to compute the rank-biserial correlation, and both yield a result of zero point repeating five.

$\begin{array}{c}{r}_{rb}^{*}=\frac{{\stackrel{\xaf}{r}}_{Y|X=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{{b}^{*}}{{n}_{1}}}=\frac{4.625-4}{\frac{3}{2}-\frac{1.5}{4}}=\frac{0.625}{1.125}=0.\stackrel{\xaf}{5}={\tau}_{rb}=\frac{A-D}{{n}_{0}{n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}\\ =\frac{5}{9}=0.\stackrel{\xaf}{5}=\gamma =\frac{A+D}{A-D}=\frac{5}{9}=0.\stackrel{\xaf}{5}\end{array}$

$\begin{array}{c}\left|{r}_{rb}^{*}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2{b}^{*}}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{3\times 4}{3\times 4-2\times 1.5}\times \left(1-\frac{2\times 3.5}{3\times 4}\right)\\ =\frac{12}{9}\times \left(1-\frac{7}{12}\right)=1.\stackrel{\xaf}{3}\times 0.41\stackrel{\xaf}{6}=0.\stackrel{\xaf}{5}=\left|{\tau}_{rb}\right|=\left|\gamma \right|\end{array}$

5. Asymptotic Standard Error with Tied Data

From the relationship between *r _{rb}* and the Mann-Whitney U test [5], Willson [2] derives an asymptotic standard error for the significance test (Formula (20)) and interval estimation when there are ties in the sample data for Y. This error also applies to the new definition of

$AS{E}_{{r}_{rb}}=\sqrt{\frac{{n}^{3}-n-{{\displaystyle \sum}}_{l=1}^{c}\left({t}_{{x}_{l}}^{3}-{t}_{{x}_{l}}\right)}{3n\left(n-1\right){n}_{0}{n}_{1}}}$ (20)

*t _{l}* = non-unit frequencies of the Y values (of the

$\begin{array}{l}P({r}_{rb}-{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{{n}^{3}-n-{{\displaystyle \sum}}_{l=1}^{k}\left({t}_{{x}_{l}}^{3}-{t}_{{x}_{l}}\right)}{3n\left(n-1\right){n}_{0}{n}_{1}}}\le {\rho}_{rb}\\ \le {r}_{rb}+{z}_{1-\frac{\alpha}{2}}\times \sqrt{\frac{{n}^{3}-n-{{\displaystyle \sum}}_{l=1}^{k}\left({t}_{{x}_{l}}^{3}-{t}_{{x}_{l}}\right)}{3n\left(n-1\right){n}_{0}{n}_{1}}})=1-\alpha \end{array}$

6. Statistical Significance with Small Samples

In the case of small samples, a critical value for *r _{rb}* can be used to test its statistical significance [2]. The calculation of the critical value of the rank-biserial correlation (with a sample size of

${H}_{0}:{\rho}_{rb}=0$

${H}_{1}:{\rho}_{rb}\ne 0$

$\left|{r}_{crit}\right|=1-2\times \frac{{}_{\alpha}U{}_{{n}_{0}{n}_{1}}}{{n}_{0}{n}_{1}}\in \left[0,1\right]$ (21)

If $\left|{r}_{rb}\right|\le {r}_{crit}$, ${H}_{0}$ is hold.

If $\left|{r}_{rb}\right|>{r}_{crit}$, ${H}_{0}$ is rejected.

The previous example is taken up again with its small sample of 10 elements (four in group 0 and six in group 1). If the test is two-tailed with a significance level of 0.05, the rank-biserial correlation coefficient is significant. The |*r _{rb}*| value is greater than the critical value, so the null hypothesis of no correlation is rejected. The critical value for

$\left|{r}_{crit}\right|=1-2\times \frac{{}_{\alpha =0.05}U{}_{{n}_{0}=4,{n}_{1}=6}}{{n}_{0}{n}_{1}}=1-2\times \frac{5}{4\times 6}=1-0.41\stackrel{\xaf}{6}=0.58\stackrel{\xaf}{3}$

${r}_{rb}=\frac{19}{23}\approx 0.8261>\left|{r}_{crit}\right|=0.58\stackrel{\xaf}{3}$

In the second example with an even smaller sample of seven elements (three in group 0 and four in group 1), the critical value of *U *is not defined at a significance level of 0.05 in a two-tailed test, so the critical value for |*r _{rb}*| cannot be obtained. However, it is defined at a significance level of 0.1 in a one-tailed test (

$\left|{r}_{crit}\right|=1-2\times \frac{{}_{\alpha /2=0.1}U{}_{{n}_{0}=3,{n}_{1}=4}}{{n}_{0}{n}_{1}}=1-2\times \frac{1}{3\times 4}=1-0.1\stackrel{\xaf}{6}=0.8\stackrel{\xaf}{3}$

${r}_{rb}=0.\stackrel{\xaf}{5}<\left|{r}_{crit}\right|=0.8\stackrel{\xaf}{3}$

7. Example with a Clinical Sample of 31 Participants

A third example is presented with a larger clinical sample than the previous ones. A random sample of 31 middle-aged women, 20 with diabetes mellitus and 11 without diabetes, was recruited and their socioeconomic status, SES = {1 = low, 2 = medium-low, 3 = medium, 4 = medium-high, and 5 = high}, was recorded. The objective was to find out whether the relationship between clinical diabetes mellitus status, DM = {0 = no case, 1 = case}, and socioeconomic status is significant at a significance level of 0.05 in a two-tailed test, using the rank-biserial correlation.

The data of the 31 participants are shown in Table 6. In this 2 × 5 contingency table, the dichotomous variable of health status (0 = without diabetes and 1 = with diabetes) is placed per row and the ordinal variable of socioeconomic status with five ordered categories is arranged per column. This contingency table is made to facilitate the computation of agreements (*A*) and disagreements (*D*), the identification of *bracket* *ties* (SES values that are repeated in both groups of DM), and the achievement of the correction based on *bracket* *ties*. All of them

Table 6. Joint frequencies between diabetes mellitus (dichotomy) per row and socioeconomic status (ordinal variable) per column and marginal frequencies of diabetes mellitus.* *

*Note*. Diabetes mellitus (DM): 0 = no case and 1 = case, socioeconomic status (SES): 1 = low, 2 = medium-low, 3 = medium, 4 = medium-high, and 5 = high. *Bracket* *ties* are identified with the ordered categories of SES that are repeated in the two DM groups and are placed in parentheses, *n _{i}* = ∑

are calculations are required by Kendall’s tau-type coefficient given by Cureton [3] to estimate the rank-biserial correlation (Formula (5)).

The sum of products between each of the first four frequencies (1 to 4) in the first row and the sum of the remaining frequencies to the right and below (in the second row) after removing the row and column of the frequency provides the agreements or concordant pairs (Formula (2)).

$A=2\times \left(9+5+0+0\right)+2\times \left(5+0+0\right)+4\times \left(0+0\right)+2\times 0=28+10+0+0=38$

The sum of products between each of the last four frequencies (from *k* to 2) in the first row and the sum of the remaining frequencies to the left and below (in the second row) after removing the row and column of the frequency provides the disagreements or discordant pairs (Formula (3)).

$\begin{array}{c}D=1\times \left(6+9+5+0\right)+2\times \left(6+9+5\right)+4\times \left(6+9\right)+2\times 6\\ =20+40+60+12=132\end{array}$

The *bracket* *ties* in this example are the SES values 1, 2, and 3. The sum of products between the frequencies in groups 0 and 1 of DM (dichotomy) of the *bracket* *ties* in SES (ordinal variable) provides the correction that appears in the denominator of Formula (5) as a subtrahend.

${\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}=2\times 6+2\times 9+4\times 5=12+18+20=50$

The coefficient is calculated following Formula (5), as there are *bracket* *ties*.

${\tau}_{rb}=\frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}}=\frac{38-132}{11\times 20-50}=\frac{-94}{170}=-\frac{47}{85}\approx -0.5529$

It is found that the rank-biserial correlation, estimated by Kendall’s tau-type coefficient given by Cureton [3], yields the same result as the Goodman-Kruskal gamma [14], as stated by Formula (5), and this value corresponds to a large strength of association, that is, greater than 0.50 and less than 0.70 [18].

$\gamma =\frac{A-D}{A+D}=\frac{38-132}{38+132}=\frac{-94}{170}=-\frac{47}{85}\approx -0.5529$

${\tau}_{rb}=\gamma $

In Table 7, the data from the 31 participants are ranked. Unaveraged ranks appear in one column and averaged ranks (in case of ties) in another. The data are sorted in descending order from the Y ranks. The 20 highest ranks are separated from the 11 lowest by a horizontal line, thus uncovering a *two-way* *tie* corresponding to SES value of 2 (medium-low). The sum of the unaveraged ranks of the *two-way* *tie* among the highest ranks is 124 and the corresponding sum of averaged ranks is 112, so the value of *b* or the difference between the two sums is 12.

*Two-sided* *tie*: value 2 (medium-low) of SES

$b=124-112=12$

The corrected Spearman’s rho-type coefficient is calculated according to Formula (15).

${r}_{rb}=\frac{{\stackrel{\xaf}{r}}_{SES|DM=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{b}{{n}_{1}}}=\frac{\frac{273}{20}-\frac{31+1}{2}}{\frac{11}{2}-\frac{12}{20}}=\frac{-2.35}{4.9}\approx -0.4796\ne {\tau}_{rb}=-\frac{47}{85}\approx -0.5529$

The corrected formula for rank-biserial correlation from U-statistic is computed using Formula (16). The sums of ranks *SR _{SES}*

$S{R}_{SES|DM=0}={\displaystyle {\sum}_{i=1}^{11}{r}_{yi}}=31+2\times 29.5+4\times 24+2\times 14+2\times 4.5=223$

$S{R}_{SES|DM=1}={\displaystyle {\sum}_{i=1}^{20}{r}_{yi}}=5\times 24+9\times 14+6\times 4.5=273$

${U}_{0}={n}_{0}{n}_{1}+\frac{{n}_{0}\left({n}_{0}+1\right)}{2}-S{R}_{SES|DM=0}=11\times 20+\frac{11\times 12}{2}-223=63$

${U}_{1}={n}_{0}{n}_{1}+\frac{{n}_{1}\left({n}_{1}+1\right)}{2}-S{R}_{SES|DM=1}=11\times 20+\frac{20\times 21}{2}-273=157$

$U=\mathrm{min}\left({U}_{0},{U}_{1}\right)=\mathrm{min}\left(63,157\right)=63$

$\begin{array}{c}\left|{r}_{rb}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2b}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{11\times 20}{11\times 20-2\times 12}\times \left(1-\frac{2\times 63}{11\times 20}\right)\\ =\frac{220}{196}\times 0.4\stackrel{\xaf}{27}=\frac{55}{49}\times 0.4\stackrel{\xaf}{27}\approx 0.4796\ne \left|{\tau}_{rb}\right|=-\frac{2.35}{4.9}\approx 0.5529\end{array}$

It is important to clarify that Glass [4], in his proofs, uses an operational assignment of the values of the dichotomous variable of membership: {0 = group with the lowest sum of ranks or lowest ranks, 1 = group with the highest sum of ranks or higher ranks}. However, this classification can be reversed, since the rank-biserial correlation coefficient will be positive or negative, but with the same absolute value, depending on the assignment of the labels 0 and 1 to two values of Y. In clinics, 0 is often used for no cases and 1 for cases of a disease or disorder, as in this example, although these labels are ultimately arbitrary.

The rank-biserial correlation is recalculated with Spearman’s rho-type coefficient [1], using the tie correction based on *bracket* *ties* (Formula (17)). Applying

Table 7. Random sample of 31 participants sorted in descending order from the SES ranks, and SES marginal frequencies.* *

*Note*. *i *= sampling order, DM* _{i}* = group membership to participant

this correction according to Formula (18), the results are the same as Kendall’s tau-type coefficient developed by Cureton [3], as was the goal of Cureton [1] and Willson [2].

*b** = half of the sum of products between the frequencies in groups 0 (no case) and 1 (case) of DM in the *c* *bracket* *ties* of SES (Formula (17)).

${b}^{*}=\frac{{\displaystyle {\sum}_{l=1}^{3}{n}_{0l}\times {n}_{1l}}}{2}=\frac{2\times 6+2\times 9+4\times 5}{2}=\frac{50}{2}=25$

${r}_{rb}^{*}=\frac{{\stackrel{\xaf}{r}}_{SES|DM=1}-\frac{n+1}{2}}{\frac{{n}_{0}}{2}-\frac{{b}^{*}}{{n}_{1}}}=\frac{13.65-16}{\frac{11}{2}-\frac{25}{20}}=\frac{-2.35}{4.25}={\tau}_{rb}=\gamma =-\frac{47}{85}\approx -0.5529$

The result is the same, except that in absolute value, if the Willson’s formula with the Mann-Whitney U statistic [5] is used to estimate the rank-biserial correlation (Formula (19)), but applying the *b*-correction based on *bracket* *ties* instead of a *two-sided* *tie*.

${U}_{0}={n}_{0}{n}_{1}+\frac{{n}_{0}\left({n}_{0}+1\right)}{2}-S{R}_{SES|DM=0}=11\times 20+\frac{11\times 12}{2}-223=63$

${U}_{1}={n}_{0}{n}_{1}+\frac{{n}_{1}\left({n}_{1}+1\right)}{2}-S{R}_{SES|DM=1}=11\times 20+\frac{20\times 21}{2}-273=157$

$U=\mathrm{min}\left({U}_{0},{U}_{1}\right)=63$

$\begin{array}{c}\left|{r}_{rb}^{*}\right|=\frac{{n}_{0}{n}_{1}}{{n}_{0}{n}_{1}-2{b}^{*}}\times \left(1-\frac{2U}{{n}_{0}{n}_{1}}\right)=\frac{11\times 20}{11\times 20-2\times 25}\times \left(1-\frac{2\times 63}{11\times 20}\right)\\ =\frac{220}{170}\times \left(1-\frac{126}{220}\right)=\frac{22}{17}\times 0.4\stackrel{\xaf}{27}=\left|{\tau}_{rb}\right|=\left|\gamma \right|\approx 0.5529\end{array}$

Once the biserial-rank correlation has been pointwise estimated, its asymptotic standard error (*ASE*) is calculated using Formula (20). This approximation is adequate, since the group with diabetes (DM = 1) has 20 participants and the group without diabetes (DM = 0) counts more than 8 (*n _{0}* = 11), giving a total sample of 31 participants [2]. See Table 7 for the SES marginal frequencies used to calculate the tie correction in the denominator of Formula (20).

$\begin{array}{c}AS{E}_{{r}_{rb}}=\sqrt{\frac{{n}^{3}-n-{{\displaystyle \sum}}_{l=1}^{k}\left({t}_{{x}_{l}}^{3}-{t}_{{x}_{l}}\right)}{3n\left(n-1\right){n}_{0}{n}_{1}}}=\sqrt{\frac{{31}^{3}-31-2550}{3\times 31\times 30\times 11\times 20}}\\ =\sqrt{0.0443304}\approx 0.2105\end{array}$

The statistical significance of the rank-biserial correlation coefficient is tested. The statistical hypothesis is two-tailed.

${H}_{0}:{\rho}_{rb}=0$

${H}_{1}:{\rho}_{rb}\ne 0$

The z-statistic in absolute value is greater than the critical value and its probability value is less than the significance level, so the null hypothesis of zero correlation at a significance level of 0.05 in a two-tailed test is rejected.

$\left|Z\right|=\frac{\left|{r}_{rb}\right|}{AS{E}_{{r}_{rb}}}=\frac{0.5529}{0.2105}=2.6262>{z}_{0.975}=1.96$

$sig.=2\times \left(1-P\left(Z\le \left|z\right|\right)\right)=2\times \left(1-P\left(Z\le 2.6262\right)\right)=0.0043<\alpha =0.05$

The rank-biserial coefficient is estimated by the interval with a 95% confidence level. This confidence interval does not include 0, since the correlation is significant.

$P\left({r}_{rb}-{z}_{1-\frac{\alpha}{2}}\times AS{E}_{{r}_{rb}}\le {\rho}_{rb}\le {r}_{rb}+{z}_{1-\frac{\alpha}{2}}\times AS{E}_{{r}_{rb}}\right)=1-\alpha $

$P\left(-0.5529-1.96\times 0.2105\le {\rho}_{rb}\le -0.5529+1.96\times 0.2105\right)=0.95$

$P\left({\rho}_{rb}\in \left[-0.9656,-0.1403\right]\right)=0.95$

It should be noted that when the rank-biserial correlation is calculated using the Spearman’s rho-type coefficient given by Glass (Formula (7)) with current data, having SES ties in both DM groups, this coefficient yields the same result as the Somers [17] asymmetric d coefficient of the ordinal variable with respect to the dichotomous variable (*d _{SES}*

$\begin{array}{c}{r}_{rb}=\frac{2\left({\stackrel{\xaf}{r}}_{SES|DM=1}-{\stackrel{\xaf}{r}}_{SES|DM=0}\right)}{n}=\frac{2\times \left(\frac{273}{20}-\frac{223}{11}\right)}{31}=\frac{2\times \left(\frac{273}{20}-\frac{223}{11}\right)}{31}\\ =\frac{2\times \left(13.65-20.\stackrel{\xaf}{27}\right)}{31}=0.4\stackrel{\xaf}{27}\end{array}$

For the computation of the Somers [17] asymmetric measure of the ordinal variable (SES) with respect to the dichotomous variable (DM), data from Table 6 are used following the Formula (8). Apart from agreements (Formula (2)) and disagreements (Formula (3)), it is necessary to calculate ties per column or non-concordant and non-discordant pairs that are tied in SES. They are obtained through the sum of products between the two frequencies in each column of the 2 × 5 contingency table (Formula (9) using data of Table 6). This table has the two values of DM (0 = no case and 1 = case) per row and the five ordered categories of SES per column.

${d}_{SES|DM}=\frac{A-D}{A+D+{E}_{SES}}=\frac{38-132}{38+132+50}=\frac{-94}{220}=-0.4\stackrel{\xaf}{27}$

${E}_{SES}={\displaystyle {\sum}_{j=1}^{k}{n}_{0j}\times {n}_{1j}}=2\times 6+2\times 9+4\times 5+2\times 0+1\times 0=50={\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}$

${r}_{rb}=-0.4\stackrel{\xaf}{27}={d}_{SES|DM}\ne \gamma =-\frac{47}{85}\approx -0.5529$

8. The Other Way Around: Tau-Type, Rho-Type, and U-Statistic-Based Formulas for Rank-Biserial Correlation Equivalent to Somer’s *d _{Y}*

If any frequency per column is null in Table 6, the summand corresponding to that socioeconomic status category (untied value) is canceled within the sum of products in the computation ties per column, so *E _{SES}* equals the sum of products between the frequencies of the

$A+D+{E}_{Y}=A+D+{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}\times {n}_{1l}}={n}_{0}\times {n}_{1}=220$

If the difference between agreements and disagreements is included as a numerator, we have Cureton’s *τ _{rb}* calculated from Formula (1) in this random sample of 31 participants with tied data.

${\tau}_{br}=\frac{A-D}{{n}_{0}\times {n}_{1}}=\frac{38-132}{220}=\frac{-94}{220}=-0.4\stackrel{\xaf}{27}$

Now, *τ _{br}* from Formula (1) is equal to Glass’

$\left|{r}_{rb}\right|=1-\frac{2U}{{n}_{1}{n}_{0}}=1-\frac{2\times 63}{11\times 20}=1-\frac{126}{220}=0.4\stackrel{\xaf}{27}=\left|{d}_{SES|DM}\right|\ne \left|\gamma \right|=-\frac{47}{85}\approx 0.5529$

From these last finding, it follows that:

${\tau}_{br}={r}_{br}={d}_{Y|X}\ne \gamma $

$\frac{A-D}{{n}_{0}{n}_{1}}=\frac{2\left({\stackrel{\xaf}{r}}_{Y|X=1}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)}{{n}_{0}+{n}_{1}}=\frac{A-D}{A+D-{E}_{Y}}\ne \gamma $

$\begin{array}{l}\frac{38-132}{11\times 20}=\frac{2\times \left(\frac{273}{20}-\frac{223}{11}\right)}{11+20}=\frac{38-132}{38+132+50}=-0.4\stackrel{\xaf}{27}\\ \ne \gamma =\frac{38-132}{38+132}\approx -0.5529\end{array}$

$\left|{\tau}_{br}\right|=\left|{r}_{br}\right|=\left|{d}_{Y|X}\right|=1-\frac{2U}{{n}_{0}{n}_{1}}\ne \left|\gamma \right|$

$\begin{array}{l}=\left|\frac{38-132}{11\times 20}\right|=\left|\frac{2\times \left(\frac{273}{20}-\frac{223}{11}\right)}{11+20}\right|=\left|\frac{38-132}{38+132+50}\right|=1-\frac{2\times 63}{11\times 20}=0.4\stackrel{\xaf}{27}\\ \ne \left|\gamma \right|=-\frac{47}{85}\approx 0.5529\end{array}$

See these equivalences applied to the examples of 10 and 7 elements.

$A+D+{E}_{Y}={n}_{0}\times {n}_{1}$

$21+2+1=4\times 6=24$

${\tau}_{br}={r}_{br}={d}_{Y|X}\ne \gamma $

$\frac{A-D}{{n}_{0}{n}_{1}}=\frac{2\left({\stackrel{\xaf}{r}}_{Y|X=1}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)}{{n}_{0}+{n}_{1}}=\frac{A-D}{A+D-{E}_{Y}}\ne \gamma $

$\frac{21-2}{4\times 6}=\frac{2\times \left(\frac{42.5}{6}-\frac{12.5}{4}\right)}{4+6}=\frac{21-2}{21+2+1}=0.791\stackrel{\xaf}{6}\ne \gamma =\frac{21-2}{21+2}=\frac{19}{23}\approx 0.8261$

$\left|{\tau}_{br}\right|=\left|{r}_{br}\right|=\left|{d}_{Y|X}\right|=1-\frac{2U}{{n}_{0}{n}_{1}}\ne \left|\gamma \right|$

$=\left|\frac{21-2}{4\times 6}\right|=\left|\frac{2\times \left(\frac{42.5}{6}-\frac{12.5}{4}\right)}{4+6}\right|=\left|\frac{21-2}{21+2+1}\right|=1-\frac{2\times 2.5}{4\times 6}=0.791\stackrel{\xaf}{6}\ne \left|\gamma \right|\approx 0.8261$

$A+D+{E}_{Y}={n}_{0}\times {n}_{1}$

$7+2+3=3\times 4=12$

${\tau}_{br}={r}_{br}={d}_{Y|X}\ne \gamma $

$\frac{A-D}{{n}_{0}{n}_{1}}=\frac{2\left({\stackrel{\xaf}{r}}_{Y|X=1}-{\stackrel{\xaf}{r}}_{Y|X=0}\right)}{{n}_{0}+{n}_{1}}=\frac{A-D}{A+D-{E}_{Y}}\ne \gamma $

$\frac{7-2}{3\times 4}=\frac{2\times \left(\frac{18.5}{4}-\frac{9.5}{3}\right)}{3+4}=\frac{7-2}{7+2+3}=0.41\stackrel{\xaf}{6}\ne \left|\gamma \right|=\frac{7-2}{7+2}=0.\stackrel{\xaf}{5}$

$\left|{\tau}_{br}\right|=\left|{r}_{br}\right|=\left|{d}_{Y|X}\right|=1-\frac{2U}{{n}_{0}{n}_{1}}\ne \left|\gamma \right|$

$=\left|\frac{7-2}{3\times 4}\right|=\left|\frac{2\times \left(\frac{18.5}{4}-\frac{9.5}{3}\right)}{3+4}\right|=\left|\frac{7-2}{7+2+3}\right|=1-\frac{2\times 3.5}{3\times 4}=0.41\stackrel{\xaf}{6}\ne \left|\gamma \right|=0.\stackrel{\xaf}{5}$

A question arises as to which equivalence (with gamma or with asymmetric d) is more appropriate. Equivalence with the Goodman-Kruskal gamma [14] allows a non-directional estimation, which is the objective of a correlation, whereas the Somers [17] *d _{Y}*

$\sum}_{l=1}^{c}{n}_{0l}{n}_{1l}}\ne 0\Rightarrow \frac{A-D}{{n}_{0}\times {n}_{1}-{\displaystyle {\sum}_{l=1}^{c}{n}_{0l}{n}_{1l}}}>\frac{A-D}{{n}_{0}\times {n}_{1$

In this connection, Metsämuuronen [20] advises the use of Somers’ *d* to measure the association between dichotomous items and an ordinal scale test. In this example, the items determine the total score of the test. Consequently, the ordinal variable conditional on the dichotomous variable (Y|X) makes sense. However, the author notes that the coefficient *d _{Y}*

Some websites [23] advise calculating the rank-biserial correlation using Spearman’s rank correlation, which is an option offered by SPSS and other statistical software, and some studies follow this suggestion [22]. However, it is incorrect. It gives a very different and lower result due to the large number of ties in X. Cureton’s tau-type formula for the rank-biserial correlation [1] [3] is equivalent to Kruskal-Wallis’ gamma [14] and Glass’ rho-type formula to Somers’ *d _{Y}*

9. Conclusions

Let Y be a ranking from 1 to *n* or a variable with *k* ordered categories and X a dichotomy (0 and 1, assigning 0 to the group with the lowest Y-rank sum and 1 to the group with the highest Y-rank sum). When there are no ties for Y in the two groups of X, Cureton’s tau-type, Glass’ rho-type and Willson’s U-statistic-based formulas to calculate the rank-biserial correlation, as well as Kruskal-Wallis’ gamma and Somers’ delta give the same result. These calculations are very easy to perform from a 2 × *k* contingency table. When there are ties in the variable with *k* ordered categories that appear in both groups of X or *bracket* *ties*, not all coefficients coincide. Cureton [3] in his 1956 paper uses a correction based on the *bracket* *ties* when calculating the rank biserial correlation and, in 1968, Cureton [1] provided a new correction so that his tau-type formula and Glass’s rho-type formula give the same result taking into account that there are ties that cause an underestimation of the relationship. In the present work, this new Cureton correction has been called correction based on a *two-sided* *or* *riding* *tie* to be able to differentiate it. Willson [2] applied the *two-sided* *tie* correction to the U-statistic-based formula with the same goal, namely to give the same result as the tau-type formula with a correction based on *bracket* *ties*. The only proof of the correction based on a *two-sided* *tie* is a 10-data example given by Cureton [1], but there is no algebraic derivation. In the present work, it is verified that the correction does not work well with an example very similar to the one given by Cureton and in another example with a sample of 31 participants. Starting from an equality given by Glass [4] in his proof that the tau-type and rho-type coefficients coincide when there are no *bracket* *ties*, a new correction based on the *bracket* *ties* is derived. It is verified with the example of Cureton [1] and the two new examples. This correction consists of dividing by two the correction given by Cureton for the tau-type formula. Thus, it is much easier than calculating the *two-sided* *tie* correction and is the only change required to use the formulas developed by Cureton [1] and Willson [2], achieving the initial goal that the three coefficients coincide with each other and are equivalent to Kruskal-Wallis’ gamma. Hence, the asymptotic standard error based on the Mann-Whitney U test derived by Willson [2] is valid to test the significance and estimate the confidence interval with the new proposed correction. Naturally, one could change the objective and invert it, namely that all three formulas coincide with Somers’ *d _{Y}*

The rank-biserial correlation does not necessarily require the dichotomization of a ranking and can be used to estimate the linear association between a dichotomous qualitative variable and an ordinal one [3] [4], as well as to estimate the effect size with the Mann-Whitney U test [5] and the fit of a logistic regression model. Another option for these same objectives is to use the polyserial correlation proposed by Metsämuuronen [24]. However, the rank-biserial correlation formulas are much easier to calculate and yield very similar estimates [11]. For example, they can be computed with Excel or MATLAB. It is also possible to take advantage of the equivalences with Kruskal-Wallis’ gamma (with correction for ties when present) and Somers’ *d _{Y}*

For further research, using a simulation study, it is suggested to compare the coverage, power and efficiency of confidence intervals for the rank-biserial correlation (with the new proposed correction in case of ties) based on the approximation from the Mann-Whitney U test proposed by Willson [2] versus bootstrap confidence intervals. In turn, the confidence intervals developed for Kruskal-Wallis’ gamma and the Somers’ *d _{Y}*

Acknowledgements

The author would like to thank the referee for their helpful comments.

Conflicts of Interest

The authors declare no conflicts of interest.

[1] |
Cureton, E.E. (1968) Rank-Biserial Correlation when Ties Are Present. Educational and Psychological Measurement, 28, 77-79.
https://doi.org/10.1177/001316446802800107 |

[2] |
Willson, V.L. (1976) Critical Values of the Rank-Biserial Correlation Coefficient. Educational and Psychological Measurement, 36, 297-300.
https://doi.org/10.1177/001316447603600207 |

[3] |
Cureton, E.E. (1956) The Rank-Biserial Correlation. Psychometrika, 21, 287-290.
https://doi.org/10.1007/BF02289138 |

[4] |
Glass, G.V. (1966) Note on Rank-Biserial Correlation. Educational and Psychological Measurement, 26, 623-631. https://doi.org/10.1177/001316446602600307 |

[5] |
Mann, H.B. and Whitney, D.R. (1947) On a Test of Whether One or Two Random Variables Is Stochastically Larger than the Other. Annals of Mathematical Statistics, 18, 50-60. https://doi.org/10.1214/aoms/1177730491 |

[6] |
Brogden, H.E. (1949) A New Coefficient: Application to Biserial Correlation and to Estimation of Selective Efficiency. Psychometrika, 14, 169-182.
https://doi.org/10.1007/BF02289151 |

[7] |
Stanley, J.C. (1968) An Important Similarity between Biserial R and the Brogden-Cureton-Glass Biserial R for Ranks. Educational and Psychological Measurement, 28, 249-253. https://doi.org/10.1177/001316446802800204 |

[8] |
Khamis, H. (2008) Measures of Association: How to Choose? Journal of Diagnostic Medical Sonography, 24, 155-162. https://doi.org/10.1177/8756479308317006 |

[9] |
Berry, K.J., Johnston, J.E. and Mielke Jr., P.W. (2018) The Measurement of Association. A Permutation Statistical Approach. Springer, Cham.
https://doi.org/10.1007/978-3-319-98926-6 |

[10] |
Kraemer, H.C. (2006) Biserial Correlation. In: Kotz, S., Read, C.B., Balakrishnan, N., Vidakovic, B. and Johnson, N.L., Eds., Encyclopedia of Statistical Sciences, John Wiley & Sons, Inc., Hoboken, 276-279.
https://doi.org/10.1002/0471667196.ess0153.pub2 |

[11] |
Metsämuuronen, J. (2022) Rank-Polyserial Correlation: A Quest for a “Missing” Coefficient of Correlation. Frontiers in Applied Mathematics and Statistics, 8, 914-932.
https://doi.org/10.3389/fams.2022.914932 |

[12] |
Beck, C.T. (1986) Use of Nonparametric Correlation Analysis in Graduate Students’ Research Projects. Journal of Nursing Education, 25, 41-42.
https://doi.org/10.3928/0148-4834-19860101-14 |

[13] |
Kerby, D.S. (2014) The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation. Comprehensive Psychology, 3, Article No. 1.
https://doi.org/10.2466/11.IT.3.1 |

[14] |
Goodman, L.A. and Kruskal, W.H. (1954) Measures of Association for Cross Classifications. Journal of the American Statistical Association, 49, 732-764.
https://doi.org/10.1080/01621459.1954.10501231 |

[15] |
Newson, R. (2008) Identity of Somers’ D and the Rank Biserial Correlation Coefficient. https://www.rogernewsonresources.org.uk/miscdocs/ranksum1.pdf |

[16] |
Newson, R. (2022) Interpretation of Somers’ D under Four Simple Models.
https://www.rogernewsonresources.org.uk/miscdocs/intsomd1.pdf |

[17] |
Somers, R.H. (1962) A New Asymmetric Measure of Association for Ordinal Variables. American Sociological Review, 27, 799-811. https://doi.org/10.2307/2090408 |

[18] | Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences. 2nd Edition, Lawrence Erlbaum and Associates, Hillsdale. |

[19] |
Volaco, A., Cavalcanti, A.M., Filho, R.P. and Précoma, D.B. (2018) Socioeconomic Status: The Missing Link between Obesity and Diabetes Mellitus? Current Diabetes Reviews, 14, 321-326. https://doi.org/10.2174/1573399813666170621123227 |

[20] |
Metsämuuronen, J. (2020) Somers’ D as an Alternative for the Item-Test and Item-Rest Correlation Coefficients in the Educational Measurement Settings. International Journal of Educational Methodology, 6, 207-221.
https://doi.org/10.12973/ijem.6.1.207 |

[21] |
Porterhouse, M.D., Paul, S., Lieberenz, J.L., Stempel, L.R., Levy, M.A. and Alvarado, R. (2022) Black Women Are Less Likely to Be Classified as High-Risk for Breast Cancer Using the Tyrer-Cuzick 8 Model. Annals of Surgical Oncology, 29, 6419-6425.
https://doi.org/10.1245/s10434-022-12140-9 |

[22] |
Khan, M.I., Mehmood, M., Husain, S.O., Waqar, S., Asim, M. and Rehman, N. (2021) Correlation of Severity of Disease and Changes in Basic Hematological Parameters in Patients of COVID-19. Journal of Medical Sciences, 29, 69-73.
https://doi.org/10.52764/jms.21.29.3.1 |

[23] |
Heidel, E. (2022) Rank Biserial Correlation between Dichotomous and Ordinal Variables. Statistics. https://www.scalestatistics.com/rank-biserial.html |

[24] |
Metsämuuronen, J. (2019) Rank Polyserial Correlation.
http://dx.doi.org/10.13140/RG.2.2.31217.53608 |

[25] | IBM (2019) IBM SPSS Statistics 24 Algorithms. SPSS Inc., Chicago. |

Journals Menu

Contact us

+1 323-425-8868 | |

customer@scirp.org | |

+86 18163351462(WhatsApp) | |

1655362766 | |

Paper Publishing WeChat |

Copyright © 2024 by authors and Scientific Research Publishing Inc.

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.