Measure of Departure from Point-Symmetry for the Analysis of Collapsed Square Contingency Tables

Abstract

For square contingency tables with ordered categories, there may be some cases that one wants to analyze them by considering collapsed 3 × 3 tables with some adjacent categories combined in the original table. This paper considers the point-symmetry model (Wall and Lienert, 1976) for collapsed tables and proposes a measure to represent the degree of departure from point-symmetry for collapsed tables. Also it gives approximate confidence interval for the proposed measure.

Share and Cite:

Iki, K. , Yamamoto, K. and Tomizawa, S. (2021) Measure of Departure from Point-Symmetry for the Analysis of Collapsed Square Contingency Tables. Open Journal of Statistics, 11, 1062-1071. doi: 10.4236/ojs.2021.116063.

1. Introduction

Consider an r × r square contingency table with the same row and column classifications. Let p i j denote the probability that an observation will fall in the ith row and jth column of the table ( i = 1 , , r ; j = 1 , , r ). The point-symmetry (PS) model is defined by

p i j = p i j ( i = 1 , , r ; j = 1 , , r ) ,

where the symbol denotes i = r + 1 i ; see Wall and Lienert [1]. This indicates that the probability of an observation falling in ( i , j ) th cell is equal to the probability of the observation falling in point symmetric ( i , j ) th cell with respect to the center cell (when r is odd) or center point (when r is even). Now, we consider the [ ( r 1 ) / 2 ] ways of collapsing the r × r original table with ordered categories into a 3 × 3 table by choosing cut points after hth and h ( = r h ) th rows and after hth and h’th columns for h = 1 , , [ ( r 1 ) / 2 ] , where

[ r 1 2 ] = { r 1 2 ( r is odd ) , r 2 2 ( r is even ) .

We refer to each collapsed 3 × 3 table as the T h h ( h = 1 , , [ ( r 1 ) / 2 ] ) table. In the collapsed T h h table, let G k l ( h , h ) denote the corresponding cumulative probability for row value k ( k = 1 , 2 , 3 ) and column value l ( l = 1 , 2 , 3 ) ; i.e.,

G 11 ( h , h ) = i = 1 h j = 1 h p i j , G 12 ( h , h ) = i = 1 h j = h + 1 h p i j , G 13 ( h , h ) = i = 1 h j = h + 1 r p i j , G 21 ( h , h ) = i = h + 1 h j = 1 h p i j , G 22 ( h , h ) = i = h + 1 h j = h + 1 h p i j , G 23 ( h , h ) = i = h + 1 h j = h + 1 r p i j , G 31 ( h , h ) = i = h + 1 r j = 1 h p i j , G 32 ( h , h ) = i = h + 1 r j = h + 1 h p i j , G 33 ( h , h ) = i = h + 1 r j = h + 1 r p i j .

Then, Yamamoto et al. [2] considered the collapsed point-symmetry (CoPS) model as

G i j ( h , h ) = G i j ( h , h ) ( i = 1 , 2 , 3 ; j = 1 , 2 , 3 ; ( i , j ) ( 2 , 2 ) ) ,

for all h = 1 , , [ ( r 1 ) / 2 ] , where the symbol denotes i = 4 i . Note that the PS model implies the CoPS model, but the PS model is not equivalent to the CoPS model.

When the CoPS model does not hold, we are interested in measuring the degree of departure from CoPS. For square contingency tables with ordered categories, Tomizawa et al. [3] proposed a measure to represent the degree of departure from PS.

By the way, consider the data in Table 1 taken from Hashimoto [4]. These data describe the cross-classification of father’s and son’s occupational status categories in Japan which were examined in 1975 and 1995. For the data in Table 1(a) & (Table 1(b)) having five categories, there may be a case that we want to combine the occupational status into the simpler three categories, namely, “high”, “middle” and “low”. For example, the collapsed 3 × 3 table T14 has “high” category which is “(1) Capitalist” category in the original 5 × 5 table, “middle” category which is obtained by combing “(2) New middle”, “(3) Working” and “(4) Self-employed” categories in the original table, and “low” category which is “(5) Farming” category in it. Similarly, we can consider the collapsed 3 × 3 table T23, which has “high” category which is obtained by combing “(1) Capitalist” and “(2) New middle” categories in the original 5 × 5 table, “middle” category which is “(3) Working” category in the original table, and “low” category which is obtained by combing “(4) Self-employd” and “(5) Farming” categories in it. Table 2 and Table 3 give the collapsed 3 × 3 tables T14, T23 (for observations) for the data in Table 1(a) and (Table 1(b), respectively. Now, we are interested in seeing what degree the departure from PS is for each of tables T14 and T23. So, the present paper proposes a measure which represents the degree of departure from

(a) (b)

Table 1. Occupational status for Japanese father-son pairs; from Hashimoto [4]. (a) examined in 1975; (b) examined in 1995.

Note: Status (1) is Capitalist, (2) New middle, (3) Working, (4) Self-employed and (5) Farming.

(a) (b)

Table 2. Collapsed tables T14 and T23 for the data in Table 1(a). (a) T14 table; (b) T23 table.

(a) (b)

Table 3. Collapsed Tables T14 and T23 for the data in Table 1(b). (a) T14 table; (b) T23 table

CoPS by using collapsed 3 × 3 tables. For related research, see Iki et al. [5] and Balcha [6].

The new measures are introduced in Section 2. Section 3 presents an approximate variance and a confidence interval for the proposed measure. Section 4 gives examples. Finally, Section 5 concludes the paper.

2. Measure of Departure from Point-Symmetry for Collapsed Tables

Assume that { G i j ( h , h ) + G i j ( h , h ) 0 } . Let

D = { ( i , j ) | i = 1 , 2 , 3 ; j = 1 , 2 , 3 ; ( i , j ) ( 2 , 2 ) } ,

and

δ h h = 1 G 22 ( h , h ) , G i j ( h , h ) = G i j ( h , h ) δ h h ( ( i , j ) D ) ,

Q i j ( h , h ) = G i j ( h , h ) + G i j ( h , h ) 2 ( ( i , j ) D ) .

Consider a measure to represent the degree of departure from CoPS, defined by

Ψ ( λ ) = 1 [ r 1 2 ] h = 1 [ r 1 2 ] Ψ h h ( λ ) ( λ > 1 ) ,

where

Ψ h h ( λ ) = λ ( λ + 1 ) 2 λ 1 I h h ( λ ) ,

I h h ( λ ) = 1 λ ( λ + 1 ) ( i , j ) D G i j ( h , h ) { ( G i j ( h , h ) Q i j ( h , h ) ) λ 1 } ,

and the value at λ = 0 is taken to be continuous limit as λ 0 . Namely

Ψ ( 0 ) = 1 [ r 1 2 ] h = 1 [ r 1 2 ] Ψ h h ( 0 ) ,

where

Ψ h h ( 0 ) = 1 log 2 I h h ( 0 ) ,

I h h ( 0 ) = ( i , j ) D G i j ( h , h ) log ( G i j ( h , h ) Q i j ( h , h ) ) .

The submeasure Ψ h h ( λ ) represents the degree of departure from PS for the collapsed T h h table. We note that I h h ( λ ) is the power-divergence between two probabilities { G i j ( h , h ) } and { Q i j ( h , h ) } , and especially I h h ( 0 ) is the Kullback-Leibler information between them. (For more details of the power-divergence I h h ( λ ) , see Cressie and Read [7]; Read and Cressie [8] ).

Let

G i j c ( h , h ) = G i j ( h , h ) G i j ( h , h ) + G i j ( h , h ) ( ( i , j ) D ) .

Also let E = { ( 1 , 1 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 2 , 1 ) } . Then the submeasure Ψ h h ( λ ) is expressed as

Ψ h h ( λ ) = λ ( λ + 1 ) 2 λ 1 ( i , j ) E ( G i j ( h , h ) + G i j ( h , h ) ) I i j ( h , h ) ( λ ) ( λ > 1 ) ,

where

I i j ( h , h ) ( λ ) = 1 λ ( λ + 1 ) [ G i j c ( h , h ) { ( G i j c ( h , h ) 1 / 2 ) λ 1 } + G i j c ( h , h ) { ( G i j c ( h , h ) 1 / 2 ) λ 1 } ] ,

and the value at λ = 0 is taken to be continuous limit as λ 0 . Namely

Ψ h h ( 0 ) = 1 log 2 ( i , j ) E ( G i j ( h , h ) + G i j ( h , h ) ) I i j ( h , h ) ( 0 ) ,

I i j ( h , h ) ( 0 ) = G i j c ( h , h ) log ( G i j c ( h , h ) 1 / 2 ) + G i j c ( h , h ) log ( G i j c ( h , h ) 1 / 2 ) .

Moreover, the submeasure Ψ h h ( λ ) is also expressed as

Ψ h h ( λ ) = 1 λ 2 λ 2 λ 1 ( i , j ) E ( G i j ( h , h ) + G i j ( h , h ) ) H i j ( h , h ) ( λ ) ,

where

H i j ( h , h ) ( λ ) = 1 λ [ 1 ( G i j c ( h , h ) ) λ + 1 ( G i j c ( h , h ) ) λ + 1 ] ,

and the value at λ = 0 is taken to be continuous limit as λ 0 . Namely

Ψ h h ( 0 ) = 1 1 log 2 ( i , j ) E ( G i j ( h , h ) + G i j ( h , h ) ) H i j ( h , h ) ( 0 ) ,

H i j ( h , h ) ( 0 ) = G i j c ( h , h ) log G i j c ( h , h ) G i j c ( h , h ) log G i j c ( h , h ) .

Note that H i j ( h , h ) ( λ ) is Patil and Taillie’s [9] diversity index of degree λ for { G i j c ( h , h ) } and { G i j c ( h , h ) } , which includes the Shannon entropy (when λ = 0 ) in a special case.

We note that for all h = 1 , , [ r 1 2 ] and λ > 1 , (i) 0 H i j ( h , h ) ( λ ) ( 2 λ 1 ) / λ 2 λ , (ii) H i j ( h , h ) ( λ ) = 0 if and only if G i j c ( h , h ) = 1 (then G i j c ( h , h ) = 0 ) or G i j c ( h , h ) = 1 (then G i j c ( h , h ) = 0 ), and (iii) H i j ( h , h ) ( λ ) = ( 2 λ 1 ) / λ 2 λ if and if only if G i j c ( h , h ) = G i j c ( h , h ) = 1 / 2 , that is, G i j ( h , h ) = G i j ( h , h ) .

We see that the measure Ψ ( λ ) lies between 0 and 1. Also the submeasures Ψ h h ( λ ) lie between 0 and 1 for h = 1 , , [ r 1 2 ] . For each λ ( > 1 ) , there is the structure of CoPS if and only if Ψ ( λ ) = 0 ; and the degree of departure from CoPS is the largest, in the sense that G i j c ( h , h ) = 1 (then G i j c ( h , h ) = 0 ) or G i j c ( h , h ) = 1 (then G i j c ( h , h ) = 0 ) for ( i , j ) E and h = 1 , , [ r 1 2 ] if and only if Ψ ( λ ) = 1 .

3. Approximate Confidence Interval for Measure

Let n i j denote the observed frequency in ith row and jth column of the table ( i = 1 , , r ; j = 1 , , r ) . The sample version of Ψ ( λ ) , that is, Ψ ^ ( λ ) , is given by Ψ ( λ ) with { p i j } replaced by { p ^ i j } , where p ^ i j = n i j / n and n = n i j . We assume that { n i j } result from full multinomial sampling. We consider an approximate standard error for Ψ ^ ( λ ) and a large-sample confidence interval for Ψ ( λ ) . The term n ( Ψ ^ ( λ ) Ψ ( λ ) ) has asymptotically (as n ) a

normal distribution with mean zero and variance σ 2 [ Ψ ( λ ) ] by using the delta method. See Appendix for the details of σ 2 [ Ψ ( λ ) ] .

Let σ ^ 2 [ Ψ ( λ ) ] denote σ 2 [ Ψ ( λ ) ] with { p i j } replaced by { p ^ i j } . Then σ ^ [ Ψ ( λ ) ] / n is an estimated approximate standard error for Ψ ^ ( λ ) , and Ψ ^ ( λ ) ± z p / 2 σ ^ [ Ψ ( λ ) ] / n is an approximate 100 ( 1 p ) percent confidence interval for Ψ ( λ ) , where z p / 2 is the percentage point from the standard normal distribution corresponding to a two-tail probability equal to p.

4. Examples

Consider the data in Table 1(a) and Table 1(b) again. From Table 4(a) and Table 4(b), since the confidence intervals for Ψ ( λ ) applied to the data in each of Table 1(a) and Table 1(b) do not include zero for all λ , these would indicate that there is not a structure of CoPS in each table. When the degrees of departure from CoPS in Table 1(a) and Table 1(b) are compared using the confidence interval for Ψ ( λ ) , it is greater for Table 1(a) than for Table 1(b).

We further analyze the data in Table 1(a) and Table 1(b) using submeasures

(a) (b)

Table 4. Estimate of measure Ψ ( λ ) , approximate standard error for Ψ ^ ( λ ) and approximate 95% confidence interval for Ψ ( λ ) , applied to Table 1(a) and Table 1(b).

(a) (b)

Table 5. Estimate of submeasures { Ψ h h ( λ ) } applied to Table 1(a) and Table 1(b).

Ψ h h ( λ ) . We see from Table 5(a) that for Table 1(a), the degree of departure from point-symmetry in the collapsed table T23 is smaller than that in T14. Thus it is seen that (i) when we combine the categories (2), (3) and (4) in Table 1(a), the degree of departure from point-symmetry for collapsed table T14 is large, and (ii) when we combine the categories (1) and (2), and combine (4) and (5) in Table 1(a), that for the collapsed table T23 is less than the case of (i). Similarly, we see from Table 5(b) that for Table 1(b), the degree of departure from point-symmetry in the collapsed table T23 is smaller than that in T14. Thus it is seen that (i) when we combine the categories (2), (3) and (4) in Table 1(b), the degree of departure from point-symmetry for collapsed table T14 is large, and (ii) when we combine the categories (1) and (2), and combine (4) and (5) in Table 1(b), that for the collapsed table T23 is less than the case of (i).

5. Conclusions

When the CoPS model does not hold for the original 5 × 5 table, we are interested in (i) seeing what degree the departure from point-symmetry is for each of tables T14 and T23, (ii) seeing for which table of T14 and T23 the degree of departure from point-symmetry is larger, and (iii) seeing what degree the departure from CoPS is for the original 5 × 5 table. For (i) and (ii), the proposed { Ψ h h ( λ ) } are useful, and for (iii) the proposed measure Ψ ( λ ) is useful.

Since the collapsed tables are obtained by combing adjacent categories, it is meaning to consider collapsed 3 × 3 tables only when an original square contingency table has ordered categories. Therefore, a measure for CoPS in square ordinal tables should depend on the order of listing the categories. We note that it does not matter whichever submeasures for the collapsed tables are invariant or not invariant, because each collapsed 3 × 3 table obtained from an original square table is unique.

In addition, the measure Ψ ( λ ) is expressed by using same weights 1 / [ r 1 2 ] for submeasures { Ψ h h ( λ ) } . It seems useful to analyze an original square contingency table using the measure Ψ ( λ ) when we cannot decide which collapsed 3 × 3 table is important.

Acknowledgements

The authors would like to thank the referee for their helpful comments.

Appendix

Using the delta method, n ( Ψ ^ ( λ ) Ψ ( λ ) ) has asymptotically variance σ 2 [ Ψ ( λ ) ] as follows:

σ 2 [ Ψ ( λ ) ] = k = 1 r l = 1 r p k l ( 1 [ r 1 2 ] h = 1 [ r 1 2 ] Δ k l ( h , h ) ( λ ) ) 2 ,

where

Δ k l ( h , h ) ( λ ) = 2 λ 1 2 λ 1 δ h h A k l ( h , h ) ( λ ) + 1 Ψ h h ( λ ) δ h h B k l ( h , h ) ( λ ) ,

A k l ( h , h ) ( λ ) = ( i , j ) E [ C k l ( i j ) { 1 ( G i j c ( h , h ) ) λ λ G i j c ( h , h ) ( ( G i j c ( h , h ) ) λ ( G i j c ( h , h ) ) λ ) } + D k l ( i j ) { 1 ( G i j c ( h , h ) ) λ λ G i j c ( h , h ) ( ( G i j c ( h , h ) ) λ ( G i j c ( h , h ) ) λ ) } ] ,

B k l ( h , h ) ( λ ) = 1 I ( h + 1 k h ) I ( h + 1 l h ) ,

C k l ( i j ) = { I ( k h ) I ( l h ) ( i , j ) = ( 1 , 1 ) , I ( k h ) I ( h + 1 l h ) ( i , j ) = ( 1 , 2 ) , I ( k h ) I ( h + 1 l ) ( i , j ) = ( 1 , 3 ) , I ( h + 1 k h ) I ( l h ) ( i , j ) = ( 2 , 1 ) ,

D k l ( i j ) = { I ( h + 1 k ) I ( h + 1 l ) ( i , j ) = ( 1 , 1 ) , I ( h + 1 k ) I ( h + 1 l h ) ( i , j ) = ( 1 , 2 ) , I ( h + 1 k ) I ( l h ) ( i , j ) = ( 1 , 3 ) , I ( h + 1 k h ) I ( h + 1 l ) ( i , j ) = ( 2 , 1 ) ,

and I ( ) is the indicator function, I ( ) = 1 if true, 0 if not.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Wall, K.D. and Lienert, G.A. (1976) A Test for Point-Symmetry in J-Dimensional Contingency-Cubes. Biometrical Journal, 18, 259-264.
[2] Yamamoto, K., Murakami, S. and Tomizawa, S. (2013) Point-Symmetry Models and Decomposition for Collapsed Square Contingency Tables. Journal of Applied Statistics, 40, 1446-1452.
https://doi.org/10.1080/02664763.2013.786028
[3] Tomizawa, S., Yamamoto, K. and Tahata, K. (2007) An Entropy Measure of Departure from Point-Symmetry for Two-Way Contingency Tables. Symmetry: Culture and Science, 18, 279-297.
[4] Hashimoto, K. (1999) Class Structure in Modern Japan: Theory, Method and Quantitative Analysis. Toshindo Press, Tokyo. (In Japanese)
[5] Iki, K., Okada, M. and Tomizawa, S. (2018) An Extended Bivariate T-Distribution Type Symmetry Model for Square Contingency Tables. Open Journal of Statistics, 8, 249-257.
https://doi.org/10.4236/ojs.2018.82015
[6] Balcha, A. (2020) Curve Fitting and Least Square Analysis to Extrapolate for the Case of COVID-19 Status in Ethiopia. Advances in Infectious Diseases, 10, 143-159.
https://doi.org/10.4236/aid.2020.103015
[7] Cressie, N. and Read, T.R.C. (1984) Multinomial Goodness-of-Fit Tests. Journal of the Royal Statistical Society, Series B, 46, 440-464.
https://doi.org/10.1111/j.2517-6161.1984.tb01318.x
[8] Read, T.R.C. and Cressie, N. (1988) Goodness-of-Fit Statistics for Discrete Multivariate Data. Springer-Verlag, New York.
https://doi.org/10.1007/978-1-4612-4578-0
[9] Patil, G.P. and Taille, C. (1982) Diversity as a Concept and Its Measurement. Journal of the American Statistical Association, 77, 548-561.
https://doi.org/10.1080/01621459.1982.10477845

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.