Estimating Vertex Measures in Social Networks by Sampling Completions of RDS Trees


This paper presents a new method for obtaining network properties from incomplete data sets. Problems associated with missing data represent well-known stumbling blocks in Social Network Analysis. The method of “estimating connectivity from spanning tree completions” (ECSTC) is specifically designed to address situations where only spanning tree(s) of a network are known, such as those obtained through respondent driven sampling (RDS). Using repeated random completions derived from degree information, this method forgoes the usual step of trying to obtain final edge or vertex rosters, and instead aims to estimate network-centric properties of vertices probabilistically from the spanning trees themselves. In this paper, we discuss the problem of missing data and describe the protocols of our completion method, and finally the results of an experiment where ECSTC was used to estimate graph dependent vertex properties from spanning trees sampled from a graph whose characteristics were known ahead of time. The results show that ECSTC methods hold more promise for obtaining network-centric properties of individuals from a limited set of data than researchers may have previously assumed. Such an approach represents a break with past strategies of working with missing data which have mainly sought means to complete the graph, rather than ECSTC’s approach, which is to estimate network properties themselves without deciding on the final edge set.

Share and Cite:

Khan, B. , Dombrowski, K. , Curtis, R. and Wendel, T. (2015) Estimating Vertex Measures in Social Networks by Sampling Completions of RDS Trees. Social Networking, 4, 1-16. doi: 10.4236/sn.2015.41001.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Heckathorn, D.D. (1997) Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations. Social Problems, 44, 174-199.
[2] Heckathorn, D.D. (2002) Respondent-Driven Sampling II: Deriving Valid Population Estimates from Chain-Referral Samples of Hidden Populations. Social Problems, 39, 11-34.
[3] Salganik, M.J. and Heckathorn, D.D. (2004) Sampling and Estimation in Hidden Populations Using Respondent-Drive Sampling. Sociological Methodology, 34, 193-239.
[4] Salganik, M.J. (2006) Variance Estimation, Design Effects, and Sample Size Calculations for Respondent-Driven Sampling. Journal of Urban Health: Bulletin of the New York Academy of Medicine, 83, i98-i112.
[5] Borgatti, S.P., Carley, K.M. and Krackhardt, D. (2006) On the Robustness of Centrality Measures under Conditions of Imperfect Data. Social Networks, 28, 124-136.
[6] Burt, R.S. (1987) A Note on Missing Social Network Data in the General Social Survey. Social Networks, 9, 63-73.
[7] Stork, D. and Richards, W.D. (2002) Non-Respondents in Communication Network Studies. Group and Organizational Management, 17, 193-209.
[8] Ghani, A.C., Donnelly, C.A. and Garnett, G.P. (1998) Sampling Biases and Missing Data in Explorations of Sexual Part- ner Networks for the Spread of Sexually Transmitted Diseases. Statistics in Medicine, 17, 2079-2097.<2079::AID-SIM902>3.0.CO;2-H
[9] Kossinets, G. (2006) Effects of Missing Data in Social Networks. Social Networks, 28, 247-268.
[10] Huisman, M. and Steglich, C.E.G. (2008) Treatment of Non-Response in Longitudinal Network Studies. Social Net- works, 30, 297-308.
[11] Huisman, M. (2009) Imputation of Missing Network Data: Some Simple Procedures. Journal of Social Structure, 10, 1-29.
[12] Killworth, P.D. and Bernard, H.R. (1976) Informant Accuracy in Social Network Data. Human Organization, 35, 269-286.
[13] Killworth, P.D. and Bernard, R.H. (1979) Informant Accuracy in Social Network Data III: A Comparison of Triadic Structure in Behavioral and Cognitive Data. Social Networks, 2, 19-46.
[14] Bernard, H.R. and Killworth, P.D. (1977) Informant Accuracy in Social Network Data II. Human Communication Re- search, 4, 3-18.
[15] Bernard, H.R., Killworth, P.D. and Sailer, L. (1979) Informant Accuracy in Social Network Data IV: A Comparison of Clique-Level Structure in Behavioral and Cognitive Network Data. Social Networks, 2, 191-218.
[16] Bernard, H.R., Killworth, P., Kronenfeld, D. and Sailer, L. (1984) The Problem of Informant Accuracy: The Validity of Retrospective Data. Annual Review of Anthropology, 13, 495-517.
[17] Brewer, D.D. and Webster, C.M. (2000) Forgetting of Friends and Its Effects on Measuring Friendship Networks. So- cial Networks, 21, 361-373.
[18] Borgatti, S.P. and Molina, J.L. (2003) Ethical and Strategic Issues in Organizational Social Network Analysis. Journal of Applied Behavioral Science, 39, 337-349.
[19] Harris, J.K. (2008) Consent and Confidentiality: Exploring Ethical Issues in Public Health Social Network Research. Connections, 28, 81-96.
[20] Handcock, M.S. and Gile, K. (2007) Modeling Social Networks with Sampled or Missing Data. CSSS Working Paper, University of Washington, Seattle.
[21] Gile, K. and Handcock, M.S. (2006) Model-Based Assessment of the Impact of Missing Data on Inference for Networks. CSSS Working Paper, University of Washington, Seattle.
[22] Rothenberg, R. and Muth, S.Q. (2007) Large-Network Concepts and Small-Network Characteristics: Fixed and Variable Factors. Sexually Transmitted Diseases, 34, 604-612.
[23] Simic, M., Johnston, L.G., Platt, L., Baros, S., Andjelkovic, V., Novotny, T. and Rhodes, T. (2006) Exploring Barriers to “Respondent Driven Sampling” in Sex Worker and Drug-Injecting Sex Worker Populations in Eastern Europe. Journal of Urban Health, 83, 6-15.
[24] Robinson, W.T., Risser, J.M.H., McGoy, S., Becker, A.B., Rehman, H., Jefferson, M., Griffin, V., Wolverton, M. and Tortu, S. (2006) Recruiting Injection Drug Users: A Three-Site Comparison of Results and Experiences with Respon- dent-Driven and Targeted Sampling Procedures. Journal of Urban Health, 83, 29-38.
[25] Scott, G., et al. (2008) They Got Their Program, and I Got Mine? A Cautionary Tale Concerning the Ethical Implica- tions of Using Respondent-Driven Sampling to Study Injection Drug Users. International Journal of Drug Policy, 19, 42-51.
[26] Broadhead, R.S. (2008) Notes on a Cautionary (Tall) Tale about Respondent-Driven Sampling: A Critique of Scott’s Ethnography. International Journal of Drug Policy, 19, 235-237.
[27] Ouellet, L.J. (2008) Cautionary Comments on an Ethnographic Tale Gone Wrong. International Journal of Drug Policy, 19, 238-240.
[28] Fry, C.L. (2010) Ethical Implications of Peer-Driven Recruitment: Guidelines from Public Health Research. The American Journal of Bioethics, 10, 16-17.
[29] Heckathorn, D.D. (2007) Extensions of Respondent-Driven Sampling: Analyzing Continuous Variables and Controlling for Differential Recruitment. Sociological Methodology, 37, 151-207.
[30] Platt, L., Wall, M., Rhodes, T., Judd, A., Hickman, M., Johnston, L.G., Renton, A., Bobrova, N. and Sarang, A. (2006) Methods to Recruit Hard-to-Reach Groups: Comparing Two Chain Referral Sampling Methods of Recruiting Injecting Drug Users across Nine Studies in Russia and Estonia. Journal of Urban Health, 83, 39-53.
[31] Wang, J.C., Carlson, R.G., Falck, R.S., Siegal, H.A., Rahman, A. and Li, L.N. (2005) Respondent-Driven Sampling to Recruit MDMA Users: A Methodological Assessment. Drug and Alcohol Dependence, 78, 147-157.
[32] Ramirez-Valles, J., Heckathorn, D.D., Vázquez, R., Diaz, R.M. and Campbell, R.T. (2005) From Networks to Populations: The Development and Application of Respondent-Driven Sampling among IDUs and Latino Gay Men. AIDS and Behavior, 9, 387-402.
[33] Johnston, L.G., Malekinejad, M., Kendall, C., Iuppa, I.M. and Rutherford, G.W. (2008) Implementation Challenges to Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance: Field Experiences in International Settings. AIDS and Behavior, 12, 131-141.
[34] Abdul-Quader, A.S., Heckathorn, D.D., Sabin, K. and Saidel, T. (2006) Implementation and Analysis of Respondent Driven Sampling: Lessons Learned from the Field. Journal of Urban Health, 83, 1-5.
[35] Wasserman, S. and Faust, K. (1994) Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences). Cambridge University Press, Cambridge.
[36] Brandes, U. (2008) On Variants of Shortest-Path Betweenness Centrality and Their Generic Computation. Social Networks, 30, 136-145.
[37] Costenbader, E. and Valente, T.W. (2003) The Stability of Centrality Measures When Networks Are Sampled. Social Networks, 25, 283-307.
[38] Borgatti, S.P. and Everett, M.G. (2006) A Graph-Theoretic Perspective on Centrality. Social Networks, 28, 466-484.
[39] Burt, R.S. (1992) Structural Holes: The Social Structure of Competition. Harvard University Press, Cambridge, MA.
[40] Wilson, D.B. (1996) Generating Random Spanning Trees More Quickly than the Cover Time. Proceedings of the 28th Annual ACM Symposium on Theory of Computing, Philadelphia, 22-24 May 1996, 296-303.
[41] Broder, A. (1989) Generating Random Spanning Trees. 30th Annual Symposium on Foundations of Computer Science, Research Triangle Park, NC, 30 October-1 November 1989, 442-447.
[42] Bayati, M., Kim, J.H. and Saberi, A. (2010) A Sequential Algorithm for Generating Random Graphs. Algorithmica, 58, 860-910.
[43] Charikar, M., Jansen, K., Reingold, O. and Rolim, J.D.P. (2007) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques. 10th International Workshop, APPROX 2007, and 11th International Work- shop, RANDOM 2007, Princeton, 20-22 August 2007, Proceedings in Lecture Notes in Computer Science, Springer, Berlin.
[44] Butts, C.T. (2003) Network Inference, Error, and Informant (in) Accuracy: A Bayesian Approach. Social Networks, 25, 103-140.
[45] Marsden, P.V. (2005) Recent Developments in Network Measurement. In: Carrington, P.J., Scott, J. and Wasserman, S., Eds., Models and Methods in Social Network Analysis, Cambridge University Press, Cambridge, Vol. 7, 8-30.
[46] Liu, H.J., Li, J.H., Ha, T. and Li, J. (2012) Assessment of Random Recruitment Assumption in Respondent-Driven Sampling in Egocentric Network Data. Social Networking, 1, 13-21.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.