An Original Solution for Completing Research through Snowball Sampling—Handicapping Method ()
1. Introduction
Snowball sampling belongs to the group of non-random or non-probability sampling methods, sometimes also called directed, empirical, subjective, etc. surveys, but they all refer to the same principle, namely the rational and voluntary selection of the survey units, adapted to the type of sampling established. While for random sampling methods it was possible to calculate the probability that a unit in the population would be included in the sample, this is impossible in non-probability surveys. But, for the application of random surveys, there is a need for up-to-date data-bases and the costs of these sampling techniques are not to be neglected, as not infrequently, these sampling methods require a longer period for design-organization and are therefore less operational.
Non-random sampling, is less strict, easier to apply and does not involve consideration of representativeness as a desirable endpoint for describing the sample. These sampling methods usually leave it up to the researcher to decide which of the investigated community units will be selected and when to end the research. Therefore, schemes in this category are mainly used in exploratory surveys, carried out using data and qualitative research methods. The advantages of non-probability surveys include: they can be used successfully when there is no access to (or no list of) the population studied (e.g. there is no list of those who prefer a certain brand of cigarettes, beer, etc.); it is the only method that can be used when the target population is difficult to identify (e.g. alcoholics, drug users, etc.) or is very specific; a survey frame to design the research is not required; it is less costly than random surveys; it allows results to be obtained much more quickly than probability surveys.
Among the disadvantages of the method are: units are sampled in an arbitrary manner, so the probability of units entering the sample cannot be calculated, consequently the variance and the estimator shift cannot be calculated, so it is not possible to measure the precision of the indicators; there is no guarantee that all units of the population have an equal chance of entering the sample; the selection procedure depends a lot on the experience of the researcher, and quite often the resulting sample can be biased.
The major disadvantage of this method is the absence of objective, quantitative criteria for the termination of the survey. There is ambiguity about the size of the sample to be researched, and therefore the number of interviewees to ensure consistency and relevance of the information collected. This number cannot be calculated in advance, but depends on the experience of the researcher, and on the moment considered that the collection of information.
The present material attempts to counteract this shortcoming by providing a method that gives statistical consistency to the final decision on whether to terminate research under known statistical risks.
2. Literature Review
The literature review was conducted systematically on the main topics where the snowball method is most commonly applied.
First, we review the treatment of some methodological problems of snowball sampling, found in Browne (2005) who emphasizes the interpersonal relationships between groups which help to build the sample, and Snijders (1992) who makes a critical review of the possibilities of sampling in a synthetically schematic network. A contribution to the development of the method is made by Berg (1988), and Naderifar et al. (2017) which establish chain research procedures. A mathematical formalization of the procedure is provided by Goodman (1961). Rather interestingly enough is the work developed by Lecy and Beatty (2012), who conducted an extensive review of specific literature in the world scientific literature. Dragan and Isaic-Maniu (2012, 2013) also contribute to the development of snowball sampling method. Estimating the number and identifying the health problems of homelessness is a topic developed by the Institute of Medicine, Committee on Health Care for Homeless People (1988) in Washington DC. Fisher (1994) estimates the number and mental health of groups of homeless people. The same important topic for urban agglomerations is addressed by D’Onise et al. (2007) to better calibrate the social services of city administrations. Child abuse is another topic well covered in the scientific literature by the snowball research method. The topic is extensively developed by Park-Higgerson et al. (2008), and ways to reduce this social scourge are re-searched by Ttofi and Farrington (2011). Menesini and Salmivalli (2017) as well as Walton (2005) discuss the prevalence, age and gender differences and different types of bullying. The consequences of violence on children are analysed by Moore et al. (2017), who insist on bullying actions and increased risk of suicidal behaviours in adolescents. A comparison between the level of bullying in state and private schools in Islamabad, Pakistan is conducted by Najam and Kashif (2018) through the medium o field research among 400 students, involving a series of interviews with children in grades 4 and 5, as well as with parents, teachers and school managers. Research with the same aim—identifying the causes and combating bullying, has also been conducted by Hymel and Swearer (2015), Patton et al. (2017), and Fullchange and Furlong (2016). Mishna (2004) organized a survey in order to find ways to reduce physical and verbal bullying. More recently, cyberbullying has become a serious public health problem faced especially by young people, a problem studied by Dennehy et al. (2020), who continue the research conducted by Vandebosch and Van Cleemput (2008) by organizing focus groups, as well as by Navarro and Serna (2016). Also, in the same sphere of concern, there is have the issue of the contradiction between the right to privacy and the expansion of the internet, namely the rise of Big Data, a topic developed by Hazarika et al. (2019).
Migration and trafficking in human beings is a continuation of the studies published by Salt and Almeida (2006), starting from the lack of data on migration flows, the authors initiate a field research, and Reichel and Morales (2017) study the issue of migration in European countries using both official data and field research, a topic also ad-dressed by Pastore and Roman (2020). The issue of the health status of migrants arriving in waves is developed by Indatwa (2020). Dowle (2021), also using the snowball method, analyses the events of the 2015-2016 European migration “crisis” in each of the four Nordic countries: Denmark, Finland, Norway and Sweden as well as border management. The risks of different economic and social activities is the subject of concern of Mohammed Ameen and Mourshed (2017) who follow through a survey the consequences of accelerated urbanization development and the multiple effects of this process in terms of pollution, reduction of green spaces and carbon emissions. The process of hypertrophied growth of urban tourism and the social consequences of this process, rep-resent the subject of the concerns of Zmyślony and Kowalczyk-Anioł (2019) team, and Kraidi et al. (2020) aim to identify the risks in the management of pipeline systems for the transport of petroleum products, using a questionnaire distributed among different economic agents. McCarthy and Schurmann (2018) follow Australian farmers’ perceptions of the risks associated with organic farming. Shanmugam et al. (2022) conduct a survey of financial risk awareness among the urban population in India. Stjepić et al. (2021) investigate the business success of SMEs in Business Intelligence Systems (BIS) and associated risks.
Various other economic issues. The environmental risk generated by excessive, uncontrolled urbanization with ignorance of environmental issues is analysed by Raed and Monjur (2017). Kaya et al. (2020) analyse the impact as well as the risks generated by the migration of IT infrastructure and applications to the cloud. Also, the snowball method is used by Rashid & Mohd Harif (2020) on major banks in Cairo-Egypt to identify risk factors in SME lending. Sequeira (2022) develops a topic of particular economic interest—poultry production and marketing in India and the spread of specific diseases. Topic also ad-dressed by Salman and Hassan (2020). The cross-border cooperation between Spain and Portugal is developed by González-Gómez and Estrella (2016) respectively González-Gómez and Estrella (2020), and Peck and Mulvey (2018) study cross-border work between England and Scotland.
Domestic Violence (DV) has become a major problem in contemporary society, and the snowball method lends itself to this area. Alhabib et al. (2009) conduct a research based on the investigation of databases, studies and field research on violence against women. Ruiz-Pérez et al. (2007) develop a problem related to domestic violence re-search by comparing different methods of conducting field research. The mental con-sequences of domestic violence are the subject of Hackett’s (2011), Golding’s (1999) and Trevillion’s (2012) concern. The psychological causes of gender-based violence are analysed by Lucena et al. (2018). Shah et al. (2012) conduct a study on domestic violence by population segments, psychotic/non-psychotic and age categories of spouses. Among the effects generated by the COVID-19 pandemic there is a vigorous increase in Domestic Violence (DV), a situation analysed by Boserup et al. (2020). Also related to the pandemic, Debashish and Al-Khalifa (2022) develop the issue of volunteerism in Islamic society in Bahrain in population testing and vaccination activities, and Wang et al. (2021) conduct a literature survey on the spread and control of the epidemic in different countries. Stevenson and Wakefield (2021) conduct an analysis of group behaviour under pandemic travel restrictions.
Other areas of application. Ruban (2017) investigates the limits of Cross-Border Co-operation (CBC) objective of the European Union, collecting interview data by using the snowball method. Jaisuekul and Teerasukittim (2017) identified the factors that can contribute to the promotion of medical and cosmetic tourism in Thailand by. Consumer behaviour is studied by Yoshida et al. (2013). A survey among gamblers to determine the impact of aggressive advertising on their behaviour is conducted by Killick and Griffiths (2022). The collection and entire process of data processing in marketing research is carried out by Gabor (2016). A study conducted by Khan and Bashir (2020) aimed at transferring practices from the commercial sector to non-profit activities through snowball sampling research.
3. Sequential Test—Short History
A so-called sequential test is a procedure whereby after each “test” (interview, measurement, determination, trial, etc.) a certain hypothesis can be accepted, the same hypothesis rejected or additional (additional) information (evidence) claimed (re-quested). It is for these reasons that the size of the sample examined is not known a priori, but is a random element. In some cases, this sample size is very small or, on the contrary, uneconomically large. The state of uncertainty, i.e. the situation when an additional sample is requested, may last longer or shorter depending on the additional information provided by each additional interviewee. The theoretical foundation of sequential analysis was laid in the 1940s (20th century), with independent research on the subject in the UK by Barnard (1946) and in the USA by Wald (1947), a leader in the field, who succeeded in demonstrating precisely the “critical points” of this methodology. The two, working independently, developed a procedure they called sequential analysis, in which the inference made on the population was carried out “step by step”. Abraham WALD (1902-1950), was born in Cluj, Romania, studied in Cluj, then in 1927 in Vienna where he completed his doctorate with Karl Menger, and developed concerns related to econometric research. He emigrated to the USA after the Nazi expansion in Europe, where he headed the Statistical Research Group at Columbia University (SRGCU), which, under the aegis of the War Department, developed econometric problems of military interest and application, including the sequential test. In 1943, Wald wrote a technical report entitled “Statistical Analysis of Statistical Data: Theory”, with results later published in the seminal work Sequential Analysis.
Synthesis of the Sequential Method
The essence of the method is given by the so-called Sequential Probability Ratio Test.
If X is a continuous or discrete random variable individualized by the density
where
is an unknown parameter, to check the statistical hypothesis
, (1)
with alternative
(
) (2)
The likelihood functions associated with the two hypotheses:
(3)
(4)
represents the density of X when H0 is true, and
is the density of X when hypothesis H1 is correct. Wald substantiated the sequential analysis by creating the Sequential Probability Ratio Test.
(5)
where:
are a succession of extractions from the population described by the distribution function;
—density
;
Ratio Ɣn = P1,n/P0,n—likelihood ratio.
The sequential test is composed as follows: two constants A and B are chosen, both positive and A > B. At each extraction the ratio Ɣn = P1,n/P0,n is calculated, and if
B < Ɣn < A (6)
then the experiment continues, extracting a new unit. Whether
Ɣn ≥ A (7)
the research process ends by accepting the alternative hypothesis (H1) and obviously rejecting the null hypothesis (H0). Whether
Ɣn ≤ B (8)
the research ends by accepting the hypothesis H0 and rejection H1. The best approximations for limits A and B (Wald, 1947: pp. 44):
(9)
respectively
(10)
values that satisfy the theoretical requirements imposed by the precision of the method.
For pragmatic reasons it is much more convenient to work with the logarithm of the ratio P1,n/P0,n than directly with the ratio itself (Girshick, 1946). The reason is that log(P1,n/P0,n) can now be written as a sum of n terms.
Thus:
Ɣn
(11)
and noting
(12)
we can write the decision rules as follows:
whether
the experiment continues; (13)
whether
hypothesis H1 is accepted and H0 is rejected; (14)
whether
hypothesis H0 is accepted and H1 is rejected. (15)
Because the sample size is random, it is necessary to evaluate its average volume in order to know how many extractions (in average value) are needed, in order that a final decision to accept or reject the null hypothesis could be made. Average Sample Number (ASN), noted
, is given by (16).
(16)
where:
is the true value of the considered parameter, and
is the average value of the variable
.
it is an operational feature that gives the discrimination strength of a survey.
(17)
where h is given by
(18)
where D is the domain of definition of the variable X.
4. Sequential Validation of the Researched Fraction in the Total of a Population
The specific fraction or share of the population, which has the X characteristic followed in the research, in the sequential variant, is particularized, considering the hypothesis of a binomial distribution as follows: Sequential Probability Ratio Test, to check the statistical hypothesis
(19)
and
(with
) (20)
also setting the quantities α, β—namely the statistical risks of type I and II that accompany the test.
Let xi be the result found in the questioned person of rank i, and the finding concerns a negative aspect, xi = 1 and xi = 0 are assigned, if the finding highlights a corresponding state, and of the n persons, s(n) are those found in the situation x = 1. Then:
(21)
respectively
(22)
and the logarithm of the likelihood ratio becomes:
(23)
The investigation continues as long as
(24)
If
(25)
we reject H0, accept H1 and conclude the research, and if it is obtained
(26)
the null hypothesis is accepted and the research is concluded.
Denoting by A(n) the quantity we will call the line of acceptance
(27)
and with R(n) the quantity we will call the line of rejection
(28)
then the decision can be expressed as follows:
If
, then the research must be continued by questioning the a (n + 1) person;
If
, then the research is concluded by rejecting the H0 hypothesis;
If
, then the research is concluded by accepting the H0 hypothesis.
It can be seen that the quantities A(n) and R(n) depend only on p0, p1 and on the α and β, risks and, therefore, they can be calculated before the investigation process of the targeted population, and the field survey can be organized much more efficient. If, from calculations, A(n) is not a whole number, then we round to the largest integer less than A(n) and if R(n) is not a whole number, then we round to the smallest integer greater than R(n), obviously as we refer to a number of people.
Relationships that give A(n) and R(n) appear geometrically as two lines (the independent variable being n), as they have the same slope ρ and the ordinates
and
(29)
Therefore, decision lines have the next equations
(30)
and respectively
(31)
The symbol “log” usually means the decimal logarithm but, because in the continuous case, many densities are exponential, then the application of the natural logarithm leads to convenient forms from a practical point of view.
5. Handicapping Method and Case Study
The proposed procedure is practical and easy to implement, and the presentation is made directly in an applicable manner, on the concrete case of homeless people in Bucharest, Romania. The appearance of homeless people and street children occurred in Romania after 1990, and the authorities’ concern for these people was especially visible after 2007, with the country’s accession to the European Union and the emergence of new rules on supporting these people. At European level there is a European Federation of National Organizations Working with the Homeless—FEANTSA (FEANTSA, 2021, 2022) which focuses on the urban dimension of EU policy, through cooperation with local authorities in the European Forum to Combat the Phenomenon and for the collection of data on homelessness. FEANTSA has developed a European Typology of Homelessness and housing exclusion—ETHOS. The number of homeless people in Romania is very difficult to establish, and the situation is very volatile, the available data being unreliable. According to European Social Policy Network (ESPN, 2019), the estimated number would be 15,000 people living rough, refers to the year 2008, this figure being estimated by the Samusocial Foundation and Médecins Sans Frontières survey (KE-02-19-507-EN-N-pdf, p. 32). According to the 2011 Census, the number of homeless people in the country was of 162,375, and the 2021 census has been postponed to 2022, when newer figures are expected. The report of the Samusocial Romania association (Samusocial Romania, 2013) for Bucharest indicates the number of 5000 homeless people in 2010. The Quality of Life journal (Dan & Dan, 2005: pp. 101-122) shows the same number of 5000 people, and the local authorities in Bucharest (DGASMB, 2021) communicate that they take care of about 900 people. This homeless population is generated by the depopulation of villages, the loss of homes, but also by the fact that about 3000 young people, who turn 18, leave childcare institutions every year, start life on their own, and as some of them, fail to integrate into the society (White Paper on Homeless Youth in Romania, 2017), increase the ranks of homeless people.
As part of an action to assess the health of homeless people present in the North Station area, Bucharest, and of the subscribers of the “Sfantul Andrei” Care Centre, the abusive consumption of alcohol was monitored, a factor likely to aggravate or trigger various diseases. The World Health Organization (WHO) has defined harmful consumption as a regular average consumption of more than 40 g of alcohol/day in women and 60 g of alcohol/day in men (WHO, 2018). It was considered that a proportion of up to 1% of alcohol consumers above the limit recommended by the WHO is acceptable at the group level (p0 ≤ 0.01), and that a weight of 10% or more leads to an appreciation of alcohol abuse at community level (p1 ≥ 0.10). The interviews took place individually, successively. The number of people consuming alcohol above the limit recommended by the WHO denoted as s(n) is compared with the decision limits A(n), respectively R(n), given by Relations (30) and (31). The statistical risks of type I and II, associated with the procedure, were established at the usual level of sample surveys (α = 5% and β = 10%). If the number of people identified as consuming alcohol is above the recommended limit (n) ≤ An, the research ends with the acceptance of H0 hypothesis, and if s(n) ≥ Rn the research ends by rejecting the H0 hypothesis and accepting the H1 hypothesis, so there is alcohol abuse at the community level. We will not use the classic sequential method, but the handicapping method, starting from Relations (30) and (31), which can also be written in the form
(32)
respectively
(33)
The proposed procedure, called by us Handicapping method, a name inspired by the sports terminology, was initially applied in the races of sailing ships, in order to allow the participation in the same race of boats of different classes. Handicap noun (competition) is a disadvantage given to a person taking part in a game or competition in order to reduce their chances of winning, or a sports event in which such disadvantages are given (Cambridge Dictionary, 2022). The process has been standardized and extended to other sports (Jensen, 2017). The participants in the competition, different in terms of the previous performances or size classes, are assigned a score or a time, so the competitors have a different Go at the start. This handicap aims to equalize the chances in competition for participants different in terms of force. Over time, the process has expanded and applied in Horse racing, in Chess, GO, Shogi, Polo, Gliding, Sailing, Tennis, Motorcycle speedway, Golf, etc.
In snowball sampling as well, the proposed procedure requires that the value of the handicap ((1 − ρ)/ρ) penalize, during the series of extractions, each element/person classified in the negative category, so it delays the decision with the size of this handicap.
Next, the decisional relations are rewritten, in order to highlight the number of people outside the negative criterion (
(34)
(35)
Relation (34) gives the condition of acceptance of the H0 hypothesis, so the completion of the survey with the conclusion that the proportion of non-alcoholic persons does not exceed the threshold established as admissible, and Relation (35) represents the condition of acceptance of the H1 alternative. The handicapping method parameters are:
(36.1)
(36.2)
(36.3)
The value
represents the value of the handicap, and the development of the research require a certain algorithm. First, the interview is conducted with the first person identified from the target group. If he/she is non-alcoholic, number 1 is added to the initial value
, and if he/she is in the abusive consumer category, the value
is subtracted from
. The interview with the second person identified in the group is carried out—if he/she is non-alcoholic, the number 1 is added to the new value of the handicap (
), and if he/she is an abusive consumer,
is subtracted from the increased value of the handicap (
). The survey thus continues, interview after interview, and if a number higher by 1 than the limit
is obtained, the research ends with the acceptance of the H0: p ≤ p0 hypothesis, so in the case of the study, the conclusion that the group does not contain alcohol consumers in a proportion greater than the established threshold (p0 = 1%), obviously a statement affected by the two decisional risks α, respectively β. If a negative number is obtained in the sequence of interviews, the survey is stopped with the acceptance of the H1: p ≥ p1 hypothesis, and the final conclusion is that in the researched group alcohol abuse is above the limit established as the research hypothesis (10% in this case).
Based on the Relations (27), (28) and (29) we obtain: ρ = 0. 04318, χ0 = −0.93886, respective χ1 = 1.205379, consequently the decisional Lines (30) and (31) are
Carrying out the field investigation, according to this procedure, involves successive interviews, and after each result is compared the cumulated number of registered values with A(n), respectively R(n), and then the decision is made to accept or reject the advanced hypotheses. Thus, the difficulty and slowness of the process highlights the advantages of handicapping method, which involves the calculation of Indicators (36.1), (36.2), (36.3). In this study, the values (rounded, being people) are obtained:
,
,
.
Following the conduct of the survey, the first interviewed subject was not an abusive consumer, so
. Then the next 18 subjects were non-abusive consumers, and thus the value reaches 47. Obviously, after each recalculation, the result is compared with
, this being the limit of acceptance of the null hypothesis. Subject 18 fell into the type of abusive consumer, so from 47 we substract (
) and the cumulative value reaches 23. The next subject is non-alcoholic, therefore 24, as well as the following 28. Thus, the cumulative value 51 is greater than
, so with the assumed risks (α = 0.05, β = 0.10) the survey can be concluded with the acceptance of H0 hypothesis, so the proportion of alcohol abusers in the investigated community does not exceed the threshold of 1%, and contradicts the general opinion on this issue.
6. Discussions and Conclusion
Snowball sampling, as non-random sampling method, is less strict, easier, compared to probabilistic sampling methods, easier to apply, as it does not involve considering representativeness, for the description of the sample, and leaves the researcher to decide which components of the investigated communities will be selected when the research is completed. The method can be used successfully when there is no database, nor a list of the studied population, and the results can be obtained faster than in the case of probabilistic surveys. It should also be noted that the units are included in the sample in an arbitrary manner, so the probabilities of being included in the sample cannot be calculated, thus the variance and displacement of the estimator cannot be calculated, and consequently, the accuracy of the indicators cannot be measured. The major disadvantage is the absence of objective, quantitative criteria regarding the decision to terminate the survey. The number of people who will be part of the survey cannot be previously calculated, but depends on the researcher’s decision.
We also consider it necessary to specify that, in the article, the main goal was not to analyze the situation of homeless people in Bucharest, but to present the handicapping method, as a simple procedure to complete the snowball sampling usable in a wide range of practical cases and reflected in the literature research. Thus, the case of homeless people in Bucharest had only the role of a practical illustration of the handicapping method.
Handicapping method, proposed as a customized version of the sequential method, aims to eliminate the uncertainty of completing the research, introducing a quantitative, objective element, in conditions of probability and decisional statistical risk. The results obtained are unique and can make a major contribution to improving the methods of sociological research in a special field—the sociological survey field, conducted in closed groups and non-visible populations. The major advantage of the handicapping method is given by the introduction of a quantitative, objective criterion for completing the survey. The fact that it does not require special statistical knowledge is not insignificant. The specific parameters can be established prior to the field start of the survey and there is a wide range in which the options for designing the handicapping method are placed, from the values of the proportions considered acceptable or unacceptable (p0 and p1) to specific errors such as I and II (α and β), depending on the desired accuracy and the aspect of the investigation, but also on the nature of the pursued indicators.
Staff conducting the field survey, in addition to conducting the interview, have to perform simple arithmetic addition or subtraction operations and comparisons with the previously calculated limits. The results obtained by the survey indicated, contrary to the general perception, that most homeless people were not heavy drinkers, so the general opinion spread was not confirmed, at least in the community of residents in the area of Bucharest North Station.