Statistical Methods and Electoral Integrity: The 2022 Brazilian Elections ()
1. Introduction
Electoral fraud, the utilization of covert and illegal methods to manipulate the outcome of elections, distorts representation, hampers accountability, and undermines the legitimacy of governments (Lehoucq, 2003) . Consequently, ensuring that electoral results accurately reflect the preferences and will of voters is a crucial aspect of representative democracies (Deckert, Myagkov, & Ordeshook, 2011) . This is one of the main aspects of electoral integrity.
Traditionally, in political science, electoral integrity refers to international standards and global norms that govern the appropriate conduct of elections (Norris et al., 2014) . However, in Brazilian electoral law, the concept of electoral integrity pertains to much more than that.
Brazil adopts a model of electoral governance characterized by an independent Electoral Management Body (EMB), the Brazilian Electoral Court, which is a branch of the Federal Judiciary entirely dedicated to managing all aspects related to the electoral process (Marchetti, 2008) . Although independent EMBs are now the most common institutional model for electoral management in the world (Catt et al., 2014; López-Pintor, 2000; Wall et al., 2006) , the Brazilian case is emblematic, as the country has one of the oldest EMBs in the world, with the Brazilian Electoral Court having been created in 1932.
Therefore, in Brazilian law, the concept of electoral integrity pertains to a collection of principles, rules, and norms aimed at guaranteeing fairness, transparency, and justice throughout the electoral process. This encompasses safeguarding the fundamental rights of voters, ensuring the security and reliability of electronic voting machines, promoting transparency in campaign financing, maintaining transparency in vote counting, and other crucial elements that foster the dependability and legitimacy of elections.
In this brief paper, I propose a combination of quantitative methods for detecting potential anomalies in electoral results. I suggest combining the digit-focused method (last digit) with a quick count to identify any substantial deviations from an ideal fair election1 and the results obtained from the 2022 presidential runoff elections in Brazil.
Brazil is a notable example of a country that has faced a unique political situation in recent years. In the 2018 presidential election, the victorious candidate, Jair Bolsonaro, received 55.13% of the valid votes and alleged that he was a victim of electoral fraud, claiming that he would have won in the first round if not for the alleged fraudulent activity. Following this, Brazil experienced a period of increased political polarization.
Although the 2018 Brazilian presidential elections have already been the subject of a comprehensive statistical analysis by Figueiredo Filho, Silva, and Carvalho (2022) , and no concrete evidence of fraud was identified, the topic of electoral fraud has once again gained prominence in 2022.
Lula da Silva, the former President of Brazil and Workers’ Party (PT) candidate, has been elected as the new President of Brazil after defeating incumbent President Jair Bolsonaro of the Liberal Party (PL) in the runoff election held on October 30, 2022. He built a broad coalition that spanned left-wing, center, and moderate right-wing leaders, which helped him secure a narrow victory.
The national election was closely contested, with a difference of just over two million votes, which was less than 2%. As a result of this narrow outcome, supporters of President Bolsonaro took to the streets in protest. This election also highlighted an unfortunate prejudice against the population of the northeast region of Brazil, which is Lula da Silva’s place of origin and primary electoral base.
I decided to focus my analysis on data from a single state, Alagoas, despite the limitations this may impose on the research. This choice was made in response to a demand that arose following the circulation of a false document on the internet that cast doubt on the integrity of the electronic voting machines used in the northeast region of Brazil (Nogueira, 2023) . The document presented a lot of false information about Alagoas, leading to confusion among those who were not well-informed about the country’s electoral system.
Alagoas is a state in the northeast region of Brazil, made up of 102 municipalities, of which only 4 have more than 50,000 voters (Maceio, its capital, with 627,485 voters; Arapiraca, with 150,627; Rio Largo, with 62,255 and Palmeira dos Indios, with 52,692).
The analysis of the electoral results in Alagoas serves as a representative sample for the rest of the northeast region. Out of the 1,794 municipalities in the region, only 114 have an electorate of over 50,000 voters.
Politically, it is a state with great relevance, despite its relative size. The first democratically elected president after the end of the dictatorship, Collor de Mello, made his political career in that state. In addition, the current president of the Brazilian Chamber of Deputies, Arthur Lira, is also from Alagoas.
The findings suggest no evidence of fraud in the electoral results released by the Brazilian Electoral Court.
The following section outlines the materials and methods used in the study. The third section displays the statistical findings. The last section concludes and explores the implications of the research results and acknowledges its limitations.
2. Materials and Methods
I used data that is publicly available on the Open Data Portal from Superior Electoral Court (TSE, 2022) , the Brazilian Electoral Management Body (EMB).
At first, I chose to use the distribution of the last digit of the count of valid votes given to each candidate, combined with the analysis of the frequency of 0 and 5 equally with the last digits.
Mack and Stoetzer (2019) argue that the last-digit test is a unique method to detect election fraud. It assumes that a manipulator replaces the vote counts of an election result sheet with fake numbers but will fail to make the numbers look random. Theoretically, manipulation-free data should exhibit an approximate mean of 4.5 and a uniform distribution of digits—each digit should appear 10% of the time.
Similarly, according to Beber and Scacco (2012) , the last digits will occur with equal frequency for a large class of theoretical distributions. Non-fraudulent electoral returns are likely to be drawn from such a distribution.
In essence, due to the inherent inability of humans to generate truly random sequences, it is expected that in a fair election, the distribution of the last digit of the number of votes received by each candidate should be evenly distributed.
On the other hand, the disproportionate occurrence of any number could indicate that the total number of votes was intentionally manipulated (fraud). This logic is at the basis of the frequency analysis of the last digits 0 and 5. In the absence of fraud, their average frequency should approach 0.2, or 20%.
Additionally, I carried out a simulation of a quick count, which is a widely utilized statistical evaluation method by international election observers (Enikolopov et al., 2013; Long, 2023; Mulyadi & Aridhayandi, 2020; Pusdiklatwas, 2019; Williams & Curiel, 2020; Wibowo & Darmanto, 2019) .
A quick count is a statistical evaluation method used to estimate the results of an election in near real-time. It typically collects a sample of votes from randomly selected polling stations and extrapolates the results to the entire population.
The goal of a quick count is to provide an accurate and timely estimate of the election results, which helps to build confidence in the election process and reduce the risk of electoral fraud.
Quick counts are widely used by international election observers and are considered an effective tool for detecting fraud and promoting electoral integrity.
As Estak, Nevitte, and Cowan (2002: p. 1) remind us, “quick counts can project or verify official results, detect and report irregularities or expose fraud. In most cases, quick counts build confidence in the work of election officials and the legitimacy of the electoral process.”
I had to adjust because the typical quick count involves recruiting volunteers to collect partial results from pre-selected polling stations, which make up the sample to be compared with the official results released by the local electoral authority. Furthermore, accurate data collection for a quick count is usually performed immediately after the polls close.
As those steps had already been completed, I had to rely on the ballot box results released by the Brazilian Electoral Court on the internet.
The ballot box report is a paper document issued by the electronic voting machine at the end of the election. The Brazilian Electoral Court encourages party representatives and citizens to immediately verify the number of votes in all voting machines against the information published online, right after the polls are closed.
I established a sample based on the total number of polling stations present in the state (6626) with a sampling error of 5% and a confidence level of 95%. I chose to consider the population distribution as more heterogeneous (50/50), resulting in a sample of 364 sections selected using the MS Office Excel random number generation function.
All statistical estimates were performed using IBM SPSS Statistics version 27. Adhering to best scientific practices, the materials for replication, including data and spreadsheets, are available on a public access platform (Nogueira, 2022) .
The false document is also available (Nogueira, 2023) , but solely for academic purposes. I must emphasize that spreading fake news about the Brazilian electoral system is a punishable crime that may result in imprisonment.
3. Findings
This section presents the main findings of the research in the following order: average of the last digit; analysis of the frequency of the last digits 0 and 5; and quick count.
3.1. Last Digit Average
Table 1 shows the average of the last digit of valid votes obtained by the candidates in each of the 6626 polling stations in the state of Alagoas. It is worth noting that the values obtained by the candidates closely match the expected theoretical parameter of 4.5.
![]()
Table 1. Average of the last digit of valid votes in the runoff election—Alagoas.
Source: Own elaboration (2023).
These results are different from those reported by Hicken and Mebane (2017) in their analyses of elections in Afghanistan (μ = 4.112) and South Africa (μ = 4.069), where the variation in the last digit average was approximately 10%. In contrast, in the Brazilian case, the average varied by less than 1% from the expected value (approximately 0.6% to be more precise).
I need to point out here that the last digit average method is an extremely sensitive method for detecting manipulations. To make the evidence more solid, I highlighted the last digit’s observed frequency in the candidates’ valid votes.
According to literature (Beber & Scacco, 2012; Dlugosz & Müller-Funk, 2009; Skovoroda & Lankina, 2017) , a fair distribution of digits should be uniform. By observing the frequency in the distribution of last digits, I aimed to identify any abnormalities in the valid votes obtained by each candidacy (Table 2).
![]()
Table 2. Frequency in the distribution of the last digits of valid votes obtained by each candidacy.
Source: Own elaboration (2023).
Again, the idea behind this approach is that, in a fair election, the distribution of the last digits of the vote count should follow a uniform distribution, with each last digit having a probability of 0.1 of appearing. In other words, each last digit (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) should occur approximately 10% of the time in the count of valid votes for each candidate.
To confirm the validity of the result, I reproduced the analysis of the last digit to evaluate the distribution of three other indicators that, in theory, would be more difficult to manipulate and would require a highly sophisticated statistical and computational enterprise5: 1) the number of eligible voters per section; 2) the total number of voters who attended the polls; and 3) the total number of invalid votes (sum of blank and null votes) (Table 3).
![]()
Table 3. Frequency in the distribution of the last digits of voters, attendance, and invalid votes.
Source: Own elaboration (2023).
For all indicators, the frequency of the last digit is close to a uniform distribution. These distributions are like the results found by Beber and Scacco (2012) about the vote count in Sweden.
The only indicator in which the count exceeds the rounding margin is that of voters. However, this is an indicator that does not submit to a truly random distribution. In election years, it is common to aggregate smaller voting sections into larger ones in the same polling location, which impacts the distribution of voters by voting section.
3.2. Analysis of the Frequency of the Last Digits 0 and 5
Next, I evaluated the frequency distribution of numbers 0 and 5 as the last digits of valid votes in 2022 runoff presidential elections. In a fair election, it is expected that the average relative frequency of digits 0 and 5 approaches 0.2, i.e., 20% of the total.
The count of 6626 polling stations resulted in an average of 0.199 for Jair Bolsonaro and 0.205 for Lula da Silva, as shown in Table 4 below.
![]()
Table 4. Frequency of the last digits 0 and 5 of the valid votes in the runoff presidential elections.
Source: Own elaboration (2023).
According to the literature, if there are no irregularities in the digit count, a distribution with an average of 0.2 is generated for digits 0 and 5 (Beber & Scacco, 2012) . To ensure a fair election, the observed results must be statistically equivalent to the predicted ones according to the theory.
Upon analyzing the data presented in Table 4, it can be observed that there were no significant deviations detected. The frequency of digits 0 and 5 was found to be approximately 0.2 (or 20%), which closely matches the expected frequency.
3.3. Quick Count
Given the lack of abnormalities in the distribution of valid votes, I elaborated a quick count simulation, computing the results from 364 sections drawn from the 6,626 sections installed in the state of Alagoas. The draw selected ballot boxes from 82 of the state’s 102 municipalities. It is worth remembering that I randomly selected the sections.
The results obtained are presented in Table 5 below. The sample corresponds to just over 5% of the valid votes of the state, making a total of 90,216 valid votes.
![]()
Table 5. Distribution of valid votes (sample).
Source: Own elaboration (2023).
My simulation proposes that considering the valid votes of the 364 sections, the candidate Lula da Silva should win the overall state of Alagoas with approximately 58% against 42% of the votes for Jair Bolsonaro.
The proximity to the sample can be seen in the official results released by the Superior Electoral Court, as demonstrated in Table 6 below:
![]()
Table 6. Distribution of valid votes (elections).
Source: Own elaboration (2023).
In the state of Alagoas, Lula da Silva won with just over 58% of the valid votes, confirming the prediction by the sample used in the quick count simulation.
4. Discussion
The idea that elections involve much more than the simple act of voting is quite consolidated (Cox, 1997; Lijphart, 1994; Taagepera & Shugart, 1989) .
It is crucial to remember that elections are the foundation of modern democracies. Furthermore, while they have their roots in independent events, voting and the solidification of democracies cannot be separated in the 21st century.
Many even point out that well-administered elections would be a prerequisite for democracy (Pastor, 1999; Birch, 2011; Norris, 2015) . As Hicken and Mebane (2017) recall, if a ballot is violated, universal suffrage loses its characteristic power of ensuring the vertical accountability necessary for democracy.
Manipulation of electoral results is a serious threat to democracy, and the issue of fraud prevention is directly linked to the maintenance of electoral integrity (Fortin-Rittberger, Harfst, & Dingler, 2017; James & Clark, 2020; Levin & Alvarez, 2012) .
Internationally, the criteria for assessing electoral integrity include transparency, impartiality, access to voting, accurate voter registration, secrecy of the vote, reliable vote counting, and independent oversight (Van Ham, 2015; Van Ham & Garnett, 2019) .
Alvim (2015) condenses five criteria for assessing electoral integrity: guarantee of freedom to exercise the right to vote; strict adherence to the legality of the contest; recognition of the authenticity of election results; certainty of impartiality and firmness in conducting elections by electoral administration and jurisdiction bodies; and preservation of equality of opportunity among candidates who submit themselves to popular choice.
The guarantee that the election results accurately express the will of the voters is a fundamental condition for recognizing electoral integrity.
In Brazil, electoral integrity is, above all, a matter of legal doctrine. The Federal Constitution provides that popular sovereignty shall be exercised by vote, and the principles of legitimacy and normality of elections are enshrined in article 14, paragraph 9 of the Constitution.
The Brazilian Electoral Court system emerged in response to a demand for clean elections after the collapse of the Old Republic in 1930. Its first task was to create a national voter registration to eliminate the traditional practice of violating the principle of “one man, one vote” that had persisted since the imperial period (Nogueira, 2021) .
Brazil is currently experiencing a highly polarized period, where long-standing prejudices against the northeastern population have resurfaced as allegations questioning the reliability of the electoral systems in that region. However, I have not found any empirical evidence to support claims of fraud in the receipt, counting, or reporting of valid votes.
The research has spatial limitations because it was restricted to a single state in the Brazilian federation and temporal limitations (runoff elections). Nevertheless, it is possible to infer that the allegations of vote manipulation disclosed on the internet lack any solid evidence.
The analysis of the average frequency of the last digits of valid votes revealed results that closely aligned with the theoretical expectation for fair elections (4.5).
The distribution reached values close to 10% (the ideal theoretical value) with a maximum variation of 0.7%. Similarly, the analysis of digits 0 and 5 did not reveal any abnormality. The frequencies were close to the ideal of 0.2, or 20% of the total.
When I employed the same approach to the numbers of eligible voters, voter turnout by section, and invalid votes (the sum of blank and null votes), the frequency distribution of the final digits also approached 10%.
Finally, I was able to simulate the application of the quick count method, which is internationally recognized as an effective mechanism for detecting electoral fraud. The sample data closely matched the official results, providing evidence that there was no indication of fraudulent vote manipulation.
Of course, the lack of evidence of fraud does not provide irrefutable proof of the accuracy of the electronic voting system adopted in Brazil. According to Popper (2018) , the criterion for the scientific status of a theory is its falsifiability. Absolute certainties are not found in the realm of science, but rather in that of religious faith.
As I mentioned earlier, electoral integrity is a fundamental principle of Brazilian electoral law. Therefore, the discussion about electoral integrity in Brazil is primarily a legal matter. However, even a dogmatic knowledge like law can benefit from empiricism. There is a great demand from the legal community for cross-sectional studies that can provide empirical tools to researchers in the field of electoral law.
In this paper, I aim to contribute to studies on electoral integrity by gathering empirical evidence that can be subjected to testing and examination. Never has the public debate in the democratic realm required so much from researchers. May we live up to the challenge.
Acknowledgements
I would like to express my gratitude to my advisor, Dr. Camila Villard Duran (Ph.D. Paris 1 Panthéon-Sorbonne, and University of Sao Paulo and Associate Professor of Law at ESSCA School of Management), my colleague, Julia Lambert Gomes Ferraz (LL.M. University of Sao Paulo), and my friend, Renato Nora Coelho (public servant in the Brazilian Electoral Court) for their valuable comments that helped improve the contents of my manuscript.
Furthermore, I would like to thank Dr. Frederico Franco Alvim (Ph.D. University of the Argentine Social Museum) for his assistance in clarifying initial concerns about the data.
NOTES
1Fair elections are those that are free and “clean”, without any manipulation in favor of a particular candidate (Beber & Scacco, 2012) .
2µ represents the observed average of the last digits. The expectation theory of a fair election (µ = 4.5).
3Calculated values based on one thousand nonparametric bootstrapping interactions.
4N represents the number of polling stations considered.
5I want to emphasize that recent advancements in artificial intelligence technology make it possible to generate statistically coherent numbers electronically. However, the Brazilian electoral system is structured in a way that printing and posting individual electronic voting machine results at each polling location before centralized tabulation by the Electoral Court makes data manipulation virtually impossible. Any inconsistencies would be detected immediately.