Fundraising and vote distribution: A non-equilibrium statistical approach

Authors: Hygor P. M. Melo ^aff001; Nuno A. M. Araújo ^aff001; José S. Andrade, Jr. ^aff004
Authors place of work: Centro de Física Teórica e Computacional, Universidade de Lisboa, Lisboa, Portugal ^aff001; Instituto Federal de Educação, Ciência e Tecnologia do Ceará, Avenida Des. Armando de Sales Louzada, Acaraú, Ceará, Brazil ^aff002; Departamento de Física, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal ^aff003; Departamento de Física, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil ^aff004
Published in the journal: PLoS ONE 14(10)
Category: Research Article
doi: https://doi.org/10.1371/journal.pone.0223059

Summary

The number of votes correlates strongly with the money spent in a campaign, but the relation between the two is not straightforward. Among other factors, the output of a ballot depends on the number of candidates, voters, and available resources. Here, we develop a conceptual framework based on Shannon entropy maximization and Superstatistics to establish a relation between the distributions of money spent by candidates and their votes. By establishing such a relation, we provide a tool to predict the outcome of a ballot and to alert for possible misconduct either in the report of fundraising and spending of campaigns or on vote counting. As an example, we consider real data from two proportional elections with more than 6000 candidates each, where a detailed data verification is virtually impossible, and show that the number of potential misconducting candidates to audit can be reduced to less than ten.

Keywords:

Fats – Finance – Probability distribution – Brazil – Decision making – Entropy – Statistical distributions – Elections

Introduction

In an effort towards fair electoral processes, regulations and reforms are constantly on the agenda of many countries around the world [1]. To avoid that the decision-making process is dominated by wealth and influence, the most pertinent processes to legislate are arguably fundraising and spending [2]. Different countries have different rules, but in general, candidates and parties are the ones that report on the financial details of their own campaigns, what raises obvious doubts over the veracity of the reported data. As the number of collected votes correlates with the money spent in the campaign [3], establishing a quantitative relation between the distribution of votes and financial resources among the candidates is instrumental to raise flags about possible misconduct.

Within some regulated boundaries, several individuals or institutions can contribute financially to a campaign. The value of the contribution is very subjective, depending on their interests and on the economic and political conjecture [4–7]. Thus, predicting the distribution of funds raised and money spent in a campaign from “first principles” is likely a hopeless endeavor, challenging the verification of the reported data. In sharp contrast, the distribution of votes among candidates is well studied. It is known to differ for proportional and plural elections, and to depend on the country, number of candidates, and money spent in campaigns [8–13]. Different models were developed to explain this distribution [3, 14–19] as well as methodologies to identify vote-counting irregularities [20–25]. Here we propose an approach based on the Shannon entropy maximization and Superstatistics to derive a relation between the distribution of financial resources declared by candidates and the distribution of their votes in proportional elections.

Results

Given a certain amount of money m_i spent by a candidate i in the campaign, the conditional probability for i to receive v votes is p(v|m_i). Since the money spent is heterogeneously distributed among candidates, the probability p(v) that a candidate receives v votes is given by,

where p(m) is the probability that a candidate spends an amount of money m in the campaign and m_max is the maximum amount of money that can be spent (see Fig 1).

By employing the principle of maximum entropy under the constraints of a fixed number of voters and candidates, we derive the conditional probability <i>p</i>(<i>v</i>|<i>m</i><sub><i>i</i></sub>), that a candidate <i>i</i> receives <i>v</i> votes, provided that <i>i</i> spends an amount of money <i>m</i><sub><i>i</i></sub> in the campaign. — Fig. 1. By employing the principle of maximum entropy under the constraints of a fixed number of voters and candidates, we derive the conditional probability p(v|m_i), that a candidate i receives v votes, provided that i spends an amount of money m_i in the campaign.

Eq (1) is the basis of Superstatistics for non-equilibrium systems [26]. This theoretical framework was developed to describe the thermostatistics of an ensemble of particles where the temperature fluctuates in space and/or time. The Boltzmann-Gibbs statistics assumes that all intensive quantities are invariant and so, the weight of a configuration is always the same. By contrast, in Superstatistics, since different particles are at different effective temperatures, the weight of a configuration depends on the effective temperature. Thus, all probabilities depend on the temperature distribution. In an election, the probability that a candidate obtains a certain number of votes is a function of the amount of money m spent in the campaign, being m the analogue for elections of the temperature in a thermal system. In the limit where all candidates spend the same amount of money m, the Boltzmann-Gibbs statistics should be recovered.

To calculate p(v|m), let us consider a proportional election with N_c candidates and N_v total votes. Based on the principle of maximum entropy [27], p(v|m) should maximize the Shannon entropy,

where v₀ and βm_i are the minimum and maximum number of votes that the candidate i can receive, and β is a constant. For simplicity, hereafter we assume that v₀ is the same for all candidates. At this point, two constraints need to be imposed, as both the number of candidates N_c and total votes N_v are fixed (see Fig 1). In this way, the first constraint is then,

which ensures the normalization of p(v|m), while the second one is,

By maximizing S subjected to Eqs (3) and (4), we obtain

where Z(m) is a normalization factor that depends on m and it is the analogue of the partition function in a thermal system, given by,

where μ is the Lagrange multiplier related to the second constraint (Eq (4)). Since the number of votes is limited, p(v|m) decays exponentially for v ∈ [v₀, βm] and it is zero otherwise.

In order to verify if the distribution predicted by Eq (5) is compatible with real data, we consider the 2014 and 2018 elections for federal deputies in Brazil, using the dataset available in Ref. [28, 29]. Each state has its own ballot, with different candidates and voters. Countrywide, these elections had more than 6000 candidates each, roughly 140 million voters, and with over US $300 million investment in campaign. We first analyze the results for the top four populated Brazilian states, namely, São Paulo, Rio de Janeiro, Minas Gerais, and Bahia. These states have each more than 10 million voters and between 501 (Bahia) and 1686 (São Paulo) candidates for the 2018 election. For each state, we grouped the candidates by the amount of money that they reported to have spent in their campaigns. Fig 2A shows the standard deviation σ_v of the number of votes received by a candidate as a function of average number of votes 〈v〉 for each group. For most data point, the results are consistent with a linear behavior (dashed line) as expected for an exponential distribution, where the average and standard deviation are always equal. To verify the functional dependence of the distribution, in Fig 2B shows the distribution of votes, rescaled as v ¯ = ( v -⁠ 〈 v 〉 ) / σ v, where 〈v〉 and σ_v is the average and standard deviation of the number of votes per candidate in the same interval (logarithmic binning) of money spent. The distribution clearly follows the predicted exponential behavior of Eq (5) for more than 99% of the candidates. However, for v ¯ > 6 the distribution deviates from the predicted one (highlighted region in Fig 2B). For 2014 there are eight candidates in this region in the entire country, all them running in São Paulo. This is remarkable, as the theory predicts only one in São Paulo. For 2018, there are eleven candidates for the entire country (six in São Paulo), although we would only expect seven. This observation raises doubts about these outliers and it could therefore call for a detailed analysis and validation of their reported data about the campaign founding.

Fig. 2. Empirical data for the 2014 and 2018 elections for federal deputies in Brazil, which counted more than 6000 candidates, roughly 140 million voters, and more than 280 million dollars of total investment in each campaign.

From the partition function (6), the average number of votes received by a candidate that spent m money in the campaign is,

The value of μ is obtained by imposing the second constraint (Eq (4)) and considering β as a free parameter. Fig 3A shows the number of votes per candidate against the money spent in the 2018 São Paulo campaign (gray circles) and the average value for candidates in the same money group (orange circles), where the circles in blue correspond to the outliers. To fit the data with Eq (7), one has one fitting parameter β. As shown in the Fig 4, β correlates strongly with the total money spent per voter in campaigns, so one can estimate β from the latter. As a proof of concept, we estimate the value of β for the 2018 election from the data for 2014, see Materials and Methods. For that, we assume a linear relation between β and the inverse of the total money spent per voter, parameterized using that data for 2014 (see Fig 4). The solid line in Fig 3A is the number of votes as a function of the money spent for the state of São Paulo in 2018 obtained with the estimated value of β. We observe an excellent agreement with the empirical data, that extends over four orders of magnitude. The deviation for candidates with very scarce resources can be explained as follows. For simplicity, we have considered that the minimum number of votes v₀ is the same for all candidates, obtained by assuming that v₀ equals the average number of votes for candidates who spent less than 1200 dollars [3]. In general, however, every candidate has a different v₀, depending on several factors such as, his/her party, visibility, and social status.

Fig. 3. Results for the 2018 election for federal deputies in the state of São Paulo, Brazil, with 1686 candidates, more than 33 million voters, and about 29 million dollars of total investment in the campaigns.

Parameter <i>β</i> as a function of the ratio of total number of votes <i>N</i><sub><i>v</i></sub> and money spent <i>R</i>. — **Fig. 4. Parameter β as a function of the ratio of total number of votes N_v and money spent R.**

From the predicted relation between β and the money spent per voter, we can also forecast the distribution of votes in 2018 using only the reported amount of money spent in this election, as shown in Fig 3B. More precisely, this is performed by assigning randomly a number of votes to each candidate from a distribution given by Eq (5), with m equal to the amount of money spent in the campaign, as declared by the candidate. The solid line in Fig 3B is the predicted outcome, which is in excellent agreement with the empirical data.

Discussion

We have shown, using the principle of maximum entropy, that the distribution of votes received by a candidate should follow an exponential distribution parameterized by the amount of money that was spent in her/his campaign. This prediction is consistent with real data from a very large proportional election, with more than 6000 candidates. Furthermore, as the money spent in a campaign is heterogeneously distributed among candidates, we developed a framework based on superstatistics to establish the relation between the distribution of money spent and of votes. Within this framework, it was possible to predict the outcome of a ballot from the distribution of money spent, and identify potential cases of misconduct either in the report of fundraising and spending or on vote counting.

For several proportional elections, the distribution of votes per candidate is fat tailed [30], what has motivated an enthusiastic discussion about the underlying mechanism [10]. The fat tailed characteristic of the distribution of votes was first interpred as the result of a multiplicative process [8]. A different model was proposed based on world–of–mouth spreading for the case of proportional elections with open lists [19]. However, the empirical analysis performed in Ref. [30] showed that, although some countries yield similar distributions, the final shape of the distribution depends strongly on the specific election rules. Our theoretical approach shows, for an election, if all candidates spent the same amount of money in their campaigns, the expected distribution of votes would actually be exponential. So, the fat-tailed distribution is a consequence of an heterogeneous distribution of resources. This is consistent with the reported power-law distribution of money spent by candidates in the same elections [3].

Materials and methods

Electoral data

The data for the elections for federal deputies in Brazil in 2014 and 2018 were collected from the website of the Brazilian Superior Electoral Court [28, 29]. For each year, we analyzed two large datasets: the financial report of each candidate and the electoral results. The first dataset contains detailed information about the expenditures of all candidates. For each one, we calculated the total amount of money spent in the campaign by adding all their expenditures. The second dataset consists of the number of votes in each candidate for every electoral zone. We coarse grained this information, by adding all votes in the same candidate. By combining these two datasets, we obtained for each of the 26 Brazilian states, the list of candidates, the total amount of money that they spent in the campaign, and the final number of votes that they obtained. This adds up to 6353 and 7950 candidates, 87 million and 90 million votes, and 316 million and 335 million dollars spent in 2014 and 2018, respectively. The dataset is in the Supporting Information.

An ensemble for elections

To determine p(v|m_i) we maximize W, defined as,

where the first term is the entropy, the second term is the constraint (3) with the Lagrange multiplier λ and the last term the constraint (4) with μ as the second Lagrange multiplier.

Imposing dW/dp = 0, we find that p(v|m_i) = e^{−1−λ−μv} = e^−μv/Z(m_i). The expression for the partition function (6) is obtained by calculating Z ( m i ) = ∑ v = v 0 β m i e -⁠ μ v. From Eq (6), we obtain the average number of votes as

In order to calculate the numerical value of 〈v〉 for each candidate, we first determine μ, by applying the constraint (4), where μ is the root of

This equation can not be solved analytically, therefore we used the SciPy implementation of the Brent’s method [31]. For 2014 election, we used the dataset of money expenditures during the campaign and the free parameter β was chosen as the value that minimizes the mean squared error between the votes expected value, Eq (7), and the real votes data.

To find the value of μ for 2018, we used the financial report of each candidate for that election. Since β correlates with N_v/R (see Fig 4), we used the linear relation calculated to 2014 to estimate β for 2018.

Data binning

To reduce the statistical noise in Figs 2 and 3, for each state, candidates were grouped by the amount of money that they officially spent in their campaigns. For that, we performed a logarithmic binning, limited by the minimum to the maximum amounts of money spent, always with 20 bins.

Model for the distribution of votes

To forecast the distribution of votes in 2018 (Fig 3B), we considered the list of all candidates and the total amount of money spent in their campaign. For each candidate, we generated their number of votes at random, following the distribution derived in Eq (5), assuming Z(m) = 1/μ. In the limit m → ∞, we recover an exponential distribution p(v|m) = μe^−μv.

The results in Fig 3B are averages over 10⁴ independent samples.

Supporting information

S1 Table [csv]
Data of 2014 election.

S2 Table [csv]
Data of 2018 election.

Zdroje

1. UN Secretary General. Guidance Note of the Secretary-General on Democracy. 2009.

2. International Institute for Democracy and Electoral Assistance. International Electoral Standards: Guidelines for Reviewing the Legal Framework of Elections. 2002.

3. Melo HPM, Reis SD, Moreira AA, Makse HA, Andrade JS. The price of a vote: Diseconomy in proportional elections. PloS One. 2018;13(8):e0201654. doi: 10.1371/journal.pone.0201654 30133469

4. Jacobson GC. The effects of campaign spending in congressional elections. American Political Science Review. 1978;72 : 469–491. doi: 10.2307/1954105

5. Morton R, Cameron C. Elections and the theory of campaign contributions: A survey and critical analysis. Economics & Politics. 1992;4 : 79–108. doi: 10.1111/j.1468-0343.1992.tb00056.x

6. Gerber AS. Does campaign spending work? Field experiments provide evidence and suggest new theory. American Behavioral Scientist. 2004;47 : 541–574. doi: 10.1177/0002764203260415

7. Gordon SC, Hafer C, Landa D. Consumption or investment? On motivations for political giving. The Journal of Politics. 2007;69(4):1057–1072. doi: 10.1111/j.1468-2508.2007.00607.x

8. Costa Filho RN, Almeida MP, Andrade JS, Moreira JE. Scaling behavior in a proportional voting process. Physical Review E. 1999;60 : 1067. doi: 10.1103/PhysRevE.60.1067

9. Costa Filho RN, Almeida MP, Moreira JE, Andrade JS. Brazilian elections: voting for a scaling democracy. Physica A: Statistical Mechanics and its Applications. 2003;322 : 698–700. doi: 10.1016/S0378-4371(02)01823-X

10. Castellano C, Fortunato S, Loreto V. Statistical physics of social dynamics. Reviews of Modern Physics. 2009;81(2):591. doi: 10.1103/RevModPhys.81.591

11. Mantovani MC, Ribeiro HV, Moro MV, Picoli S Jr, Mendes RS. Scaling laws and universality in the choice of election candidates. EPL (Europhysics Letters). 2011;96 : 48001. doi: 10.1209/0295-5075/96/48001

12. Mantovani MC, Ribeiro HV, Lenzi EK, Picoli S Jr, Mendes RS. Engagement in the electoral processes: scaling laws and the role of political positions. Physical Review E. 2013;88 : 024802. doi: 10.1103/PhysRevE.88.024802

13. Bokányi E, Szállási Z, Vattay G. Universal scaling laws in metro area election results. PloS One. 2018;13:e0192913. doi: 10.1371/journal.pone.0192913 29470518

14. Moreira AA, Paula DR, Costa Filho RN, Andrade JS. Competitive cluster growth in complex networks. Physical Review E. 2006;73 : 065101. doi: 10.1103/PhysRevE.73.065101

15. Araújo NAM, Andrade JS, Herrmann HJ. Tactical voting in plurality elections. PloS One. 2010;5:e12446. doi: 10.1371/journal.pone.0012446 20856800

16. Fernández-Gracia J, Suchecki K, Ramasco JJ, San Miguel M, Eguíluz VM. Is the voter model a model of voters? Physical Review Letters. 2014;112 : 089903. doi: 10.1103/PhysRevLett.113.089903

17. Calvão AM, Crokidakis N, Anteneodo C. Stylized facts in brazilian vote distributions. PloS One. 2015;10:e0137732. doi: 10.1371/journal.pone.0137732 26418863

18. Borghesi C, Raynal JC, Bouchaud JP. Election Turnout Statistics in Many Countries: Similarities, Differences, and a Diffusive Field Model for Decision-Making. Plos One. 2012;7:e36289. doi: 10.1371/journal.pone.0036289 22615762

19. Fortunato S, Castellano C. Scaling and universality in proportional elections. Physical Review Letters. 2007;99 : 138701. doi: 10.1103/PhysRevLett.99.138701 17930647

20. Lehoucq F. Electoral fraud: Causes, types, and consequences. Annual Review of Political Science. 2003;6 : 233–256. doi: 10.1146/annurev.polisci.6.121901.085655

21. Alvarez RM, Hall TE, Hyde SD. Election fraud: detecting and deterring electoral manipulation. Brookings Institution Press; 2009.

22. Deckert J, Myagkov M, Ordeshook PC. Benford’s Law and the detection of election fraud. Political Analysis. 2011;19 : 245–268. doi: 10.1093/pan/mpr014

23. Klimek P, Yegorov Y, Hanel R, Thurner S. Statistical detection of systematic election irregularities. Proceedings of the National Academy of Sciences. 2012;109(41):16469–16473. doi: 10.1073/pnas.1210722109

24. Beber B, Scacco A. What the numbers say: A digit-based test for election fraud. Political Analysis. 2012;20 : 211–234. doi: 10.1093/pan/mps003

25. Enikolopov R, Korovkin V, Petrova M, Sonin K, Zakharov A. Field experiment estimate of electoral fraud in Russian parliamentary elections. Proceedings of the National Academy of Sciences. 2013;110 : 448–452. doi: 10.1073/pnas.1206770110

26. Beck C, Cohen EGD. Superstatistics. Physica A: Statistical Mechanics and its Applications. 2003;322 : 267–275. doi: 10.1016/S0378-4371(03)00019-0

27. Jaynes ET. Information theory and statistical mechanics. II. Physical review. 1957;108 : 171. doi: 10.1103/PhysRev.108.171

28. Dataset for the 2014 election for federal deputies in Brazil, from http://www.tse.gov.br/.

29. Dataset for the 2018 election for federal deputies in Brazil, from http://www.tse.gov.br/.

30. Chatterjee A, Mitrović M, Fortunato S. Universality in voting behavior: an empirical analysis. Scientific Reports. 2013;3 : 1049. doi: 10.1038/srep01049