Saturday, May 10, 2014

Going from vote share estimates to seat estimates

by Rajeeva Karandikar, Director, Chennai Mathematical Institute.

My previous blog posts (link, link), showed that opinion polls help us predict the vote shares of major parties (or alliances). This leads us to the next question: converting estimates of vote share into estimates of seats. This is much harder than meets the eye. Sampling is not done in all constituencies. In constituencies where we have respondents in our sample, the sample size is not large enough to predict the winner in these constituencies in isolation.

Hence, we have to build a mathematical model of voting behavior. It is widely believed that an individual's identity (caste, religion, economic status, gender, age) plays a role in his/her vote. Moreover, this behavior varies from state to state. So if one were to build a model incorporating these parameters, we will end up having a large number of free parameters, particularly as these correlations are likely to change from state to state. Data on these parameters at the constituency level is not available, as census data is compiled and reported at the district level. Thus, this approach is unlikely to yield a good result.

To get a tractable solution, we do not need to build a model for voting intention of an individual voter- it suffices to build a model for voting behavior of a constituency. One can assume that the socio-economic composition of a constituency does not undergo a major change in 5 years (this is true for most constituencies). We assume that the change in vote share for a given party in a constituency, from the previous election to the current one, is constant across a state, or a smaller geographic sub-region of a larger state. We call this the uniform swing assumption.

If the sample size at the state level, or a sub-region in a large state is adequate, we can estimate these vote shares via a methodologically proper poll.

Then using the uniform swing assumption described above, using actual data from the previous election and estimating vote shares of parties across a state or a sub-region of a state enables us to estimate vote shares of major parties in every constituency.

This is a crude model! In the historical experience, the reality has diverged from the uniform swing assumption. However, it turns out that with some further work, this model yields fairly good estimates of seats at the national level.

Consider a scenario where in one constituency, out of a sample of size 101, Candidate A gets 52 votes while candidate B gets 49 and in an adjacent constituency also on sample size 101, Candidate C gets 59 votes while candidate D gets 42. While we can be fairly confident that C will win, the same cannot be said of A. What is the best case scenario for B? The scenario is that A and B are almost neck-to-neck with B having a slight edge and yet a sample of size 101 shows B to be behind A by 3 votes. The probability of this happening is the same as the probability of seeing 49 or less heads in 101 tosses of a fair coin, which is 0.42. We assign B a winning probability of 0.42 and A a winning probability of 0.58. On the other hand, the probability of 42 or less heads in 101 tosses of a fair coin is 0.06. Thus, we assign a winning probability of 0.06 to candidate D while 0.94 to C.

This analogy can be extended to 3 or more significant candidates. We have been using this for the top three candidates: first the best case scenario for the third candidate, then the best case scenario for the second candidate and the remaining for the first candidate. This needs an assessment of the standard deviation of the vote estimates.

To summarise, based on an opinion poll (or our day-after poll), we obtain statewide vote shares and vote shares in sub-regions of a state and build vote estimates for all major parties in each constituency. Then we convert the vote share estimates in each constituency to predicted win probabilities for the top three candidates. Finally, we add up the probability of wins for a given party across all the 543 constituencies and this yields an estimate of the seats for the party.

This methodology, developed over 15 years ago, has yielded useful seat estimates. Of course, this element also has an errting to seat estimates go in opposite directions, which is lucky. Sometimes, the two errors conspire to go together and give bad results.

From October 2005 onwards, CNN-IBN, CSDS-Lokniti and I have done numerous poll projections. Most of these are based on post poll surveys, but occasionally these are also based on pre-election polls. Here is the listing of all such occasions: a comparison of what we said and what happened. I am giving vote share estimates and my seat estimate corresponding to that and the actual vote share and seats.

BIHAR(October, 2005)


Vote EstimateVote ActualSeat EstimateSeat Actual
JDU-BJP3637127-137147
RJD+313172-8065
Others333229-3931

ASSAM 2006


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress313152-6053
BJP111210-1510
AGP222025-3124
Others363726-3539

TAMIL NADU 2006


Vote EstimateVote ActualSeat EstimateSeat Actual
AIADMK+354064-7469
DMK+4545157-167163
DMDK1082-61
Others107-1

KERALA 2006


Vote EstimateVote ActualSeat EstimateSeat Actual
LDF5149107-11798
UDF414325-3142
Others880-10

WEST BENGAL 2006


Vote EstimateVote ActualSeat EstimateSeat Actual
LF5350230-240235
INC161517-2324
TMC+272932-4031

PUNJAB 2007


Vote EstimateVote ActualSeat EstimateSeat Actual
SAD-BJP414550-6068
Congress414150-6044
Others18143-95

UTTARAKHAND 2007


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress313021-2721
BJP343233-3935
Others35388-1214

UTTAR PRADESH 2007


Vote EstimateVote ActualSeat EstimateSeat Actual
SP252599-11197
BSP2930152-168206
BJP+221880-9052
Congress11925-3326
Others131821-2726

GUJARAT 2007


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP474992-100117
Congress423977-8562
Others11123-73

KARNATAKA 2008


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP303479110
Congress35358680
JDS21194528
Others1412146

LOK SABHA 2009


Vote EstimateVote ActualSeat EstimateSeat Actual
UPA3636210-225262
NDA2824180-195159
Left-830-4024
BSP-624-3221
Others-26-77

BIHAR 2010


Vote EstimateVote ActualSeat EstimateSeat Actual
JDU-BJP4639185-201206
Congress986-124
RJD-LJP272622-3225
Others18279-198

ASSAM 2011


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress363964-7278
BJP9117-115
AGP181616-2210
AIUDF131311-1718
Others242112-2015

KERALA 2011


Vote EstimateVote ActualSeat EstimateSeat Actual
LDF364569-7768
UDF454663-7172
Others9900

TAMIL NADU 2011


Vote EstimateVote ActualSeat EstimateSeat Actual
DMK+4439102-11431
AIDMK+4652120-132203
BJP Front32--
Others77--

WEST BENGAL 2011


Vote EstimateVote ActualSeat EstimateSeat Actual
Left404160-7262
TMC+5048222-234227
Others1011-5

UTTARAKHAND 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress393431-4132
BJP323322-3231

PUNJAB 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
SAD+BJP414251-6368
Congress404048-6046

MANIPUR 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress304224-3242
TMC14177-137

UTTAR PRADESH 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
SP3429232-250224
BSP242665-7980
BJP+141536-4447
Congress121228-3828

GUJARAT 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP4848129-141116
Congress+363937-4560
Others16134-106

HIMANCHAL PRADESH 2012


Vote EstimateVote ActualSeat EstimateSeat Actual
Congress414329-3536
BJP403829-3526

KARNATAKA 2013


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP232039-4940
Congress3737117-129122
JDS202034-4440
Others202314-2221

MADHYA PRADESH 2013


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP4145136-146165
Congress353667-7758
Others241913-217

RAJASTHAN 2013


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP4345126-136162
Congress333349-5721
Others242212-2016

CHHATISGARH 2013


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP424145-5549
Congress+384032-4039
Others20197-132

DELHI 2013


Vote EstimateVote ActualSeat EstimateSeat Actual
BJP333432-4231
Congress23249-178
AAP273013-2128
Others17121-53

1 comment:

  1. Interesting. So if one were to attempt to report an average error statistic for all your seat count predictions so far what would it be?

    In fact, what's a good metric to judge the quality of predictions? Is a % error on seat count reasonable? Or do you prefer a better metric?

    What would be fun is if someone designed a prediction market for the Indian polls.

    ReplyDelete

Please note: Comments are moderated. Only civilised conversation is permitted on this blog. Criticising me is perfectly okay; uncivilised language is not. I delete any comment which is spam, has personal attacks against anyone, or uses foul language. I delete any comment which does not contribute to the intellectual discussion about the blog article in question.

Please note: LaTeX mathematics works. This means that if you want to say $10 you have to say \$10.