## Saturday, May 10, 2014

### Going from vote share estimates to seat estimates

by Rajeeva Karandikar, Director, Chennai Mathematical Institute.

My previous blog posts (link, link), showed that opinion polls help us predict the vote shares of major parties (or alliances). This leads us to the next question: converting estimates of vote share into estimates of seats. This is much harder than meets the eye. Sampling is not done in all constituencies. In constituencies where we have respondents in our sample, the sample size is not large enough to predict the winner in these constituencies in isolation.

Hence, we have to build a mathematical model of voting behavior. It is widely believed that an individual's identity (caste, religion, economic status, gender, age) plays a role in his/her vote. Moreover, this behavior varies from state to state. So if one were to build a model incorporating these parameters, we will end up having a large number of free parameters, particularly as these correlations are likely to change from state to state. Data on these parameters at the constituency level is not available, as census data is compiled and reported at the district level. Thus, this approach is unlikely to yield a good result.

To get a tractable solution, we do not need to build a model for voting intention of an individual voter- it suffices to build a model for voting behavior of a constituency. One can assume that the socio-economic composition of a constituency does not undergo a major change in 5 years (this is true for most constituencies). We assume that the change in vote share for a given party in a constituency, from the previous election to the current one, is constant across a state, or a smaller geographic sub-region of a larger state. We call this the uniform swing assumption.

If the sample size at the state level, or a sub-region in a large state is adequate, we can estimate these vote shares via a methodologically proper poll.

Then using the uniform swing assumption described above, using actual data from the previous election and estimating vote shares of parties across a state or a sub-region of a state enables us to estimate vote shares of major parties in every constituency.

This is a crude model! In the historical experience, the reality has diverged from the uniform swing assumption. However, it turns out that with some further work, this model yields fairly good estimates of seats at the national level.

Consider a scenario where in one constituency, out of a sample of size 101, Candidate A gets 52 votes while candidate B gets 49 and in an adjacent constituency also on sample size 101, Candidate C gets 59 votes while candidate D gets 42. While we can be fairly confident that C will win, the same cannot be said of A. What is the best case scenario for B? The scenario is that A and B are almost neck-to-neck with B having a slight edge and yet a sample of size 101 shows B to be behind A by 3 votes. The probability of this happening is the same as the probability of seeing 49 or less heads in 101 tosses of a fair coin, which is 0.42. We assign B a winning probability of 0.42 and A a winning probability of 0.58. On the other hand, the probability of 42 or less heads in 101 tosses of a fair coin is 0.06. Thus, we assign a winning probability of 0.06 to candidate D while 0.94 to C.

This analogy can be extended to 3 or more significant candidates. We have been using this for the top three candidates: first the best case scenario for the third candidate, then the best case scenario for the second candidate and the remaining for the first candidate. This needs an assessment of the standard deviation of the vote estimates.

To summarise, based on an opinion poll (or our day-after poll), we obtain statewide vote shares and vote shares in sub-regions of a state and build vote estimates for all major parties in each constituency. Then we convert the vote share estimates in each constituency to predicted win probabilities for the top three candidates. Finally, we add up the probability of wins for a given party across all the 543 constituencies and this yields an estimate of the seats for the party.

This methodology, developed over 15 years ago, has yielded useful seat estimates. Of course, this element also has an errting to seat estimates go in opposite directions, which is lucky. Sometimes, the two errors conspire to go together and give bad results.

From October 2005 onwards, CNN-IBN, CSDS-Lokniti and I have done numerous poll projections. Most of these are based on post poll surveys, but occasionally these are also based on pre-election polls. Here is the listing of all such occasions: a comparison of what we said and what happened. I am giving vote share estimates and my seat estimate corresponding to that and the actual vote share and seats.

BIHAR(October, 2005)

 Vote Estimate Vote Actual Seat Estimate Seat Actual JDU-BJP 36 37 127-137 147 RJD+ 31 31 72-80 65 Others 33 32 29-39 31

ASSAM 2006

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 31 31 52-60 53 BJP 11 12 10-15 10 AGP 22 20 25-31 24 Others 36 37 26-35 39

 Vote Estimate Vote Actual Seat Estimate Seat Actual AIADMK+ 35 40 64-74 69 DMK+ 45 45 157-167 163 DMDK 10 8 2-6 1 Others 10 7 - 1

KERALA 2006

 Vote Estimate Vote Actual Seat Estimate Seat Actual LDF 51 49 107-117 98 UDF 41 43 25-31 42 Others 8 8 0-1 0

WEST BENGAL 2006

 Vote Estimate Vote Actual Seat Estimate Seat Actual LF 53 50 230-240 235 INC 16 15 17-23 24 TMC+ 27 29 32-40 31

PUNJAB 2007

 Vote Estimate Vote Actual Seat Estimate Seat Actual SAD-BJP 41 45 50-60 68 Congress 41 41 50-60 44 Others 18 14 3-9 5

UTTARAKHAND 2007

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 31 30 21-27 21 BJP 34 32 33-39 35 Others 35 38 8-12 14

 Vote Estimate Vote Actual Seat Estimate Seat Actual SP 25 25 99-111 97 BSP 29 30 152-168 206 BJP+ 22 18 80-90 52 Congress 11 9 25-33 26 Others 13 18 21-27 26

GUJARAT 2007

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 47 49 92-100 117 Congress 42 39 77-85 62 Others 11 12 3-7 3

KARNATAKA 2008

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 30 34 79 110 Congress 35 35 86 80 JDS 21 19 45 28 Others 14 12 14 6

LOK SABHA 2009

 Vote Estimate Vote Actual Seat Estimate Seat Actual UPA 36 36 210-225 262 NDA 28 24 180-195 159 Left - 8 30-40 24 BSP - 6 24-32 21 Others - 26 - 77

BIHAR 2010

 Vote Estimate Vote Actual Seat Estimate Seat Actual JDU-BJP 46 39 185-201 206 Congress 9 8 6-12 4 RJD-LJP 27 26 22-32 25 Others 18 27 9-19 8

ASSAM 2011

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 36 39 64-72 78 BJP 9 11 7-11 5 AGP 18 16 16-22 10 AIUDF 13 13 11-17 18 Others 24 21 12-20 15

KERALA 2011

 Vote Estimate Vote Actual Seat Estimate Seat Actual LDF 36 45 69-77 68 UDF 45 46 63-71 72 Others 9 9 0 0

 Vote Estimate Vote Actual Seat Estimate Seat Actual DMK+ 44 39 102-114 31 AIDMK+ 46 52 120-132 203 BJP Front 3 2 - - Others 7 7 - -

WEST BENGAL 2011

 Vote Estimate Vote Actual Seat Estimate Seat Actual Left 40 41 60-72 62 TMC+ 50 48 222-234 227 Others 10 11 - 5

UTTARAKHAND 2012

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 39 34 31-41 32 BJP 32 33 22-32 31

PUNJAB 2012

 Vote Estimate Vote Actual Seat Estimate Seat Actual SAD+BJP 41 42 51-63 68 Congress 40 40 48-60 46

MANIPUR 2012

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 30 42 24-32 42 TMC 14 17 7-13 7

 Vote Estimate Vote Actual Seat Estimate Seat Actual SP 34 29 232-250 224 BSP 24 26 65-79 80 BJP+ 14 15 36-44 47 Congress 12 12 28-38 28

GUJARAT 2012

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 48 48 129-141 116 Congress+ 36 39 37-45 60 Others 16 13 4-10 6

 Vote Estimate Vote Actual Seat Estimate Seat Actual Congress 41 43 29-35 36 BJP 40 38 29-35 26

KARNATAKA 2013

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 23 20 39-49 40 Congress 37 37 117-129 122 JDS 20 20 34-44 40 Others 20 23 14-22 21

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 41 45 136-146 165 Congress 35 36 67-77 58 Others 24 19 13-21 7

RAJASTHAN 2013

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 43 45 126-136 162 Congress 33 33 49-57 21 Others 24 22 12-20 16

CHHATISGARH 2013

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 42 41 45-55 49 Congress+ 38 40 32-40 39 Others 20 19 7-13 2

DELHI 2013

 Vote Estimate Vote Actual Seat Estimate Seat Actual BJP 33 34 32-42 31 Congress 23 24 9-17 8 AAP 27 30 13-21 28 Others 17 12 1-5 3

#### 1 comment:

1. Interesting. So if one were to attempt to report an average error statistic for all your seat count predictions so far what would it be?

In fact, what's a good metric to judge the quality of predictions? Is a % error on seat count reasonable? Or do you prefer a better metric?

What would be fun is if someone designed a prediction market for the Indian polls.

Please note: Comments are moderated. Only civilised conversation is permitted on this blog. Criticism is perfectly okay; uncivilised language is not. We delete any comment which is spam, has personal attacks against anyone, or uses foul language. We delete any comment which does not contribute to the intellectual discussion about the blog article in question.

LaTeX mathematics works. This means that if you want to say $10 you have to say \$10.