by Rajeeva Karandikar, Director, Chennai Mathematical Institute.
My previous blog posts (link, link), showed that opinion polls help us predict the vote shares of major parties (or alliances). This leads us to the next question: converting estimates of vote share into estimates of seats. This is much harder than meets the eye. Sampling is not done in all constituencies. In constituencies where we have respondents in our sample, the sample size is not large enough to predict the winner in these constituencies in isolation.
Hence, we have to build a mathematical model of voting behavior. It is widely believed that an individual's identity (caste, religion, economic status, gender, age) plays a role in his/her vote. Moreover, this behavior varies from state to state. So if one were to build a model incorporating these parameters, we will end up having a large number of free parameters, particularly as these correlations are likely to change from state to state. Data on these parameters at the constituency level is not available, as census data is compiled and reported at the district level. Thus, this approach is unlikely to yield a good result.
To get a tractable solution, we do not need to build a model for voting intention of an individual voter- it suffices to build a model for voting behavior of a constituency. One can assume that the socio-economic composition of a constituency does not undergo a major change in 5 years (this is true for most constituencies). We assume that the change in vote share for a given party in a constituency, from the previous election to the current one, is constant across a state, or a smaller geographic sub-region of a larger state. We call this the uniform swing assumption.
If the sample size at the state level, or a sub-region in a large state is adequate, we can estimate these vote shares via a methodologically proper poll.
Then using the uniform swing assumption described above, using actual data from the previous election and estimating vote shares of parties across a state or a sub-region of a state enables us to estimate vote shares of major parties in every constituency.
This is a crude model! In the historical experience, the reality has diverged from the uniform swing assumption. However, it turns out that with some further work, this model yields fairly good estimates of seats at the national level.
Consider a scenario where in one constituency, out of a sample of size 101, Candidate A gets 52 votes while candidate B gets 49 and in an adjacent constituency also on sample size 101, Candidate C gets 59 votes while candidate D gets 42. While we can be fairly confident that C will win, the same cannot be said of A. What is the best case scenario for B? The scenario is that A and B are almost neck-to-neck with B having a slight edge and yet a sample of size 101 shows B to be behind A by 3 votes. The probability of this happening is the same as the probability of seeing 49 or less heads in 101 tosses of a fair coin, which is 0.42. We assign B a winning probability of 0.42 and A a winning probability of 0.58. On the other hand, the probability of 42 or less heads in 101 tosses of a fair coin is 0.06. Thus, we assign a winning probability of 0.06 to candidate D while 0.94 to C.
This analogy can be extended to 3 or more significant candidates. We have been using this for the top three candidates: first the best case scenario for the third candidate, then the best case scenario for the second candidate and the remaining for the first candidate. This needs an assessment of the standard deviation of the vote estimates.
To summarise, based on an opinion poll (or our day-after poll), we obtain statewide vote shares and vote shares in sub-regions of a state and build vote estimates for all major parties in each constituency. Then we convert the vote share estimates in each constituency to predicted win probabilities for the top three candidates. Finally, we add up the probability of wins for a given party across all the 543 constituencies and this yields an estimate of the seats for the party.
This methodology, developed over 15 years ago, has yielded useful seat estimates. Of course, this element also has an errting to seat estimates go in opposite directions, which is lucky. Sometimes, the two errors conspire to go together and give bad results.
From October 2005 onwards, CNN-IBN, CSDS-Lokniti and I have done numerous poll projections. Most of these are based on post poll surveys, but occasionally these are also based on pre-election polls. Here is the listing of all such occasions: a comparison of what we said and what happened. I am giving vote share estimates and my seat estimate corresponding to that and the actual vote share and seats.
BIHAR(October, 2005)
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
JDU-BJP | 36 | 37 | 127-137 | 147 |
RJD+ | 31 | 31 | 72-80 | 65 |
Others | 33 | 32 | 29-39 | 31 |
ASSAM 2006
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 31 | 31 | 52-60 | 53 |
BJP | 11 | 12 | 10-15 | 10 |
AGP | 22 | 20 | 25-31 | 24 |
Others | 36 | 37 | 26-35 | 39 |
TAMIL NADU 2006
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
AIADMK+ | 35 | 40 | 64-74 | 69 |
DMK+ | 45 | 45 | 157-167 | 163 |
DMDK | 10 | 8 | 2-6 | 1 |
Others | 10 | 7 | - | 1 |
KERALA 2006
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
LDF | 51 | 49 | 107-117 | 98 |
UDF | 41 | 43 | 25-31 | 42 |
Others | 8 | 8 | 0-1 | 0 |
WEST BENGAL 2006
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
LF | 53 | 50 | 230-240 | 235 |
INC | 16 | 15 | 17-23 | 24 |
TMC+ | 27 | 29 | 32-40 | 31 |
PUNJAB 2007
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
SAD-BJP | 41 | 45 | 50-60 | 68 |
Congress | 41 | 41 | 50-60 | 44 |
Others | 18 | 14 | 3-9 | 5 |
UTTARAKHAND 2007
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 31 | 30 | 21-27 | 21 |
BJP | 34 | 32 | 33-39 | 35 |
Others | 35 | 38 | 8-12 | 14 |
UTTAR PRADESH 2007
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
SP | 25 | 25 | 99-111 | 97 |
BSP | 29 | 30 | 152-168 | 206 |
BJP+ | 22 | 18 | 80-90 | 52 |
Congress | 11 | 9 | 25-33 | 26 |
Others | 13 | 18 | 21-27 | 26 |
GUJARAT 2007
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 47 | 49 | 92-100 | 117 |
Congress | 42 | 39 | 77-85 | 62 |
Others | 11 | 12 | 3-7 | 3 |
KARNATAKA 2008
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 30 | 34 | 79 | 110 |
Congress | 35 | 35 | 86 | 80 |
JDS | 21 | 19 | 45 | 28 |
Others | 14 | 12 | 14 | 6 |
LOK SABHA 2009
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
UPA | 36 | 36 | 210-225 | 262 |
NDA | 28 | 24 | 180-195 | 159 |
Left | - | 8 | 30-40 | 24 |
BSP | - | 6 | 24-32 | 21 |
Others | - | 26 | - | 77 |
BIHAR 2010
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
JDU-BJP | 46 | 39 | 185-201 | 206 |
Congress | 9 | 8 | 6-12 | 4 |
RJD-LJP | 27 | 26 | 22-32 | 25 |
Others | 18 | 27 | 9-19 | 8 |
ASSAM 2011
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 36 | 39 | 64-72 | 78 |
BJP | 9 | 11 | 7-11 | 5 |
AGP | 18 | 16 | 16-22 | 10 |
AIUDF | 13 | 13 | 11-17 | 18 |
Others | 24 | 21 | 12-20 | 15 |
KERALA 2011
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
LDF | 36 | 45 | 69-77 | 68 |
UDF | 45 | 46 | 63-71 | 72 |
Others | 9 | 9 | 0 | 0 |
TAMIL NADU 2011
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
DMK+ | 44 | 39 | 102-114 | 31 |
AIDMK+ | 46 | 52 | 120-132 | 203 |
BJP Front | 3 | 2 | - | - |
Others | 7 | 7 | - | - |
WEST BENGAL 2011
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Left | 40 | 41 | 60-72 | 62 |
TMC+ | 50 | 48 | 222-234 | 227 |
Others | 10 | 11 | - | 5 |
UTTARAKHAND 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 39 | 34 | 31-41 | 32 |
BJP | 32 | 33 | 22-32 | 31 |
PUNJAB 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
SAD+BJP | 41 | 42 | 51-63 | 68 |
Congress | 40 | 40 | 48-60 | 46 |
MANIPUR 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 30 | 42 | 24-32 | 42 |
TMC | 14 | 17 | 7-13 | 7 |
UTTAR PRADESH 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
SP | 34 | 29 | 232-250 | 224 |
BSP | 24 | 26 | 65-79 | 80 |
BJP+ | 14 | 15 | 36-44 | 47 |
Congress | 12 | 12 | 28-38 | 28 |
GUJARAT 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 48 | 48 | 129-141 | 116 |
Congress+ | 36 | 39 | 37-45 | 60 |
Others | 16 | 13 | 4-10 | 6 |
HIMANCHAL PRADESH 2012
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
Congress | 41 | 43 | 29-35 | 36 |
BJP | 40 | 38 | 29-35 | 26 |
KARNATAKA 2013
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 23 | 20 | 39-49 | 40 |
Congress | 37 | 37 | 117-129 | 122 |
JDS | 20 | 20 | 34-44 | 40 |
Others | 20 | 23 | 14-22 | 21 |
MADHYA PRADESH 2013
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 41 | 45 | 136-146 | 165 |
Congress | 35 | 36 | 67-77 | 58 |
Others | 24 | 19 | 13-21 | 7 |
RAJASTHAN 2013
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 43 | 45 | 126-136 | 162 |
Congress | 33 | 33 | 49-57 | 21 |
Others | 24 | 22 | 12-20 | 16 |
CHHATISGARH 2013
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 42 | 41 | 45-55 | 49 |
Congress+ | 38 | 40 | 32-40 | 39 |
Others | 20 | 19 | 7-13 | 2 |
DELHI 2013
| Vote Estimate | Vote Actual | Seat Estimate | Seat Actual |
BJP | 33 | 34 | 32-42 | 31 |
Congress | 23 | 24 | 9-17 | 8 |
AAP | 27 | 30 | 13-21 | 28 |
Others | 17 | 12 | 1-5 | 3 |
Interesting. So if one were to attempt to report an average error statistic for all your seat count predictions so far what would it be?
ReplyDeleteIn fact, what's a good metric to judge the quality of predictions? Is a % error on seat count reasonable? Or do you prefer a better metric?
What would be fun is if someone designed a prediction market for the Indian polls.