The approach that we have been following in the current analysis is based on the work by Tsakalidis et al. . Focusing on the 2014 EU elections in three different countries, the authors demonstrated that Twitter data can be incorporated effectively into poll-based regression models in order to predict election results. Several parameters of their method were improved and the model was applied again in the Greek general elections in January 2015, achieving better results than all the most recent 31 opinion polls, as well as the three exit polls that were announced once the ballots closed. The methodology has been now adjusted for the UK general elections, using different features and tools.
Since the 10th of March we have been using Twitter’s streaming API in order to aggregate tweets written in English and containing various UK political-related keywords. Such keywords include the names of the main parties, their leaders, their Twitter accounts, etc. Working on our collected data, we have been extracting several features for each party and every political leader on a daily basis, generating different time series based on those. Some examples of such features can be found in the Timeline. For the sentiment analysis task – that is, how to determine a tweet’s sentiment – we have been following the approach developed by Townsend et al. . In this approach, different sets of features are extracted out of every tweet (lexicon-based, ngrams, word embeddings and others) and the sentiment of a tweet is then determined based on a previously trained model.
At the same time, we have been manually collecting opinion polls from different sources, creating time-series of the reported values for all parties. The values of both twitter and poll features are then normalised, and provided as an input to a regression model, which tries to forecast their values of the polls for the next day. Our final prediction for every party is the prediction for the poll-based feature of that party.
Issues and Future Directions
The approach we have been following has been successfully applied in four different cases, albeit with some modifications. However, there do exist some important differences to consider for the case of UK general elections.
The first of them is the Twitter crawling process itself. In all of our past approaches we have been looking for tweets written in the respective language (German, Dutch, Greek) and we are following the same approach for the UK general elections as well. However, looking for the keyword “conservative” in tweets written in English returns noisy data; in other words, it is not necessarily the case that our collected tweets are indeed about the UK elections. Furthermore, we have noticed a considerable amount of spam tweets in our collection. Word sense disambiguation and spam filtering are two directions for future analysis that we have not employed in the project so far and could probably improve our results.
The second difference refers to the sentiment analysis algorithm. In the current project we are particularly interested in studying the effects of applying an entity-level (target-specific) sentiment analysis approach, instead of a tweet-level approach. The latter one refers to determining the sentiment of the whole tweet, whereas the first one refers to determining the sentiment towards the entities (e.g., different politicians, parties or topics such as “immigration”) mentioned in a tweet. Tsakalidis et al.  have demonstrated that even naive sentiment analysis methods can work effectively for the election prediction task. One would expect that incorporating more appropriate methods for this task to have a positive impact on the forecaster’s accuracy; however, this study has been left for post-elections analysis. We have been collecting a cleaner set of tweets annotated according to entity and topic for this purpose.
Finally, the extracted twitter features are different from the past approaches and it is the first time that the model is tested using those as an input.
 A. Tsakalidis, S. Papadopoulos, A.I. Cristea, and Y. Kompatsiaris (2015). Predicting Elections for Multiple Countries Using Twitter and Polls. Intelligent Systems, IEEE, 30(2), 10-17
 R. Townsend, A. Tsakalidis, Y. Zhou, B. Wang, M. Liakata, A. Zubiaga, A.I. Cristea and R. Procter. (2015). From Phrase-Based to Target-Specific Sentiment Recognition. To appear in the Proceedings of Semeval 2015