Search this blog

Thursday 27 November 2014

Sentiment Analysis: British politicians compared with a Happiness Histogram.

I'm currently making some text mining videos one of which is about sentiment analysis. For fun, I thought I would analyse the sentiment of speeches given at their respective party conferences by three current British politicians, David Cameron, Nick Clegg and Ed Miliband to see what we can learn. Of course, and I stress, this is by no means an exhaustive and thorough analysis; it's just a bit of fun.

I used RapidMiner and the Text Mining and WordNet extensions. Specifically, the WordNet 3.0 and the SentiWordNet 3.0.0 database. I divided each text into tokens (i.e. words) and then split the text into consecutive equal sized parts with 100 words in each. I then used the Extract Sentiment (English) operator to score each of the parts with a sentiment. This ranges between +1 for positive and -1 for negative. I used a dash of R to draw some of the graphs below with the advanced charts of RapidMiner being used for the last one.

Let's compare the three speeches using a histogram of the sentiments - the Happiness Histogram. The colours represent the parties (Ed Miliband: Red, Nick Clegg: Orange, David Cameron: Blue). The graphs show the sentiment distribution for each 100 word part of the document and you can see that the values range between +0.1 and -0.04. With 100 words you would not expect very high scores because the sentiment calculation simply applies a sentiment value to each word and averages for all words. Nonetheless, the variations are slightly more than would be expected from random sampling; I did some brief checking to confirm this.



This next graph compares them directly.
We notice that the speeches are resolutely perky in that they are always more positive than negative on average. The Miliband speech has an outlying region of happiness (ironically to the right) whereas the other two are more middle of the road,

Now let's see how sentiment varies as we move through the speeches.


This graph is a moving average of 10 data points (i.e. 1000 words) for each of the 3 speeches with the colours as before. The minutes axis corresponds to a speaking rate of 125 words a minute which is what I observed the speeches averaged to. This means the first moving average starts at 1000 words or at about 8 minutes in.

It's quite interesting to see how the different politicians vary sentiment. Ed Miliband approaches the end of the speech in a series of steps gradually getting happier with mini-spells of relative gloom. Nick Clegg seems to get more and more positive but perhaps peaks too early and ends on a down. David Cameron starts happy, gets gloomy then quickly recovers but again maybe too early and ends on a down. Perhaps Messrs Clegg and Cameron have to temper what they say with the realism of being in government.

It is also possible to correlate the extremes of the sentiment with the words being used. There is a wealth of detail and interesting things to note but time prevents me from detailing this today and so I will save that for another post.