Can we see the bias in political speeches through machine learning? I took a look at Danish parliamentary speech data, and, using the word2vec algorithm, tried to see what interesting biases might show up. Done in collaboration with (in Danish)

 Go to Zetland Article

Details – Detection

Word2vec is an algorithm that can detect related words based on sentence context. The idea is that words appearing in the same context are semantically related. For instance, in the following sentences:

The house had many floors.
The building had many floors.

Both house and building occur around the same words. With enough data, we start to see fairly reliable detection of similar words.

For a more detailed explanation of word2vec, reasoning and methodology, please look at my similar post on Swedish parliamentary data.

Sanity Checking

As mentioned in the previous post, one major issue with this approach is interpreting the results without political knowledge. For instance, we saw bureaucracy and trouble had strong similarity measures. That could mean politicians are using these two words interchangeably, or it could mean there was a law called “Bureaucracy is Trouble” that they happened to talk about all the time.

Luckily, Philip Flores helped make sure none of the results we saw were from any strange political factors like that. We can be fairly certain that the results reported were from approximate semantic similarity.


In the end, this kind of analysis produces a good mix of the interesting and the obvious.  Hopefully, these techniques can be refined to better detect and pinpoint bias.


blankNimish Gåtam