top of page

An in-depth analysis of US political  discussion on Reddit

Browse subreddits or Political Leader:

Using R to scrape and analyse Reddit comments

Summary

This project looks at the polarisation of US politics in terms of language,  please be aware that as a result, strong and foul language may feature from here on.

Reddit is the so called 'Front page of the Internet' is the first port of call for many internet users when it comes to politics. 

​

It is also no secret that the freedom the internet provides often leads to use of abusive, racist and other foul language. This project aims to take a closer look a the language used on the social news network Reddit, with a specific focus on US politics by looking at four key topics in four subreddits.

 

r/The_Donald (A highly right wing subreddit), r/Politics (A neutral subreddit), and r/All (all posts from all subreddits, this will be used mainly as a comparison. 

​

Using the stat language R and redditextractoR, a special R package. I have scrapped Reddit comments directly containing the following key terms 'Trump' 'Obama' 'Democrats' 'Republicans'. These comments were then broken down into individual words for sentiment analysis using the Afinn scoring system. This gives all words a score between 5 and -5 based on the positive or negative connotations of the word. I also used an 'adjusted score' system, which multiplied the score by the frequency of the word.

​

I then compared the average word values and distributions of the results to look at the way in which each subreddit talks about ideas relating to differing political opinion and party. 

​

The results, before being broken down into individual subreddits, showed an slant to negativity, with a positive to negative ration of 47:77. 

​

The violin graph opposite shows the break down of words by their Afinn score

This plot shows the overall distribution of all the comments scraped for all subreddits and topics comments by their score. It is clear to see that negative comments outweigh their positive counterparts.

90,000+

Reddit comments scraped

Relevant words scored with Afinn

12,000+

47 : 77

Overall ratio of positive comments to negative comments

-0.55

Mean Afinn word score for all subreddits

Conclusion

Reddit is the so called 'Front page of the Internet' is the first port of call for many internet users when it comes to politics. 

​

Now looking at our data as a whole, we can see that many of the results are as we expected, of course r/The_Donald is going to talk about Donald Trump in a positive way. The analysis has shown some interesting results as well however, looking at the 100 most common words and their distribution in r/laterstagecapitalism we can see that there is a strong use of positive language relating to Trump, with a mean word score of -0.58, that while negative, is actually higher than r/all which hovers around -0.7.

​

Looking at r/politics, we can see that the comment distribution with regards the current and former presidents, across the top 100 words produced near identical mean word scores. Surprisingly, there were more positive words associated with Donald Trump than Obama.

​

What we can conclude from this analysis is that the majority of Reddit comments are carry a negative sentiment score, and that language use in this political context is often polarised. That being high use of strongly positive and negative words, and less use of more intermediate words. Could this be a result of the bipartisan nature of US politics, or perhaps a symptom of internet anonymity causing people to be more extreme in their language use. A wider analysis of non-political subreddits and topics may provide an answer.

bottom of page