In this post, we will look at how to analyze text from Twitter. We will do each of the following for tweets that refer to Donald Trump and tweets that refer to Barrack Obama.
- Conduct a sentiment analysis
- Create a word cloud
This is a somewhat complex analysis so I am assuming that you are familiar with Python as explaining everything would make the post much too long. In order to achieve our two objectives above we need to do the following.
- Obtain all of the necessary information from your twitter apps account
- Download the tweets & clean
- Perform the analysis
Before we begin, here is a list of modules we will need to load to complete our analysis
import wordcloud import matplotlib.pyplot as plt import twython import re import numpy
Obtain all Needed Information
From your twitter app account, you need the following information
- App key
- App key secret
- Access token
- Access token secret
All this information needs to be stored in individual objects in Python. Then each individual object needs to be combined into one object. The code is below.
TWITTER_APP_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXX TWITTER_APP_KEY_SECRET=XXXXXXXXXXXXXXXXXXX TWITTER_ACCESS_TOKEN=XXXXXXXXXXXXXXXXXXXXX TWITTER_ACCESS_TOKEN_SECRET=XXXXXXXXXXXXXX t=twython.Twython(app_key=TWITTER_APP_KEY,app_secret=TWITTER_APP_KEY_SECRET,oauth_token=TWITTER_ACCESS_TOKEN,oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET)
In the code above we saved all the information in different objects at first and then combined them. You will of course replace the XXXXXXX with your own information.
Next, we need to create a function that will pull the tweets from Twitter. Below is the code,
def get_tweets(twython_object,query,n): count=0 result_generator=twython_object.cursor(twython_object.search,q=query) result_set=[] for r in result_generator: result_set.append(r['text']) count+=1 if count ==n: break return result_set
You will have to figure out the code yourself. We can now download the tweets.
Downloading Tweets & Clean
Downloading the tweets involves making an empty dictionary that we can save our information in. We need two keys in our dictionary one for Trump and the other for Obama because we are downloading tweets about these two people.
There are also two additional things we need to do. We need to use regular expressions to get rid of punctuation and we also need to lower case all words. All this is done in the code below.
tweets={} tweets['trump']=[re.sub(r'[-.#/?!.":;()\']',' ',tweet.lower())for tweet in get_tweets(t,'#trump',1500)] tweets['obama']=[re.sub(r'[-.#/?!.":;()\']',' ',tweet.lower())for tweet in get_tweets(t,'#obama',1500)]
The get_tweets function is also used in the code above along with our twitter app information. We pulled 1500 tweets concerning Obama and 1500 tweets about Trump. We were able to download and clean our tweets at the same time. We can now do our analysis
Analysis
To do the sentiment analysis you need dictionaries of positive and negative words. The ones in this post were taken from GitHub. Below is the code for loading them into Python.
positive_words=open('XXXXXXXXXXXX').read().split('\n') negative_words=open('XXXXXXXXXXXX').read().split('\n')
We now will make a function to calculate the sentiment
def sentiment_score(text,pos_list,neg_list): positive_score=0 negative_score=0 for w in text.split(' '): if w in pos_list:positive_score+=1 if w in neg_list:negative_score+=1 return positive_score-negative_score
Now we create an empty dictionary and run the analysis for Trump and then for Obama
tweets_sentiment={} tweets_sentiment['trump']=[sentiment_score(tweet,positive_words,negative_words)for tweet in tweets['trump']] tweets_sentiment['obama']=[sentiment_score(tweet,positive_words,negative_words)for tweet in tweets['obama']]
Now we can make visuals of our results with the code below
trump=plt.hist(tweets_sentiment['trump'],5) obama=plt.hist(tweets_sentiment['obama'],5)
Obama is on the left and trump is on the right. It seems that trump tweets are consistently more positive. Below are the means for both.
numpy.mean(tweets_sentiment['trump']) Out[133]: 0.36363636363636365 numpy.mean(tweets_sentiment['obama']) Out[134]: 0.2222222222222222
Trump tweets are slightly more positive than Obama tweets. Below is the code for the Trump word cloud
Here is the code for the Obama word cloud
A lot of speculating can be made from the word clouds and sentiment analysis. However, the results will change every single time because of the dynamic nature of Twitter. People are always posting tweets which changes the results.
Conclusion
This post provided an example of how to download and analyze tweets from twitter. It is important to develop a clear idea of what you want to know before attempting this sort of analysis as it is easy to become confused and not accomplish anything.