Sentiment Analysis of Company Specific Tweets

The sentiment analysis of the tweets is a very important task. The sentiment analysis let the user know the emotion of the content.
The sentiment analysis can be used for automatic categorizing the text into a different category of positive, negative and neutral classes.
This python code performs sentiment analysis of tweets downloaded from Twitter.

import simplejson as json
from urllib.request import urlopen
from zipfile import ZipFile
import io
import re
import csv

The Twitter data is stored in the folder ‘twitterData’ as a JSON file. The output of the sentiment analysis is written in a CSV file ‘tweetsSentiScoreAndCls.csv’.
folder and file details

twitterDataFldr='twitterData'
twitterDataJsonFn='tweetsForAllCompany.json'
twitterSentimentInfoFn='tweetsSentiScoreAndCls.csv'

This function performs the tokenization of the given input line by splitting the line into words.

def textTokenizer(input):
    return re.sub('\W+', ' ', input.lower()).split()

This function takes input parameters as tokens and afinnDict. The afinnDict is a dictionary consisting of words and their sentiment scores.
This function computes the total sentiment score for all word of tokens by referring afinnDict dictionary. It also maintains a list of AFINN score for each word. After computing the summation of sentiment score, the function returns total score and sentiment score for each word.

def findAFINNSentiment(tokens, afinnDict):
    total = 0.0
    afinnScores=[]
    for t in tokens:
        if t in afinnDict:
            total += afinnDict[t]
            afinnScores.append(afinnDict[t])
    return total,afinnScores

The following function is used to display tweets and corresponding sentiment values.

def displayTweetAndSentiVals():
    for t in tweets:
        tokens=[textTokenizer(t)]
        afinnTotal = []
        for tweet in tokens:
            total,afinnScore = findAFINNSentiment(tweet, afinnDict)
            print (t,tweet,total,afinnScore)
            afinnTotal.append(total)

The AFINN file is downloaded as AFINN.zip. The following function refers to each entry in the AFINN.zip and stores them to a dictionary afinn.
Each word of AFINN list is the key of the dictionary and its value is the sentiment score. For instance
pretend: -1
pretends: -1
pretending: -1
pretty: 1
prevent: -1
prevented: -1

def AFINNtoDic(zipfileAFINN):
    afinnFile = zipfileAFINN.open('AFINN/AFINN-111.txt','r')
    afinn = dict()
    for line in afinnFile:
        line=line.decode()
        words = line.strip().split()
        if len(words) == 2:
            afinn[words[0]] = int(words[1])
    return afinn

This program utilizes a lexicon from AFINN dictionary. The lexicon consists of a list of words with corresponding sentiment values.
The lexicon AFINN is obtained from www2.compute.dtu.dk. This lexicon is a zip file. An object of type ZipFile is created to read the content of the downloaded AFINN lexicon. The content of the zip file is read into a dictionary.

url = urlopen('http://www2.compute.dtu.dk/~faan/data/AFINN.zip')
zipfileAFINN = ZipFile(io.BytesIO(url.read()))
afinnDict=AFINNtoDic(zipfileAFINN)

The JSON file containing the tweet read and the content of the JSON file is loaded into python variable tweetsAllCompany.

with open(twitterDataFldr+'/'+twitterDataJsonFn,'r') as f:
    tweetsAllComapy=json.load(f)
f.close()

A CSV file is created to write the computed sentiment information about tweets, csv.writer() is used to create CSV file writer.

ftweetSentiCls = open(twitterDataFldr+'/'+twitterSentimentInfoFn, 'w', encoding="utf-8")
csvWriterTweetSentiCls = csv.writer(ftweetSentiCls)

A list tweets is created and added with all the tweets corresponding to a specific company. Each tweet is subjected to tokenization, where the tweet is broken into words. Iterate through all company tweets and compute sentiment score for each tweet.
The AFINN lexicon is referred to compute total sentiment score for tweets.

for company in tweetsAllComapy.keys():
    print ("Sentiment calculation for...",company)
    tweets = []
    for itweets in tweetsAllComapy[company].keys():
        print (itweets,len(tweetsAllComapy[company][itweets]))
        tweets.extend(tweetsAllComapy[company][itweets])
    tokens = [textTokenizer(t) for t in tweets] 
    afinnScoreForAllToks = []
    
    for tOftweet in tokens:
        total,afinnScore = findAFINNSentiment(tOftweet , afinnDict)  
        afinnScoreForAllToks.append(total)

The tweets are segregated into positive, negative and neutral based on the sentiment score. When the total sentiment score is greater than 0 then it is considered as positive, when sentiment score is equal to zero the class type is neutral. For sentiment score of less than zero, tweets are considered as a negative tweet.
The company name, sentiment type, date details of the tweet, the text part of tweet and sentiment score are written to a CSV file ‘tweetsSentiScoreAndCls.csv’.

	positiveTweet = []
    negativeTweet = []
    neutralTweet = []

Seperating Positive, negative and neutral tweets based on afinnScoreForAllToks

    for i in range(len(afinnScoreForAllToks)):
        if afinnScoreForAllToks[i] > 0:
            positiveTweet.append(afinnScoreForAllToks[i])
            csvWriterTweetSentiCls.writerow([company,"positive", str(tweets[i].split("|")[0]), str(tweets[i].split("|")[1]), float(afinnScoreForAllToks[i])])
        elif afinnScoreForAllToks[i] < 0:
            negativeTweet.append(afinnScoreForAllToks[i])
            csvWriterTweetSentiCls.writerow([company,"negative",  str(tweets[i].split("|")[0]), str(tweets[i].split("|")[1]),float(afinnScoreForAllToks[i])])
        else:
            neutralTweet.append(afinnScoreForAllToks[i])
            csvWriterTweetSentiCls.writerow([company,"neutral", str(tweets[i].split("|")[0]), str(tweets[i].split("|")[1]),float(afinnScoreForAllToks[i])])
    
ftweetSentiCls.close()

Leave a Reply

Your email address will not be published. Required fields are marked *