Twitter Data Features Extraction and Status of Stock

The features from twitter text data are created here. This program code reads twitter data and financial data and combines both of them. From twitter data, text related features are extracted. Using financial data for companies two classes such as stock up and down are prepared.

import datetime
import numpy as np
import csv
import simplejson as json

Following folders are used to read data from Twitter and finance. The twitter related data is stored in the folder ‘twFeaturesAndCls’, finance related information is kept in ‘yahooFinData’.
The CSV file ‘tweetsSentiScoreAndCls.csv’ stores tweets and sentiment score for tweets. The JSON files ‘stockPriceOpenAllCompany.json’ and ‘stockPriceCloseAllCompany.json’ keep open prices and close prices respectively for various stocks.
folder and file details

featuresFldr='twFeaturesAndCls'
financeDataFldr='yahooFinData'
twitterDataFldr='twitterData'
featuresFn="stockpredict.txt"
tweetSentiScrFn='tweetsSentiScoreAndCls.csv'
stockPrzOpenFn='stockPriceOpenAllCompany.json'
stockPrzCloseFn='stockPriceCloseAllCompany.json'

A file is created to write the features related to tweets, the file name is “stockpredict.txt”. This file is stored in folder ‘twFeaturesAndCls’.
open a file for writing the features and class information

file = open(featuresFldr+'/'+featuresFn, "w",encoding="utf-8")

The files ‘stockPriceOpenAllCompany.json’ and ‘stockPriceCloseAllCompany.json’ are loaded into Python variables stockPriceOpenAllCompany and stockPriceCloseAllCompany using json.load() function.

with open(financeDataFldr+'/'+stockPrzOpenFn,'r') as f:
    stockPriceOpenAllCompany=json.load(f)

Read the stock close prices from the JSON file

with open(financeDataFldr+'/'+stockPrzCloseFn,'r') as ff:
    stockPriceCloseAllCompany=json.load(ff)

The CSV file ‘tweetsSentiScoreAndCls.csv’ which contains tweets related information is read using csv.reader().

fileRCSV=open(twitterDataFldr+'/'+tweetSentiScrFn, 'r',encoding='utf=8')

All the tweets from CSV file ‘tweetsSentiScoreAndCls.csv’ to a list. A csv.reader() on the given file is obtained. This is a reader object using which the list of all tweet related information prepared.

inpAllTweets = csv.reader(fileRCSV, delimiter=',')
inpAllTweets=list(inpAllTweets)

The CSV file for twitter data has the information such as the abbreviation of the company, sentiment of the tweet, date of the tweet, the text of the tweet and the sentiment score.
For instance, following information about tweets
AAPL neutral 11/14/2018 #bitcoin #btc #crypto video still coming tonight. This will also go over the movements in the $spy $aapl $amzn $brk… https://t.co/eUBGeBhtND 0
AAPL neutral 11/14/2018 RT @Jessicalessin: Killer iPhone chart that shows price != profit for $aapl. https://t.co/jDD37ScgCb 0
AAPL neutral 11/14/2018 RT @AsshatTrading: When a day goes according to plan @UTR_INFO $AMZN $SPY $SPX $AAPL https://t.co/hFnzumPH8x 0
AAPL positive 11/14/2018 RT @SlopeOfHope: After careful consideration, I have decided that the public’s obsession with the TRILLION DOLLAR valuation of $AMZN and $A… 2
The following function goes through all the tweets and related information. For a company abbreviation function will collect the tweet related information and date corresponding to tweets into two lists compTweets and compTweetsOnDate.

def collectCompTweetsAndDates(companyAbb,inpAllTweets):
    compTweetsOnDate =[]
    compTweets= []
    for row in inpAllTweets:
        if len(row) == 5 and row[0] == companyAbb:            
            compTweets.append(row)
            print(row[0],companyAbb,row)
            date = row[2]
            compTweetsOnDate.append(date)
    return compTweets,compTweetsOnDate

On a particular date for a given company, the number of positive sentiment tweets, negative sentiment tweets, and neutral sentiment tweets is determined. The total number of tweets also found for the given date. This statistic gives information about how many total tweets are posted for a company, how many tweets were positive, how many were negative and neutral.

def getDatewiseSentiDetail(aDate,compTweets):
    dateTotalCount = 0
    datePosCount = 0
    dateNegCount = 0
    dateNutCount = 0
    totalSentimentScore = 0.
    for row in compTweets:
        sentiment = row[1]
        temp_date = row[2]
        sentiment_score = row[4]
        if(temp_date == aDate):
            totalSentimentScore += float(sentiment_score)
            dateTotalCount+=1
            if (sentiment == 'positive'):
                datePosCount+=1
            elif (sentiment == 'negative'):
                dateNegCount+=1
            elif (sentiment == 'neutral'):
                dateNutCount+=1    
    s = str(dateTotalCount)+" "+str(datePosCount)+" "+str(dateNegCount)+" "+str(dateNutCount)
    return s,dateTotalCount,datePosCount, dateNegCount, dateNutCount

For a given company and on a particular date, open stock price and close stock price are gathered from company_open_price and company_close_price. The company_open_price consists of all the stock’s opening price in a given period of date. And company_close_price has all the stock’s closing prices in a period.
When the date is on a week then open stock price and close stock price are fetched from company_open_price and company_close_price respectively. If the date falls on Saturday or Sunday then the date is adjusted to Friday by taking previous dates. Then for this adjusted date, open price and close price are accessed.

def getStockPriceDetails(aDate,company_open_price,company_close_price):
    aDate = aDate.strip()
    day = datetime.datetime.strptime(aDate, '%Y-%m-%d').strftime('%A')
    closingPrice = 0.
    openingPrice = 0.
    if day == 'Saturday':
        aDateParticulars = aDate.split("-")
        if len(str((int(aDateParticulars[2])-1)))==1:
            aDate = aDateParticulars[0]+"-"+aDateParticulars[1]+"-0"+str((int(aDateParticulars[2])-1))
        else:
            aDate = aDateParticulars[0] + "-" + aDateParticulars[1] + "-" + str((int(aDateParticulars[2]) - 1))
        dateInMonthDayYr=datetime.datetime.strptime(aDate,'%Y-%m-%d')
        dateInMonthDayYr=dateInMonthDayYr.strftime("%b %d %Y")
        openingPrice = company_open_price[dateInMonthDayYr]
        closingPrice = company_close_price[dateInMonthDayYr]
    elif day == 'Sunday':
        aDateParticulars = aDate.split("-")
        if len(str((int(aDateParticulars[2])-2)))==1:
            aDate = aDateParticulars[0]+"-"+aDateParticulars[1]+"-0"+str((int(aDateParticulars[2])-2))
        else:
            aDate = aDateParticulars[0] + "-" + aDateParticulars[1] + "-" + str((int(aDateParticulars[2]) - 2))
        dateInMonthDayYr=datetime.datetime.strptime(aDate,'%Y-%m-%d')
        dateInMonthDayYr=dateInMonthDayYr.strftime("%b %d %Y")
        openingPrice = company_open_price[dateInMonthDayYr]
        closingPrice = company_close_price[dateInMonthDayYr]
    else:
        dateInMonthDayYr=datetime.datetime.strptime(aDate,'%Y-%m-%d')
        dateInMonthDayYr=dateInMonthDayYr.strftime("%b %d %Y")
        openingPrice = company_open_price[dateInMonthDayYr]
        closingPrice = company_close_price[dateInMonthDayYr]
    return dateInMonthDayYr,dateInMonthDayYr,openingPrice,closingPrice

For each compnay abbreviation in stockPriceOpenAllCompany collect the openning stock price and closing stock price as per dates into company_open_price and company_close_price, shown in following
Opening prices “AAPL”: {“Nov 14 2018”: 193.9, “Nov 13 2018”: 191.63, “Nov 12 2018”: 199.0, “Nov 09 2018”: 205.55, “Nov 08 2018”: 209.98}}
Closing prices {“AAPL”: {“Nov 14 2018”: 186.8, “Nov 13 2018”: 192.23, “Nov 12 2018”: 194.17, “Nov 09 2018”: 204.47, “Nov 08 2018”: 208.49}}
For a given company abbreviation collect all the tweets and dates from the twitter data. For each of the unique date from the collected dates determined the number of total tweets, the number of positive tweets, the number of negative tweets and number of neutral tweets.
Suppose on 11/14/2018 ten tweets related to AAPL are collected. Out of these ten tweets, 7 are neutral, two positive and one negative tweet. This statistics is collected using getDatewiseSentiDetail().
Using function getStockPriceDetails() all the trade information such as opening price, the closing price is gathered on a particular date.
When the closing stock price is greater than opening price then the status of the stock is considered as a UP and when the closing price is less than opening price then it is taken as DOWN.
For each company stock, the information such as the number of positive tweets, number of negative tweets, number of neutral counts, number of total counts and stock status are written to a file(“stockpredict.txt”).
The following shows as an example:
5,1,4,10,-1
1,0,9,10,1
4,1,5,10,1
3,1,6,10,-1
3,3,4,10,-1
1,2,7,10,-1
1,0,7,8,-1

for companyAbb in stockPriceOpenAllCompany.keys():
    company_open_price=stockPriceOpenAllCompany[companyAbb]
    company_close_price=stockPriceCloseAllCompany[companyAbb]
    print (companyAbb)
    compTweets,compTweetsOnDate=collectCompTweetsAndDates(companyAbb,inpAllTweets)
    datewiseSentiDetails = {}
    for aDate in np.unique(compTweetsOnDate):
        s,dateTotalCount,datePosCount,dateNegCount,dateNutCount=getDatewiseSentiDetail(aDate,compTweets)
        dateInMonthDayYr,dateInMonthDayYr,openingPrice,closingPrice=getStockPriceDetails(aDate,company_open_price,company_close_price)
        compMarketStatus = 0
        if (float(closingPrice)-float(openingPrice)) > 0:
            compMarketStatus = 1
        else:
            compMarketStatus =-1
        file.write( str(datePosCount) + "," + str(dateNegCount) + "," + str(dateNutCount) +"," + str(dateTotalCount) + "," + str(compMarketStatus) + "\n")
file.close()
fileRCSV.close()
print( "Dataset cotaining sentiment info and stock status is prepared.\n")

 

Leave a Reply

Your email address will not be published. Required fields are marked *