Stock Status prediction using Twitter Sentiment Information

This python code performs prediction of company status. The twitter sentiment information related to the company are used as features and prediction of stock status is performed.

import numpy as np
from sklearn import svm
from sklearn.model_selection import cross_validate
from sklearn.utils import shuffle

The features extracted from tweets are stored in “stockpredict.txt”. This file is kept in ‘twFeaturesAndCls’.

featuresFldr='twFeaturesAndCls'
featuresFn="stockpredict.txt"

Open the file containing the features and classes

fileFeas = open(featuresFldr+'/'+featuresFn, 'r',encoding="utf-8")

Create a Python vector by loading the text file into it. The vector consists of features and class.

vector = np.loadtxt(fileFeas,dtype=float, delimiter=',')

Separate features and targe classes

features = np.array(vector[:,0:-1], dtype='float')
target = vector[:,-1]

The collection of tweets for different companies are subjected to sentiment analysis.
After performing sentiment analysis various features are collected for tweets. The feature consists of the number of positive tweets, the number of negative tweets, the number of neutral counts, the number of total counts and stock status
Here the target is the stock status, which indicates whether the stock price went up or down as compared to opening and closing prices. Status is 1 if the stock went up otherwise it is -1.
For example below are few features and target formed for few tweets.
5,1,4,10,-1
1,0,9,10,1
4,1,5,10,1
3,1,6,10,-1
3,3,4,10,-1
1,2,7,10,-1
1,0,7,8,-1
The normalization of features is performed by dividing feature value/total number of tweets. Another way of normalization is feature value/(total number of tweet – number of neutral tweets).
The number of positive tweets (at position 0) and the number of negative tweets (at position 1) give a better perspective of user emotional opinion. Therefore these two features are used for classification.

normlzdFeas=np.zeros((features.shape[0],4))
normlzdFeas[:,0]=features[:,0]/(1+features[:,3])
normlzdFeas[:,1]=features[:,1]/(1+features[:,3])
normlzdFeas[:,2]=features[:,0]/(1+features[:,3]-features[:,2])
normlzdFeas[:,3]=features[:,1]/(1+features[:,3]-features[:,2])

X the input feature vector is created using normalized features. y is the array of target classes (-1 or 1).
The SVM classifier is constructed using training set and 5 fold cross validation is performed using cross_validate().

X = np.array(normlzdFeas)
y = np.array(target)
X,y=shuffle(X,y)
clf=svm.SVC(C=0.1, tol=0.001, max_iter=100, random_state=1, verbose=1)
scoring = ['accuracy','precision_macro', 'recall_macro']
scores = cross_validate(clf, X, y, scoring=scoring,cv=5, return_train_score=False)
print(scores.keys())
print(scores['test_accuracy'])
fileFeas.close()

 

Leave a Reply

Your email address will not be published. Required fields are marked *