This python code performs prediction of company status. The twitter sentiment information related to the company are used as features and prediction of stock status is performed.
import numpy as np from sklearn import svm from sklearn.model_selection import cross_validate from sklearn.utils import shuffle
The features extracted from tweets are stored in “stockpredict.txt”. This file is kept in ‘twFeaturesAndCls’.
Open the file containing the features and classes
fileFeas = open(featuresFldr+'/'+featuresFn, 'r',encoding="utf-8")
Create a Python vector by loading the text file into it. The vector consists of features and class.
vector = np.loadtxt(fileFeas,dtype=float, delimiter=',')
Separate features and targe classes
features = np.array(vector[:,0:-1], dtype='float') target = vector[:,-1]
The collection of tweets for different companies are subjected to sentiment analysis.
After performing sentiment analysis various features are collected for tweets. The feature consists of the number of positive tweets, the number of negative tweets, the number of neutral counts, the number of total counts and stock status
Here the target is the stock status, which indicates whether the stock price went up or down as compared to opening and closing prices. Status is 1 if the stock went up otherwise it is -1.
For example below are few features and target formed for few tweets.
The normalization of features is performed by dividing feature value/total number of tweets. Another way of normalization is feature value/(total number of tweet – number of neutral tweets).
The number of positive tweets (at position 0) and the number of negative tweets (at position 1) give a better perspective of user emotional opinion. Therefore these two features are used for classification.
normlzdFeas=np.zeros((features.shape,4)) normlzdFeas[:,0]=features[:,0]/(1+features[:,3]) normlzdFeas[:,1]=features[:,1]/(1+features[:,3]) normlzdFeas[:,2]=features[:,0]/(1+features[:,3]-features[:,2]) normlzdFeas[:,3]=features[:,1]/(1+features[:,3]-features[:,2])
X the input feature vector is created using normalized features. y is the array of target classes (-1 or 1).
The SVM classifier is constructed using training set and 5 fold cross validation is performed using cross_validate().
X = np.array(normlzdFeas) y = np.array(target) X,y=shuffle(X,y) clf=svm.SVC(C=0.1, tol=0.001, max_iter=100, random_state=1, verbose=1) scoring = ['accuracy','precision_macro', 'recall_macro'] scores = cross_validate(clf, X, y, scoring=scoring,cv=5, return_train_score=False) print(scores.keys()) print(scores['test_accuracy']) fileFeas.close()