Projects

Sentiment Analysis of Tweets with Application to Stock markets

 

The social media act as one of the key platform to share information and express opinions. Social media has rich information about companies, their products and various services offered by them. Sentiment analysis of tweeters is carried out to understand the market perspective of selected companies. N-gram feature vectors are prepared on Tweets.  Sentiment classification of Tweets is performed using Naïve Bayes Classifier and Support Vector Machine. Pattern association between sentiments of tweets and stock values is established.

      

3D Visualization of Sentiment Measures and Prediction of Amazon Customer Reviews

The review comments provided by consumers has rich information such as usages of the products and their opinions towards products. Customer reviews are collected from Amazon.com. VADER is a sentiment analyzing tool, which used to compute sentiments of customer reviews. Three-dimensional visualizations are constructed using Matplotlib of Python for further analysis of sentiments. A combined classifier is developed using Logistic Regression, Decision Tree and Support Vector Machine to perform sentiment classification of customer reviews.

  

Text Summary Generation using Keyword Frequency

Text summarization has found to be useful for quick searching, automatic sorting, abstract generation etc. The text summarization method is implemented to generate an abstract of a given document. Methods are developed to identify and compute frequency of keywords in the text. The weightage of words and lines are computed based on frequency. Important lines are extracted from original text using weightage of words. Techniques are implement to replace pronoun with its noun to achieve better summary.

 

Sentiment Analysis Visualization and Classification of Summarized News Articles

A huge amount of data being generated every day in various forums such as social media (Twitter, Facebook), websites, blogs from experts, youtube videos, Wikipedia, online news articles etc. Users are nowadays bombarded with the huge volume of data. To comprehend the large volume of data, an effective method is developed in this project. Data samples are considered as news articles from BBC. Initially, summarization of news articles is carried out using the pronoun replacement method. The summarization method produces a concise representation of the news articles. The sentiment analysis and visualization are applied to the summarized BBC articles. This representation can make the reader to quickly comprehend the news articles.

  

Measuring and Prediction of Customer Loyalty using Tweeter Data

The data from social media can be used to understand consumer satisfaction related products. The tweets related to several Airlines are gathered and sentiment analysis of tweets carried out using TextBlob. A method is implemented to measure consumer loyalty exclusively based on Twitter data. Also, three classifiers such as Random Forest, Decision Tree and Logistic Regression are implement to predict consumer loyalty to Airlines.

  • Packages Used: Tweepy, TextBlob, Matplotlib, Scikit-Learn.
  • Method Implemented: Random Forest, Decision Tree and Logistic Regression.
  • Research Output:

http://thesai.org/Publications/ViewPaper?Volume=9&Issue=6&Code=IJACSA&SerialNo=52

Prediction Method for Fast and Reliable Identification of Harmful Variants

Genome project and sequencing has generated huge amount of data. A new computational tool is developed PON-P2 to carry out prediction of amino acid substitution in human protein. Various features are collected such as evolutionary conservation sequence, properties of amino acids, functional annotation etc. Feature selection is implemented to identify most informative and useful features for classification. The Random Forest classifier is developed to classify variant into pathogenic, neutral or unknown.

Protein Disorder Prediction on Amino Acid Substitutions

Many proteins contain intrinsically disordered regions. Many of these regions are important for protein functionality but also caused due to variants. Prediction programs have been developed to detect disordered regions and their performance is evaluated. PON-Diso a tool is developed to predict the effects of amino acid substitutions on protein disorder.  Evolutionary conservation sequence features and amino acid index values used as features for Random Forest classifiers.

Protein Variant Stability Predictor

Protein stability has been affected by amino acid substitution. A Random Forest classifier is developed to predict the effects of amino acid substitution on protein stability. The training of classifier is carried out on ProTherm data set. The classifier predicts three possible classes such as stability increasing, decreasing or no affect due to substitutions.

Robust Object Recognition Using Binarized Gabor Features

An object recognition system is developed for numerals. The images of are subjected to significant noise and illumination variation. Gabor filters are used for feature extraction. The grid system is implemented to collect Gabor features. The noise and illumination invariant features is achieved by implementing binarized Gabor feature extraction. An adaptive mechanism is developed to carry out the binarization. Object recognition is performed using K-Nearest Neighbor classifier.

Design of a Decision Tree Classify Similar Looking Characters

Indian language such as Kannada has large collection characters. Kannada script has around 19090 characters and many character have similarity shapes. A subimage extracting technique is implemented to locate and extract part of the character useful to make distinction. The decision tree developed to carry out recognition using unique subimages. Decision tree incorporates three modular classifiers.

OCR for Kannada with Application to Braille Translation

Optical Character Recognition system is developed for a south Indian language, Kannada. Further Braille Translation is implemented using Kannada OCR system. Document image analysis is developed on Kannada document with project profile based segmentation, word segmentation and recognition of characters. Braille translation is implementation using translation rules.

Multi-Layer Perceptron Neural Network with Adapting Structure

A Multi-Layer Perceptron Neural Network is implemented with application to number recognition. A method is developed to measure learning process of a parameter in the multilayer perceptron. With the measurement of learning process of parameters, the neural network structure can be adopted during the training phase. The network freezing and pruning are developed based on measurement.

Rotation Invariant Electronic Component Recognition

Gabor filters based feature extraction technique has effectively implemented for object recognition. Images of electronic component objects are considered which rotated abruptly. A bank of Gabor filters is constructed and spatially localized feature are extracted. A combined classifier using K-Nearest Neighbor classifier along with Minimum distance classifier is developed to carry out recognition.

  • Packages Used: MAT-Lab Gabor Filtering, Spatially localized Feature Extraction
  • Method Implemented: Combined classifier using K-Nearest Neighbor and Minimum distance classifier.
  • Research Output: https://ieeexplore.ieee.org/document/5578669/