Testing

The process for testing the system has so far been a very rigorous process which as the term has gone by has gotten more automated but initially a lot of hand checking, and data scraping was required to collect the information that was needed. To start the project, I first collected six months of historical market data and the corresponding tweets from President Donald J Trump for that time frame. The six months of historical tweet data and market data were then loaded into the Data Frame. I then also pulled the news headlines for the last six months and stored those into the Data Frame. At this point I needed to test and see if the hypothesis about tweet volume and sentiment correlating to the S&P 500 price, volume and volatility held any validity before moving forward.

By using NLTK’s Vader plugin which is a package that can be added on to python and stands for the Valence Aware Dictionary and Sentiment Reasoner. This tool is specifically for use with sentiment in social media post. The way that Vader works is that it uses a dictionary which maps features of the word to intensities scores. These emotion intensity scores are on a scale from -4 to +4 with -4 being the most negative and +4 being the most positive and 0 being neutral. These scores were assigned by people using Amazon Mechanical Turk and end up forming the scoring lexicon.

Since Vader takes into account objects such as punctuation, hashtags, mentions, emojis, and URLs the tweets did not have to be cleaned from their input form. A custom lexicon was also loaded into Vader for use for analysis and this lexicon is one I created over three years of research and is financial based terms and their positive and negative sentiment values. Using the two lexicons the tweets were scored and then sorted into data frames depending if they contained words that were found in the news articles.

The data was then exported along with the Stock Data and was loaded into Tableau for manipulation and graphing and for validating the hypothesis. By looking at days that had varying tweet volume a few hand pick days were chosen, and the data was then closely investigated. The tweets sentiment score was checked to see if the score made sense with the context of the tweet and the list of filtered tweets that were filtered based of the list of the top words found in news headlines was compared to all the tweets to made sure relevant tweets were pulled and that there weren’t any tweets that didn’t belong after that the data was graphed so that we could draw conclusions.

The next steps for testing now that the hypothesis has be tested and verified are a little different but still flow along in the same mentality. Six months of historical market data and the corresponding tweets from President Donald J Trump for that time frame and news headlines for the last six months will be stored into the Data Frame. The tweets will be sorted based on the news headlines and sentiment will be determined for the tweets using both the Custom Vader lexicon and NLTKs built-in Vader Lexicon. The data will then be hand tagged for instances for when we determined that there was a correlation between the tweet volume and sentiment and the volume and price changes of the index and the calculated implied volatility. The data will be merged into a data frame and exported as a csv to go to the predictive engine. This data set will make up the training data set for the machine learning algorithm

Six additional months of historical market data and the corresponding tweets from President Donald J Trump for that time frame and news headlines for the last six additional months will be stored into the Data Frame. The tweets will be sorted based on the news headlines and sentiment will be determined for the tweets using both the Custom Vader lexicon and NLTKs built-in Vader Lexicon. The data will be merged into a data frame and exported as a csv to go to the predictive engine. This data set will make up the testing data set for the machine learning algorithm after the system has been trained.

The predictive engine system with the use of machine learning will determine when there is a correlation between the tweet volume and sentiment and the volume and price changes of the index and the calculated implied volatility. This will happen after the machine learning model has been verified and trained and tested with the two data sets mentioned above.

Using the six months of historical data that was hand sorted and pre tagged for correlation between the tweet volume and sentiment and the volume and price changes of the index and the calculated implied volatility, the machine learning system will be trained. The machine learning algorithm will also predict the Implied Volatility for the 6 months’ worth of data. This data will then be compared to see if the training of the system was successful. If it is the next step is to load in the six additional months of historical data into the machine learning system.

Using the testing data set we will have the machine learning system predict the implied volatility for each time period and compare that to the calculated historical volatility for that time period to see if the model was trained correctly. Once those test are completed the training data will be compared to the testing data. If there is a significant correlation between the testing data and training data, the machine learning algorithm will then start predicting volatility in real time otherwise the system will be continued to be trained until the results are satisfactory.

Once we move into real time prediction tweets, news articles, and stock quotes will be pulled into the system in real time and feed into the predictive engine with the results being graphed in Tableau in real time. Since the Tweets are constantly being curated and the list of current events is constantly being updated the model will always be up to date with current events which will ensure only tweets with any significance are included. The machine learning systems will then be swapped out and the test run again for different machine learning systems with the best system being chosen in the end for the final product.