Project

Twitter as a Predictor of S&P 500 Volatility

My Capstone project is “Twitter as a predictor of Stock Market Volatility”   looking at how Twitter can be used as a predictor of S&P 500 Volatility. . Over the course of the last term, I have investigated the fields of natural language processing (NPL), machine learning, parallel computing, speech recognition and processing, sentiment analysis, big data analytics, and financial analysis to determine if there is a correlation between tweets and market volatility. The premise of the project was that social media provides us with ample amounts of data that has a numerical value which is yet to be assigned. Social media data is composed of unstructured data, which is usually in the form of text. This data, in its raw form, cannot be processed by computers to predict changes in structured data sets which is market data.

So, to overcome this obstacle, I utilized NPL and sentiment analysis to parse through data from the social media site Twitter, and newswire services which were Yahoo news and Bloomberg, Reuters, FactSet, Eikon, and CNBC just to name a few. By utilizing custom Pandas Data Frames, the data was able to be live streamed and stored. By then employing machine learning and NPL in combination of sentiment analysis through the use of a custom lexicon, I was able to obtain a numerical value for the weight that the words carried in the social media posts and sort the post that had relevance to current news events by sorting through the news headlines for the top used words and seeing if the tweets contained those words . These changes in score were then correlated with changes in stock index pricing, volume and volatility.

The motivation for this project comes from my passion for the stock market and wanting to be able to predict events that are going to happen before they happen which in today’s market is harder than it seems. It used to be that market predictions were based on fundamental analysis of the stocks financials but now we live in a digital age where information is available to everyone instantaneously and people can share their opinions with the world. What this means for the market is that there is now external factors that influence the market and key players who are in positions of power both in political power with the ability to change economic policy, and social power someone who is an influencer who has many followers can cause huge swings in the market with a single post that is shared and seen by billions of people. These swings would never have been predicted by fundamental analysis alone and given the direction that the market and the world has been moving with social media platforms as a major source of where people get their news from and highly volatile political climate, we need another tool to help predict market changes.

The goal of this project is to develop a machine learning network that is cost effective meaning that it is both not processor intense and it doesn’t require special software or hardware to run along with creating an algorithm that not only is accurate but is constantly changing and learning which will use the social media data to predict market volatility.