Design Alternatives | Lisa Hladik: Senior Capstone Project

SELECTING A PROGRAMMING LANGUAGE

In selecting a programming language, there were several factors that I took into account, including simplicity as well as the availability and quality of machine learning packages. MATLAB, R, Python, and Java are amongst the most popular languages. For the sake of comparison, I have broken down each language’s attributes into Table 1. below [14].

Ultimately, I selected Python. Python provides both an easier and faster way to build highly performing algorithms, especially since it provides a huge collection of specialized libraries. Companies likes Google have even developed open-source machine learning libraries for Python, which gives credibility to the strength and robustness of these libraries. Indeed, Python in general is the most popular coding language amongst data scientists—another indicator of the language’s measured success [14].

After selecting Python as my coding language, I had to choose amongst its various libraries [15],[16],[17],[18], [19]. Again, I broke down my option into a table, which is shown in Table 2 below.

The decision for what Python libraries I would use was impacted by not only these factors, but also my decision as to what machine learning model I wanted to use (which I will discuss in the next section). I chose to use Keras with Tensorflow as a backend. Ultimately, Keras supports the type of neural network I wanted to implement. The use of tensors for computation was ideal for the size and type of datasets I wanted to utilize. Furthermore, Keras’s provision for fast and easy prototyping was ideal for working within the time constraint of this project, where fast experimentation proved to be necessary.

SELECTING A MACHINE LEARNING MODEL

There are numerous machine models that exist. Broadly, these models can be broken down into three different types of categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning consists of creating a function that maps input independent variables to an output target variable. Unsupervised learning is implemented when no target variable exists. Reinforcement learning trains itself continually using trial and error. In the context of this problem, my data is historical data, and the target variable is future stock price [19]. Thus, this project falls into the category supervised learning. A literature review revealed a wide variety of models that have been used in stock price prediction. I decided to start by exploring the simplest model, linear regression, and progress to researching more complex neural networks.

1. Linear Regression

Linear regression is a linear model that assumes a linear relationship between our input variable (x) and output variable (y). Our linear equation (y = mx +b) assigns a weighting/scaling value (m) to our input. There is an additional coefficient, b, that acts as a bias term, which gives the line an additional degree of freedom. As the linear regression algorithm trains on our training data, the bias and weighting terms are adjusted. The goal of the algorithm is to get a line of best fit for the data, where the best fin tine has a total prediction error that is as small as possible. The error is calculated by finding the distance between the point and the regression line. A cost function helps to figure out the best possible values for the weight coefficient and bias coefficient. This is a minimization problem, where we want to minimize the error between the predicted value and the actual value [20]. A literature review revealed that linear regression is a somewhat common form of stock market prediction. Its main purpose seems to be as a basic baseline model. Typically, its performance is worse (providing a large amount of predictive error) than those of more complicated models [21], [22], [23].

2. Feed-forward Artificial Neural Network

The next machine learning method I explored was the Artificial Neural Network. This type of network has units that mimic an artificial neuron, which is called a perceptron. An example perceptron is shown in Fig. 3 below:

Fig. 3: Example Perceptron.

The perceptron consists of multiple parts: input layer (whose values come from our input features), weights and bias, the net sum, and an activation function. The perceptron inputs (x_j) are multiplied by their respective weights (w_j), which are real numbers that express the importance of each input. The result of this multiplication is, then, added together in the weighted sum . These results are then passed to an activation function, and the output is determined by whether the weighted summation is less than or greater than some threshold value determined by the activation function. The purpose of the weights in our perceptron is to show the strength of a particular node, while the bias values allows the activation function curve to be shifted up or down. A neural network’s activation function is to map the output to a value within a certain range specified by the function [24].

A literature review revealed that the performance of feed forward neural networks significantly exceeded that of linear regression. While the linear regression models typically showed prediction root-mean-squared errors (RMSE) in varied ranges much greater than 1, typical feed forward networks in my research achieved RMSEs typically less than one. For instance, the results found by Weng showed consistent RMSEs less than 1 for predictions using feed forward neural networks [21],[22], [25]. Furthermore, Weng’s overall system achieved an average of 60% accuracy of stock price prediction.

3. Recurrent Neural Network

The fee forward neural networks discussed in the previous section merely feed input values forward through the network; there are no feedback connections. The issue with a mere feed-forward type of network is that information can get lost over time, meaning that greater emphasis is placed on more recent data. For stock price prediction, this can be an issue. The most recent knowledge is not necessarily the most pertinent; sometimes past information can be more pertinent to current market patterns. As such, I expanded my research into networks with feedback capabilities—the recurrent neural network.

. Unlike other ANNs, RNNs have loops in them to allow past information to persist.

Fig. 4: Simple model of Recurrent Neural Network (RNN).

Fig. 4 above shows a simple model of a recurrent neural network, A, with input x_tand output h_t. Note that it contains a loop, enabling information to be passed between different steps of the network. An RNN is basically a chain of multiple copies of the unit shown above, so “unrolling” this unit yields the following architecture:

Fig. 5: “Unrolled” recurrent neural network.

Because of the chain-like architecture of the RNN, it is useful for analyzing data with a similar structure, such as time-series data like stock historical stock prices. While LSTMs are a popular choice for their predictive accuracy, the network shown in Fig. 8 above still does not totally avoid the issue of long-term dependencies that we have been concerned with thus far. As the gap between the units of our RNN grows, it starts to “forget” past information, so the RNN still fails at learning long-term dependencies. The simple architecture (exemplified by our single tanh layer) of the LSTM is shown below:

Fig. 6: “Unrolled” LSTM showing simple internal architecture.

The solution is a variation of the RNN: the Long Short Term Memory networks (LSTMs). The LSTM still has a chain-like architecture, however, it has added complexity. Instead of one neural network layer, it has four, as shown below in Fig. 7:

Fig. 7: Internal architecture of Long-Short-Term Memory Network.

LSTM Steps:

Step One: “Forget Gate”

The first step in our process is shown in Fig. 8 below and is represented by the corresponding equation:

Fig. 8: “Forget Gate” of RNN-LSTM.

The first step is known as the “forget gate” layer. The inputs to the gate are the output from the previous cell (h_t-1) and the current input (x_t). The output is f(t), whose value (on a scale between 0 and 1) determines how much of the previous information will be “remembered.” A zero means that none of the information is kept, while a one means all of the information is kept.

Step Two: Cell State

The second step in our process is shown in Fig.9 below and is represented by the corresponding equations:

Fig. 9: Updating the Cell State of RNN LSTM.

This step contains two layers. The first layer is the “input gate,” which determines which values will be updated. The tanh layer determines creates a vector of values denoted, C_t, that are possibilities for being added to the current cell state.

Step Three: Updating the Cell State

The second step in our process is shown in Fig. 13 below and is represented by the corresponding equations:

Fig. 13: Calculating New Cell State for RNN LSTM.

In this step, we want to take the outputs from step 2 and perform the operation pictured to update the old cell state, C_t-1,to the new cell state, C_t. First, we take the output of the “forget gate,” f(t), and multiple it by C_t-1to “forget” the desired previous information. Then, this output is added to i(t)*C_t (which represents the candidate values multiplied by how much the network decided to update the state values). Our output for this is the new cell state, C_t.

Step Four: Calculating the Output

The second step in our process is shown in Fig. 11 below and is represented by the corresponding equations:

Fig. 11: Calculating Output RNN LSTM.

The output is based on some additional calculations on the new cell state. First, cell state included in the calculation is determined by passing through the sigmoid function pictured. Next, this cell state is multiplied by the tanh function before, finally, being multiplied with the output of the sigmoid gate, yielding our output, h_t.[9].In some cases, the literature review showed that an RNN and its “selective memory” capabilities provided some greater predictive accuracy [26], [27], [28]. For instance, in his paper, Aamodt utilized an LSTM hybrid. This hybrid was able to achieve relatively low error with directional accuracies greater consistently above 60% [27]. Additionally, work by Selvin et al shows that an LSTM showed the least amount of prediction error when compared with other neural network architectures like a regular RNN and Convolutional Neural Network (CNN) [28]. Thus, the LSTM seems to have a measured degree of success in similar stock price applications as compared to other networks. Additionally, its “memory” is a built-in method of likely mitigating concept drift. Taking into consideration all of these factors, I chose the LSTM as my model framework.

SELECTING A MECHANISM TO COMBAT CONCEPT DRIFT

Although the LSTM does have a “memory,” I wanted to explore other alternatives to mitigating concept drift. The markets are not stable, meaning indicators that might have been predictive at one moment will disappear as other investors spot the pattern and implement it in their trading strategies. As the underlying concept in data changes, the model’s performance may decline over time. There are several methods that can combat that I will discuss in this section.

The first alternative is retraining the model to new data. For stock data, this is not the best option. Depending on the amount of input data, the training time could be time-intensive, which is not ideal for applications like stock trading that require speed to make faster decisions in order to find “hot spots” of predictability[4]. An alternative that is similar to retraining is updating the model with recent data at checkpoints. Since end-of-day stock prices are released on a daily basis, the model can be updated on a daily basis. By maintaining a checkpoint in the model, the checkpoint can be used as a point to add in new training data without re-training the whole dataset, i.e., the algorithm will pick up where it left off at the checkpoint[29].

The second alternative is to use a sliding window, which is represented in Fig. 12 below. The sliding window uses a specified window size w from times t-w+1 to t (where t represents the “current time”).

Fig. 12: Sliding Window

The model will train at each window with the specified performance metrics before sliding to the next set of data that can fit in the window. Using a small sliding window is, theoretically, supposed to reflect the current distribution of data and prevent outdated information from affecting the model. Conversely, a larger window will contain more training instances and be able to perform better during moments of stability [30]. Thus, there is a balance in determining the fixed window size. If the window is too small, the classifier will not contain a large enough number of instances and will overfit. If the window is too large, the classifier may be built using data that contains too many different concepts. There will be further discussion about the window size as it relates to testing of the actual model.

SELECTING INPUT FEATURES

Feature engineering is key to machine learning. By definition, feature engineering is the process of transforming raw data and features that better represent the underlying problem to predictive models with the goal of improving model accuracy [32]. Basically, this means that we must properly select our input values/features. Because a machine learning framework is learning a solution to a problem based on sample data, the goal is to input the best representation of the data that will learn the solution to the problem. With stock data this is both pivotal and challenging because stocks are predicted by so many varying factors. In order to determine the features to be input into the algorithm, a large amount of testing was required. Thus, in this section, I will briefly discuss the potential features and combinations of features that I decided to utilize as inputs.

The target variable is the daily closing stock price. Thus, historical closing stock prices are a necessary input. Thus, one necessary combination of inputs are the historical daily close, adjusted close, opening, high, and low prices as well as volume. The open and close prices represent the end-of-day and start-of-day stock prices, and the low and high prices represent the stock’s minimum and peak prices within the span of a day respectively. The volume indicates the number of stocks that were bought and sold on a given day.

Again, because stock prices are highly influenced by a variety of market forces, choosing disparate data sources could be key to improving the accuracy of the model. Significant stock market movements are often driven by investor perceptions of stock based on information collected from various data sources. Because stock price is based on investor sentiment, I sought out data sources that are freely accessible to the public. Thus, my chosen source for data was Yahoo Finance, where historical stock price data can be accessed.

The next type of data that was experimented on was technical indicators, which are used widely by quantitative traders and could give insight into their trading strategies as they effect market movements. Technical indicators are mathematical calculations that can be applied to a stock’s past patterns (such as price and volume). The two general categories of technical indicators are leading and lagging. Leading indicators give trade signals where a trend is supposed to start, while lagging indicators are those that follow price action (meaning they give a signal after a trend or reversal has started). Under these two major categories, there are four other types of technical indicators: trend, momentum, volatility, and volume. Trend is used to measure the direction and strength of a trend using some form of price averaging to establish a baseline. As price moves above the average, this indicates a bullish (upward) trend; as price moves below, this indicates a bearish (downward) trend. Volatility indicators measure the rate or price movement, regardless of the direction. This is generally based on a change in highest and lowest historical prices. They provide useful information about the range of buying and selling that take place in a given market and help traders determine points where the market may change direction. Volume indicators measure the strength of a trend or confirm a trading direction based on some form of averaging or smoothing of row volume. The strongest trends often occur while volume increases; in fact, it is the increase in trading volume that can lead to large price movements[33]. The key to utilizing technical indicators is by choosing indicators that complement each other and are not redundant. The technical indicators used in the feature testing are listed in Table 3. below.

Table 3: Types of Technical Indicators

The next set of data that I utilized relates to market fundamentals. This type of data goes beyond trading patterns of the stock itself, but these fundamentals still usually are expected to have some impact on the value of the stock itself by influencing investors. Fundamental data is generally released on a quarterly basis with company balance sheets. These balance sheets reports a company’s assets, liabilities and shareholders’ equity. Such financial statements show how a company performed in a given quarter, meaning they could impact the stock’s price in the next quarter. The fundamental factor I chose to incorporate is the price-to-earning (P/E) ratio. The P/E ratio is equal to the market value per share divided by the earning per share. The P/E is expressed as a multiple of its earnings, so it can give a look at the overall health of the company [36]. This fundamental was used in testing in combination with the technical indicators and price data.

The final data that was used to predict stock prices included data that is reflective of the economy’s overall economic health. The first is the ISM manufacturing index. Based on over 300 manufacturing firms, this index monitors the employment, production, inventories, new order and supplier deliveries. This index is released at the start of each month, impacting the confidence of businesses and investors [34]. The second macroeconomic factor is the University of Michigan Survey of Consumers, a highly regarded and trusted consumer sentiment index. It reflects the strength of consumer confidence and spending, providing another indication of overall economic health [35]. The third macroeconomic factor is the number of released housing permits. This shows the strength of the U.S. household and, thus, consumer. The number of permits issued is also reflective of the how much banks are able to lend, which is reflective the Gross Domestic Product, or GDP [36]. The GDP is the primary indicator of a nation’s economic health. It represents the total value of all the goods and services provided by a country over a given time frame. More succinctly, it is referred to as the total size of the economy. [37] Thus, the fourth macroeconomic factor is the ISM Non-Manufacturing Index, which measure employment trends, prices and new order in non-manufacturing industries.. These factors were used in combination along with the fundamental data to see if indications of overall economic health with historical prices would help the training algorithm gain further insight and more accurate stock price predictions.