View on GitHub

Major-Project-Final

Comparing Machine Learning algorithms for stock price prediction and stock index movement using trend deterministic data preparation.

Comparing Machine Learning Algorithms for stock prediction and stock index movement using trend deterministic data preparation techniques.

If youโ€™re into presentations rather than readme files, weโ€™d suggest you to check out the project presentation

๐Ÿšฉ Table of Contents

๐Ÿ’ก Introduction

In 2014, Xinjie Di, an SCPD student from Apple Inc. submitted a paper which focused on predicting stock price trend for a company in the near future. The feature space was derived from the time series of the stock itself and was concerned with potential movement of past price. Tree algorithm was applied to feature selection and it suggests a subset of stock technical indicators are critical for predicting the stock trend.

Experiment results suggested an accuracy of more than 70% on predicting 3-10 day average price trend with SVM algorithm.


Another paper presented in Expert Systems with Applications journal under Elsevier publishing company by Jigar Patel, Sahil Shah, Priyank Thakkar, K. Kotecha quoted as Patel , J., et al. Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques. Expert Systems with Applications (2014) addressed the problem of predicting direction of movement of stock and stock price index for Indian Stock Markets.

The paper compares four prediction models, Aritificial Neural Network(ANN), Support Vector Machine(SVM), Random Forest and Naive Bayes with two approaches for input to these models.

The first approach for input data involves computation of ten technical parameters using stock trading data(open, high, low & close prices) while the second approach focuses on representing these technical parameters as trend deterministic data. Accuracy of each of the prediction models for each of the two input approaches was evaluated. Evaluation was carried out on 10 years of historical data from 2003 to 2012 of two stocks namely Reliance Industries and Infosys Ltd.

Experiment results suggested that for the first approach of input data Random Forest outperforms other three prediction models on overall performance. Experimental results also show that the performance of all the prediction models improve when these technical parameters are represented as trend deterministic data.


๐ŸŽ“ What are we doing

In our initial research we found out that efficiency in predicting stock price movements and their trends maybe improved if the prediction model could โ€œrememberโ€ the historical movement of the stocks.

General prediction models that are used in the previously described research papers like Naive Bayes, Support Vector Machines, Random Forest, etc., can only take in consideration the most recent previous input received to the model.

Our motive is to implement the Long Short Term Memory networks in the prediction on stock price movements and their trends, as LSTM networks have internal contextual state cells that act as long-term or short-term memory cells.

The output of the LSTM network is modulated by the state of these cells. This is a very important property when we need the prediction of the neural network to depend on the historical context of inputs, rather than only on the very last input.


๐Ÿ“™ Base Research Papers

  1. Xinjie Di, Stock Trend Prediction with Technical Indicators using SVM

  2. Jigar Patel, Sahil Shah, Priyank Thakkar, K. Kotecha, Predicting stock and stock price index movement using Trend Deterministic Data Preparation and machine learning techniques


๐Ÿ’ป Technology Stack

Python3.6 is the programming language used in the experiment.

For code editing creating files we are using the following editors:

For development, training, deployment of the models, we are using Jupyter Notebook along with Anaconda Integrated Development Environment.

All the dataset will be used from quandl.com. Quandl is a platform for financial, economic, and alternative data that serves investment professionals. Quandl sources data from over 500 publishers. All Quandlโ€™s data are accessible via an API. API access is possible through packages for multiple programming languages including R, Python, Matlab.

We will be using stock price dataset of OHLC format of the following companies to train and test our prediction models:

  1. Google(GOOGL)
  2. Apple (AAPL)
  3. Amazon (AMZN)

๐Ÿ“š Further Reading


๐Ÿพ Screenshots


๐Ÿš€ Results


๐Ÿ‘ซ Team

  1. Samar Srivastava
  2. Mohak Kulshrestha
  3. Saurabh Arya
  4. Kashvi Agarwal

๐Ÿคฏ Mentor

Mr. Munish Khanna, Head Of Department, CSE, HCST, Farah


Guidelines

Please follow the following guidelines in order to commit to the repository.


๐Ÿ“œ License

This software is licensed under the GNU General Public License v3.0