A hybrid forecasting approach using ARIMA models and self-organising fuzzy neural networks for capital markets

Linear time series models, such as the autoregressive integrated moving average (ARIMA) model, are among the most popular statistical models used to forecast time series. In recent years non-linear computational models, such as artificial neural networks (ANN), have been shown to outperform traditional linear models when dealing with complex data, like financial time series. This paper proposes a novel hybrid forecasting model which exploits the linear modelling strengths of the ARIMA model, and the flexibility of a self-organising fuzzy neural network (SOFNN). The system's performance is evaluated using several datasets, and our results indicate that a hybrid system is an effective tool for time series forecasting.


I. INTRODUCTION
T IME series forecasting is a central area of statisti- cal research, both in terms of theory and application.Various disciplines rely heavily on the ability to forecast one or more variables.For example, time series analysis is used extensively in economics and finance to predict inflation levels, currency exchange rates and equity prices.Traditional time series approaches tend to focus on methods first proposed by Box and Jenkins [1].Here the emphasis is on linear models that can be identified, fitted and evaluated using a simple framework.Among the most popular of these linear models are the auto-regressive (AR) model, the moving average (MA) model, and the auto-regressive integrated moving average model (ARIMA).However, these models have proven to be inadequate in dealing with complex data sets, where non-linear relationships may be present [2].They also rely heavily on the assumption of stationarity [1].A stationary process is one whose statistical properties are time invariant; this condition can be difficult to test for and verify in practical applications.
To overcome the problems encountered when using linear models, artificial neural networks (ANNs) that were initially developed to try and replicate the learning behaviour of the human brain have been used for forecasting.A survey of their use in finance is presented in [3].ANNs can be thought of as parameter-free, non-linear regression models, made up of layers of interconnected neurons.With enough hidden Manuscript received March 1, 2013.This work was supported by the Northern Ireland Capital Markets Engineering Initiative.S. McDonald is with the Intelligent Systems Research Centre, University of Ulster, Magee, Northern Ireland, BT48 7JL, U.K. (phone: +44-28-71675122; e-mail: mcdonald-s9@email.ulster.ac.uk) S. Coleman is with the Intelligent Systems Research Centre, University of Ulster, Magee, Northern Ireland, BT48 7JL, U.K. (e-mail: sa.coleman@ulster.ac.uk)T.M. McGinnity is with the Intelligent Systems Research Centre, University of Ulster, Magee, Northern Ireland, BT48 7JL, U.K. (e-mail: tm.mcginnity@ulster.ac.uk)Y. Li is with the Intelligent Systems Research Centre, University of Ulster, Magee, Northern Ireland, BT48 7JL, U.K. (e-mail: y.li@ulster.ac.uk) neurons ANNs are universal approximators, meaning they can be fitted to any function, to within a set level of tolerance.This ability to model highly non-linear data has allowed them to emerge as a widely used forecasting tool.However, the flexibility of ANNs is also their downfall, as they are essentially a black-box system.The relationships and weightings found by the network often have very little meaning to the observer [4].To make ANNs more transparent, fuzzy neural networks were developed, which fuse the learning power of ANNs with the readability of fuzzy logic [5].The fuzzy neural network (FNN) constructs a collection of "if-then" fuzzy rules, whose meaning can be more easily interpreted by the user.FNNs have been widely used in financial forecasting applications [6]- [9].

II. RELATED WORK
By combining the linear modelling of the ARIMA class of models with the non-linear modelling capabilities of artificial neural networks, several authors have shown improvements in forecast accuracy using a number of different data sets.The use of other non-linear modelling tools has also been investigated, for example a support vector machine (SVM) is combined with an ARIMA model in [10] to model the daily closing prices of ten stocks traded on the US equity market.These hybrid models assume that the time series data comprise of both a linear and a non-linear component.An ARIMA model is used to handle the linear portion of the data.The residuals of the model will contain the non-linear component of the data that the ARIMA model could not capture.Even if the selected ARIMA model had passed all the required model validation procedures, non-linear patterns may still exist in the residuals, as there are no diagnostic checks for non-linear auto-correlation structures [2].
A number of authors have proposed hybrid forecasting systems where non-linear computational models supplement traditional time series models.Zhang [2] proposed one of the earliest models.The ARIMA model is combined with an ANN trained using the GRG-2 non-linear optimizer.Improved forecast results are presented for three data sets: the Canadian lynx population; Wolf's sunspot data; and the GBP/USD exchange rate.In [11] an evolutionary fuzzy neural network is developed to forecast future financial values using bank prime loan rates, the federal fund rate and discount rates as inputs.The network is trained using a genetic algorithm, and tuned further using gradient descent.Their results show an improvement over a previous iteration of their algorithm.A SVM is used in [10] to forecast foreign exchange rates.Its parameters are tuned using a genetic algorithm, and the system is tested on the monthly exchange rates of four currencies.Their results are compared against three benchmark models, including the random walk, with their hybrid model outperforming all of these.A hybrid modelling system is used in [12] to develop foreign exchange trading strategies.Fuzzy reasoning is combined with one-step ahead forecasts from an ANN.Their neuro-fuzzy approach outperforms a buy-and-hold benchmark portfolio, after transaction costs are taken into account, using data from Canadian banks.Hybrid neural networks have also been used in other disciplines.In [13] a hybrid system is used to forecast stream flow levels of the Colorado River, while in [14], water quality levels in Turkey are predicted using an ARIMA model with a multi-layer perceptron.
This paper presents a hybrid forecasting approach using an ARIMA model and a self-organising fuzzy neural network.The self-organising ability of the FNN should combat the rule explosion often seen in other fuzzy inference systems [15].Here the emphasis is on financial time series forecasting, though some results on widely used benchmark data sets will also be presented.Section III will outline some of the theory behind ARIMA models and the self-organising fuzzy neural network used.Section IV outlines our hybrid approach.Section V describes the datasets used, while section VI presents some experimental results.Section VIII concludes this paper.

III. PRELIMINARIES
We present some of the key theoretical features of the ARIMA model, and outline the architecture of the selforganising fuzzy neural network.

A. ARIMA Modelling
Autoregressive integrated moving average (ARIMA) models are amongst the most used time-series models in existence.The ARIMA model is a generalisation of the autoregressive moving average (ARMA) model, which is used to describe stationary time series.The ARIMA model introduces a differencing step, which can be applied to remove any non-stationarity present in the data.Both the ARIMA and ARMA models belong to a class of models known as linear models, as future values are assumed to be linear combinations of historical observations.In an ARIMA model, the future value of the random process is a linear function of past observations both of the variable being studied and random shocks to the system.The general form of an ARIMA model for a response series {y i } is: where By t = y t−1 is the backshift operator, w t = (1−B) d y t is the response series after differencing and µ is the mean of the series.The sequence of random shocks {a t } is assumed to be independently and identically distributed with mean zero and constant variance.The auto-regressive and moving average operators are given by and respectively.The integers p, d and q are referred to as the order of the model.The central task when fitting an ARIMA model is determining the correct model order.In [1], a framework is presented for identifying ARIMA models.The basic approach follows three steps, which focus on theoretical features of the autocorrelation of the process.
Firstly, a suitable model must be selected.The data must be made stationary, and any periodicity must be taken into account.Stationarity means that the statistical properties of the model, like its mean and auto-covariance, do not change with time, and is important for forecasting.Box and Jenkins [1] suggest using differencing to achieve stationarity.Other transformations can also be used.For example, in financial applications it is common to examine the log-returns of the series, rather than the raw prices themselves [16].Seasonality does not need to be removed from the data, but the orders of the seasonal terms should be identified.
To identify the values of p and q, Brockwell and Davis [17] suggest using the Akaike information criterion with correction (AICc), while other authors use the plots of the auto-correlation and partial auto-correlation functions of the process.The second step is to fit the selected model to the data.This is done using maximum likelihood estimation or a similar optimisation routine.The third step is to validate the chosen model.If the model is well chosen, the residuals should be indistinguishable from white noise.Any autocorrelations in the residuals indicate an inadequate model, and the user should return to the first step.As noted previously, any non-linear dependencies in the residual series cannot be identified using current residual analysis techniques, but may be modelled by a non-linear forecasting tool, such as a neural network.

B. Self-Organising Fuzzy Neural Networks
The self-organising fuzzy neural network applied in the proposed hybrid model is based on [15], [18]- [20].It is based on a number of fuzzy neural networks designs, namely the adaptive-network based fuzzy inference system (ANFIS) [21], the dynamic fuzzy neural network (DFNN) [22] and the unsymmetrical Gaussian function network [23].The network contains a total of five layers (Figure 1), and can dynamically add and remove neurons as required.This adaptive behaviour is controlled by two error criteria.The network uses ellipsoidal basis functions (EBFs) as membership functions.The centres and widths of these membership functions vary from neuron to neuron during the training process.
The first layer of the network is the input layer.Here there is one neuron for each input to the network.The second layer consists of EBF nodes.The neurons in this layer can be thought of as the premise ("if") parts of the fuzzy rules.In layer three of the network, the firing strengths of the layer two neurons are normalized.The number of neurons in layers three and four is the same as in layer two.In layer four, the "then" parts of the fuzzy rules are constructed.In the model presented here, a Takegi-Sugeno (TS) fuzzy inference scheme is implemented, although a singleton fuzzy model can also be used by altering the bias values in this Here the signals from the fourth layer neurons are simply summed.Mathematically, the output of the network, given an r-dimensional input vector X is given by: where x i is the i th input variable and w 2j is the weight of each fourth layer neuron.The centres and widths of the i th EBF membership function in neuron j are given by c ij and σ ij respectively.The weights are the consequent (then) parts of the fuzzy rule.The Takegi-Sugeno inference scheme implies that the weights will have the form The output of the j th neuron in the third layer can be written as meaning we can rewrite the output of the system as where W 2 is the parameter matrix, and Ψ jt is the output of the j th neuron in the third (normalised) layer when the t th training pattern is presented to the network.This can be represented as a linear regression model, given by: where d(t) is the target of the SOFNN, p i (t) are regressors, θ i are the parameters to be estimated, and (t) is the error term.
The upper summation limit is calculated as M = u × (r + 1).
In matrix form, when t = n, where T and The network is trained using a recursive least squares (RLS) algorithm.The parameter update equations are given by: and where e(t) is the estimation error.The matrix Q(t) is the Hermitian matrix, defined as Q(t) = [P T (t)P (t)] −1 .The dynamic behaviour of the network is outlined in the following subsections.
1) Adding a neuron: The network is augmented during the training phase when it is not generalising well.The network can also be expanded when a new input is presented to the network that cannot be handled by a pre-existing EBF neuron.The growth of the network is controlled by two error criteria.The first is defined as where d t is the network output and y t is the network target, for the t th training example.The second error criterion is where is the output of the j th neuron in layer two.In [19] this is referred to as the "if-part" criterion.If the input data is assumed to be normally distributed, 95% of the data for each membership function will lie within two standard deviations of its centre.By ensuring that no input can have a fuzzy membership grade less than 0.1354, the -completeness of fuzzy rules is satisfied for = 0.1354.As we have two error criteria we have four distinct scenarios: should not be altered, only the parameters should be adjusted.The EBF functions can also be merged if neurons are found to have similar centre and width vectors.
2) Pruning the network: If a neuron is found to no longer be useful, it can be removed from the network.The Optimal Brain Surgeon (OBS) approach [24] is used to determine which neurons should be removed.The OBS approach uses second order conditions are used to determine the usefulness of the neuron.This approach seeks to minimize the mean squared error of the output, defined as We approximate the functional Taylor series with respect to the parameter matrix Θ by where We recall from Equation (12b) that the matrix Q(t) is generated by the RLS training algorithm.This means no extra computational effort is required for calculating this matrix.
The process for pruning a neuron, based on a predetermined error tolerance level, λ, is as follows: 1) Compute the root mean squared error (RMSE) of the training vector at time t.This is denoted by E RM SE .2) Define the error tolerance limit λE RM SE .3) Calculate ∆E, which is the change of the squared error for each neuron.Redundant neurons will have small ∆E values.4) Using a predefined limit k RM SE , calculate 5) Select and delete the least important neuron, and recalculate E RM SE .If E RM SE < E delete the neuron permanently.Then move onto the next least important neuron.If error condition is not met, stop the process and do not delete.

IV. HYBRID APPROACH
The hybrid methodology used in this work is based on the approach outlined in [2].It is widely accepted that no single model will be optimal in all situations, therefore it makes sense to use the strengths of each model.Combining a linear model with a non-linear model allows the user to capture more of the information available in the data.
We assume that the raw data comprises of a linear and a non-linear component: When we fit our ARIMA model, we are left with the residuals: We can use the residuals of the ARIMA model as inputs to the SOFNN, which is then used to forecast values of the residual series.This represents the non-linear component of the data.Denoting the forecast generated by the SOFNN as Nt , our hybrid forecast is The hybrid forecast is generated in two parts.The ARIMA model is first used to deal with the linear dependencies in the data.Whatever could not be fitted, the residual series, is sent to the SOFNN which generates forecasts of the error of the ARIMA model.The two forecasts are combined to generate a single value which contains more information than either of the two methods used in isolation.

V. EXPERIMENTAL SETUP
In our experiments, three datasets are examined.Two of these, the Canadian lynx data and the Wolf's sunspot data, are widely used as benchmark series.The third set of data comprises adjusted daily closing prices of some of the constituent companies of the Dow Jones Industrial Average.The three series have differing statistical properties, although all have been shown to exhibit non-linear behaviour to varying degrees.
Plotting the lynx series illustrates the periodic nature of the series.The period of the data is close to 10 years.For comparative purposes, we use the base-ten log of these data in our experiment.The sunspot data also exhibits periodic behaviour.In this case the period length is approximately 11 years.
The financial data are the most chaotic of the sample data sets.Financial data are notoriously difficult to forecast, as the volume of market participants and other impact factors are too numerous to model with any degree of accuracy.A popular theory in finance is that of the efficient market [25], where all market prices fully reflect the amount of information available to market participants.If the efficient market hypothesis (EMH) is true, prices follow a random walk, and the best estimate for tomorrow's (or indeed any future times) price is today's price.The random walk model is often used as a benchmark for financial forecasting models.Again, for comparative reasons, we take the natural log of the price series in our analysis.Many authors choose to focus on log-returns of financial data, where the log-returns are calculated as the differenced log prices.However in this case we do not focus on returns, as the SOFNN will handle the information in the data that the ARIMA model cannot deal with.
In our experiments, the ARIMA models were identified and fitted using the statistical software environment R. The models used for the financial data were identified using the auto.arimafunction in the forecast package, while the other models were identified using the "ARfit" and "stat" packages.The models identified were largely in keeping with those found in [2].The ARIMA forecasts were also generated in R, with the SOFNN being implemented in Matlab.In each experiment the forecasts generated by our hybrid method are compared to those generated by an ARIMA model , and those generated by a SOFNN.
The sunspot series contains a total of 289 entries.We used 221 data points as the training data, and 68 as the test set.An AR(9) model was identified as the most appropriate fit for the training set.The residual series was modelled using a SOFNN with six inputs: five previous residuals and their average.The lynx population data contains a total of 105 observations.Here we used 85 samples as the training set, and 20 as the testing data.An AR (8) model was used to generate our initial forecast.The residual series was then modelled using a SOFNN with eight inputs.The inputs to this network were seven previous residual values along with their average.For the financial tests, we have selected ten of the thirty stocks that make up the Dow Jones Industrial Average.Adjusted closing prices were used as they already take into consideration any corporate actions that may take place between trading days.The time series contained daily prices from the 253 trading days of 2008, split in a 4:1 ration between training and testing data.Different ARIMA models were identified by R for each financial series, but in each case the residuals were modelled using the same SOFNN

VI. RESULTS
The results of the experiments are presented in Tables I to III.We use three error metrics to assess the forecast accuracy: mean absolute error (MAE), mean absolute percentage error (MAPE) and root mean squared error (RMSE).We compared the hybrid model's forecasts with those generated by an ARIMA model and a SOFNN in isolation.The SOFNN in each case used five previous values of the series as inputs.An extensive parameter search was carried out for each series to find the optimal combination of network parameters.In almost all of the experiments, the hybrid approach improved the forecast accuracy in at least two of the three error metrics used.The increases in accuracy are more substantial compared to the ARIMA model on its own, rather than compared to the SOFNN in isolation.This is no doubt due to the flexibility of the SOFNN as a forecasting tool.Our experiments have shown that using a SOFNN, either on its own or as part of a hybrid system, offers increased forecast accuracy when compared to the forecasts generated by an ARIMA model.However, how to assess the accuracy of the forecasts is open to debate.The usefulness of some error metrics, for example the RMSE, is debated by many in the forecasting community [26].Also, in a financial context, some of these metrics are essentially meaningless.The strength of a forecasting model used for trading purposes will be assessed on the profit it generates, with an emphasis on the correct identification of the direction and magnitude of price changes.

VIII. CONCLUSIONS
We have presented a novel hybrid forecasting model, which utilises the strengths of the linear ARIMA model and the non-linear modelling capabilities of a fuzzy neural network.The performance of the system was assessed using several distinct datasets, and was compared to the forecasts generated by an ARIMA model and a SOFNN used independently.The proposed model first generates an ARIMA forecast, which is then supplemented with a prediction of the forecast error, generated by the SOFNN.The proposed hybrid approach offers improvements over the ARIMA forecasts

Fig. 2 .
Fig. 2. Comparative forecasts for Bank of America adjusted closing price.

TABLE III COMPARISON
OF RESULTS -ROOT MEAN SQUARED ERROR (RMSE) when two of the three error metrics are considered.The performance improvements are not as clear cut when compared to the SOFNN in isolation, with the most accurate model changing depending on which error metric is used.