STREAMFLOW FORECASTING IN BUKIT MERAH WATERSHED BY USING ARIMA AND ANN

This paper presents the application of linear and non-linear time series modeling approaches for simulating and forecasting streamflow at three stations located in three different rivers namely Kurau River, Ara River and Krian River of Bukit Merah watershed of Malaysia. The performance of linear autoregressive integrated moving average (ARIMA) model and non-linear artificial neural networks (ANN) model in forecasting the monthly streamflow of Malaysian river basins has been evaluated based on mean absolute percentage error (MAPE), root mean squared error (RMSE) and coefficient of determination (R). The results show that both ARIMA and ANN methods are suitable for streamflow forecasting. However, ANN is better than ARIMA in dealing with shortmemory streamflow data. In addition, ANN method is more flexible to use against the inconsistent data.


Streamflow
forecasting is very important in water resources management and planning.Medium-to long-term forecasting is particularly useful in reservoir operations and irrigation management as well as institutional and legal aspects of water resources management and planning (Abudu et al., 2010).However, streamflow is one of the most complex and difficult elements of the hydrological cycle to model (Shabri and Suhartono, 2012).Streamflow depends on numerous factors including spatio-temporal characteristics of rainfall, soil, landuse, weather, etc. as well as catchment characteristics which are often either not available or difficult to obtain in practice, especially in developing countries like Malaysia.Furthermore, influence of those variable in combination with other variable are difficult of understand and express as systems of non-linear partial differential equations for modelling (Beven, 2012;Katimon et al., 2013).
These has inspired many researchers to focus on time series modeling for streamflow forecasting because of its forecasting capability, inclusion of richer information, and more systematic way of building models (Abudu et al., 2010).In comparison to physical-based models, the time-series modelling approach has several advantages (Ali and Dechemi, 2004;Young, 2006).Physical-based hydrological models require parameterization and are based on the predetermined theory of hydrology, whereas a time series model is essentially a `black-box' (Lohani et al., 2011;Hipel and McLeod, 1994).Time series modeling approach does not require any theory to link the inputs and outputs series.Places where hydrological parameters are not available, time series modelling approaches are found to be appropriate (Katimon et al., 2013).
The application of time series models for streamflow forecasting includes univariate models that deal with one time series and more complex multivariate models which incorporate exogenous time series variables.The linear time series models have been most widely applied for streamflow modeling.Autoregressive integrated moving average (ARIMA) model is one such time series models which have long been applied in streamflow forecasting, particularly in the modeling of monthly streamflow (McKerchar and Delleur, 1974;Noakes et al., 1985;Salas and Obeyskara, 1992;Bender and Simonovic, 1994;Hipel and McLeod, 1994;Abrahart and See, 2000;Yurekli et al., 2005).However, the linear time series models are built under the assumption that the process follows normal distribution, and generally they focus on modeling and predicting the mean behavior, or the first moment of the variable.These models are usually insufficient for capturing the nonlinear properties of the processes governing the temporal variation of hydrological variables.Hence, the artificial neural networks (ANN) model, has gained more and more popularity for hydrological forecasting in recent decades because of its ability to identify complex nonlinear relationships between input and output data sets without the necessity of understanding the nature of the phenomena and without making any underlying assumptions (Abudu et al., 2010).Hence, the artificial neural networks (ANN) model, has gained more and more popularity for hydrological forecasting in recent decades because of its ability to identify complex nonlinear relationships between input and output data sets without the necessity of understanding the nature of the phenomena and without making any underlying assumptions (Abudu et al., 2010).A number of studies in recent years have proved the efficacy of ANN in streamflow forecasting (Hu et al., 2001;Shamseldin et al., 2002;Dolling and Varas, 2003;Kisi, 2008;Wang et al., 2009;Abudu et al., 2010).
Prediction of streamflow is critical to many hydrological activities such as designing flood protection, estimating available water for withdrawal, etc.As more than 97% of total water supply in Malaysia comes from surface water, forecasting streamflow is very essential for hydrological activities of the country such as designing flood protection, estimating available water for withdrawal, etc.This paper presents the application of autoregressive integrated moving average (ARIMA) and artificial neural networks (ANN) models in forecasting the monthly streamflow of three river basins of Malaysia to assess the capability of linear and non-linear modeling approach to forecast streamflow.It is expected that the study will be beneficial for number of stakeholders especially local and nations water resources planning and management organizations in selecting tools for hydrological forecasting of Malaysian river basins.

II. STUDY LITERATURE
The study area is located in Bukit Merah in the Perak state of Malaysian.Due to optimal rainfall and sunlight, huge agricultural activities have been developed in the region.Water resources have also been developed with time to supply the growing agricultural activities.For example, the Bukit Merah Reservoir in the confluence between Kurau River and Merah River was constructed in 1906 to provide irrigation water for 24,000 Hectare paddy field.Besides, it is also used as domestic water suppliers for around 160,000 inhabitants at the Krian districts (Hasan et al., 2012).Unfortunately, floods and droughts are common phenomena in the study area.For example, flood in November 2008 and drought in June 2009 hampered agricultural activities in the region in two consecutive years.Forecasting of river discharge is very important for water resources planning and management and continuous supply the growing agricultural activities in the region.The data sets at these stations have both mix long and short memories data.There are also the consistent and inconsistent distributions of Data.Therefore, it is expected that streamflow simulation and forecasting at these stations will help us to explore the advantages and disadvantages of ARIMA and ANN under different circumstances.

III. METHODOLOGY Methodology Of The Arima Model
Autoregressive integrated moving average(ARIMA) was first developed by Box and Jenkins (1976) with three stages of analysis viz.model identification, parameter estimation and diagnostic check.Later, for forecasting purposes Makridakis et al. (1998) added two more processes.One is the preliminary data preparation and the second is the model execution or forecasting.Description of each processing steps are given below:

Model identification
Model identification was done through the analysis of autocorrelation function (ACF) and partial autocorrelation function (PACF) plots generated from the input data.The first step of model identification is to determine the order of stationarized series.This condition is reached when the lags of ACF plot is small and dies down.If the series shows continuous pattern, seasonal differencing is required.In the present study, twelve lags were used to find the seasonal difference as the input is monthly time series.
The identification of autoregressive (p) and moving-average (q) orders were also based on the observation of ACF and PACF plot.If the series has autoregressive terms, the ACF plot dies down follows a dumped cycle and the PACF is cut-off suddenly after p lags.In that case, p is considered as the autoregressive term.
If the series has moving-average terms and the PACF plot is dies down follows a dumped cycle, the ACF cuts-off suddenly after q lags.Then q is considered as the autoregressive term.If both autoregressive and moving-average terms are dies down after a few lags than the mixed model is used.

Parameter Estimation
Usually, two or more tentative models are proposed through model identification process.Each of the suggested models needs to pass the parameter estimation process which involves the statistical t-test and p-test.The good model should have p-value less than 0.05 at 95% level of confidence.The statistical t-test is known as hypothesis test where the hypothesis can only be rejected if the series fulfill certain criteria.Rejection criteria are when |t test | > t α/2,df .Where, α is the significance level, and df is known as degree of freedom which is equal to total number of data minus the number of parameter used.

Diagnostic Check
The tentative models which passed the parameter estimation process were then checked to assess their suitability for the forecasting.The diagnostic check examines the residuals ACF produced by the tentative models by using hypothesis testing.The null hypothesis which defines that the residuals of ACF as white noise can be rejected if the test statistics value is higher than the tabulated chi-squre value at particular level of significant and degree of freedom.

Methodology Of The Ann Model
To design the ANN, the basic processing elements (PE) are divided into two layers viz. the input and the output.Multilayer perceptron was used with one or more layers between the input and output (Rosenblat, 1959).Basheer and Hajmeer (2000) found that the selection of the number of hidden layer and its neurons determine the successful rate of the neural networks systems.Many researchers used one hidden layer for forecasting (Cybenko, 1989).However, different problems have different characteristics, therefore, the number of layers and its neurons are also different for different problem (Kisi, 2005).In the present study topology of ANN is selected through trial and error method.
A back-propagation learning algorithm known as Levenberg-Marquardt backpropagation algorithm was used to train the network in the present study.It has been reported that the Levenberg-Marquardt backpropagation algorithm is the most stable algorithm for time series analysis which has a medium data size.The back-propagation algorithm can be divided into two steps viz.forward activation to get the solution and backward activation to adjust the weighted value (Basheer and Hajmeer, 2000).
In the beginning, each neuron is connected from one layer to the other and each connection is provided with a random weight values.The Levenberg-Marquardt algorithm use following equation for learning, where  +1 is the new weight,   is the old weight, J is the Jacobian matrix which contain the first derivative of error, μ is constant learning coefficient, I is identity matrix e is vector of error and χ is the weight and bias function.Sigmoid activation function is used in the input-hidden layer and linear activation function is used in the hidden-output layer in the present study.

Streamflow Forecasting by Using ARIMA
The ARIMA model is first applied to simulate streamflow in the station located in Kurau River (4907422).Analysis of ACF plot reveals that non-seasonal differencing (1-lag differenced, d = 1) is needed to make the time series stationary.The ACF plot also shows the existence of a seasonal trend in data and therefore, seasonal difference (12-lags differenced, D=1) is also applied.Analysis of ACF and PACF plots for non-seasonal differenced streamflow indicate that ARIMA (1,1,0), ARIMA (0,1,1) and ARIMA (1,1,1) could be the possible models.
This verifies that the tentative ARIMA models are statistically significant.The Ljung-Box test is used to determine the accuracy of the models by observing the residuals.It is found that the critical values for both ARIMA models are greater than tabulated chi-square values.This indicates that the models can be assumed to be statistically significant for forecasting.
Mean absolute percentage error (MAPE), root mean squared error (RMSE) and coefficient of determination (R 2 ) are used to compare the observed values to the predicted values in order to find the best ARIMA model.From the results given in Table I, ARIMA (1,1,1) (0,1,1) 12 gives the least error and therefore, can be chosen for forecasting.The ARIMA model yields the following equation for forecasting streamflow: The equation shows that streamflow data from one, two, twelve, thirteen and fourteen months earlier are necessary to predict streamflow of the present month.Figure 1 shows the results of observed and forecast streamflow by ARIMA (1,1,1) (0,1,1) 12 for the station located at Kurau River.

Streamflow Forecasting by Using ANN
Structure of ANN for streamflow forecasting is selected by using the rule proposed by Dopico et al. (2009).Two-thirds of the total number of neurons at input layer and output layer is the optimum number of neurons at hidden layer.Following that rule, ANN(12,9,1) is used as the starters model in the present study.To find the best fitted model, the neurons of input and hidden layers of the starter model are adjusted by using trial and error method.Mean absolute percentage error (MAPE), root mean squared error (RMSE) and coefficient of determination (R 2 ) computed for different ANN models to find the best model.The results (Table IV) shows that the ANN(12,9,1) is the best fitted model for forecasting at station located in Kurau River.The observed and forecast streamflow by this model is shown in Figure 4.The same procedures are applied for other two stations.The ANN(12,9,1) model is found at the best model for station located at Ara River (Table V), while ANN(12,6,1) model is found to be most suitable for station at Krian River (Table VI).The comparison between observed and one-month ahead forecast streamflow by corresponding ANN models are shown in Figures 5 and 6, respectively.Finally, the forecasting ability of ARIMA and ANN are compared.Forecasting accuracy of ARIMA and ANN models are measured by using mean absolute percentage error (MAPE), root mean squared error (RMSE) and efficiency coefficient (R 2 ).Table VII shows the forecasting accuracy for autoregressive integrated movingaverage (ARIMA) and artificial neural network (ANN) models.The results show that the forecasting ability of both methods is more or less same.

V. CONCLUSIONS
The objective of the study was to apply autoregressive integrated moving-average (ARIMA) and artificial neural network (ANN) methods for forecasting of monthly streamflow data in Malaysian rivers.Three monthly streamflow data sets around Bukit Merah, Perak, Malaysia were selected for this purpose.Characteristics of streamflow data are different at three locations.Stations at Krian River and Ara River are located at the upstream of Bukit Merah Reservoir, while the station at Kurau River is located near to the reservoir.Streamflow data at Krian River and Ara River are inconsistent with some outliers.After up gradation of the Bukit Merah reservoir in 1965, streamflow at these two stations were suddenly decreased to very low, probably due to the damming of the river flow during the construction process.On the other hand, Krian River the streamflow data has fairly stable and consistent.The results showed that both ARIMA and ANN methods are suitable for monthly streamflow time series forecasting.However, artificial neural network performed betterfor stations at Kurau and Ara rivers.On the other hand, both methods are found to give similar result in forecasting streamflow at Krian River.Generally it can be concluded that Autoregressive integrated moving average is suitable in dealing with long memory time series analysis, while ANN more accurate in short memory time series analysis.In addition, ANN method was more flexible to use against the inconsistent data sets.ANN model has the ability to learn data pattern better than ARIMA.
Daily streamflow data at three gauging stations located around Bukit Merah Reservoir are collected from the Drainage and Irrigation Department (DID) of Perak, Malaysia for the present study.Those are station No. 4907422 at Kurau River, Station No. 5007423 at Ara River and Station No. 5206432 at Krian River.Stations at Kurau and Ara River are located at the upstream of Bukit Merah Reservoir, while the station at Krian River is located near to the reservoir.However, the streamflow at the station of Krian River has no direct link with reservoir water control.The stations are selected based on the availability of data.Streamflow at each station has different characteristics and regimes.

Table I .
Accuracy check of the ARIMA models for the Kurau River (4907422)

Table II .
Accuracy check of the ARIMA models for the Ara River (5007423)

Table III .
Accuracy check of the ARIMA models for the Krian River (5206432)

Table IV .
Accuracy check of the ANN models for the Kurau River (4907422)

Table V .
Accuracy check of the ANN models for the Ara River (5007423)

Table VI .
Accuracy check of the ANN models for the Krian River (5206432)

Table VII .
Comparison of selected models