Time Series: An ARMA Example

Alparslan Mesri
4 min readNov 19, 2022

This article is written by Alparslan Mesri and Hale Kizilduman.

Bruno Ticianelli by Pexels

The aim of this study is to examine the success of this model from two perspectives which are visual and direction prediction success by estimating a time series model on synthetic data.

You can access the files via the link.

Firstly needed imports are made.

Then need to read CSV file and df file is split into train and test variables. The column name is changed from “y7” to “actual_val”. df file contains 1500 rows. The last 50 rows are allocated for the validation phase. The first 1450 rows are used for the training part.

The graph of df_train is called.

In a time series analysis, it is needed to examine the correlogram of the data. The first correlogram is about ACF (Autocorrelation Function). In random data, the graph needs to diminish abruptly to show there is no correlation between rows. In this case, it can be seen that data doesn't have long-term memory. After 15 rows the correlation didn't remain. It means the data is stationary. This idea will be tested again more deeply with ADF. Whether it is stationary or not indicates whether the model to be used is ARMA or ARIMA.

Secondly, PACF (Partial Autocorrelation Function) graph is examined.

Now ADF(Augmented Dickey-Fuller Test) test will be performed to understand is the data is stationary or not. If there is a unit root, it can be said that the data is non-stationary.

In the next phase, the Ljung-Box test will be performed to understand whether the data has a time series structure or not.

H0: Absence of serial correlation

Ha: Existence of serial correlation

P-value is less than 5% which means the H0 hypothesis is rejected. Thus ARMA can be used in this case. In the next code block, the ARMA model will be called and AKA (Akaike’s Information Criterion) value will be calculated for each variation.

The best model is determined as ARMA(1,1) according to AKA values. In the next phase, the predicted and observed values will be compared in the graph.

Blue lines represent real observed values and red lines represent forecasted values. As can be seen, forecast values are not so different than real values. So it can be said that the ARMA(1,1) model worked well.

As next, ARMA results are assigned to results_series variable and it is converted to dataframe. df_val variable’s index is reset. diff_actual_val, diff_prediction, and evaluation columns are added.

After obtaining val_results, new columns’ values are calculated.

In this case, the evaluation criterion will be whether the direction in the actual data is correctly estimated. If the increase and decrease are predicted correctly in accordance with the actual data, it will write as 1 in the evaluation column, and 0 if incorrectly predicted.

As a result, our model correctly predicted the direction of the ups and downs 57% of the time.

--

--