Discussing the article: "Two-sample Kolmogorov-Smirnov test as an indicator of time series non-stationarity"

 

Check out the new article: Two-sample Kolmogorov-Smirnov test as an indicator of time series non-stationarity.

The article considers one of the most famous non-parametric homogeneity tests – the two-sample Kolmogorov-Smirnov test. Both model data and real quotes are analyzed. The article also provides an example of constructing a non-stationarity indicator (iSmirnovDistance).

In this study, I will test financial time series for stationarity in the narrow sense using empirical distribution functions. Probability theory and mathematical statistics, as a specific section of the former, are based on stationarity assumptions. There are many methods for analyzing stationary processes, including regression analysis, autocorrelation analysis, spectral analysis methods, and the use of neural networks. However, applying these methods to non-stationary data can lead to significant forecast errors.

For traders, the issue of stationarity is closely related to the choice of the amount of data for calculating various indicators. In the case of stationary processes, the more data is available, the more accurately all statistical characteristics can be calculated. However, when analyzing non-stationary processes, it is difficult to determine the optimal amount of data. Too large volume may contain outdated information that no longer affects the current situation. If too little data is taken, then we will not be able to adequately assess the statistical properties of the process due to insufficient representativeness.

The most complete characteristic of a random process is its distribution law (probability function). Therefore, constructing an indicator that would allow tracking changes in the distribution function of a time series over time is an important task. This indicator, in turn, will serve as a signal about the need to revise the volume of data for calculating standard technical analysis indicators. In mathematical statistics, the problem of testing whether the distribution function of a random variable has changed over time is called "testing the homogeneity hypothesis".

Author: Evgeniy Chernish

 
Перейдем к анализу реальных данных. В качестве примера я взял минутные бары валютной пары EURUSD и золота XAUUSD.

I didn't understand what was being compared at all. Apparently, it is necessary to study the source.

ZЫ I looked at the source. It looks like it should be run (I haven't tried it) on D1, with less bars taken instead of 1440 (PERIOD_M5 instead of PERIOD_M1). And "balls" - Close increments

 
Стоит сказать, что есть разница на каком таймфрейме рассчитывается расстояние Смирнова. Для минутных данных как мы видели, наблюдается существенная нестационарность ряда, в то же время для пятиминутного таймфрейма ряд более стационарен, гипотеза однородности отвергается гораздо реже. Отчасти это связано с объемом данных, 1440 для минутного таймфрейма против 287 для пятиминутного. С постепенным увеличением данных с 287 до 1440 показатель отклонение нулевой гипотезы растет, тем не менее гипотеза однородности чаще отвергается именно для минутного графика. 

I'm sure if you compare weeks rather than days (1435 M5 increments), M5 will still be "stationary" than M1. It's not about the amount of data, it's about the nature of the construction of increments.

You can take different laws of construction of increments: TF increments, ZZ increments, etc.

That is, the result of stationarity tests depends on the preparation of the initial data. Why to take TF increments is a mystery. Small timeframes on a day are a mincemeat of flies and cutlets. There are rollovers and low liquidity and news. Then compare the pieces of the day. For example, EURUSD from 02:00 to 08:00.


After all, no one is forcing you to trade or learn on a 24/7 basis.

 
fxsaber PERIOD_M1). And "balls" - Close increments

Run the indicator on the daily timeframe, the time series of logarithmic price increments PERIOD_M5 is analysed . Minutes can also be used, but this data is too non-stationary.

Every day we analyse how much the law of distribution of time series returns changes, for this purpose we use the Smirnov homogeneity criterion. This statistic itself is based on the comparison of two sample distribution functions. The maximum modulus of the difference is taken as the difference between these two functions.

 
Евгений Черныш #:

Run the indicator on the daily timeframe, the time series of logarithmic price increments PERIOD_M5 is analysed . Minutes can also be used, but this data is too non-stationary.

Every day we analyse how much the law of distribution of time series returns changes, for this purpose we use the Smirnov homogeneity criterion. This statistic itself is based on the comparison of two sample distribution functions. The maximum modulus of the difference is taken as the difference between these two functions.

I read the article from top to bottom. I got to this phrase.

Let's move on to the analysis of real data. As an example, I took minute bars of the EURUSD currency pair and gold XAUUSD.

Further charts without explanations. As if they forgot to write a small paragraph.

 
fxsaber #:

I'm sure if you compare weeks rather than days (1435 M5 increments), M5 will still be "stationary" than M1. It's not about the amount of data, but about the nature of the construction of increments.

You can take different laws of construction of increments: TF increments, ZZ increments, etc.

That is, the result of stationarity tests depends on the preparation of the initial data. Why to take TF increments is a mystery. Small timeframes on a day are a mincemeat of flies and cutlets. There are rollovers and low liquidity and news. Then compare the pieces of the day. For example, EURUSD from 02:00 to 08:00.

It is so, M5 is more stationary than M1, at least Smirnov's criterion indicates it.

Minute or in extreme case 5-minute timeframe is taken in order to get some adequate lag. If we analyse for example 1440 for 5 minutes, we will get the result according to Smirnov's criterion once in a fortnight. It is necessary not to increase the timeframe, but on the contrary, probably to go down to the floor below the ticks, which would be faster to react to changing conditions. Unfortunately, I did not have a base of ticks for analysis, so I analysed minutes.

But comparing separate intraday sessions is a good idea. Take out the Asian session, for example, as there are many zero increments there, quite different volatility, etc.

P.S. what is ZZ increments ?

 
fxsaber #:

I was reading the article from top to bottom. I got to this sentence.

Then there are graphs without explanations. It was as if they forgot to write a small paragraph.

There are brief explanations below the graphs. If something specific is not quite clear, write to me and I will try to explain everything.

 
Good article, thank you. The conclusion that the window size should be variable was interesting.
 
Maxim Dmitrievsky #:
Good article, thank you. The conclusion that the window size should be variable was interesting.
Thanks. Well, I have not discovered America here of course =)
 

Imho, it would be worth taking into account daily volatility fluctuations. For example, you could normalise the increments by the average volatility at that time of day. According to my estimates, the increments normalised in this way differ significantly less from noise.

For rows it is more often said about tests of decomposition. Such tests are often based on tests of homogeneity, but in principle it is an independent area of matstat. For example, Pettitt's test is often used in econometrics.

In general, the article is good.

 
Евгений Черныш #:

P.S. what are ZZ increments ?

Build ZigZag by HighBid/LowAsk and take increments between vertices.