Wavelets based estimation of trend in sub-divisional rainfall in India

Presence of long memory in climatic variables is frequently observed. The trend assessment becomes difficult in the presence of long-memory as the usual methods are not capable to take care of this property during trend estimation. In order to estimate the trend in presence of long memory, the non-parametric wavelet method has become popular in the recent time. The discrete wavelet transformation (DWT) re-expresses a time-series in terms of coefficients that are associated with a particular time and a particular scale. In the present study, DWT has been applied to estimate the monthly rainfall trend for the monsoon months: June-September in ten selected sub-divisions of India using “Haar” wavelet filter. The results from DWT were cross checked with the non-parametric Mann-Kendall (M-K) test. The investigation reveals that the monthly rainfall trend for the monsoon months of different sub-divisions in India are significantly decreasing over the years. However, in some of the sub-divisions, rainfall trend is increasing. DWT reveals significant trend in most of the sub-divisions whereas M-K test reveals that most of the trends are not significant at 5% level.


Introduction
A common issue in time series analysis is the decomposition of several time series components viz., low frequency (trend) component and high frequency (noise and periodicity) component. Most of the time series of aggregated variables show a steadily increasing or decreasing pattern, known as trend. Over last few decades Box Jenkin's Autoregressive integrated moving average (ARIMA) methodology (Box et al., 2007), a parametric approach of time series analysis has been used for forecasting time series data. But there are certain circumstances where it is not possible to postulate appropriate parametric relationship for the underlying phenomena; in this case nonparametric approach is called for. A plausible statistical model for such series can consist of non-stochastic or trend component and stochastic component : Y (t) = T(t) + X(t), where Y(t) is the value of an observable time series at time t, T(t) is the trend component and X(t) is called the noise process with mean zero. Trend assessment is the problem of determining whether or not T(t) is actually present in the time series, i.e., test the null hypothesis H 0 : T(t) = 0. In many agricultural data, like daily commodity price data, daily rainfall and temperature data it is seen that the distant observations are dependent that means the dataset have characteristic feature of long memory or long range dependency. A time-series process is called as long memory or Fractional differenced (FD) process if the autocorrelation function decays very slowly towards zero unlike the exponential decay in usual ARIMA model. A major problem in practice is the discrimination of a FD(d) (d denotes the long memory parameter) process with trend from a FD(d) process without trend, since FD(d) processes for 0 < d < ½ usually looks like having some trend in the series. This makes it hard to distinguish the existence of trend in the time series.
Here our interest lies on the testing of low-frequency part or called as trend, in wavelet domain (Antoniadis, 1997). Let us consider the assumptions that the trend T(t) is well approximated at least locally by a low order polynomial (such as linear) and that the stochastic component X(t) is a FD process. Under these assumptions, the discrete wavelet transform (DWT) (Percival and Walden, 2000) based upon the Haar family of wavelet filters can be used to transform the series Y(t). The ability of the DWT to cleanly separate Y(t) into these components allows us to test for its significance. Almasri et al. (2010) have proposed a test statistic by using wavelet decompositions to test the significance of trend in a time series data. The most difficult problem of testing for linear trend is the presence of dependence among the residuals because of which, tests for trend based on the classical ordinary least squares (OLS) regression are inappropriate. In many situations, the error auto covariance function exhibits a slow decay reflecting the possible presence of long memory process. The wavelet analysis, however, has been extensively used for such purposes, since it suitably matches the structure of these processes. Paul et al. (2011) studied wavelet methodology for estimation of trend in Indian monsoon rainfall time-series data and reported significant trend.
Indian agriculture is mainly rain-dependent; approximately two-third net cropped area is under rain-fed cultivation. Therefore, regular and uniform rainfall pattern is crucial for betterment of agriculture; extreme situations can affect the agriculture devastatingly. But problem is that the regular rainfall pattern may not be observed all the time. Several studies have been done on the pattern of regional rainfall in India (Kumar et al., 2004;Gowsami et al., 2006). Guhathakurta and Rajeevan (2006) have found that there is a decline in rainfall in the months of June, July and September and an increase in the month of August in few sub-divisions in India. Goswami et al. (2006) have studied the rainfall behaviour over central India of the period 1951-2000 and found that a significant increase in heavy rainfall events and a decrease in moderate rainfall events. Some studies have also carried out to predict the rainfall and estimate the trend using some non-parametric methods like wavelet (Kallache et al., 2005). Paul et al. (2013) have investigated the modelling of Indian monsoon rainfall data and concluded that wavelet methodology has greater accuracy than that of ARIMA model. Paul et al. (2015) investigated the trend in mean temperatures in different agro-climatic zones in India using both parametric and nonparametric methods. Paul and Birthal (2016) have applied wavelet approach for describing variability in rainfall in different agro-climatic zones of India. Paul (2017) found the significance presence of long memory in maximum and minimum temperatures in India. In this study we use wavelet based test statistic and non-parametric Mann-Kendall test statistic to test the existence of trend in the time series data. The paper is organized as follows: section 2 describes about long memory process; section 3 deals with the basics of wavelets; section 4 describes methods of detection of trend; section 5 deals with empirical illustration followed by conclusions in section 6.

Long memory process
Most of the research works in time-series analysis assume that the observation separated by long time span are independent of each other or nearly so. But in many practical situations it is seen that many empirical economic series show that the distant observations are dependent, though the correlation is small but not negligible. Let ; ( = 0, 1, 2, … ) be a stationary timeseries process and the autocorrelation function of the timeseries with a time lag of is . For long memory processes, decaying of autocorrelations functions occur at much slower rate (hyperbolic rate) which is consistent with ≈ C 2 −1 , as increases indefinitely, where C is a constant and is the long memory parameter. A study of long memory time series in climate can be found in Paul and Anjoy (2018).

Wavelet
Wavelets (Daubechies, 1992;Ogden, 1997 andVidakovic, 1999) are fundamental building block functions, analogous to the trigonometric sine and cosine functions. As with a sine or cosine wave, a wavelet function oscillates about zero. This oscillating property makes the function a wave. However, the oscillations for a wavelet damp down to zero, hence the name wavelet. If ψ(.) be a real valued function defined over the real axis (-∞, ∞) and satisfying two basic properties: the integral of ψ(.) is zero and the square of ψ(.) integrates to unity. The detail discussion of wavelets can be found in (Percival and Walden, 2000).

Discrete wavelet transform (DWT)
DWT of a time-series observation is used to capture high and low frequency components. The DWT re-expresses a time-series in terms of coefficients that are associated with a particular time and a particular dyadic scale 2 j-1 (Nason & Von Sachs, 1999). Let, = 0 , 1 , … , −1 ′ be a column vector containing N observations of a real-valued time series, where we assume that N is an integer multiple of 2 M , where M is a positive integer. The DWT of level J is an orthonormal transform of X defined by where, W is an orthonormal N × N realvalued matrix, i.e., −1 = ′ so ′ = ′ = and called the wavelet matrix. Dj = {d j,k }, j = 1, 2, . . . , J , are N/λj × 1 real-valued vectors of wavelet coefficients at level j associated with scale λ j and location k, where λ j = 2 j . The real-valued vector S J = {s J,k } is made up of N/2 J scaling coefficients. Thus, the first N − N/2 J elements of D are wavelet coefficients and the last N/2 J elements are scaling coefficients, where J ≤ M. In DWT several filters can be used, "Haar filter" is one of them.

Trend
Trend is defined as a long term change in the underlying mean level per unit time (Jain and Kumar, 2012). There are different models for describing various forms of trend, both linear and nonlinear and both stochastic and deterministic. The model that represents the time series by using a j th -order polynomial function is given as when j = 0, there is no long-run increase or decline in the time series over time and j = 1, implies that there is a straight-line long-run growth (if a 1 > 0) or decline (if a 1 < 0) over time.

Mann-Kendall test
The non-parametric Mann-Kendall test is commonly employed to detect monotonic linear trend (Jayawardane et al., 2005) in time series data. The null hypothesis, H 0 , is that the data come from a population with independent realizations and are identically distributed. The alternative hypothesis, H 1 , is that the data follow a monotonic trend.
The Mann-Kendall test statistic is calculated according to: where, sgn is the signum function. If S is greater than zero, trend is said to be increasing otherwise if S is less than zero trend is decreasing. More details of the test can be found in Paul et al. (2014).

Wavelet-based estimation of the trend in presence of long memory
Yajima (1988) considered a polynomial regression consisting of a polynomial trend with p = 1 and a stationary process with long memory. Based on that we consider the following model: where, the process Z t is a residual term which is a long-memory process defined by where, 0 ≤  < 1/2 is the long-memory parameter, { } is a Gaussian white noise process with mean zero and 2 > 0.
From (1), we can write where, d w is an N×1 vector containing the wavelet coefficients and zeros at all other positions and d s is an N×1 vector containing the scaling coefficients and zeros at all other positions. Since = ′ , we can write, where, is an estimator of the polynomial trend T at level J , while is a tapered "version" of X.
An important issue is how to choose the wavelet filter. The Haar wavelet, which is a piecewise constant function, preserves the discontinuities and therefore it is most suitable to identify a structural break in the data.

Dataset
In this study monthly rainfall data corresponding to different zones of India is collected from Indian Institute of Tropical Meteorology (www.tropmet.res.in), Pune, India for detection as well as estimation of trend in monsoon rainfall data of different sub-divisions of India. The data set comprises of monthly rainfall over 128 years (from 1887 to 2014) measured in mm. There are total 30 sub-divisions in India, out of which 10 zones are considered in this present study. These 10 zones are selected in such a manner that it can represent the whole India. Here we are considering only four monsoon months (namely June, July, August and September). The names of the selected sub-divisions along with their short names are listed below:

Descriptive statistics
The descriptive statistics of the rainfall data is reported in Table 1. Here, we consider mean, maximum value, minimum value, standard deviation (SD), coefficient of variation (CV), skewness and kurtosis to represent the data set. The values of these statistics are calculated for each of the monsoon months corresponding to each of the sub-divisions under consideration. A perusal of Table 1 indicates that the average monthly monsoon rainfall is higher for Assam & Meghalaya (ASMEG) and Gangetic West Bengal (GNWBL) sub-divisions than the other sub-divisions. The table also indicates that the maximum rainfall is higher for Assam & Meghalaya (ASMEG), Gangetic West Bengal (GNWBL), Gujarat (GUJRT) and Madhya Maharashtra (MADMH) subdivisions for each of the monsoon months. In terms of CV the variability in the monthly monsoon rainfall data is higher for West Uttar Pradesh Plains (WUPPL), Punjab (PUNJB) and Gujarat (GUJRT) sub-divisions. Almost all the series under consideration are positively skewed (except July month of WUPPL and MADMH) and all the series are leptokurtic.

Test for long memory
The presence of long memory in a time series data can be confirmed either by investigating the autocorrelation function (ACF) plot of the data set or by using some parametric and nonparametric tests like GPH (Geweke and Porter-Hudak, 1983) and Figs. 1(a-j). DWT plot of different zones using Haar wavelet filter

Fig. 2.Trend plots of different zones
Sperio test (Reisen, 1994). In this study we have applied Sperio test to test the presence of long memory for each of the monsoon month rainfall data of the selected zones and the test results are provided in Table 2. It is found that the test is significant at 5% level for some of the series which establishes the presence of long range dependency in the respective monthly rainfall data like July (ASMEG), August (WUPPL) etc.

Variation in rainfall using DWT
DWT plots of rainfall in different zones for the months of June, July, August and September using Haar wavelet filter at level 5 are reported in Figs. 1(a-j). The number of the plots in Figs. 1(a-j) are given for the different zones accordingly to the order as per section 5.1. Wavelet coefficients are plotted as bars, up or down. The sizes of the bars are relative to magnitudes of coefficients. The number of wavelet coefficients at the lowest resolution level (level = 1) is exactly half the number of original data points and the number of coefficients decreases by half at each level (Nason and Sachs, 1999). The plot of actual data is shown at the bottom of each plot of Figs. 1(a-j) which is very rough and it is full of noise component. DWT attempts to extract the actual signal from the noisy time series data at each level. The graph of W 2 is much smoother than the W 1 . Similarly smoothness increases as we are going to top of the graphs with upper coefficients. In the graph V 5 scaling coefficient shows the smooth plot.

Detection of trend using wavelet
Trends of different sub-divisional zones are estimated using "Haar" DWT and plotted in the Fig. 2. Each plot in Fig. 2 is represented by two subscripts. The first subscript refers to zone as described in section 5.1 and the second subscript refers to the monsoon months, i.e., 1 for June; 2 for July; 3 for August and 4 for September. Therefore, the first plot, i.e., 1.1 represents the trend in rainfall in ASMEG zone for the month June. The rainfall trend in the month of June for ASMEG zone clearly indicates that trend is negative. From the year 1887 to 1915 trend remains constant then it falls during 1916. Again it remains stagnant during the year 1918 to 1949. After that a slight increase in the rainfall has been detected till the year 1951. A rapid decreasing trend has been observed from 1997 to 2014. For other three months of this zone, the rainfall pattern remains same with a difference in magnitude. The trend in other zones can be interpreted in similar pattern. The overall pattern of trend in rainfall remains almost same in all the zones except the variation in different time epochs.

Test for detection of trend
In this section we test the existence of trend using Mann-Kendall test. The values of test statistics along with corresponding p-values are given in Table 3. This table also includes the values of G statistic along with their corresponding F-values for testing the significance of existing trend as described in section 4.3. It has been seen that the Mann-Kendal test could not capture the significant trend in many of the series whereas the Wavelet based test successfully captured the trend in almost all the months of different zones. This is because, wavelet based test has more power than that of the Mann-Kendall test in capturing the trend particularly when there is long memory present in the series.

Conclusions
In the present study, two nonparametric methods namely wavelet analysis and Mann-Kendall test have been used for estimation of trend in monthly rainfall in ten selected sub-divisions of India. Presence of long memory in rainfall data was tested using suitable statistical test and it was found that long memory was significantly present in rainfall for the month of June in COAPR; July of ASMEG, PUNJB and EMPRA; August of GUJRT and EMPRA; September of EMPRA and COAPR. In presence of long memory, Mann-Kendall test may not be appropriate to detect the trend. Therefore, wavelet based trend test was advocated and it was found that in most of the sub-divisions, the rainfall trend was significant at 5% level. Mostly, significant negative trend in rainfall has been found in the monsoon months which is not a positive signal for the water sector in the country. The variation in monthly rainfall for June-September for the studied zones were also investigated by wavelet using Haar filter. The local as well as global variation in rainfall is observed by DWT plot of the respective months in the respective locations. The variability in rainfall is evident in the recent decades in all the locations.