Statistical prediction of Sri Lankan rainfall during October to December

Sri Lanka receives most rainfall during October to December (OND). Here we construct multiple linear regression models to forecast the OND Sri Lankan rainfall during 1979-2012 for lead times of 1 and 2 months. Correlation analysis was used to examine the relationship between Sri Lankan OND rainfall and global sea surface temperature (SST) anomalies. Three independent predictors were identified through partial least square regression method which includes the southern Atlantic SST tendency, southern Pacific SST tendency and western Pacific and Maritime Continent SST tendency at two different lead times. Three-year-out cross validation concludes that the multiple linear regression models can produce forecast the OND rainfall forecast at correlation coefficient skill of 0.69 and 0.68 for the 1 and 2 month lead times respectively. The physical processes associated with these three predictors show that they contribute to increase in OND rainfall of Sri Lanka.


Introduction
Sri Lanka receives the major portion of its annual rainfall during October to December (OND) (Zubair and Ropelewski, 2006). This primary rainy and agricultural season is locally known as "Maha" period. In addition to agricultural productivity, hydropower planning, water resource management and disaster preparedness of Sri Lanka are strongly influenced by the OND seasonal rainfall. During the season, the rainfall rate is intense [approximately 150 ~ 200 mm month -1 ; Figs. 1(a&b)], which often causes floods and landslides . Therefore, prediction of the seasonal rainfall during OND is important, yet one of the most challenging tasks in Sri Lanka. Although interannual relationships of seasonal rainfall with leading climate modes, such as the El Niño Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD), have been examined (Zubair and Ropelewski, 2006;Zubair et al., 2003). The main objective of this study is hence to construct a statistical model, which will be a valuable tool to reduce disaster risk and economic losses in Sri Lanka.
The rainfall during the OND season can be attributed to multiple meteorological phenomena, such as the formation of tropical cyclones and depressions in the Bay For precipitation, we use the monthly Global Precipitation Climatology Project (GPCP) Version 2.2 (Huffman et al., 1997) at the spatial resolution of 2.5° latitude by 2.5° longitude. The monthly SSTs are obtained from the Version 1.1 of Hadley Centre Global Ice and Sea Surface Temperature dataset (HadISST1) provided by the United Kingdom Meteorological Office (UKMO) (Rayner et al., 2003). The SST dataset is originally at the 1.0° latitude by 1.0° longitude resolution but is regridded to the 2.5° by 2.5° resolution of the GPCP. For monthly zonal wind and meridional wind (at what level?), the ERA-Interim dataset produced by European Centre for Medium-Range Weather Forecasts (ECMWF) (Dee et al., 2011) is used. The ERA-Interim is downloaded at a native horizontal resolution of ~ 60 km but is also interpolated to the 2.5° by 2.5° resolution.

Methodology
Two multiple regression models are constructed to predict the OND rainfall anomalies over Sri Lanka. Setting up the Sri Lanka rainfall index (SLRI) as the predict and is the first step to build the prediction model. The SLRI is defined as the normalized time series of OND rainfall anomaly averaged over Sri Lanka region (79° E to 82° E and 5° N to 10° N) for the period of 1979-2012. The index will be presented in Section 3.
To provide stable and effective prediction at seasonal time scale, SST is often chosen as one of slowly varying boundary conditions (Lau et al., 2000;Park et al., 2015;Yim et al., 2013). Most of the statistical seasonal prediction models use lead-lag relationships between SSTA tendency and predict and (Yim et al., 2015;Lee and Seo, 2013;Kim et al., 2017;Yim et al., 2013). We use the Pearson correlation coefficients, , to identify the relationships between the Sri Lankan OND rainfall and SSTA tendency. The predictors are selected when the correlation coefficient exceeds the 95% significance level. Estimation of the statistical significance is based on a t-test that uses the t- In this equation, the number of seasons is used as the degree of freedom (n).
To find the lead-lag linkage between the predictors and the predict and, the correlation coefficients between the SLRI and tendency of SSTAs were calculated at 1and 2-month lead times. We first define the 1-month lead time as the difference between August minus June. This forecast model uses the predictors which have information before and during August and then the 1-month lead time is defined as the difference between July minus May, which includes the information during and before July.
To make sure the predictors are independent from each other, the partial least square regression (PLSR) method, e.g., Black (2017) is employed as follows: (i) Grid by grid correlation coefficients between the global SSTAs and SLRI are calculated to obtain the first correlation map, (ii) Statistically significant regions at 95% confidence level are selected as predictors. Based on the significance test, the first predictor field is identified [show a table giving a list of selected predictors (time period of derived predictor) along with correlation coefficients with significant level], (iii) Area-weighted predictor field is normalized by subtracting its mean and dividing it by its standard deviation, (iv) First partial regression is obtained by using conventional least squares fitting and regressing the SSTAs against the first predictor, (v) First partial regression is linearly removed from both the predictor field and all SSTA field. The residual predictor field became as the new predictor and the residual SSTA field became as the new predictor field and (vi) The residual SSTA field is used to find the second predictor.
Steps 1-5 are repeated to obtain the other predictors. This procedure is terminated when there are no further significant predictor fields. We limit the number of predictors up to 3 so as to avoid the over fitting problem (Lee and Seo, 2013;Kim et al., 2017)

Cross-validation
The regression coefficients remain stable when using a cross validation method, which is widely used in climate prediction. That is, to examine the performance of the model, we employ two cross validation approaches. The first validation approach is following Blockeel & Struyf (2002), who suggest that 50%--70% data can be used to construct the regression model and the remaining data can be used to validate the model. For this approach, we divide the entire 34-year data into two subsets as the training period and the validation period. For the training period, first 21 years data  are used to obtain the regression coefficients for the model. The remaining 13 years data (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) are then used to make the independent forecast.

Figs. 2(a&b).
Correlation coefficient maps between the SSTA and SLRI are shown at (a) 1-and (b) 2-month lead times, respectively. The boxes indicate the regions of the three predictors. Black crosses mark the areas that are statistically significant at the 95% confidence level The second approach is based on the three years out cross validation method (Yim et al., 2013). For this approach, we develop the model all years, but excluding the three years centered at the year that the prediction and hence the validation is performed. The procedure is repeated by taking 3-year out around each predicted year. That is, this leave-three-years-out cross validation involves using 3 observations as the validation set and the remaining observations as the training set.

Selection of the best predictors
To investigate the monthly rainfall evolution, monthly climatology of the Sri Lankan rainfall is computed by averaging the rainfall amount over the domain (79° E to 82° E and 5° N to 10° N; red box in Fig. 1(a) during the period of 1979-2012 [ Fig. 1(b)]. Bimodality is apparent in the mean annual cycle of the rainfall with a primary peak from October to December and a subsidiary peak from April to June. The rainfall amount gradually increases from September and attains the highest rainfall from October to November. The OND mean low-level winds at 850 hPa [vectors in Fig. 1(a)] show cyclonic circulations and easterly / northeasterly trade winds over Bay of Bengal.
The formation of the low-level cyclonic circulation to the east of Sri Lanka and the moist northeasterly winds blowing across Sri Lanka are favorable for heavy rainfall over the island. The OND mean precipitation is centered near the   Maritime Continent and decreases toward Sri Lanka [shading in Fig. 1(a)].
To select the predictors for the 1-month lead time, the correlation coefficients between the SLRI and August minus June SSTA tendencies are calculated [ Fig. 2(a), Table 1]. The correlation pattern shows some regions being significantly correlated with the SLRI. When there are many potential predictors while their physical relationships with the predict and are not well defined, a

Figs. 3(a&b).
Seasonal rainfall predictions are made using the multiple regression models at (a) 1-month and (b) 2-month lead times. The observation, i.e., SLRI is shown in black. The three-year-outcross-validated prediction is shown in blue. The cross validation is performed by taking 3-year out around the predicted year. Independent prediction for the validation period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) is shown in red. The model for the independent prediction is built using the data of the training period  few of them should be selected based on statistical methods to avoid collinearity (Sahai et al., 2003). This procedure begins with the construction of simple linear regression models for each potential predictor variable. The predictor field, which has the maximum correlation coefficient with the minimum root mean square error (RMSE), is selected as the first predictor field (Sahai et al., 2003;Del Sole and Shukla, 2002). Based on this condition, an area over the southern Atlantic SSTA tendency (SA) is selected as the first predictor field [box near the date line in Fig. 2(a)]. The southern Atlantic SSTA tendency field is indexed through, first, area averaging and, then, normalization. The index will be referred to SA hereafter (Table 1). The first partial regression coefficient is obtained by regressing the SA against the SLRI and the value is 0.292 [Eqn. (1)].
The second predictor is obtained via the correlation coefficients between the SLRI and the residual SSTA field, where the residual is defined by linearly removing the SA from the SSTA field. Through this procedure, the southern Pacific area [box near 140° W in Fig. 2(a)] is chosen as the second predictor field. Similarly, for the SA, the southern Pacific SSTA tendency (hereafter SP, Table 1) is area averaged and normalized, before it is regressed against the SLRI to obtain the second partial  Table 1). Note that the three predictors are constructed to ensure independence from each other and hence their inter-correlations between the predictors are negligible.
The same procedure is repeated for the 2-month lead time, where the SSTA tendency of the 2-month lead time is defined as July minus May. We note that despite the different lead times, significant correlations of the SSTA tendencies with SLRI are found over similar regions [Figs. 2(b)]. The exact locations of the three predictors for each lead time are listed in Table 1. We also note that all the variables are normalized so that the regression coefficients in the statistical model represent the relative weighting among the predictors (Lee and Seo, 2013;Kim et al., 2017).
The correlation coefficients between the SLRI and the predictors are summarized in Table 2. The correlation coefficients between the SLRI and normalized SA index are 0.32 and 0.37 at the 1-and 2-month lead times, respectively. After the SA signal is linearly removed from the SLRI, the correlation coefficients between the residual time series of the SLRI and the normalized SP index are -0.47 and -0.40 at the 1-and 2-month lead times, respectively. Lastly, we linearly remove the SP from the residual time series and compute its correlation coefficients with normalized WP & MC index, which are -0.40 and -0.40 at the 1-and 2-month lead times, respectively. The values that exceed the 95% confidence level are marked by asterisks (Table 2).

Prediction skills of PLSR forecasts
Having established the three predictors for each 1and 2-month lead time, multiple linear regression models are constructed. First, the model is built for the training period, i.e., 1979-1999. The equation (1) The regression models for the 2-month lead time is formulated as (2): Using the Eqns. (1&2), we perform the seasonal rainfall forecasts for the validation period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) at the two lead times (red lines in Fig. 3). The observed SLRI is shown in black lines in Fig. 3. Temporal correlation coefficients are computed between the SLRI and the prediction models and their statistical significance are examined (Table 3). At the 1-month lead time, the correlation coefficient reaches 0.69 for the training period and it is 0.65 for the validation period. Similarly, at the 2-month lead time the correlation Figs. 6(a-f). Same as Fig. 4, but the regressed OND 850-hPa velocity potential anomalies (10 6 m 2 s -1 ) are shown coefficient is 0.68 for the training period and is 0.71 for the validation period.
To further verify the predictive capability of the statistical models, the cross-validation method with three years out scheme is used. That is, as explained earlier, the model is constructed for each year using the entire time series except for the three years centered at the year where the prediction is made. As a result, the predicted time series (blue lines in Fig. 3) are correlated with the SLRI by values of 0.69 and 0.68 at the 1-and 2-month lead times, respectively. These values are very similar to those that are obtained from Eqns. (1&2). In Fig. 3, one can also notice that the blue and red lines are overlapped by each other.

Processes associated with the predictors
In this subsection, we attempt to understand the physical linkage between the predictors of the Sri Lankan OND rainfall. First to verify the large-scale precipitation pattern associated with the precipitation anomalies over Sri Lanka, we compute the regressed precipitation anomalies against the SA, reversed SP and reversed WP & MC indices for the two lead times (Fig. 4). The signs of Negative anomalies are seen at the same time over the Maritime Continent and the Bay of Bengal. This suggests that a dipole-like precipitation pattern, which is influenced by the SSTs over the selected regions, plays an important role on Sri Lankan OND rainfall. Therefore, the physical linkage can be examined by investigating the processes that induce the large-scale dipole pattern of precipitation over the Indian Ocean.
Regressed OND SSTAs against the reversed WP & MC index similarly illustrates the dipole pattern with the warm anomalies over Indian Ocean and cold anomalies over the Maritime Continent and western Pacific [ Fig. 5(a)]. The SSTAs retain their structure throughout the lags, contributing to a continuous warming and cooling at the corresponding locations [ Fig. 5(b)]. The widespread warm SSTA scan act as a forcing that favors deep convection over Indian Ocean. Also, the SST gradient between warm north Indian Ocean and cool western Pacific induces easterly winds over the Bay of Bengal with favor moisture transport toward Sri Lanka. The positive velocity potential anomalies, which are also obtained through the regression analysis against the reversed WP & MC index, represent large-scale lower level convergence over the western Indian Ocean when the SSTs over the Maritime Continent and western Pacific decreases [top row in Figs. 6(a&b)]. The velocity potential  shows a baroclinic structure with a negative anomaly at the upper troposphere (not shown), indicating the upper level divergence. Over the Maritime Continent and western Pacific, the sign for the velocity potential is reversed, exhibiting the dipole structure that has previously been shown in the large-scale precipitation pattern (Fig. 4), as well as the SSTAs (Fig. 5). The strong lower-level convergence anomaly over the western Indian Ocean and the strong lower-level divergence anomaly over the western Pacific are consistent with the strong positive and negative SSTAs over the respective regions.
Despite its smaller amplitude compare to that for the WP & MC, the regressed OND SSTAs against the SA and reversed SP indices show positive anomalies over Indian Ocean [Figs. 7&8(a&b)]. Similarly, over the Maritime Continent and western Pacific, negative anomalies can be seen, which forms again the dipole-like structure. A positive anomaly can be seen over the western Indian Ocean in the regressed field of the OND 850 hPa velocity potential against the reversed SP index [Figs. 6(c&d)]. As was for the WP & MC, this low-level large-scale convergence is vertically associated with an upper-level divergence (not shown) and horizontally with a low-level positive velocity potential anomaly over the western Pacific and Maritime Continent. The similar features are observed for the SA index [Figs. 6 (e&f)].

Summary and discussion
Sri Lanka receives the highest amount of rainfall during October to December (OND). Our study aims to build an empirical model to predict the Sri Lankan OND rainfall. Through the partial least square regression approach, three predictors of sea surface temperature anomaly tendency fields are over the western Pacific and Maritime Continent, southern Atlantic and southern Pacific. Using the predictors, multiple linear regression models have forecasted the Sri Lankan OND rainfall at 1and 2-month lead times. The three year out cross validated prediction skill for the period of 1979-2012 reaches 0.69 and 0.68 at the 1-and 2-month lead times respectively. Similar skills can be identified by dividing the entire 34 years into the 21-year calibration periods  and the 13-year verification period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).
The physical processes associated with the reversed WP & MC, reversed SP and SA predictors show positive (negative) SSTAs over the western Indian (western Pacific) Ocean with a dipole structure at all three lead times. Low-level convergence and divergence in the warm western Indian Ocean and the cool western Pacific, respectively, tend to induce cyclonic circulations to the east of Sri Lanka and to enhance easterlies in Bay of Bengal, which are likely to increase precipitation in Sri Lanka. These results indicate that all three predictors have a crucial impact on the Sri Lankan OND rainfall.
In this study, the spatial characteristics of Sri Lankan rainfall is not considered when building a model. Although the seasonal prediction model relies on largescale variability of climate, the local orography of Sri Lanka and climatological circulation makes the south western sector receive the highest rainfall. The annual rainfall of this region exceeds 2500 mm, which separates it from the other regions by 500-1000 mm per year. We plan to use higher resolution data to regionally evaluate our model's performance.
Our study develops a statistical prediction model of seasonally averaged precipitation for a targeted domain, Sri Lanka. As explained above, the predictors of the model are chosen by investigating linear correlations with the predictand. This approach has taken in many previous studies. However, one may instead pursue to find a source of predictability from well-established tropical climate modes, such as the El Niño Southern Oscillation (ENSO) and the Indian Ocean Dipole (IOD). To explore possibility of this alternative method, we have checked the correlations between the OND Sri Lankan rainfall and the ENSO and IOD indices ( Table 4). The correlation with the IOD is 0.39, which exceeds the 95% confidence level. This implies that the IOD is closely related to the Sri Lankan rainfall variability, which seems to be implicated in the effect of the WP & MC index of the present study. The correlations with the Niño3.4 and Niño3.0 indices are similarly above 0.3. Therefore, both of the climate modes may provide good source of skill for seasonal Sri Lankan rainfall. To do so, however, we feel that one needs to carefully consider the seasonality of the climate modes, as well as their interconnections.