Assessing the performance of multi-sources gridded data to estimate long-term rainfall change over north-central region of India

As station data quality and availability is not adequate to reliably estimate observed climate change over many parts of the country, multi sources observational gridded datasets have been employed in the present study. The performances of multi-observational gridded datasets, e.g., IMD gridded data, CRU, APHRODITE, GPCC, NCAR/NCEP reanalysis have been compared with the reference rainfall data from IITM over North central India (NCI), a region of subtropical monsoon climate, during four main seasons (MAM,JJAS,ON and DJF) as well as in annual scale for the period 1951-2003. All the gridded data except CRU and NCEP have secured good skill scores in all seasons as well as at annual scale. APHRODITE and NCEP reanalysis have shown large wet bias in all seasons. The reference rainfall data over NCI has shown 6.3 mm, 4.2 mm, 1.9 mm and 11.2 mm increase per decade for MAM, JJAS, DJF seasons and annual rainfall respectively whereas 2.2 mm decrease per decade has been found for ON season. Only GPCC dataset have been able to capture similar trend for all seasons. Performance of NCEP reanalysis has been worse in compared to others. GPCC and IMD high resolution data has shown smallest bias among all the datasets and also obtain superior skill scores than others. Therefore based on visual inspection and the results from different conventional measures, GPCC high resolution gridded data and high resolution IMD gridded data may be reliably used for climatic analysis of this region.


Introduction
Major parts of North central regions of India are rainfed and agricultural production depends significantly on monsoon as well as annual rainfall which are highly erratic in nature. Thus, under the context of climate change, it is necessary to assess the amount of past rainfall change using multi sources observational data in order to understand the impact of climate change over agricultural production. Gridded representations of observed data on the basis of a variety of instruments, locations, platforms, retrieval algorithms and analysis schemes are widely (Adopted from Sontakke et al., 2008). North Central India (NCI) has been shaded with gray employed in climate research with various goals (Schneider et al., 2014).Typically, only a limited number of such data sets have been available, and most climate studies have employed a single data set which includes the features needed for their analyses. Gridded observational datasets are one of the key ingredients of climatic research for the assessment of climate change of a region as well as for the validation of global climate model (GCM) and regional climate model (RCM) outputs of that region. We have to depend vastly on this datasets as real observations are not available and inadequate in many cases (Kannan et al., 2014). Lack of accurate observations also may be a principal barrier for climate model improvement (Collins et al., 2013). Many researchers and institutions have introduced newly developed observation-based gridded analysis data sets of global or regional coverage with fine spatial resolutions (Adler et al., 2003). Analysis methodology such as the quality control of input data, spatial/temporal interpolation, and retrieval algorithms used in producing these gridded datasets plays a crucial role in determining the characteristics of precipitation climatology represented by individual datasets (Kim et al., 2015). As different data sets are created with different sources of observation and numbers of station data is not uniform, method of analysis are also not unique, so the estimated amount of rainfall change over the North-central region of India (NCI zone) from different data may vary. Therefore, to have a comprehensive idea about the effective use of different gridded datasets, the present study was carried out with the following major objectives: (i) Assessment of the skills of different sources of gridded analysis data in capturing the observed rainfall using conventional statistical measures over NCI zone.
(ii) Identification of most reliable gridded dataset over this zone.

Study area
Present study has been carried out over North-Central India (NCI) which is one of the seven homogeneous zones identified by Sontakke et al. (2008). NCI is portion of the country extending from the latitude of 21° N and between 80° and 88° E and covering 5,99,860 km 2 area ( Fig. 1).

Data
Two types of datasets have been used in this study-Firstly, area averaged data of NCI zone which is available at Data Archival1 of IITM website (http://www.tropmet. res.in/) has been used as reference data for evaluating performance of different gridded data sources. Longest possible instrumental area-averaged monthly, seasonal and annual rainfall series of NCI zone have been developed by Sontakke et al. (2008) using highly qualitycontrolled data from well spread network of 65 rain gauge stations. The stations used in this data have been provided in Table 1.

Methodology
Before evaluation, all gridded dataset have been interpolated on station locations shown in Table1 using bilinear interpolation technique and area-averaged data prepared using simple arithmetic mean of interpolated data over all stations similar to Sontakke et al. (2008).Bilinear method uses minimum of 4 nearest grid points from the domain and nearby areas to interpolate over each station/point location as recommended by Das et al., 2012. To evaluate the performance of gridded data, initially a visual comparison has been done by plotting time series of gridded data and reference data. Next, skills of the gridded  Taylor (2001). Firstly, skill score-1 have been defined as, and secondly, skill score-2 defined as, R is the correlation coefficient between two data; SDR is the ratio of the standard deviations in each data. We assumed that R 0 = 1, as each gridded contains single simulation.
Performances of the gridded data have been also assessed through comparing linear trends and percentage bias. Linear trends have been fitted using linear regression.

Datasets
Source Resolution Percentage bias (PB) has been calculated as: Here, m i and o i indicate i th value of each data. The optimal value of PB is 0.0, with low-magnitude values indicating better performance.

Results and discussion
In the present study, the analyses have been carried out for four seasons i.e., pre-monsoon (MAM), Monsoon (JJAS), Post-monsoon (ON), Winter (DJF) as well as annual scale during 1951-2003. The results have been discussed in the following sub-sections.

Visual comparison of time series over NCI
How the different global gridded data have been able to reproduce the temporal variation of annual rainfall during 1951-2003 period over the study domain have been compared in Fig. 2. Visual comparison has indicated that the GPCC-H, GPCC-M and IMD high and low resolution data have shown higher accuracy to reproduce the time series of reference data for the period of 1951-2003. APHRODITE (both low and high resolution) and NCEP data have underestimated the reference rainfall series over this zone. Therefore, APHRODITE and NCEP data may not be much reliable over NCI zone. These visual impressions have been further verified using different statistical metrics in the subsequent sections.

Percentage bias of analysis datasets
Percentage bias as defined by the equation (3) in section 2.3 has been calculated for all other gridded datasets and the results have been shown in Fig. 3. In the case of pre-monsoon season, all gridded data except GPCC-low have shown dry biases among which maximum amount of dry bias (~40%) have been found for NCEP. During the monsoon season, APHRO high and low data along with NCEP data have shown relatively large dry biases whereas no significant biases have been found for other gridded datasets. Again no significant biases have been observed during the post monsoon except APHRO high and low. However, during the winter season mostly all the data except GPCC low has shown a dry bias among which NCEP has shown largest dry bias (~30%). Overall in annual scale, it has been seen that APHRO high and low along with NCEP datasets have shown significant dry bias whereas negligible biases have been found for other datasets (Fig. 3).

Comparison of skill scores among various analysis datasets
Apart from checking the bias within data, two different types of skill scores (SS1 and SS2) as defined in section 2.3 have been calculated for all gridded data for four seasons (pre-monsoon, monsoon, post monsoon and winter) along with annual scale and the results have been compared in Tables 3 and 4. During the pre-monsoon season, it has been revealed that both the high and medium resolution GPCC data have shown better performance as estimated through their higher skill scores (SS1 = 0.99; SS2 = 0.98) whereas, NCEP data has shown worst performance with SS1 = 0.71 and SS2 = 0.25 (Tables 3 and 4). In addition to GPCC data, both high and low resolution gridded data from IMD have also shown higher SS1 (0.97-0.98) and SS2 (0.92-0.93) values, indicating their superior skills. Therefore, it is to be mentioned that IMD gridded data may be used as an efficient indigenous alternative source in addition to GPCC. Similarly, during monsoon IMD (High and Low) and GPCC (Low, medium and high) have shown higher skills whereas, NCEP and CRU data have obtained less  (Tables 3 and 4). In case of post monsoon, all the gridded data except NCEP (SS1 = 0.84; SS2 = 0.62) and CRU (SS1 = 0.93; SS2 = 0.80) performed consistently well as estimatedSS1 and SS2 have been above 0.93 and 0.91, respectively. Similar results have been found over dry winter season also. Finally based overall analysis of skill scores on all seasons and annual scale, it has been found all gridded data has performed satisfactorily except NCEP and CRU.

Linear trend analysis over NCI
Linear trend analysis has revealed that the reference dataset has shown significant (at 10% level) increasing trend during pre-monsoon at the rate of 6.30 mm/decade. APHRO high and low and all three GPCC data have been able to reproduce the magnitude of trend close to reference data (Table 5). However, APHRO high and low has shown significant increasing trend at 5% level. All other datasets have shown non-significant increasing trend except NCEP (which has shown a declining trend ~3.37 mm/decade) .During wet monsoon season, IMDlow and all three types of GPCC data showed nonsignificant increasing trend similar to that of reference data (4.18 mm/decade). It is noteworthy that, abrupt significant declining trend has been observed for CRU gridded data (30.5 mm/year) which closely resemble to the findings by Meena et al. (2015). Some observational studies using station data have also revealed decreasing trend during monsoon over some pockets of this zone (Kishore et al., 2016;Guhathakurta et al., 2015;Guhathakurta and Rajeevan, 2008). However, Dubey and Krishnakumar (2014) have not found any significant trend in seasonal monsoon rainfall over central India. During the post monsoon season, however all the gridded datasets On the basis of visual comparison, skill scores and trend analysis, it has been found that GPCC data has been more reliable data among different datasets. On the other hand, CRU and NCEP data have been emerged as less skilful datasets. Our findings are in line with Satya Prakash et al. (2014Prakash et al. ( , 2015 where they have put more confidence on GPCC dataset over Indian domain as well as over global land region. They have also found poor performance of NCEP dataset compared to others. Satya Prakash et al. (2014) have reported that APHRODITE data have similar skill of GPCC dataset over Indian domain but present study have indicated large biases in APHRODITE data for various seasons over NCI.

Conclusions
In the present study, different types of gridded datasets have been evaluated with respect to reference data of NCI zone through visual comparison and various statistical measures. The findings of the study have been as following: (i) Visual impressions have put more confidence on GPCC-H, GPCC-M and IMD high and low resolution data and less confidence on APHRODITE & NCEP data.
(ii) All gridded data except CRU and NCEP have obtained good skill scores in all seasons as well as at annual scale.
(iii) NCEP, Aphrodite data have shown large biases in all cases. GPCC high and medium resolution data have shown least biases.
(iv) All three versions of GPCC and IMD-low resolution data have consistently captured the observed trend in all seasons as well as at annual scale. CRU data has shown unrealistic trend values in JJAS and annual.
Overall, it has been found that NCEP data has been the least reliable data in simulating observed rainfall over NCI region. GPCC data especially the high and medium resolution data have been found to be more reliable for this region in every aspects tested in this study. Resolution-wise IMD high-resolution and APHRODITE data are the highest resolution data among all; IMD high has performed well except capturing observed trends but performance of APHRODITE high has not been satisfactory. Therefore, for high-resolution rainfall analysis, IMD high-resolution gridded data may be recommended.
The contents and views expressed in this research paper/article are the views of the authors and do not necessarily reflect the views of the organizations they belong to.