Choice of ideal sunshine hour based model to predict global solar radiation in India

सार — सौर िविकरण चाहे वह जैिवक हो या यांित्रक, अिधकांश ऊजार् पांतरण प्रणािलय के िलए प्रमुख ऊजार् ोत है। यह भिव य की ऊजार् मांग के िलए सबसे बुिनयादी ऊजार् ोत भी है। अिधकांश िवकासशील देश की तरह, भारत म भी अनुशंिसत थािनक अंतराल पर वैि क सौर िविकरण (GSR) को मापन ेके िलए पयार् उपकरण सुिवधाओं का अभाव है अत: GSR डटेा प्रा करने के िलए वैकि पक तरीक का उपयोग िकया जाना चािहए। इस शोध पत्र म, लंबे समय तक वैि क सौर िविकरण और ती धूप वाले घंटे के डटेा का उपयोग करके भारत के बारह प्रमुख शहर म GSR का अनुमान लगान ेके िलए छह प्रिसद्ध आनुभिवक मॉडल का परीक्षण िकया गया। प्रितगमन िव ेषण िविध का उपयोग करके सभी मॉडल और प्र येक थान के िलए आनुभिवक गुणांक की गणना की गई है। दैिनक GSR की गणना उन प्रितगमन ि थरांक का उपयोग करके सांिख्यकीय िव ेषण के साथ की गई है। पिरणाम बतात ेह िक सभी मॉडल िन न मा य अिभनित तु्रिट (MBE), वगर् मा य मूल तु्रिट (RMSE) और मा य प्रितशत तु्रिट (MPE) मान के साथ सि नकट अनुमान को दशार्ते ह। सभी मॉडल म िशलांग को छोड़कर जहां बैिकिसर्िलिनयर एक्सपोनिशयल मॉडल की िसफािरश की जाती है, पूरे देश म GSR के पूवार्नुमान के िलए लीिनयर एक्सपोनिशयल और लीिनयर लॉगिरदिमक मॉड स की अ यिधक िसफािरश की जाती है। मह वपूणर् परीक्षण यानी टी-टे ट यह भी पुि करता है िक ये दोन मॉडल दसूर की तुलना म सबसे मह वपूणर् पिरणाम देते ह।


Introduction
Considering global warming, high pollution level and depleting source of conventional energy, more stress should be given to utilize renewable energy sources, especially in the developing countries. Scientists and researchers around the world prefer solar energy as a key renewable energy source for the future (Ulgen and Hepbasli, 2004). Solar Radiation (SR), the electromagnetic spectrum emitted from the sun, is the direct form of abundant permanent solar energy resource available on earth. At each and every moment, one hundred thousand terawatt (TW) of the solar power is received by earth surface. The solar energy is so powerful that if the un-attenuated solar radiation for 71 minutes can be harvested, it would satisfy the total energy demand of the earth for whole year (Gadiwala et al., 2013). The amount of solar energy received at a particular place on earth is governed by attenuation of clouds, water vapors, pollutants including aerosol and other particulate matter present in the troposphere (Schiermeier et al., 2008).
A reasonably accurate knowledge on the availability of solar resource at a geographical location is required by solar engineers, architects, meteorologists, agriculturists and hydrologists for solar energy related system design, researches in meteorology, agronomy, soil physics, etc. (Wan et al., 2008;Moradi, 2009;Pandey and Katiyar, 2009;Benghanem and Mellit, 2010). According to Allen et al. (1998), SR is an indispensable part of photosynthesis and evapotranspiration and thus a mandatory input for crop growth simulation models. Installation of instruments like pyranometer, pyrheliometer, etc., at particular spatial interval with monitoring facility is the best way to gather information about global solar radiation (GSR) of a region. Pyranometer can record direct, diffuse and global solar radiation. Whereas, Pyrheliometeris capable of measuring only the direct beam solar irradiance. To point a pyrheliometer at the sun, a solar tracker rotates around 2 axes: the zenith (up and down) and the azimuth (east to west) axis. However, all of them are costly exercise and requires regular monitoring cum maintenance (Teke and Başak Yildirim, 2014). Thus, researchers across the world are trying to find out alternative approaches to correlate the GSR with other frequently measured meteorological parameters. In a developing country like India where energy shortage along with high demand is a prime concern, scientists need to harness solar energy to solve the energy related issues. India, geographically located in a tropical region, has adequate potential for solar energy to support its national energy demands and provide electricity to rural areas. With increasing interest in utilizing solar energy application, Indian government has also set a goal of achieving 100 GW of solar capacity by 2022 (MNRE, 2017;NITI Aayog, 2017). But in our country, there are very few meteorological stations which measure GSR. India Meteorological Department (IMD), a Govt. of India Organisation, is the prime authoritative body for measurement of meteorological data in the country. For whole West Bengal state, the GSR is measured only in one location by IMD, although the area of the state is 88750 km 2 . In such situations, scientists have to depend on predictive models to estimate GSR based on different meteorological parameters (Hay, 1979;Supit and Van Kappel, 1998;Dorvlo and Ampratwum, 2000;Falayi et al., 2008). Some researchers used the sunshine duration (Suehrcke, 2000;Akinogle, 2008;Salima and Chavula, 2012;Umoh et al., 2014), others used the relative humidity and temperature (Fagbenle and Karayiannis, 1994), while a few used the number of rainy days, sunshine hours and a factor that depends on latitude and altitude (Skeiker, 2006;Chiemeka, 2008). According to World Meteorological Organisation (2003), sunshine duration during a given period is defined as the sum of that sub-period for which the direct solar irradiance exceeds 120 Wm -2 . For climatological purposes, derived terms such as "hours per day" or "daily sunshine hours" are used. In general, simple Campbell-Stokes sunshine recorder is used in the meteorological observatories. The recorder detects sunshine if the beam of solar energy concentrated by a special lens is able to burn a special dark paper card. However, nowadays, new automated measurement procedures are being used in automatic weather stations to avoid the expense of visual evaluations and to obtain more precise results on data carriers permitting direct computerized data processing. Several research works confirm that SR data calculated from sunshine duration achieves considerable degree of precision so that the derived data can be safely used for different purpose including agricultural and hydrological studies (Trnka et al., 2005;Sahin, 2007;Akpabio and Etuk, 2003;Li et al., 2011a;Iziomon and Mayer, 2002;Podesta et al., 2004).
Considering the background, the main objectives of the present study are: (i) Testing of six well known regression models to find out their ability to predict GSR from sunshine data in India, (ii) Finding out the best fitted model for Indian cities by comparing them with the help of statistical indicators.

Location description and collection of meteorological data
India is a vast country spreading over 3,287,263 km 2 in area and wide range of climatic diversity is observed in the country due to variation in topography. To fulfill our research goal, twelve major cities from different corners of India were selected, namely, Kolkata, Chennai, Visakhapatnam, Thiruvananthapuram, Hyderabad, Pune, Nagpur, Ahmedabad, Jodhpur, Dehradun, Varanasi and Shillong. Geographical positions of these twelve locations are shown in the map of India in Fig. 1 and information on climatic characteristics is presented in Table 1. Daily recorded meteorological data (including sunshine hour, GSR, etc.) were collected from IMD. The data availability periods are also included in Table 1. These set of weather data were used for testing and evaluating the models. Differences in the number and distributions of data periods observed among the cities were due to nonavailability of weather data and missing data. The problem of missing data were solved by omitting the month from calculation procedures in which more than 5 days data were missing. Angstrom (1924), one of the pioneer in the history of model development, proposed the first ever correlation to predict daily global irradiation based on sunshine hour. The equation relates the monthly average daily irradiation to clear day irradiation at a given location and average fraction of possible sunshine hours. The original Angstrom equation is as follows:

Regression models used to calculate GSR
where, Ampratwum and Dorvlo, 1999 Almorox and Hontoria, 2004 But the basic difficulty with that equation lies in the definition of the terms n/N and H c . Few years later, first Prescott (1940) and later Page (1961) modified the equation into its current form by replacing the concept of clear day radiation (H c ) with extraterrestrial radiation (H 0 ). This equation is known as Angstrom-Prescott (A-P) model and presented as: Various researchers across the world are working to improve the accuracy of the existing A-P model after its development, but in a random fashion (Bahel et al., 1987;Akinoglu and Ecevit, 1990;Samuel, 1991;Katiyar and Pandey, 2010;Li et al., 2011b;Muzathik et al., 2011;Behrang et al., 2011). Whereas, some others started thinking in a different way by introducing new factors which is much effective to produce right coefficients. Those modified equations acquired worldwide validity due to their close prediction ability of GSR. Review of literatures clear the fact that most of the models are based on monthly average daily sunshine and the monthly average maximum possible daily sunshine durations. Newland (1988) proposed a linear logarithmic model while Ampratwum and Dorvlo (1999) used the logarithmic model. Few years later, Almorox and Hontoria (2004) suggested an exponential regression type model, but Bakirci (2009) modified the equation and used it as its linear logarithmic form. At the same time, he also proposed a new exponent model which is very effective for GSR calculation. In the present study, these six well established models were selected and all the models including A-P model have been listed in Table 2. The model number is given to each model for easy identification.

Comparison techniques
The present research work was started with the aim of introducing the best regression model for twelve major cities of India. The regression constants for different models and different locations were calculated through the statistical regression technique based on provided data series. The correlation coefficient (r), a test for the linear relationship between predicted and measured values, were also calculated along with coefficient of determination (R 2 ). To confirm the higher modeling accuracy, the value of mean percentage error (MPE), mean bias error (MBE) and root mean square error (RMSE) were also calculated (Tadros, 2000;Sabziparvar and Shetaee, 2007;Banerjee et al., 2016;Menges et al., 2006). If the value of MBE, MPE and RMSE are close to zero and the value of r or R 2 are close to one, then the model can predict the target value in a better way (Muzathik et al., 2011;Menges et al., 2006;Martínez-Lozano et al., 1984;Khorasanizadeh and Mohammadi, 2013). Nash-Sutcliffe efficiency (NSE) is a simple measure to determine the model precision by plotting observed values against simulated data in a 1:1 line (Nash and Sutcliffe, 1970;Chen et al., 2004;Akpootu and Sanusi, 2015). Generally, NSE ranges between -∞ and 1.0 and the model is more efficient when NSE is closer to 1.0. Values between 0.0 and 1.0 are generally viewed as acceptable levels of performance, whereas negative values indicate unacceptable model prediction. The t-statistic was also worked out to determine the statistical significance of the model. Detailed information of all of these equations along with other indicators is presented in Table 3. The present research outline is briefly presented in Fig. 2.

Generation of empirical constants
The monthly average values of H/H 0 and n/N over six selected cities is presented in Fig. 3. The scatter plots describe the good relation exist between H/H 0 and n/N.  The values of H/H 0 also show the abundance of available solar energy in the study areas. Regression analysis for all the six models for each selected cities were carried out using the collected data series. The empirical coefficients obtained from this analysis have been summarized in Table 4. Coefficient of determination (R 2 ) along with correlation coefficients (r) were also presented in that table. The value of empirical coefficients a and b of the A-P correlation varied from 0.2343 to 0.3932 and 0.1887 to 0.4360 respectively depending on locations. Angstrom (1924) recommended values 0.25 and 0.75, respectively for the constants a and b based on the data from  Stockholm. Whereas, Martinez-Lozano et al. (1984) reported that the value of a and b may vary between 0.06 to 0.4 and 0.19 to 0.87 respectively after reviewing the literature for 101 locations around the world. Katiyar and Pandey (2010) also delineated that the values of a and b ranges between 0.2229 to 0.2623 and 0.3952 to 0.5309 respectively. Thus, it is evident that analysed values of a and b for the present study are also well within the range as described by different researchers. Apart from A-P model, the values of a, b and other coefficients are not well established as observed through literature survey.
Due to the climatic differences experienced by different countries, the values of regression coefficients also differs from the coefficient values cited in the previous literatures. Among all the cities, the data of Thiruvananthapuram showed best correlation. Based on those a and b values, the GSR was calculated to observe the best-fit model for each location. To determine the statistical significance of the coefficients, in addition, F-test and t-test are also done during regression analysis (Table 5 and Table 6 respectively). It is well known that F value signifies the whole equation, whereas, significance of each empirical coefficients are tested by t-test. Table 5 depicts that all the models are statistically significant as the values of P < 0.05 are considered significant. As the P value get smaller the model predictions are assumed to be more significant. Results of t-test also exhibits that all the coefficients are highly significant (Table 6).

Comparison of model output
The values of monthly mean GSR estimated by six models were compared with the measured data for each station. In Fig. 4, the measured GSR of all selected cities is presented along with the model generated GSR values. Most of the cities received highest amount of GSR during the period of March to May, except Jodhpur where it prolonged up to June. GSR of December showed the lowest amount of global irradiation across the country. It is evident from the figure that all the selected models showed well agreement with the measured values. Though the performance of model numbers 3 and 6 was not so well like others. The only exception found was for Shillong, where model no. 1, i.e., A-P model continuously overestimates GSR with more than 10% difference throughout the year. However, for other locations, all the models give such a close estimation that the percentage difference rarely exceeds 5%. Though few exceptions were also observed during the months of July and August when solar insolation was interrupted due to cloud cover.

Identification of best-fit model
In order to identify best-fit model for all selected locations, the values of analysed statistical indicators, namely, MBE, MAE, RMSE, MPE, MAPE, etc., were compared. Magnitudes of statistical indicators have been summarized in Table 7. As seen from the table, all the models exhibit high correlation along with more than 80% determination coefficients value for all locations except Kolkata, where least values of R 2 were obtained by each  This values ensure the long term performances of the models. Values of NSE also revealed that all the models fit well in the 1:1 line with values ≥0.75 for all locations, except Kolkata. NSE testing also indicates that the Newland and Bakirci linear exponential models were the best performers among all. However, when the prediction of all models was tested for significance, it showed no uniform trend at all. The obtained values of t-statistic are either high or less than the critical t value (2.201 at 5% confidence level).
Statistical indicators showed that all the models can be applicable for precise estimation of monthly mean daily GSR across the country. The predictions of all the models are pretty close to each other in such a way that all the statistical parameters also showing very close values and thus it is hard to find out the best model for each city. But considering the overall accuracy level, it can be summarized that the linear logarithmic and the linear exponential models give overall best results while the logarithmic and exponent models exhibit poor performance than the other models. But Bakirci exponent model is recommended for high rainfall areas like Shillong.

Conclusions
In the present study, six well known regression models were tested for calculation of daily global irradiation which revealed that all the models can be reliably used to calculate GSR. Only A-P model shows some abnormality for predicting GSR of Shillong station which may be due to the effect of high altitude and climatic variation. Bakirci exponent model has been identified as the best model for the location and also applicable in places with similar geographical and climatic scenarios. For other cities, Newland model and Kadir Bakirci linear exponential model are highly recommended for estimation of monthly mean daily GSR. If only one unique model has to be chosen for predicting GSR over Indian sub-continent, Bakirci linear exponential model will be the best choice. Hopefully this study will also help the policy makers or companies making solar products with the information of mean daily GSR available at their desired location across the country.