Evaluation of Precipitation in the Chinese Regional Reanalysis Using Satellite Estimates, Gauge-Based Observations and Global Reanalysis

Two high-resolution Chinese regional reanalysis (CNRR) datasets at a resolution of 18 km during the period of 1998–2009 are generated by Gridpoint Statistical Interpolation (GSI) data assimilation system and spectral nudging (SN) method. The precipitation from CNRR is comprehensively evaluated against the observational datasets and global reanalysis ERA5 over East-Asia. The climatology mean, seasonal variability, extreme events, and summer diurnal cycle of precipitation are analyzed. Results show that CNRR reasonably reproduces the observed characteristics of rainfall, although some biases exist. The spatial distribution of climatology mean precipitation is well simulated by CNRR, while overestimation exists especially on the west side of Tibetan-Plateau (TP). CNRR reproduces the unimodal feature of the annual cycle with overestimations of summer precipitation, and well produces the probability of light and moderate rainfall but tend to overestimate heavy and extreme precipitation over most regions in China. The overall spatial distribution of extreme precipitation indices can be captured by CNRR. The diurnal cycle of summer precipitation, as well as the amplitude of diurnal cycle, are better reproduced by CNRR-GSI, capturing eastward propagation of diurnal phase from TP along the Yangtze River. CNRR-GSI generally outperforms CNRR-SN over most regions of China except in reproducing heavy and extreme rainfall in the Yangtze River Basin (YRB) and South China (SC) regions. CNRR-GSI shows comparable results with the latest ERA5 and outperforms it in simulating the diurnal cycle of precipitation. This dataset can be considered as a reliable source for precipitation related applications.


INTRODUCTION
Precipitation is a vital component of the hydrological cycle and the predominant and direct sources of water for the land surface water budget (Trenberth, 1999;Maurer et al., 2001;Risi et al., 2010;Zhou et al., 2011;Schneider et al., 2017). By affecting drinking water, lake and river, precipitation casts a remarkable influence on human life. Meanwhile, large economic losses and serious damage to both urban and rural areas can be caused by floods coming with extreme precipitation (Kunkel, 2003;Wang and Zhou, 2005;Li et al., 2011;Powell and Reinhard, 2016). Long-term precipitation data is the fundamental of climate research. Thus, accurate and reliable precipitation data at high spatiotemporal resolution is highly desirable for a wide range of applications from global freshwater budget (Veldkamp et al., 2016;Keune and Miralles, 2019;Llovel et al., 2019) to local hydrology (Orth et al., 2016;Lei et al., 2017).
In recent decades, atmospheric reanalysis datasets have been widely used in climate researches. Reanalysis datasets are derived from numerical weather prediction (NWP) systems anchored on long-term, observed climate data which are reanalyzed with data-assimilation techniques. Several global reanalysis datasets have been developed, such as the National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) global reanalysis (Kalnay et al., 1996), the European Center for Medium-Range Weather Forecast (ECMWF) reanalysis ERA-Interim (Dee et al., 2011) and ERA5 (Hersbach et al., 2020), the Modern-Era Retrospective Analysis for Research and Application (MERRA) reanalysis (Rienecker et al., 2011) and the Japan Meteorological Agency JRA-55 (Kobayashi et al., 2015), and they have been applied in numerous studies, including the detection of climate variability, the investigation of climate change and the evaluation of climate models (Sylla et al., 2010;Rudeva and Gulev, 2011;Olauson and Bergkvist, 2015;Hawcroft et al., 2017;Chen et al., 2019). Global reanalysis with high horizontal resolution is a milestone for climate researches. For example, the ERA5 global reanalysis with a 0.25-degree horizontal grid spacing released since 2018 introduced significant improvements in representing the atmosphere with much higher vertical and temporal resolutions compared to prior reanalyses Delhasse et al., 2020;Dullaart et al., 2020;Nogueira, 2020;Tarek et al., 2020;Taszarek et al., 2020). However, ERA5 performed unsuitable for trend analysis (Gleixner et al., 2020;Nogueira, 2020), and shows positive bias of temperature over complex topography, such as the Rocky Mountain area, possibly due to some processes not being accounted for in ERA5 (Tarek et al., 2020). Over TP, WRF-driven regional reanalysis shows advantage in simulating precipitation compared to ERA5 (He et al., 2019). The ability of such global reanalysis datasets in depicting regional-scale, complex terrain climate and diurnal-scale variability requires further evaluation and improvement (von Storch, 1999;Sotillo et al., 2005;Bromwich et al., 2016;Zhang et al., 2017). Thus, long-term reanalysis datasets with higher spatial and temporal resolutions are still in need for regional climate researches and investigating phenomena such as extreme rainfall events and mesoscale weather systems.
Regional reanalysis datasets have been developed to meet the demand for local applications with finer spatial resolutions (at about 10-30 km) that could provide more detailed meteorological information. To avoid the assumption of a perfect model and to reduce the errors at a regional scale, observations are also assimilated by a data assimilation framework in regional reanalysis systems. The North American Regional Reanalysis (NARR) was the first successful regional reanalysis utilizing NCEP Eta Model and its Data Assimilation System for the North American domain (Mesinger et al., 2006) and proved to be useful with detailed precipitation information. NARR has been widely used to study the precipitation characteristics in North America, such as understanding the characteristics of daily precipitation (Becker et al., 2009), investigating the precipitation recycling variability (Dominguez et al., 2008), and studying the suitability for hydrologic modeling and analysis in complex terrain (Trubilowicz et al., 2016). After NARR, there are many regional reanalysis datasets generated by regional models and data assimilation techniques over Europe (Bollmeyer et al., 2015;Dahlgren et al., 2016), Arctic (Bromwich et al., 2016), Australia (Su et al., 2019) and Indian monsoon region (Mahmood et al., 2018). Previous studies showed that regional reanalysis datasets with higher resolution are able to well reproduce local extreme events and reflect the spatial and temporal variability of precipitation. Comparison between two European regional reanalysis developed in the European Reanalysis and Observations For Monitoring (EURO4M) project and the global reanalysis revealed that regional reanalysis could improve the representation of precipitation, especially in representing high−threshold events and extreme events (Jermey and Renshaw, 2016). Besides the traditional data assimilation method, the spectral nudging (SN) technique (von Storch et al., 2000) is also used to produce a regional reanalysis dataset (Kanamaru and Kanamitsu, 2007). When applying the SN technique, the regional atmosphere model simulated variables are nudged to the large-scale global reanalysis forcing at the inner model domain. Using the dynamical downscaling method with SN technique, von Storch et al. (2017) developed a different procedure to produce the regional reanalysis without local data, which permits to generate additional regional details in observation-sparse regions. Although several regional reanalysis datasets have been constructed and used in climate researches around the world, very limited reanalysis dataset is available over China. Zhang et al. (2017) constructed a high-resolution (18 km) regional reanalysis in China and its adjacent area via data assimilation and spectral nudging (SN), which was the preliminary and the first experimental attempt to construct a high-resolution reanalysis for China mainland. To extend the CNRR data to decadal length, two 12-year long CNRR datasets (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) have been generated by using the data assimilation method based on GSI and SN technique. This study aims to (1) comprehensively evaluate the capability of CNRR in reproducing the mean climate, seasonal cycle, extreme rainfall, daily and diurnal variability of precipitation over China, (2) compare the effects of GSI data assimilation technique and SN method in generating precipitation in CNRR. The paper is organized as follows. Section "Data and Methods" offers the details of CNRR datasets, the reference data, and validation methods. In Section "Results, " the evaluation of CNRRs in reproducing the precipitation over China and its adjacent areas is presented. A summary and discussion are given in Section "Conclusion and Discussion." DATA AND METHODS

Chinese Regional Reanalysis Datasets
The Weather Research and Forecasting (WRF, version 3.8.1, Skamarock et al., 2008) model is used to generate the CNRR dataset. The CNRR domain covers mainland China and most parts of East Asia, with the domain center at 30 • N, 115 • E, and 481 × 361 grid points in the east-west and south-north directions at 18 km horizontal resolution (Figure 1). The model physical parameterizations configuration is same as that used in Zhang et al. (2017). Two WRF experiments are conducted to produce the CNRR datasets, one is using the GSI data assimilation technique (Shao et al., 2016) (CNRR-GSI), and the other one is using the SN method (CNRR-SN). The WRF simulations are initialized at 0000 UTC on 01 July 1997 and integrated to 0000 UTC 01 January 2010, with the first half-year as spin-up time. The 12-year period (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)) is used for the evaluation of precipitation. The CNRR precipitation is at an hourly temporal scale with 18 km horizontal resolution.

Data Used for Evaluation
To assess the general capability of the CNRR in reproducing precipitation, observations including satellite-based datasets and gridded gauge-based datasets, and global reanalysis dataset are collected as references.
The satellite derived precipitation dataset used in this study is the Climate Prediction Center morphing technique (CMORPH v1.0) (Joyce et al., 2004), which is a highresolution precipitation dataset provided by the Climate Prediction Center (CPC), National Oceanic and Atmospheric Administration (NOAA). It has been widely used to study precipitation characteristics from seasonal mean to diurnal cycle in China. The 3-hourly CMORPH with corrected bias (CMORPH-CRT) precipitation product, which has a spatial resolution of 0.25 latitude by 0.25 longitude, is used for evaluation of both spatial-temporal features, daily and diurnal variations of CNRRs.
The gridded gauge-based land precipitation over the simulation domain (named as OBS-Land) is constructed by combining the CN05.1 dataset (Wu and Gao, 2013) in China, and the daily precipitation produced by the project Asian Precipitation-Highly Resolved Observational Data Integration Towards the Evaluation of Water Resources (APHRODITE) (Yatagai et al., 2012) over the land area outside China. The CN05.1 contains gridded daily precipitation data with a resolution of 0.25 latitude by 0.25 longitude based on the interpolation of over 2400 observing stations in China. Due to the lack of in-situ observations over western China such as TP, gridded precipitation data over such regions may have potential uncertainties. The APHRODITE aims to develop state-of-the-art daily precipitation datasets on high-resolution grids covering the whole of Asia, including monsoon Asia, Russia, and the Middle East. The daily precipitation is generated from the Global Telecommunication Systems (GTS) reports, precompiled data and individual data collected by the APHRODITE project and rain gauge data provided by the National Hydrological and Meteorological Services (NHMs). The combined gridded   Global reanalysis dataset ERA5 is released by ECMWF (Hersbach et al., 2020) since 2018, with a spatial resolution of 0.25 degree (∼31 km), covering a period of 1950-present. The reanalysis using a significantly more advanced 4D-Var assimilation technique, providing several improvements compared to its previous generation ERA-Interim.

Evaluation Methods
Since the characteristic of precipitation varies in space and time over China and its adjacent areas, 7 sub-regions are further selected for detailed evaluation (Figure 1): Northeast China (NEC), North China (NC), Northwest China (NWC), Yangtze River Basin (YRB), South China (SC), Southwest China (SWC) and Tibetan Plateau (TP). Comparison is conducted on climatological mean, seasonal cycle, daily and extreme precipitation, and diurnal variations. A threshold of 0.1 mm day −1 is chosen to separate dry and wet days. For the assessment of CNRR's capability of simulating diurnal variation of precipitation, we calculate and standardize the annual, summer (June, July and August, JJA) and winter (December, January and February, DJF) mean of the diurnal cycle of precipitation amount during 1998-2009. The harmonic analysis is also applied on the precipitation diurnal variation, and the first harmonic is chosen to represent the diurnal cycle of precipitation (Dai and Trenberth, 2004;Koo and Hong, 2010;Mascaro, 2017). The first harmonic fits the following model: where R(t k ) is the accumulated-rainfall amount from time t k−1 to t k . For CNRR-GSI and CNRR-SN which are hourly data, t k = 0, 1, 2, . . . , 23, for CMORPH whose temporal resolution is 3 hours, t k = 0, 3, 6, 9, . . . , 21; R(t k ) is the value of rainfall amount at time t k ;R stands for daily mean precipitation, equals to zero here because of the standardization, A i and ϕ i are the amplitude and the phase of the given ith harmonic.
To quantify the performance of CNRR precipitation datasets, the root mean square error (RMSE) and the correlation coefficient between CNRR and observations are calculated in the study. The CNRR datasets are firstly interpolated to the grids of CMORPH, and OBS_Land, respectively, when calculating correlation and RMSE between them.
The equations of RMSE and correlation are: where M i is the value from CNRR at grid i, O i is the value from observations at grid i,M is the spatially averaged value from CNRR,Ō is the value from observations, and n is the total number of grid points.
Considering the climatology of China, 4 extreme precipitation indices, namely consecutive dry days (CDD), consecutive wet days (CWD), very wet day precipitation (R95p), maximum 5-day precipitation amount (Rx5) established by the widely used Expert Team on Climate Change Detection and Indices (ETCCDI), are selected in this study to analyze extreme rainfall. The former two indices deal with rainfall frequency and the latter two indices deal with rainfall intensity. The definitions of the indices are given in Table 1.

RESULTS
To detailed evaluate the precipitation characteristics in CNRR products, the mean climatology, seasonal cycle of precipitation, daily precipitation and diurnal variations of precipitation are assessed against the CMORPH, OBS_Land, and, ERA5. Figure 2 depicts the spatial distribution of 12-year annual, summer, and winter mean precipitation of observations (CMORPH and OBS_Land), CNRRs and, ERA5 over East Asia. According to the reference data, the 12-year annual mean precipitation is distributed as decreasing from south near the equator of 10 mm/day to north around 1 mm/day, with more rainfall over the ocean than the land. Over the China mainland, precipitation decreases from the southeast to the northwest. The reanalysis datasets can well reproduce annual mean precipitation compared to the observations with spatial correlations coefficients (SCCs) all above 0.79 and RMSEs below 2.13 mm/day over the simulation domain. CNRR-GSI tends to overestimate precipitation over southern China, while CNRR-SN tends to overestimate precipitation over southwestern China. Over ocean, both CNRR-GSI and CNRR-SN well produce the main rainfall belts over Northwest Pacific compared to CMORPH, although CNRR-GSI overestimates precipitation over tropical regions. Seasonal mean precipitation has similar spatial distribution as the annual mean, and most of the precipitation occurred in summer. In summer, CNRR-GSI can simulate the spatial patterns of precipitation over most regions in East Asia, except for the significant overestimation over southern China. CNRR-SN produces wet bias over SWC and NEC. It is worth noting that both CNRR-GSI and CNRR-SN have large wet biases over TP. The summer precipitation over ocean is evidently overestimated by CNRR-SN over Bay of Bengal. For winter precipitation, CNRR-GSI and CNRR-SN produce wet bias over regions south of YRB in China.

Mean Climatology
To quantitively assess the CNRR precipitation products, the SCCs and RMSEs of annual, summer, and winter mean precipitation over the whole domain, ocean, and land areas between the CNRRs and observations are computed and listed in Table 2. CNRRs perform well with most of the SCCs above 0.8. Generally, CNRR-GSI results are relatively close to ERA5, and outperforms CNRR-SN in simulating the spatial distribution of climatology precipitation with higher SCCs, while it generates relative larger RMSEs over ocean. Over land area, FIGURE 3 | Monthly precipitation produced by the CNRRs, observations and ERA5 over 7 sub-domains over China. The correlation coefficients are calculated for the different sub-domains. GSI_CMO, SN_CMO and ERA5_CMO stand for the correlation between CNRRs and CMORPH. GSI_Land, SN_Land and ERA5_Land stand for the correlation between CNRRs and OBS_Land.
CNRR-GSI generates better spatial distribution with higher SCCs and lower RMSEs of annual precipitation compared to the others.

Seasonal Cycle
The seasonal cycles of regional averaged precipitation over the 7 sub-regions are displayed in Figure 3. Overall, the reanalysis datasets can generally reproduce the seasonal variations of precipitation over all sub-regions with the temporal correlation coefficients (TCCs) above 0.93, but obvious differences exist over different regions. Over the northern regions of China (NC, NEC, and NWC), CNRRs can portray the annual cycles of precipitation well with TCCs above 0.97. CNRR-SN significantly overestimates the precipitation amounts over these regions during warm season. CNRR-GSI shows better results in NEC and NWC than CNRR-SN and ERA5. ERA5 simulates better over NEC and NC than the CNRRs. For YRB region, ERA5 and CNRR-SN performs better on simulating the seasonal cycle with a TCC above 0.98, while CNRR-GSI simulates the peak of the monthly precipitation in July which is one month later than that in the observations. Over SC region, the observed precipitation seasonal cycle has two peaks in June and August, respectively. CNRRs and ERA5 can simulate the seasonal cycle with the two peaks, although wetter biases exist in CNRR-GSI in warm season. In SWC and TP regions, the peaks of the seasonal cycle are reasonably captured by the CNRRs. ERA5 and CNRR-SN yields significant larger overestimations over the two regions, reaching up to twice of the observations.   Figure 4 demonstrates the seasonal variations of CNRRs, EAR5 and observed meridional mean (105 • E∼120 • E) precipitation, which represents the evolution of monsoon precipitation. As can be seen from the references, the time of peak precipitation varies with latitude under the influence of the propagation of monsoon precipitation. The reanalyses can reasonably reproduce the propagation, but clear biases exist in CNRRs. Over regions south of 20N, both CNRR-GSI and CNRR-SN capture the peak precipitation in August and September. Over regions between 20N and 30N, the observed rainfall peaks in June, and the propagation is well captured by CNRRs, while larger wet biases can be found during warm season in CNRR-GSI compared to CNRR-SN. Over northern regions above 30N, CNRRs can portray the precipitation pattern well.

Daily Precipitation
An examination of the daily precipitation of CNRRs and ERA5 is conducted by comparing the wet day (defined as daily precipitation >0.1 mm/day) probability between the reanalysis and observations using several different thresholds, namely 0.1 mm∼10 mm, 10 mm∼25 mm, and 25 mm∼50 mm, representing light rainfall, moderate rainfall, and heavy rainfall, respectively. Figure 5 shows the spatial distributions of wet day probability at different precipitation intensities and the SCCs and RMSEs between CNRR, ERA5 and references are listed in Table 3. For light rainfall, ERA5 and CNRRs reproduced the patterns well over the land compared to the OBS_Land observation with the SCCs above 0.72 (Table 3). CNRRs tend to underestimate the light rainfall frequency over TP and Sichuan Province, while ERA5 reduces such bias. The reanalyses tend to generate more light rainfall events over ocean compared to CMORPH, and ERA5 produces larger biases than CNRRs. For moderate rainfall events, CNRR-GSI can well simulate the spatial pattern of precipitation probability with the SCC larger than 0.8. However, CNRR-SN clearly overestimates the probability, especially over southwestern China. Both CNRR-GSI and CNRR-SN can simulate the distribution of moderate rainfall probability over ocean, with positive biases exist compared to CMORPH, while ERA5 simulates more moderate rainfall events than CNRRs. For heavy rainfall events, CNRR-GSI reproduces the distribution of precipitation probability, and the SCCs between CNRR-GSI and observations are all above 0.75. CNRR-SN again simulates larger biases over southwestern China than CNRR-GSI.
ERA5 shows similar performance to CNRR-GSI, and has better results over the land with higher SCC at 0.82. Over the ocean, the reanalyses tend to underestimate the heavy rainfall frequency, and ERA5 produces larger bias. Based on Table 3, it can be found that CNRR-GSI outperforms CNRR-SN in simulating the distribution of precipitation probability for all three precipitation intensity events, with higher SCCs and lower RMSEs. ERA5 provides better light and heavy rainfall over land than CNRR-GSI, while CNRR-GSI shows better results over the entire domain. The probability density function (PDF) of precipitation offers further details into the error structure of the CNRR precipitation datasets. The PDFs of daily precipitation for the seven subregions in CNRR, ERA5 and the OBS_Land are depicted in Figure 6. Only daily precipitation above 0.1 mm/day is used to generate PDFs. Precipitation probability generally decreases as the precipitation intensity increases. Differences in the PDFs exist between the sub-regions owing to the spatial variation in precipitation intensity. For instance, the southern region of China (SEC and SWC) exhibits larger probabilities of daily precipitation above 25 mm/day. In general, CNRRs and ERA5 well reproduced the observed PDFs, although overestimated probability at heavy precipitation intensities exists in CNRRs. CNRR-GSI generally captures light rainfall probability more accurately over all 7 subregions, while ERA5 shows obvious overestimation. Moderate rainfall events are generally overestimated by CNRR-SN and ERA5, especially over NEC, NC, and SWC. Obviously, both CNRR-GSI and CNRR-SN generate too many heavy rainfall events in NEC, NC, NWC, SWC, and TP, but comparable results for heavy rainfall events in YRB and SC. CNRR-GSI generates even more extreme rainfall events (daily precipitation >50.0 mm/day) in NC, YRB, and SC than CNRR-SN. ERA5 simulates the probability of heavy rainfall events well except over NWC. Over TP region, the light and moderate rainfall frequency is well simulated by CNRRs compared to observations, while significant overestimation exists for the frequency of rainfall daily precipitation intensity above 15 mm/day especially in CNRR-SN, and ERA5 shows similar results as CNRR-GSI. To further assess the CNRRs' capability of producing the extreme precipitation events, the spatial distributions of 4 extreme precipitation indices (CDD, CWD, Rx5, R95p) from ETCCDI over land area are calculated and shown in Figure 7. According to OBS_Land, large CDDs above 100 days are mainly located at northwestern China. Although ERA5 and CNRRs can well simulate the distribution of CDD, they tend to overestimate CDD over Xinjiang province especially in CNRR-SN experiment. Both ERA5 and CNRRs reproduce the large CWD over southwestern China, and CNRR-GSI can also simulate the spatial pattern of CWD over the Indochina Peninsula where underestimation (overestimation) clearly exits in CNRR-SN (ERA5). For the extreme precipitation intensity indices (Rx5 and R95p), the spatial distributions can be well simulated with the SCCs above 0.93 in CNRR experiments, while overestimation clearly exists in southern China especially in CNRR-GSI experiment. Generally, CNRRs and ERA5 reasonably capture the observed distributions of extreme precipitation indices. CNRR-GSI better simulates the patterns of CDD and CWD with higher SCCs and lower RMSEs, while it has larger RMSEs for Rx5 and R95p (as shown in Table 4).

Diurnal Variation
The diurnal cycle of precipitation is an important aspect of regional climate. Evaluation of the diurnal variation of precipitation is helpful to further assess the ability of CNRR in producing the spatiotemporal characteristics of precipitation. In this study, only the diurnal variation of summer precipitation is discussed. Local Standard Time (LST) is used to represent the diurnal cycle.
The CMORPH satellite derived precipitation which represents 3 hourly rain rate on the 0.25 • × 0.25 • latitude−longitude grids is used as observation. The harmonic analysis is applied to characterize the diurnal cycle (24-h period) of precipitation, the first harmonic component is used to calculate the amplitudes and phases of diurnal variations. Figure 8 shows the distribution of  the phase of the diurnal cycle of precipitation amount (PA) and frequency (PF), reflecting the occurring time of daily maximum PA and PF, respectively. From the observation, precipitation mainly occurs in the afternoon (from 1500 to 1800 LST) over most land areas except for southwestern China where rainfall usually occurs at night to early morning. An eastward propagation of peak time of PA can be observed starting from TP to YRB (marked in Figure 8A). The spatial pattern of the peak time of PA over land area produced by CNRR-GSI is consistent with that of CMORPH, reflecting the spatial evolution of the diurnal phase of rainfall, although a bias of 1∼2 h exists in certain areas. The eastward propagation of peak time of PA is best reproduced in CNRR-GSI with a 2-h delay over TP. CNRR-SN can simulate the spatial distribution of the peak time of PA with the occurrence time about 2 h earlier over most regions in China and about 4 h earlier over Indochina Peninsula compared to observation. Also, it fails to simulate the eastward propagation of the diurnal phase. ERA5 captures the spatial distribution, and well reproduced the phase over TP and south China. Both CNRR and ERA5 can reproduce the peak time of diurnal variation of PA over ocean with the maximum precipitation occurs in daytime, and CNRR-GSI better simulates the pattern of peak time over Bay of Bengal. The distribution of the phase of the diurnal cycle of PF is quite similar to that of PA. CNRR-GSI can capture the characteristic of maximum precipitation frequency over most regions over land with the occurrence time about 2 h earlier than observation over NEC, Inner Mongolia, and TP. CNRR-SN shows similar pattern to ERA5, and CNRR-SN and ERA5 both beat CNRR-GSI in reproducing the phase of the diurnal cycle of PF over NE and Inner Mongolia, but they simulate earlier peak time of PF over most land areas especially over SC and Indochina Peninsula, and do not reproduce the peak time of frequency at night over southwestern China. Both CNRRs and ERA5 can simulate the maximum precipitation frequency in morning over most offshore areas (East China Sea, Yellow Sea, etc.), and CNRR-GSI performs  slightly better results than CNRR-SN over South China Sea and Bay of Bengal. Figure 9 depicts the distribution of the normalized amplitudes of diurnal harmonic of PA from CMORPH, CNRRs and ERA5. The amplitude is normalized by dividing them by the daily mean precipitation. From the CMORPH (Figure 9A), the maximum normalized amplitude of PA mainly locates over the TP and along the coastal regions in southern China and Indochina Peninsula. CNRR-GSI can reproduce the distribution of the diurnal harmonic of precipitation amplitude, with the strongest diurnal rainfall amplitudes over TP and along the coastal regions in SC and Indochina Peninsula. And the SCC between CNRR-GSI and CMORPH is the highest, reaching 0.89. ERA5 captures the spatial pattern of PA amplitude well, while it underestimates the amplitude over TP and south east of China. Although CNRR-SN can simulate the spatial pattern of precipitation amplitude with the SCC at 0.88, it clearly overestimates the normalized amplitude over southern China, Indochina Peninsula, northern and northwestern China.
To further reveal the regional features of diurnal cycles of PA and PF in China, Figure 10 compares the normalized diurnal harmonic cycle of summer PA and PF from the CNRRs, ERA5 and CMORPH over the 7 sub-regions. Obvious diurnal variations in precipitation can be observed in all 7 sub-regions in China. Most regions (NEC, NWC, YRB, SC) show that both PA and PF peak in the afternoon between 1500 LST and 1700 LST, consistent with previous studies (Yu et al., 2007;Zhou et al., 2008;Chen et al., 2018). PA in TP and SWC peaks during midnight. NEC, NWC, SC, SWC, and TP regions exhibited larger diurnal variations of PA and PF. CNRR-GSI simulates the diurnal cycle of PA much closer to the observation especially in NC and SWC with correlation coefficients exceed 0.89, although a 2 h bias exists in regions such as NEC and NWC. ERA5 simulates the diurnal cycle of PA quite close to CMORPH with the TCC up to near 1.0 over NEC and SC, but its performance over different regions is unstable, with 2∼9 h bias existing and TCC below 0.5 over SWC and TP, and the TCC over NC reaches −0.28. CNRR-SN produces identical diurnal variations of PA among all 7 sub-regions with significant overestimation of afternoon PA and underestimates the early morning PA in SWC and TP regions. In terms of PF, the peak frequency occurs during the afternoon (14:00 to 17:00 LST) in all 7 sub-regions. Both CNRRs and ERA5 well simulate the diurnal variations of PF in which CNRR-SN significantly overestimates the magnitude of diurnal variation of PF. CNRR-GSI outperforms CNRR-SN and ERA5 with higher TCCs, although a 1-2 h delay exists over most regions of China.

CONCLUSION AND DISCUSSION
Two Chinese regional reanalysis datasets, namely the CNRR-GSI and CNRR-SN, are generated at a horizontal resolution of 18 km from 1998 to 2012 utilizing data assimilation technique and SN method, respectively. The satellite data based global precipitation data CMORPH and the gauge-based gridded land precipitation over the simulation domain (OBS_Land) constructed using CN05.1 and APHRODITE, and the global reanalysis dataset ERA5 are used to comprehensively evaluate the skills of CNRRs in reproducing the key features of the climatology, seasonal, daily, and diurnal precipitation and extreme rainfall over East Asia.
Both CNRR-GSI and CNRR-SN are reliable in depicting the spatial distribution of precipitation in a climatological view, but overestimation of warm season rainfall especially over the south of TP exists. In general, CNRR-GSI generates better spatial distribution with higher SCCs but more rainfall with higher RMSEs than CNRR-SN and ERA5 over land. In terms of the seasonal cycle, CNRRs can reproduce the seasonal variations of precipitation over China as well as the evolution of monsoon precipitation, and generates the results close to ERA5. CNRR-GSI outperforms CNRR-SN in reproducing summer rainfall in most regions of China except YRB and SC.
Analysis of the daily precipitation reveals that CNRRs are capable of reproducing the distribution of precipitation occurrence at different precipitation intensities. CNRRs slightly underestimate the light rainfall probability and overestimate heavy rainfall probability over the simulation domain. For different regions in China, CNRRs overestimate heavy rainfall probability, especially over TP and NWC. CNRR-GSI outperforms CNRR-SN in reproducing precipitation probability at different intensities over most regions except that CNRR-GSI produces more heavy and extreme rainfall over YRB and SC. The probability of moderate and heavy rainfall can be well captured by ERA5, while CNRRs simulate the light rainfall occurrence and underestimate the extreme rainfall frequency better than ERA5. CNRRs also reasonably capture the observed distributions of extreme indices, and CNRR-GSI best simulates the patterns of CDD and CWD with higher SCCs and lower RMSEs compared to CNRR-SN and ERA5, while it has larger RMSEs for Rx5 and R95p.
Comparison of peak time of summer PA and PF in China between CMORPH, CNRRs and ERA5 reveals that CNRR-GSI outperforms CNRR-SN and ERA5 in capturing the regional characteristic of the diurnal cycle of precipitation, especially over eastern China. The eastward propagation of peak time PA starting from the TP and along the Yangtze River is well reproduced in CNRR-GSI but failed in CNRR-SN, regardless of the slight shifts of diurnal phases around 1-2 hours. CNRR-GSI and ERA5 also reasonably captures the amplitude of the diurnal cycle while significant overestimations exist in CNRR-SN over most regions in the simulation domain. Diurnal variations of PA and PF in different regions in China are better revealed by CNRR-GSI with higher SCCs and lower RMSEs.
With the refinement of the dynamical and physical processes in regional numerical weather models and the application of 3D-Var data assimilation and SN techniques, the CNRR datasets can provide reliable precipitation products. CNRR-GSI generally outperforms the CNRR-SN in simulating the characteristics of precipitation over China, shows comparable results with the latest ERA5 and outperforms it in simulating the diurnal cycle of precipitation. Overestimations of heavy and extreme rainfall are observed over YRB, SC, and south of TP in CNRR-GSI. Special care should be taken when using CNRR datasets to study the regional climate over TP as well as in the case of extreme rainfall. With the increasing computing resources, optimized assimilation techniques (e.g., 4D-Var, ensemble Kalman filter, ensemble transform Kalman filter, etc.), and the selection of observations, CNRR could be further improved to provide more reliable precipitation products. Thus, there is a great potential for the application of the CNRR products in precipitation related researches and applications in East Asia.

AUTHORS BIOGRAPHIES
YL is a second-year postgraduate student from School of Atmospheric Sciences at Nanjing University, Nanjing, China. Her major field of study is regional climate change and simulation.
She studied and earned her BA degree from School of Atmospheric Sciences at Nanjing University. By far, she has had interest in the data assimilation methods and evaluation of regional reanalysis data. He works as Associate Professor in School of Atmospheric Sciences, Nanjing University, Nanjing, China currently, teaching Principle of Synoptic Meteorology. His research interest focuses on mesoscale synoptic analysis and numerical simulation.
Prof Pan is awarded Outstanding Individual of Employment Guidance by Nanjing University in 2004, the Second Prize of the Advancement of Science and Technology, Province Jiangsu in 1999 and is nominated the First Prize of National Award for Natural Sciences. Prof Pan's e-mail address is pyn@nju.edu.cn JT earned his Ph.D. in Meteorology from School of Atmospheric Sciences, Nanjing University, Nanjing, China in 2004. His major field is regional climate modeling.
He works as Professor in School of Atmospheric Sciences, Nanjing University, Nanjing, China from 2016 currently. He is devoted to studying regional climate modeling, extremes, landatmosphere interaction and diurnal cycle of precipitation and convective systems. His teaching interest focuses on Statistical Weather Forecasting and Statistical methods in meteorological big data analysis.
Prof Tang is a member of Key Laboratory of Mesoscale Severe Weather/Ministry of Education, Nanjing University. His e-mail address is jptang@nju.edu.cn

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
YL conceived the work, analyzed the results, and wrote the manuscript. MS and JT made contribution to the conception and design the study, reviewed and edited the manuscript. JF and YP read and approved the manuscript. All authors contributed to the article and approved the submitted version.