Reinterpretation of classic proton charge form factor measurements

In 1963, a proton radius of $0.805(11)~\mathrm{fm}$ was extracted from electron scattering data and this classic value has been used in the standard dipole parameterization of the form factor. In trying to reproduce this classic result, we discovered that there was a sign error in the original analysis and that the authors should have found a value of $0.851(19)~\mathrm{fm}$. We additionally made use of modern computing power to find a robust function for extracting the radius using this 1963 data's spacing and uncertainty. This optimal function, the Pad\'{e} $(0,1)$ approximant, also gives a result which is consistent with the modern high precision proton radius extractions.


Introduction
The proton charge radius, r E , is the conventional measure for the size of the proton, a fundamental constituent of matter. This constant is defined as the derivative of the proton charge form factor, G p E , at zero four-momentum transfer, Q 2 = 0: and can be determined by both hydrogen spectroscopy and elastic lepton scattering [1]. The first determination of the radius was done with elastic electron scattering data by Hand et al. [2], who determined the radius of 0.805 (11) fm, the value used in the standard dipole parameterization of the form factor [3,4]. The original study was followed by several decades of dedicated nuclear scattering and spectroscopic experiments, which led to a recommended value for the proton charge radius of 0.8791(79) fm (CODATA 2010, [5]). This result was called into question when the extremely precise spectroscopic measurements on muonic hydrogen [6,7] reported a significantly smaller value of 0.84087 (39) fm. The observed discrepancy, colloquially known as "the proton radius puzzle" [8] motivated several new experiments [9][10][11][12]. These experiments have been accompanied by different reanalyses of the existing data [13][14][15][16][17][18][19][20], focusing on data of Bernauer et al. [21,22]. In this paper we follow a different path and revisit the first data of Hand et al., and evaluate their result by using modern analysis techniques.
In the first determination of the radius, existing data on proton charge form factor from five different measurements were considered [23][24][25][26][27], as noted in Table 1. In an attempt to reconstruct the radius of 0.81 fm we followed the original analysis approach and compared the data to the quadratic function in Q 2 : This model depends on two free parameters: the radius, r E , in front of the linear term, and the parameter a that determines the curvature of the function. Since the data are normalized, the constant term of the model is simply 1. In the first step the two parameters were determined by fitting Eq. (2) to the data with Q 2 ≤ 3 fm −2 , considering the entire region with the high density of experimental points. The obtained results were r E = 0.819 (21) fm and a = 0.00787(309) fm 4 . However, the radius obtained in this manner should not be trusted since the true shape of the G p E (Q 2 ) may be more complex than a second order polynomial. At Q 2 ≈ 3 fm −2 the contributions of the Q 6 and Q 8 terms are not negligible and their omission from the fit causes a systematic shift in the determined radius.
To avoid model dependent bias in the radius extraction, the contributions of higher order terms should be kept minimal. The way Hand achieved this with a model, such as Eq. (2), is by keeping the parameter a at a value determined in their first step and then only fitting the radius, using data with Q 2 ≤ 1.05 fm −2 . Assuming that the determined value for a is a good estimate for the size of the Q 4 term, this preserves the curvature of the model. Additionally, we were able to determine that at 1 fm −2 the Q 4 term contributes less than a percent to the value of G p E . Hence, even a 10 % error in the value of a would result in a modification of the form-factor much smaller than the statistical uncertainty of each measurement. Hence, the described two step fitting technique should result in a more reliable estimate of the proton charge radius. We determined it to be r E = 0.851 (19) fm, which is inconsistent with the original result (see Fig. 1). The obtained value is 5 % larger than the original radius while its uncertainty is almost twice as large as the uncertainty of the first result.
To find the source of the discrepancy the last step of the analysis was repeated with different values of a. Since r E and a are strongly correlated, it is important to evaluate the effect of a on r E . Additionally, the original paper does not report the value of a. The analysis demonstrated in Fig. 2 shows that the radius depends almost linearly on a and reveals that the original value of r E can be reproduced if a, determined in the first step of our analysis, is used, but with the opposite (wrong) sign.  To confirm this hypothesis, we again fitted model (2) to the data with Q 2 < 1.05 fm −2 , but this time kept the radius fixed at 0.805 (11) fm and adjusted only a. We obtained a = −0.00749(63) fm 4 , which strongly supports our assumption that a mistake was made in the original analysis. Additionally, our analysis has also revealed that the original study failed to acknowledge the uncertainty of a in the determination of r E . Their analysis considered only statistical uncertainty and thus underestimated the final uncertainty of the radius.
To test the stability of the extracted radius, we have repeated the analysis by using all combinations of four of the five data sets. The results presented in Fig. 3 demonstrate the tension between the two most precise data sets, Drickey et al. [25] and Lehmann et al. [27]. The data of Lehmann et al. prefer a larger value of the proton charge radius and dominate the result when considering the data with small Q 2 . The data of Drickey et al., on the other hand, favor a smaller proton charge radius and control the result at Q 2 > 1.4 fm −2 . While the discrepancy is too small to exclude a statistical fluctuation in the data, the most probable source of the tension are unaccounted for systematic effects, e.g. offsets in the absolute normalization of the reported data. The tension between the data is reduced if the normalizations of the data sets are kept as free parameters, as is being done in modern analyses of form factor measurements [15,22,28], but does not disappear completely. Furthermore, introduction of additional five free parameters to the fits (normalizations) increases the variance of the extracted result and dilutes the significance of the extracted radius, which in the given case equals to 0.865(48) fm, see Fig. 3.

Robust analysis
The key problem of radius calculation is our ignorance of the true functional form of the proton charge form factor. Consequently, the form factor is approximated by various parameterizations. So far we  considered function (2). Although the model was applied carefully to the data, it is not clear whether the quadratic function is an acceptable model for its description. The choice of a model can impact the result and can lead to a biased radius, i.e. a value that is systematically different from the true value. The bias is associated with the nature of the function and is typically smaller for functions with more free parameters. However, models with many parameters are justifiable only when data sets with large kinematic range and sufficient precision are available. Otherwise the variance of the radius increases to the level that the obtained result has no practical value. Hence, a model needs to be selected that exhibits a minimal bias of the extracted radius while keeping the variance of the result reasonably small. To achieve this, we have complemented the original analysis with a different technique based on a Monte-Carlo study of different form factor models, and are able to offer a more reliable determination of the radius.
Since the majority of the available data were measured only at small Q 2 and with limited precision, we investigated only models that depend on up to three parameters in order to keep the uncertainty of the extracted radius below the difference between the two competing values of the proton radius  problem. Beside model (2), we considered: where n 1 , n 2 , n 3 , m 1 , m 2 and m 4 represent adjustable parameters of the models. Using these parameters the r E for each model can be calculated using Eq. (1). The quadratic (Eq. (2)) and cubic functions (Eq. (3)) were considered as well as four rational functions. They are interesting because, like the dipole model, they introduce higher order terms and define the curvature of the form factor at higher Q 2 , although they depend on relatively few parameters. For completeness, we considered also the dipole model, which is known to report biased results [29], but can serve as a test of our approach. The evaluation of the chosen models and tests of their capacity to reliably extract the radius can not be performed on the real data. Therefore we developed a Monte-Carlo simulation which generated many sets of pseudo data on a desirable kinematic interval using specific form factor models with known corresponding radii. These pseudo data were used to establish statistically relevant estimates on the size of the bias and variance of the extracted radius. The goal was to find a model that would (for a chosen kinematic range) return a radius with uncertainty smaller than σ r E ≤ σ 0 = 0.02 fm and with the bias below ∆r E ≤ 1/(2σ 0 ). Therefore, we have defined the estimator which combines both conditions and could be used to quantify the quality of the selected model and search for the model with RMSE ≤ √ 2. The six models were tested by using the parameterization of Bernauer et al. [22] determined from real data, the fifth-order continued-fraction model of Arrington and Sick [30], and the theoretical prediction of Alarcon, Higinbotham, Weiss and Ye [20]. For each parameterization the pseudo data were generated and studied on the interval [0, Q 2 max ]. The results of the analysis are gathered in Table 2 and presented in Fig. 4. Table 2. Summary of the Monte-Carlo study of the form-factor models (2) - (7). For every model listed in column one, the table shows the results for the most pessimistic case, as can be seen in Fig. 4. Column two shows the "best" value of Q 2 max at which RMSE reaches its minimum and defines the range of the data [0, Q 2 best ] to be used in the fit and in the extraction of the radius. Columns three and four contain the expected bias (extracted minus input radius) and uncertainty of the radius obtained with a chosen model. At small momentum transfers, the value of RMSE(Q 2 max ) is governed by the variance, which decreases with the increasing number of data points considered in the fit. For large Q 2 max , the model is no longer capable of satisfactorily describing the data. Consequently, the extracted radius becomes biased and the RMSE(Q 2 max ) again starts to increase. The position of the minimum determines the ideal momentum transfer range over which a given model gives the most reliable radius for a chosen form factor parameterization. Unfortunately, since we do not know the true functional form of the charge form factor, one cannot simply select a minimum from a single specific parameterization. Thus, we try to be conservative and choose the minimum with the highest RMSE value, Q 2 best , assuming that the form-factor parameterizations considered in the analysis form a representative set of functions and that the true form factor may be somewhere inbetween.
Once the Q 2 best for each of the models was estimated, the data could be fitted on the interval [0, Q 2 best ] and the proton charge radius could be determined. The results of the fits to the real data are shown in Table 2, Table 3 and in Fig. 5. However, the Monte-Carlo analysis demonstrates that only model (4) satisfies the condition for the RMSE = 1.54 ≈ √ 2. All other models have RMSE values larger than 2, which means that the radius results will not meet our criterion regarding the bias and variance. While quadratic and dipole functions are expected to have a large bias and should  Figure 4. Results of the Monte-Carlo study of the form-factor models (2) -(7). RMSE as a function of Q 2 max , obtained with realistic form factor paramaterisations is used to evaluate the behaviour of each model. According to our selection criterion a model is appropriate for the analysis if the minima of all the curves on a given plot lie below the threshold of ≈ √ 2. The selection threshold is marked on the plots with gray bands. The black arrows on each plot denote the positions of the highest minimum which determines the interval [0, Q 2 best ] of the data that should be considered in the fit. therefore be excluded, the remaining functions could still be considered, because their RMSE values are dominated by the large variance, but the calculated radii are expected to have large uncertainties. Hence, our best estimate for the radius is obtained with the Padé (0, 1) approximant, yielding the radius of 0.841(9) fm.

Conclusions
In this paper we reanalyzed the proton charge form factor data from classical experiments performed in the 1960s by utilizing modern analysis tools that were not available at the time of the original anal-  ysis. Repeating the steps of Hand et al., we determined the radius to be 0.851 (19) fm, a value which is 5 % larger than the result of the original paper. Using Monte-Carlo simulation we determined that the observed discrepancy is most probably related to a mistake in the interpretation of the Q 4 -term when fitting the radius. To evaluate and minimize the dependence of the radius on the model applied in the analysis, the classical approach was superseded by a Monte Carlo-based analysis using pseudo-data generated with realistic form-factor parameterizations. In this approach the most appropriate fitting interval and the model function was selected by using a predefined selection criterion RMSE ≤ √ 2. Among the considered functions only Padé (0, 1) fulfilled the set condition. Using this function the best estimate for the proton charge radius was determined to be 0.841 (9) fm. The obtained result is in good agreement with recent extractions of the radius and with the new recommended value (CO-DATA2018, [31]), see Fig. 6. Minimization of the model dependence of the extracted radius is key for reaching consistent interpretation of the modern electron scattering data. Here we offer an approach, which, relying on predefined selection criterion and using Monte-Carlo simulations, simultaneously examines both the model bias and variance. The method successfully applied to the data of Hand et al. can be directly extended to more complex models and used for a robust interpretation of the recent data. Table 3. The parameters for the form-factor models (2), (4), (6) and (7), which have more than one free parameter. Each table shows the values for a given model extracted from the data. The relative contributions of the terms equipped with the given parameters to the total value of the form-factor at Q 2 best are also presented. The alternating signs of the parameters of the quadratic model (r, a) and cubic function (n 1 , n 2 , n 3 ) indicate that the true nature of the form-factor is more complex than a low order polynomial, thus requiring higher-order terms to match its slope and the curvature in a chosen Q 2 -range. The positive values of m 1 , m 2 and m 4 show that the Padé (0, 2) and the hybrid model do not have poles, while automatically ensure a correct asymptotic behaviour of the form-factor. The large uncertainties of the higher-order terms (n 2 , n 3 , m 2 , m 4 ) are governed by the large uncertainties of the available measurements.  Figure 6. The result of this work compared to other extractions of the proton charge radius. Full circles show findings of modern nuclear scattering experiments ( [11,[32][33][34]) together with the original result of Hand et al [2]. Full squares represent values obtained from the recent atomic hydrogen spectroscopy measurements ( [9,10,12]). The triangles denote values determined from the muonic hydrogen (deuterium) measurements ( [35][36][37]). The uncertainties of data from Pohl et al. and Antognini et al. are multiplied by factor 5 for clarity. The diamonds show recent reanalyses of the electron scattering experiments ( [14,15,[17][18][19][20][38][39][40][41][42]). The gray line with the corresponding band is the recommended value (CODATA 2018, [31]).