Mixing-length estimates from binary systems. A theoretical investigation on the estimation errors

We performed a theoretical investigation on the mixing-length parameter recovery from an eclipsing double-lined binary system. We focused on a syntetic system composed by a primary of mass M = 0.95 Msun and a secondary of M = 0.85 Msun. Monte Carlo simulations were conducted at three metallicities, and three evolutionary stages of the primary. For each configuration artificial data were sampled assuming an increasing difference between the mixing-length of the two stars. The mixing length values were reconstructed using three alternative set-ups. A first method, which assumes full independence between the two stars, showed a great difficulty to constrain the mixing-length values: the recovered values were nearly unconstrained with a standard deviation of 0.40. The second technique imposes the constraint of common age and initial chemical composition for the two stars in the fit. We found that $\alpha_{ml,1}$ values match the ones recovered under the previous configuration, but $\alpha_{ml,2}$ values are peaked around unbiased estimates. This occurs because the primary star provides a much more tight age constraint in the joint fit than the secondary. Within this second scenario we also explored, for systems sharing a common $\alpha_{ml}$, the difference in the mixing-length values of the two stars only due to random fluctuations owing to the observational errors. The posterior distribution of these differences was peaked around zero, with a large standard deviation of 0.3 (15\% of the solar-scaled value). The third technique also imposes the constraint of a common mixing-length value for the two stars, and served as a test for identification of wrong fitting assumptions. In this case the common mixing-length is mainly dictated by the value of $\alpha_{ml,2}$. [...] For $\Delta \alpha_{ml}>0.4$ less than half of the systems can be recovered and only 20% at $\Delta \alpha_{ml} = 1.0$.


Introduction
Despite the huge refinements in the accuracy and reliability of the stellar evolutionary predictions, several mechanisms involved in the evolution of stars are still poorly understood. A major and long standing problem affecting stellar model computations is the treatment of superadiabatic convection. This lack prevents a firm and reliable prediction of the extension of the external convective regions.
A precise treatment of external convection would require 3D hydrodynamical calculations, greatly improved in recent years (see e.g. Tanner et al. 2013;Trampedach et al. 2014;Magic et al. 2015) however they still cannot cover the wide range of input physics needed to model stellar computations. Moreover defining alpha in one dimensional codes from the results of 3D simulations is quite ambiguous, due to the differences between the mixing length representation of convection and the convection behaviour in 2D/3D simulations, even if there have been relevant attempts in literature (Lydon et al. 1992;Ludwig et al. 1999;Tanner et al. 2013;Magic et al. 2015;Mosumgaard et al. 2017). Therefore the current generation of stellar evolution codes still address this problem by relying, almost universally, on the Send offprint requests to: G. Valle, valle@df.unipi.it mixing-length theory (Böhm-Vitense 1958). In this framework the efficiency of the convective transport and the stellar structure in the superadiabatic transition layers depends on the mixinglength l, which is supposed to be proportional to the pressure scale height H p , i.e. l = α ml H p , where α ml is a non-dimensional free parameter to be somehow calibrated.
As a result of this freedom, neither the effective temperature nor the radius of stars with a thick outer convective envelope can be firmly predicted by current generation of 1D stellar models since they strongly depend on the calibrated value of α ml . This obviously influences the stellar characteristics recovered by fit techniques that exploit these observables.
The classical target for mixing-length calibration is the Sun, but the generalization of this calibration to different evolutionary phases, metallicity, and mass ranges has been questioned both on theoretical and observational grounds. In particular, a growing amount of observations suggests that the adoption of the solar calibrated α ml does not allow to properly model all kind of stars (see e.g Guenther & Demarque 2000;Yıldız et al. 2006;Yıldız 2007;Clausen et al. 2009;Deheuvels & Michel 2011;Bonaca et al. 2012;Mathur et al. 2012;Wu et al. 2015;Joyce & Chaboyer 2018b,a;Li et al. 2018). page 1 of 8 manuscript bin-ML-arxiv Several of the above mentioned studies consider stars in double-lined detached eclipsing binary systems. In fact, in this case it is possible to obtain an accurate measurement of both masses and radii of the two stars. The availability of these two fundamental quantities allows for a stringent tests of the stellar models (see e.g. Claret 2007;Stancliffe et al. 2015;Gallenne et al. 2016;Claret & Torres 2016;Valle et al. 2017;Claret & Torres 2017) and thus also on the possible variations of the mixing-length parameter in different mass and metallicity ranges.
A recent theoretical analysis (Valle et al. 2019) pointed out many limitations for a mixing-length calibration from field stars, so it is interesting to investigate the question of the reliability of this calibration in binary systems. The aim of this work is to perform such investigation for a system composed by two low mass stars, thus avoiding the supplementary complication of dealing with the concurrent calibration of the convective core overshooting parameter.
The investigation is focussed on a synthetic binary system composed by a primary of M 1 = 0.95 M ⊙ and a secondary of M 2 = 0.85 M ⊙ , resulting in a mass ratio of about 1.1. This configuration, quite common for real systems, was chosen also to allow the stars to be sampled in different evolutionary phases, because this provides the most stringent constraints for the recovery (Valle et al. 2017;Claret & Torres 2016). The focus of our analysis is in quantifying the errors in the recovered mixinglength values arising only from observational errors. Therefore no systematic discrepancies between the grid of models adopted in the recovery and the artificial stars are assumed. Although this is a far too optimistic assumption when dealing with real world binary systems, nevertheless it is a mandatory step that allows to asses the very minimum errors that can be expected in the calibration process. Moreover only a theoretical investigation can highlight the presence of hidden biases and dependencies (see e.g. Valle et al. 2019, for an analysis of spurious metallicity dependencies of the recovered mixing-length on field stars) that would be otherwise neglected.

Methods
The analysis was performed at three different metallicity values, centred around the solar value (Z = 0.0074, 0.0129, 0.0221), and at three different evolutionary stages of the primary star. More precisely, we define a relative age r with respect to the main sequence (MS) lifetime and we selected models at r = 60%, 90%, and 120% of the central hydrogen exhaustion time. The first two points correspond to a primary at the middle and nearly at the end of the MS, while the third one corresponds to a primary in the red giant branch (RGB) phase. Correspondingly, the secondary star is always in the MS, at about 40%, 55%, and 75% of its central hydrogen exhaustion time. The synthetic stars' initial helium abundances were obtained by the linear relation Y = Y p + ∆Y ∆Z Z with the primordial abundance Y p = 0.2485 from WMAP (Peimbert et al. 2007a,b) and with a helium-to-metal enrichment ratio ∆Y/∆Z = 2.0 (see e.g. Gennaro et al. 2010).
We considered different possible values of α ml for the two stars. The reference scenario has the two stars with common mixing-length parameter α ml,1 = α ml,2 = 2.0. The other five cases adopted a systematic increasing difference between of the two mixing length values in step of 0.2: while the mixing-length of the primary star was increased in step of 0.1 (α ml,1 = 2.1, 2.2, 2.3, 2.4, and 2.5), the values of α ml,2 were correspondingly decreased by 0.1 (α ml,2 = 1.9, 1.8, 1.7, 1.6, 1.5). Thus the extreme scenario has a difference in mixing-length value α ml,1 − α ml,2 = 1.0. The considered simulation parameters are summarized in Table 1.
For each possible mixing length combination, metallicity, and evolutionary stage artificial stars were sampled from the grid described in Sect. 2.2, and their observables perturbed by means of a Monte Carlo procedure assuming Gaussian errors. The procedure was repeated N = 5000 times for each artificial system. The adopted observable constraints were the effective temperature T eff , metallicity [Fe/H], mass M, and radius R of both stars. We assumed uncertainties of 100 K in T eff , 0.1 dex in [Fe/H], 0.5% in M and 0.25% in R. Similar precisions on the mass and radii are achievable for a small subset of binary systems, but they are mandatory for calibration purposes (see e.g. Valle et al. 2017).

Grid-based recovery technique
Each artificial system was then reconstructed adopting the SCEPtER-binary pipeline (Valle et al. 2015a), modified to consider the mixing-length value in the estimation process. Details about the technique can be found in Valle et al. (2015aValle et al. ( , 2016. We adopted the pipeline in three configurations. The first one fits the two stars independently, thus effectively losing the binary constraint. This result is useful as a reference for comparison with the other results. A second configuration imposes the binary constraint, forcing the pipeline to return identical ages, initial metallicity Z, and initial helium abundance Y for the two stars; however each star can have different mixing-length values. The third configuration adds the constraint of a common mixinglength value. The latter scenario is particularly useful to test the sensitivity of grid techniques in identifying an inadequate fitting model specification. Indeed, as the difference in the mixing length of the synthetic stars grows, the constraint of a common α ml value would become more and more difficult to satisfy, causing several fits to return no acceptable values for the system. The adopted procedure will allow to quantify the theoretical fraction of systems for which such behaviour is expected.

Stellar models grid
The estimation procedure required a grid of stellar models, sufficiently extended to cover the whole parameter space. To this purpose, we adopted the same stellar models grid as in Valle et al. (2019), computed by means of the FRANEC code (Degl'Innocenti et al. 2008;Tognelli et al. 2011), in the same configuration as was adopted to compute the Pisa Stellar Evolution Data Base 1 for low-mass stars (Dell'Omodarme et al. 2012). Models were calculated for the solar heavy-element mixture by Asplund et al. (2009). Atomic diffusion was included, taking into account the effects of gravitational settling and thermal diffusion with coefficients given by Thoul et al. (1994). Outer boundary conditions were determined by integrating the T (τ) relation by Krishna Swamy (1966). Further details on the stellar models can be found in Valle et al. (2009Valle et al. ( , 2015b and references therein. Although the choices in the input physics play a relevant role when estimating stellar parameters from real observational data, they are of minor relevance for our aim because artificial stars are recovered from the same model grid adopted for their sampling. The adoption of microscopic diffusion in the stellar computations causes an evolution with time of the surface [Fe/H], which is adopted as one of the observational constraints in the analysis.   For each metallicity we computed models for nine different values of the initial helium abundance by following the above mentioned linear relation Y = Y p + ∆Y ∆Z Z, with a helium-to-metal enrichment ratio ∆Y/∆Z from 1 to 3 with a step of 0.25 (Gennaro et al. 2010). ∆Y/∆Z = 2.0 corresponds to the reference value for the synthetic systems. Ultimately, the grid spans a set of 153 different initial chemical compositions. For each mass, metallicity and initial helium abundance, we computed models for 21 values of the mixing-length parameter α ml in the range [1.0, 3.0] with a step of 0.1. With the assumed input physics, the solar-calibrated value is α ml = 2.1. All the adopted steps are sufficiently small to impact in a negligible way on the estimates.

Results
The analysis of the fit results for the considered scenarios revealed some expected behaviours and also some peculiar effects. The following subsections explore in detail the outcomes in the three considered fitting configurations.

Independent recovery
The fit of the binary systems under full independence between the stars (i.e. stars can be fitted at different chemical compositions and different ages) revealed a great difficulty to constrain the mixing-length value for the two stars. The recovered marginalized posterior density of α ml,1 and α ml,2 are presented in Fig. 2 and Table 2. While the mean values of the mixing-length is in general consistently recovered for both stars, there is a huge variability in the results, which practically cover the whole allowed α ml range. The α ml,1 values are underestimated for the two most extreme scenarios (α ml,1 = 2.4, 2.5), as a consequence of an edge effect that truncates the estimates at the grid upper value (α ml = 3.0). This effect is clearly evidenced in Fig. 2A: for the sampling at α ml = 2.5 the posterior density is clearly truncated at the upper edge. A correspondent tendency to overestimation is reported in the last two cases for the secondary star (α ml,2 = 1.6 and 1.5). These results confirm the theoretical finding by Valle et al. (2019), obtained for field stars. It seems that even from stars with exceptionally well constrained masses and radii, a calibration of the mixing-length parameter is poorly reliable.

Binary constraints in age and chemical composition
Imposing the constraint of a common age and initial chemical composition for the two stars modifies the results in an interesting way. As it appears from Fig. 3 and Table 3, the recovered α ml,1 values closely match the ones shown in Sect. 3.1, showing the same mean values and the characteristic large dispersion.
On the other hands, the recovered α ml,2 values are much more peaked around their mean values, which provide unbiased estimates of the values adopted in the sampling. These results suggest that while the mixing-length parameter of the primary star is not further constrained by the fit, this is not the case for that of the secondary. Indeed the standard deviations of the recovered α ml,2 values range from one half to one third of those of α ml,1 . Moreover, the tendency to overestimate α ml,2 for the two most extreme scenarios -shown in Sect. 3.1 -disappears, as a con- sequence of the much smaller variability which prevents edge effects to play a role. The impressing differences in the two stars behaviour is dictated by the different constraints they provide each other in the joint fit. While the independently recovered initial chemical composition are nearly identical for both stars, this is not the case for the age. As discussed in detail in Valle et al. (2015a), the primary star provides a much more tight age constraint in the joint fit than the secondary. This occurs because the age relative error becomes smaller as a star evolves in the MS, due to the faster evolutionary time scale. In fact the age range allowed by the observable constraint errors is smaller in rapid evolutionary phases. This is clearly demonstrated in Fig. 4, which presents the α ml,2 posterior density for a mock data with α ml,1 = α ml,2 = 2.0, in dependence of the evolutionary stage of the primary star. The distribution of the recovered secondary mixing-length values shrinks as the primary evolves. The RGB scenario -for which the evolutionary time scale is the fastest -provides the lowest variance for the estimated α ml,2 values. This effect is further shown in Table 4. It is apparent that the standard deviation of the recovered mixing-length value for the secondary star shrinks as the primary evolves, mainly when the differences in the sampled α ml value are lower than about 0.6.
As a consequence, the common age constraint leads to the rejection of several extreme solutions for the secondary, while the solutions of the primary are unaffected. The net result is a se-

Expected difference in α ml value for the two stars
The analysis conducted assuming independent mixing-length values for the two stars also allows to estimate an interesting parameter that is, the expected dispersion of the recovered α ml values when sampled at common α ml = 2.0. An estimate of this value can help in judging how reliable are calibrations from binary systems allowing for independent mixing-length values.
The question requires to consider the reconstructed differences ∆α ml = α ml,1 − α ml,2 for all the binary systems simulated with the same α ml . The distribution of these differences are only due to random errors on the observables and should be therefore considered as the minimum variability on the mixing-length pa- rameter. Figure 5 shows the estimated distribution of ∆α ml with the identification of the expected 1σ and 2σ quantiles. It appears that a fluctuation of ∆α ml ±0.3 is expected at 1σ levels, implying that about 32% of the systems with true common α ml values can be reconstructed with differences higher than this only owing to the observational errors.
This results should be carefully considered because a difference of 0.3 in α ml is as high as 15% of the solar-scaled value. For investigations that report a difference in mixing-length values of two stars in a binary system lower than this one should consider the possibility that a random error on the observables can indeed explain this discrepancy.

Fully coupled recovery
The last explored reconstruction also imposes the constraint of a common mixing-length value for the two stars, beside those on the common original chemical composition and age, discussed in Sect. 3.2. In nature, a similar assumption is theoretically justified when stars have a similar mass and are in the same evolutionary phase, but can be otherwise questioned. For our mock data this assumption is not valid for most of the systems, which are sampled with different mixing-length values for the two stars. The assumption of a common α ml is thus useful as a test of the robustness of the fit to wrong assumptions in the mixing-length values.
As shown in Fig. 6 and in Table 5  the secondary is much more peaked than that of the primary, thus providing a much stronger constraint on the joint estimate.
In the extreme scenario of ∆α ml = 1.0 the mixing-length value of the primary is underestimated from 2.5 to 1.5. This severe underestimation comes at one cost: only a marginal share of systems in this configuration can be reconstructed by the algorithm. For the vast majority the fitting pipeline is not able to provide an acceptable fit, thus suggesting the existence of some wrong assumptions in the modelling. Indeed, Fig. 6B shows the fraction of systems for which a fit was possible. While a common mixing-length value is found for more than three quarter of systems with ∆α ml ≤ 0.4, this fraction rapidly drops to 20% for ∆α ml = 1.0. Moreover, restricting the analysis to systems with both stars in the MS, the fraction of systems for which the fit was possible decreases even more and is nearly zero for ∆α ml = 1.0.
Therefore it seems that large discrepancies between the α ml assumptions in the fit and in the mock data are easily detected. This is not the case for moderate differences: in this case the fitting algorithm is able to provide a common solution, which is however biased towards α ml,2 . The results presented in the previous sections assume a fixed error on the observational constraints. It is interesting to explore how these assumptions influence the outcome of the fit in the three explored configurations. Due to the strong dependence of the effective temperature on the adopted mixing-length we repeated the analysis assuming an error of 50 K in T eff that is, one half of what previously assumed.
The results are summarised in Table 6. A comparison with Tables 2, 3 and 5 shows a minor impact of this change. Overall one can observe a moderate reduction of the standard deviations, and a equally small reduction of the biases in the recovered mixing-length values. Overall the reduction of the standard deviation of the recovered mixing-length values in the three scenarios is about 20%, with respect to a 50% reduction of the T eff uncertainty. Therefore the results and the trends discussed so far can be considered robust against this particular source of uncertainty. Table 6. Means and standard deviations of the recovered mixing-length values adopting an observational error of 50 K in T eff . The results cover all the three considered scenarios, in dependence on the α ml,1 and α ml,2 adopted in the generation.

Conclusions
We performed a theoretical investigation on the biases and random uncertainties affecting the calibration of the mixing-length value from a mock eclipsing double-lined binary system, composed by a primary artificial star of mass M 1 = 0.95 M ⊙ and a secondary of mass M 2 = 0.85 M ⊙ . We used the SCEPtERbinary pipeline (Valle et al. 2015a) to estimate the mixing-length of the mock stars, adopting as observational constraint the effective temperature, the metallicity [Fe/H], the radius, and the mass of the two stars. The comparison between the true and the estimated mixing-length values allows to evaluate the calibration reliability. More in detail, several Monte Carlo simulations were conducted considering nine different scenarios, consisting on three metallicities, coupled to three different evolutionary stages of the primary (0.6, 0.9 and 1.2 of the central hydrogen exhaustion time). For each configuration, data were sampled assuming an increasing difference between the mixing-length of the two stars, from perfect agreement at α ml,1 = α ml,2 = 2.0 to a maximum difference of 1.0 (α ml,1 = 2.5, α ml,2 = 1.5).
The mixing length values were then estimated adopting different hypothesis in the recovery procedure. In the first case we assumed full independence between the two stars and reconstructed them without imposing any constraint in age and chemical composition between the stars. A great difficulty to estimate the mixing-length value for the two stars resulted under these hypotheses. The standard deviation of the recovered values was about 0.40, confirming the difficulties pointed out for field stars in Valle et al. (2019). Thus, even from stars with exceptionally well constrained masses and radii, the calibration of the mixinglength parameter seems unreliable.
In the second case we imposed the constraint of common age and initial chemical composition for the two stars in the recovery. While the fitted α ml,1 values closely match those recovered under full independence, the α ml,2 values are much more peaked around unbiased estimates of the values adopted in the sampling. The standard deviations of the recovered α ml,2 values range from one half to one third of those of α ml,1 . This occurs because the primary star provides a much more tight age constraint in the joint fit than the secondary. This leads to the rejection of several extreme solutions for the secondary, while the solutions of the primary are unaffected. In this scenario we also explored the difference in the mixing-length values of the two stars due to random fluctuations owing the observational errors. We considered stars sampled at common α ml = 2.0 and focussed the analysis on the distribution of the differences α ml,1 − α ml,2 . We found that the posterior distribution of these differences was peaked around zero, with a somewhat large standard deviation of 0.3 (about 15% of the solar-scaled value). Therefore about 32% of systems with true identical α ml are expected to show differences higher than that caused only by random errors. This results should be carefully considered when obtaining a fit from a real binary system, because a difference lower than this one has a great chance to be only a random fluctuation.
In the third case we also imposed the constraint of common mixing-length value for the two stars, beside those on chemical composition and age. Two interesting effects were detected. First, the estimated common mixing-length is mainly dictated by the value of α ml,2 . This happens because, as discussed above, the posterior distribution of the mixing-length of the secondary star under partial independence is much more peaked than that of the primary, thus dominating in the joint estimate process. Second, an increasing share of systems cannot be fitted by the algorithm as the differences between the true α ml increase. For ∆α ml > 0.4 less than half of the systems can be recovered; at ∆α ml = 1.0 the values decreases at 20% . Therefore it seems that moderate differences in the mixing-length value between the two stars are difficult to detect and in these cases the solution is biased towards α ml,2 .
While most of the results presented in this paper can be considered general, such as the effect of the shrink of the es-timated mixing-length value around the value of the secondary star, nonetheless this work deals only with a specific binary system with fixed masses. Therefore the quantitative results cannot be expected to be valid without modifications for other binary systems with different masses and in different evolutionary phases. Although to check the robustness of the results presented here for different ranges of mass or by adopting a different mass ratio would be highly desirable, this possibility is actually limited by the very huge computational burden required to compute the stellar models for the recovery at the required level of accuracy. For these reasons our study should be considered as a first step in this exploration,adopting a quite common value of masses and mass ratio. More theoretical investigations are required to fully address this open topic.