Likelihood Methods for CMB Experiments

A great deal of experimental effort is currently being devoted to the precise measurements of the cosmic microwave background (CMB) sky in temperature and polarization. Satellites, balloon-borne, and ground-based experiments scrutinize the CMB sky at multiple scales, and therefore enable to investigate not only the evolution of the early Universe, but also its late-time physics with unprecedented accuracy. The pipeline leading from time ordered data as collected by the instrument to the final product is highly structured. Moreover, it has also to provide accurate estimates of statistical and systematic uncertainties connected to the specific experiment. In this paper, we review likelihood approaches targeted to the analysis of the CMB signal at different scales, and to the estimation of key cosmological parameters. We consider methods that analyze the data in the spatial (i.e., pixel-based) or harmonic domain. We highlight the most relevant aspects of each approach and compare their performance.


INTRODUCTION
After 71 years from its first predictions, and after 55 years from its first observational evidences, the cosmic microwave background (CMB) is nowadays one of the most important probes in cosmology. During the past decades, theoretical efforts have elucidated the physics leading to the pattern of anisotropies in temperature and polarization (see e.g., [1] for a review on CMB physics and [2] for an exhaustive review on early Universe physics). Quantum fluctuations in the early universe generate metric perturbations. Scalar perturbations are converted into matter perturbations and radiation anisotropies that evolve in the expanding universe according to a set of coupled Einstein, Boltzmann and fluid equations. Matter perturbations eventually grow into galaxies and galaxy clusters. Primary CMB anisotropies are frozen at the time of matter-radiation decoupling, and subsequently modified during the propagation through evolving structures from the last scattering surface to the observer. Scattering between free electrons and CMB photons in two distinct epochs (recombination and reionization) further enriches the CMB structure with the addition of a polarization "curl-free" (E-mode) pattern in the CMB radiation. Gravitational lensing of the CMB due to the propagation of CMB photons throughout large-scale structures (LSS) generate a polarized "divergence-free" (B-mode) patter from the distortion of the CMB E-modes. Perhaps more elusive, though of paramount importance, is the primordial B-mode signal sourced by tensor perturbations to the metric (gravitational waves).
On the other hand, observational efforts have progressively lead the field to the current stage of precision cosmology. Observations of the CMB sky from space missions [3][4][5], balloonborne experiments [6][7][8], and from ground-based telescopes [9][10][11][12][13] provided measurements of CMB anisotropies in temperature and polarization over a wide range of angular scales. While we are writing this review, the Planck collaboration [14] is preparing the final public release of the Planck legacy products, which will likely represent the state of the art of CMB measurements from a single experiment for the next decade and more. Current observations [3] are in agreement with the standard cosmological model of a homogeneous and isotropic Universe at large scales, based on General Relativity and on the standard model (SM) of particle physics, complemented with a mechanism for the generation of primordial perturbations, i.e., the inflationary paradigm. When interpreted in this CDM framework, cosmological data point to a spatially flat Universe composed by baryons ( b h 2 = 0.02237 ± 0.00015, ∼ 5% of the total density), dark matter ( c h 2 = 0.1200 ± 0.0012, ∼ 25%), and dark energy ( = 0.6847 ± 0.0073, ∼ 70%), a component that behaves like a cosmological constant, and is responsible for the present accelerated expansion, plus photons (a few parts in 10 5 ) and light neutrinos. Further advances in CMB observations are still to come. Planned upgrades of existing ground-based and balloon experiments are ongoing [15][16][17][18]. The next generation of CMB observatories is under construction and is paving the way to the "stage IV" experiments targeting the ultimate measurements of the CMB polarization field [19][20][21][22][23].
The long run that lead from the pivotal observations of Penzias and Wilson [24] to the Planck legacy release has seen the dramatic improvement of the sensitivity to key cosmological parameters. Planck 2018 data provides sub-percent constraints on the base-CDM parameters 1 [3]. Moreover, advances in experimental cosmology over the past decades made cosmology itself a new avenue to the investigation of fundamental physics properties complementary to laboratory searches. A clear example is given by the possibility to constrain neutrino properties, such as their number N eff and the sum of their masses m ν . Indeed, the combination of Planck 2018 data and LSS information (in the form of measurements of baryon acoustic oscillations, BAO) can exclude at 95% c.l. the presence of light thermal relics decoupling after the QCD phase transition (T < 100 MeV) and provides a bound on the sum of the neutrino masses of m ν < 0.12 eV at 95% c.l. 2 [3].
In this context, a key ingredient is the suitable choice of the likelihood function to compare observed data with theoretical 1 The base CDM parameters are: the angular size of the sound horizon at recombination θ * , the amplitude A S and tilt n s of the spectrum of primordial scalar perturbations, the reionization optical depth τ , the energy density in baryonic matter b h 2 and in cold dark matter c h 2 . 2 Constraints derived in the context of minimal extensions of the standard CDM model. predictions in order to constrain the model parameters. In the standard cosmological model of the early universe, primordial perturbations are Gaussian distributed, and so are CMB fluctuations. Therefore, all relevant physical information in the CMB field are contained in the variance of the distribution. This is the reason why the full-sky power spectra of CMB fluctuations are a sufficient statistics. The power spectra of observed data also provide an unbiased estimator of the ensemble averaged variance of the CMB fluctuations. In the simple case of full-sky observations and isotropic and mode-uncorrelated experimental noise, the likelihood function can be derived analytically. In particular, for correlated temperature and polarization field, the probability of the data given the theoretical model (i.e., the likelihood L) is given by a Wishart distribution. However, this simple case does not capture the properties of realistic observations. Depending on the experimental platform (satellite, balloon, ground), each telescope has access to fractions of the sky f sky of different size. As an example, compare the almost full-sky observations of the Planck satellite [14] with the f sky ∼ 1% sky coverage of the ground-based BICEP-2 experiment [11]. Even in the case of full-sky observations, only a certain fraction of the sky can be retained for cosmological analysis. Foreground emissions from astrophysical and galactic sources should be masked if particularly bright contaminants. In addition to limited access to the sky coverage, a particular choice of the observational (or scanning) strategies of the sky can break the assumption of isotropic noise, due to repeated visits to the same part of the sky. As an example, consider the Planck scanning strategy featuring a longer integration time in the proximity of the Ecliptic poles (i.e., at lower Galactic latitudes, where galactic foreground contaminations are smaller).
In general, complications to the simple case of full-sky and isotropic noise arising from realistic experimental conditions require a different likelihood analysis. First of all, specific estimators of the power spectra should be defined in the partial-sky regime, which take into account spurious correlations between fields induced by the incomplete sky coverage. Secondly, the use of a Wishart distribution as a likelihood function is no longer possible. Either the new estimators are no longer distributed according to a Wishart, and therefore this choice is not exact anymore. Or, the use of the exact likelihood is unfeasible as one moves to the analysis of smaller scales (larger multipoles) and higher-resolution maps, due to the huge computational cost of inverting large covariance matrices. At large scales and for lowenough angular resolutions, the exact likelihood in pixel space can still be adopted.
In all the above situations, approximate forms of the likelihood functions have been developed. At small scales, the central limit theorem allows to approximate the Wishart distribution as a Gaussian in the power spectra. In general, quadratic forms in some functions of the CMB spectra have been adopted as approximate likelihood functions, with various choices of the covariance matrix. To conclude this long introduction, it has to be stressed that the choice of the likelihood strongly depends on the characteristics of the experiment at large, i.e., on the observational strategy, on the range of scales probed, on the noise properties, etc.
The aim of this manuscript is to review the basics of likelihood analysis in CMB experiments. This goal is motivated by the fact the we are at a crucial point in the history of observational CMB cosmology. The level of maturity and complexity reached by current CMB experiments boosted the theoretical efforts in finding smart solutions to the issue of identifying a suitable likelihood choice. A rich literature has been produced in this sense, although an exhaustive overview of the topic is not available, to the best of our knowledge. This review would fill the gap. Such a review could also serve as a good starting point for those who are approaching the field of CMB data analysis today or in the next future, and would be ideally contributing to the advances of CMB science in the next decades.
The structure of the manuscript is as follows. Section 2 is devoted to the statistics of the CMB fields. The approach to the topic is pedagogical, in a sense that we begin with a discussion in the single-field, temperature-only regime and introduce the basic statistical properties of CMB fluctuations. Then, we move to the more general case of correlated T, E, B fields. The discussion is carried over in the full-sky regime, with no distinction made between applications to large-and small-scales. We conclude section 2 with the introduction of the exact likelihood in full sky. Specific approximations to the exact likelihood are presented in section 3 (applications to the small-scale regime) and in section 5 (applications to the large-scale regime). In both sections, attention is devoted to complications due to partial-sky coverage and noise contamination. The inclusion of physical late-time Universe effects on the CMB photons in terms of gravitational lensing is detailed in section 4, whereas the important issues related to the presence of foreground emissions are described in section 6. The discussion of the various likelihood approaches in terms of computational cost (where applicable) and robustness with respect to the ability to provide unbiased estimations of cosmological parameters is detailed in section 7. Our conclusions are summarized in section 8. Some useful tools that will be mentioned throughout the main text are further discussed in Appendix. In particular, in Appendix A, we review the basic notions of statistics needed to develop the formalism of CMB statistics. In Appendix B, we discuss the construction of power spectrum estimators, including pseudo-C ℓ , the "pure" formalism, and quadratic maximum likelihood (QML) estimator.

STATISTICS OF THE COSMIC MICROWAVE BACKGROUND
We now introduce some basic aspects of the statistics of the CMB. The basic object that we are interested in is the likelihood function L, i.e., the probability of the observed data d given a model, regarded as a function of the model itself. If the model is defined in terms of a vector of parameters θ, we thus have: The notation used throughout this review is presented in Appendix A, were we also recall some basic notion of probability and statistics.
We first derive the exact likelihood function for the CMB fields in harmonic space. The main statistical concepts are introduced in the limit of single field (section 2.1), i.e., temperature only, for the sake of simplicity. We then generalize these main findings in the case of joint temperature and polarization analysis (section 2.2). The exact likelihood in real space are derived in section 2.3.
We assume an ideal scenario of full-sky observations with infinite angular resolution and absence of noise and foreground contaminations. Obviously, this scenario is highly idealized. Nevertheless, it allows to easily derive the basic concepts of CMB statistics. Modifications to this picture arising from realistic observational issues (limited sky coverage, masked sky, experimental noise, finite angular resolution) are introduced in section 3.3. Foregrounds are briefly discussed in section 6. We also assume that the temperature and polarization fluctuations are Gaussian, thus neglecting any non-Gaussianity, either of primordial origin (which are anyway bounded to be small [25]), or coming from unresolved systematics (e.g., foreground residuals).
A final remark concerns the physical, late-time-Universe effects on the CMB fields due to the propagation of CMB photons from the last-scattering surface to the observer throughout the evolving large-scale structures. Weak gravitational lensing due to the gravitational potential of cosmological structures deflects CMB photons and modifies the observed statistics of CMB anisotropies with respect to the pattern arising at decoupling. In what follows, we will implicitly consider unlensed CMB fields, i.e., we will ignore the effects of gravitational lensing for the sake of simplicity. The non-trivial modifications induced by the gravitational potential will be discussed later in section 4.

Statistics of CMB Temperature Field-Exact Likelihood in Harmonic Space
The CMB temperature field T( x,n, τ ) =T(τ ) 1 + ( x,n, τ ) observed 3 in a given directionn is defined at every point ( x, τ ) in space and time. The field has been decomposed in an isotropic background valueT(τ ) and a small perturbation, the anisotropy field ( x,n, τ ) = (T −T)/T. Anisotropies are assumed to be the result of a Gaussian random process originated from quantum fluctuations in the early Universe. The observed temperature field generated by scalar fluctuations is a linear operator acting on three-dimensional perturbation fields: where ξ ( k) is the primordial curvature perturbation, and the source function S (s) T for scalar temperature fluctuations is a linear combination of the cosmological perturbation fields (see [26] for the explicit expression). A similar expression holds for temperature fluctuations generated by tensor perturbations [26].
In Equation (2), we have suppressed the τ and x dependence in as we are implicitly assuming that the temperature field is observed in a given position at a fixed point in time.
Assuming a given cosmological model, we cannot directly predict the particular realization of the temperature field. Instead, we shall infer statistical properties of the observed perturbation field. It is useful, to this purpose, to decompose the angular dependence of the temperature anisotropy field in spherical harmonics Y ℓm (p) where the harmonic Y ℓm corresponds to an angular scale θ ∼ π/ℓ with (2ℓ + 1) m-modes for each multipole ℓ. Low multipoles (low-ℓ) in the expansion correspond to large angular scales in the sky, whereas high multipoles (high-ℓ) correspond to small scales. Since is real, the decomposition coefficients a ℓm have to satisfy the reality condition All the information about the x and τ dependence of is now encoded in the a ℓm 's.
We are interested in extracting information about the statistical properties of the a ℓm 's from the observations. In the standard cosmological model, a ℓm s follow a Gaussian distribution, with vanishing average ( a ℓm = 0, since the a ℓm are expansion coefficients of the anisotropy field, whose mean vanishes), and covariance where the constraints imposed by the two Dirac delta functions follow from the a ℓm being independent random variables (diagonal covariance). Moreover, statistical isotropy ensures that the variance does not depend on m (rotational invariance of C ℓ ). The C ℓ 's are the angular power spectrum of the CMB temperature field. The power spectrum is related to the two-point correlation function of the field C(θ ) = (n 1 ) (n 2 ) observed at two directionsn 1 andn 2 in the sky such thatn 1 ·n 2 = cos θ : where P ℓ is the Legendre polynomial of order ℓ. If a random variable is Gaussian distributed, all the statistical properties are encoded in its mean and variance, which are the only momenta of the distribution we need to know. In fact, for a Gaussian distribution, odd momenta vanish and even momenta beyond the second can be recast as a function of the variance (Wick's theorem). Thus, the power spectrum C ℓ , or equivalently the two-point correlation function C(θ ), completely characterizes the statistical properties of the anisotropy field.
Since the a ℓm 's follow a Gaussian distribution with zero mean and variance C ℓ , we can readily write the probability density function p(a ℓm |C ℓ ) of the a ℓm 's conditioned by the C ℓ 's: Given the observed temperature field and the corresponding a ℓm 's, this expression already provides the likelihood function for the theoretical (model) C ℓ 's. However the information contained in the a ℓm 's can be further compressed, as we shall see in the following. Statistical isotropy of the C ℓ 's allows us to rewrite Equation (5) as: Some considerations are in order at this point. The average operation defined with the symbol ... in Equations (5) and (8) is an ensemble average. As noted above, the CMB field is a realization of a random process and statistical information about the outcome of such a process should be obtained by averaging over all possible realizations. In practice, however, we can only observe a single realization of the CMB field. A way out is provided by the statistical omogeneity and isotropy of the CMB fluctuations, that in principle allows to substitute the ensemble average in Equation (5) with an average over different positions and directions. According to this ergodic hypothesis, different regions that are widely separated in the sky are statistically independent from each other and can be considered as different statistical realizations of the same stochastic process. Since we only have access to the CMB field observed at x 0 and τ 0 , i.e., the CMB field here and now, what we are really left is the average over different directions, or equivalently over different values of m. In other words, for a given ℓ, all the a ℓm are drawn from the same distribution, which can be therefore sampled by measuring all the 2ℓ + 1 coefficients. We are thus led to define an estimator of the observed power spectrum with the property Ĉ ℓ = C ℓ . Note that in Equation (9) the ensemble average does not appear: we are forced to measure C ℓ only with a limited number of values. This induces an intrinsic source of inaccuracy due to replacing the true variance C ℓ with the observed powerĈ ℓ (i.e., by replacing the ensemble average with the average over directions). This effect is known as cosmic variance: where the third equality follows from Wick's theorem. Cosmic variance is an irreducible source of uncertainty in cosmological measurements of the CMB power spectrum, and one of the major sources of uncertainties especially at the largest scales (low-ℓ), where we have only a limited number of coefficients a ℓm to average over with respect to the smallscale (high-ℓ) regime. Equation (10) is valid provided full-sky observations. However, in real data analysis, even if we are able to observe the full sky (e.g., with space missions), we are nevertheless forced to mask a certain fraction of the sky, e.g., to avoid foreground contamination. An approximate estimate of the increase is given by a factor 1/f sky , where f sky but see e.g., [27] for a careful counting of the degrees of freedom available in cut-sky regimes. Current experiments like the Planck satellite are ideally cosmic-variance-limited up to very high multipoles, i.e., ℓ ∼ 1500.
To derive the distribution of the observed C ℓ 's, we note that the sum of ν = (2ℓ + 1) standard Gaussian variables follows a χ 2 distribution with ν degrees of freedom. If we defineŶ ℓ = m (|a ℓm / √ C ℓ | 2 ), this new variable has a χ 2 distribution: The estimator (hereafter observed)Ĉ ℓ is a multiple ofŶ ℓ :Ĉ ℓ = C ℓŶℓ /(2ℓ + 1), and multiples of χ 2 -distributed variables follow a Gamma distribution: The previous expression is the probability of the observed power spectrum given the fiducial power, and for fixed data it can be still regarded as a likelihood L(C ℓ ), in which the role of the data is not played by the a ℓm 's as in Equation (7), but by thê C ℓ . The mean and variance of the distribution of theĈ ℓ 's are E[Ĉ ℓ ] = C ℓ and Var[Ĉ ℓ ] = 2C 2 ℓ /ν. The maximum of the distribution is in (ν − 2)/νC ℓ , that does not coincide with the mean of the distribution. As such, the distribution of observed C ℓ is skewed. However, in the limit ν → ∞, the distribution in Equation (12) tends to a Gaussian distribution with same mean and variance, according to the central limit theorem. Note that the variance of the distribution is exactly the cosmic variance introduced in Equation (10). This further stresses the meaning of the cosmic variance as an irreducible source of uncertainty due to the limitation of having access to a single realization of the Universe (i.e., the limitation due to estimating the true power spectrum C ℓ with the observed power spectrumĈ ℓ ).

Statistics of Joint CMB Temperature and Polarization Fields-Exact Likelihood in Harmonic Space
The above treatment has to be generalized in the case of the joint analysis of temperature and polarization fields T, E, B. In analogy to the temperature case, we can define two sets of spherical harmonics coefficients for E and B: where ±2 a ℓm are the expansion coefficients of the combinations of Stokes parameters describing the polarization state of the CMB signal-(Q ± iU)-in spin-2 spherical harmonics ±2 Y ℓm (see e.g., [28] for a derivation of the formalism).
The variable X a = (a T ℓm , a E ℓm , a B ℓm ) is distributed according to a Gaussian multivariate distribution with covariance matrix where it is explicitly seen that the temperature and the Epolarization fields are correlated, whereas the parity-even fields (T and E) are uncorrelated with the parity-odd field B (although this is strictly true only in the standard cosmological model when parity violation processes are forbidden in the early Universe). In analogy to Equation (9), the estimators for the observed power spectra are given by the following matrix: where the observed cross-correlations TB and EB may be nonvanishing as well.
The probability of X a at each ℓ can therefore be expressed as: Note that S ℓ represents sufficient statistics for this likelihood function: in the full-sky regime, Equation (16a) only depends on the data through S ℓ and therefore information on the CMB sky can be losslessly compressed to a set of power spectrum estimators S ℓ = S ℓ (Ĉ XY ℓ ), X, Y = T, E, B. The probability of S ℓ given V ℓ = V ℓ (C XY ℓ ) is obtained by properly normalizing (Equation 16a). In the previous section, we have seen that the single-fieldĈ ℓ is Gamma-distributed. It is easy to understand that the full set of observed power spectra S ℓ has a Wishart distribution, i.e., a multi-dimensional generalization of the Gamma distribution, with ν = (2ℓ + 1) degrees of freedom in p = 3 dimensions: (17) represents the exact expression of the likelihood function of the C XY ℓ . Since V ℓ is separable in the two blocks TE and B, we can simplify the problem and consider two separate Wishart distributions for the block TE and for the block B: The latter is further simplified since it reduces to the onedimensional Gamma distribution, as described in details in the previous section. The distribution for the TE block can be fully expanded as: The marginal distribution of each individual diagonal element of S TE ℓ can be obtained by integrating p(S ℓ |W ℓ ) TE overĈ TE ℓ and the other diagonal element, and it is again a Gamma distribution as we expect it to be, in analogy to discussion in the previous section.
However, the marginal distribution of the off-diagonal termsĈ TE ℓ is not a Gamma distribution, and it is interesting to note that it depends on C TT ℓ and C EE ℓ in addition to C TE ℓ (see [29] for a detailed calculation).
In the limit ν → ∞, the Wishart distribution ofX C = (Ĉ ℓ TT ,Ĉ ℓ TE ,Ĉ ℓ EE ) tends to a multivariate Gaussian distribution with covariance matrix: The variance ofĈ ℓ TT andĈ ℓ EE is the same of the single-field limit, whereas the variance of the cross-correlationĈ ℓ TE reflects the different marginalized distribution ofĈ ℓ TE itself.

Statistics of Joint CMB Temperature and Polarization Fields-Exact Likelihood in Real Space
The discussion in sections 2.1 and 2.2 refers to the CMB statistics in harmonic space, i.e., the space in which the CMB fields are expanded in spherical harmonics and the physical information are encoded in the expansion coefficients a ℓm . In this subsection, we will review the basics of CMB statistics in real space. The starting point are the observed CMB maps of the three Stokes parameters T, Q, and U. These maps can be discretized into N pixels and arranged in N-dimensional vectors T, Q, and U. As discussed in the previous section, the statistical properties of these objects are fully encoded in the auto-and cross-power spectra C XY ℓ , with X, Y = {T, E, B} for temperature, E-mode and B-mode polarization.
The exact likelihood function in real space (also called the pixel-based likelihood) is defined as (21) where m is the vector with 3N elements built from the justaxposition of T, Q, and U, and M is the total covariance matrix. The matrix M depends only on the angle between two directions in the skyn i,j The (3 × 3) entries in Equation (22) for any given pair of pixels ij depend on the Legendre polynomial P ℓ and the fiducial power spectra. As a straightforward example, the entry T i T j is the expression where . A detailed description of the full procedure to obtain the covariance matrix, together with the expressions of the (3 × 3) entries, can be found in appendix A of Tegmark and de Oliveira-Costa [30].
It should be noted that the pixel-based likelihood in Equation (21) is exact even in the case of partial sky coverage; this not the case for the likelihood in harmonic space in Equation (17). Note however that it is still possible to derive an exact form for the harmonic-space likelihood even for partial sky coverage [31]. The pixel-based approach ensures mathematical rigor in the evaluation of the likelihood function. Nevertheless, it is highly expensive from a computational point of view. Indeed, the number of pixels needed to retain the information in the first ℓ max multipoles of the power spectrum scales as ℓ 2 max and therefore the Cholesky decomposition required to evaluate the inverse of the covariance matrix in Equation (21) scales roughly as ℓ 6 max , where ℓ max is the highest multipole retained in the analysis. The computational cost is therefore driven by the evaluation of inverse matrix and determinants and becomes prohibitive for ℓ max larger than few hundreds. For this reason, this exact approach is feasible only to study large angular scales, where the information is contained in a relatively small number of multipoles.

LIKELIHOOD APPROACHES-SMALL-SCALE REGIME
The exact likelihood of the observed CMBĈ XY ℓ as a function of the underlying fiducial CMB C XY ℓ is given by Equation (17) in case of full-sky observations: (24) However, complications arise in real analysis that make it necessary to replace Equation (17) with a suitable approximation. Complications usually include time-consuming evaluations of Equation (17) due to the inversion of large covariance matrices for each theoretical model.
A standard approach is to develop an approximation of Equation (17) in the full-sky regime that is quadratic in some function of C XY ℓ , and that can be easily generalized to the cut-sky regime with a proper estimate of the covariance matrix: In what follows, we introduce a list of the most common approximate forms among those proposed in the literature (see e.g., [29,[32][33][34][35][36]). We further quantify the goodness of the approximation in the full-sky regime following the approach in Percival and Brown [29]: we expand the exact likelihood and the approximate forms along the standard axes (TT, TE, EE) around the maximum X C =X C , and compare the expansion coefficients up to a certain order.
As already commented in the previous section, the analytic comparison of the various approximations is carried in absence of noise contaminations and in the limit of infinite angular resolution. We also implicitly assume that the CMB spectra are unlensed, i.e., they are the spectra as they would be observed in absence of gravitational lensing effects on the CMB photons. The inclusion of experimental noise, experimental angular resolution, and gravitational lensing effects will be discussed in sections 3.3 and 4.
Before moving to the list of the most common approximate likelihood functions, we would like to mention that it is not trivial to construct an unbiased estimator of the true C ℓ in the cutsky regime. We don't have access to the full-sky set of a ℓm and therefore we cannot directly constructĈ ℓ . In the case of cut-sky maps, appropriate algorithms have been developed to derive the unbiased estimatorĈ ℓ to be used in the likelihood analysis. For example, pseudo-C ℓ power spectraC ℓ can be defined from cutsky harmonic coefficientsã ℓm , see section B for further details and references. The pseudo-C ℓ are related to the true C ℓ in ensemble average as where M ℓ 1 ℓ 2 is a coupling matrix that encodes the geometrical effects of cut-sky observations. From Equation (26), it is possible to operatively define an estimator for the C ℓ in the cut-sky regime as:Ĉ The interested reader can find a detailed discussion in Appendix B, where we also report alternative methods adopted to construct estimators for the BB spectrum B.2, and for power spectra at large scales via the quadratic maximum likelihood (QML) approach B3. A final remark on the cut-sky case: the compression of information from CMB maps (∼ (N pix × N pix ) pixels) to CMB spectra (∼ (ℓ max − ℓ min ) bandpowers) is lossless in the full-sky regime, i.e., the power spectra represent sufficient statistics. In the cut-sky regime, the compression is partly lossy, as the masked regions induce correlations between multipoles which have to be taken into account (see e.g., discussion in Appendix B).

Approximate Forms
The most common approximations are given by quadratic/Gaussian expressions in C XY ℓ with different choices for the covariance matrix. Alternatively, quadratic expressions involving more complicated functions f = f (C XY ℓ ) ≡ Z C , as well as ad-hoc combinations of various approximations have been developed to match the exact likelihood up to a certain order in the perturbative regime (see section 3.2).
• Symmetric Gaussian. This approximation is quadratic in Z C = C XY ℓ , with covariance matrix given by the curvature of the Wishart, see Equation (29): The inverse of the covariance matrix Y C is the curvature of the Wishart distribution in Equation (19a), i.e., Y −1 • Improper Gaussian. This approximation is similar to the Symmetric Gaussian in Equation (28), with the covariance matrix that appears in the first term replaced by Y = Y(C XY ℓ ); i.e., the covariance matrix is given in terms of the model C XY ℓ . This approximation is an improper Gaussian in a sense that there is no determinant term: • Determinant Gaussian. The expression in Equation (30) can be slightly modified to provide a better fit to the exact likelihood approach (see section 3.2). The modification consists in the addition of a C XY ℓ -dependent determinant term: This approximation is similar to Equations (28) and (30), with a constant determinant term (as in Equation 28) and the covariance matrix computed for a given fiducial model (as in Equation 30). The fiducial model for the covariance matrix is however kept fixed, and assumed to be smooth and a close approximation to the underlying model under scrutiny: The fiducial Gaussian approximation is used in the official analyses of the Planck [37,38], ACT [9], and SPT [10] collaborations.
• Log-normal. This approximation is quadratic in a peculiar function of theoretical and observed spectra, i.e., • Offset log-normal. This approximation is a generalization of Equation (33). The data vector is generalized to , with a XY a suitable real offset coefficient that may or may be not be the same for every XY pair. The covariance matrix is again as in Equation (33) • One-third-two-thirds. We briefly mention this approximation as an example of combined likelihood appositely built to match the exact likelihood up to the third order. It is a weighted combination of the improper Gaussian in Equation (30) (with weight 1/3) and of the log-normal approximation in Equation (33) (with weight 2/3). Note that the approximation was explicitly built for the single-field TT-only WMAP analysis [39]: In Hamimeche and Lewis [32], Hamimeche & Lewis (HL) have developed a form of the likelihood for correlated Gaussian fields (CMB temperature and polarization) that coincides with the exact likelihood in full sky. The authors show with simulations that it provides a very good approximation to the exact likelihood in the cut-sky regime at small scales 4 (ℓ ≥ 30). The form of the likelihood is quadratic in some peculiar function of the observed, fiducial, and theoretical C ℓ , as we shall see in section 3.2. The covariance matrix is precomputed for a fixed fiducial model.
The dependence on the fiducial model is negligible. Moreover, should the fiducial fail in matching the true sky, the likelihood is still exact in full sky. The HL likelihood was used in the analysis of the BICEP2/KECK data [11].
In the HL formalism, the likelihood in cut-sky can be approximated as: where X g is a vector of a specific function of the observed, fiducial, and theoretical C ℓ , and M f is the fiducial model covariance block matrix with n(n + 1)/2 × n(n + 1)/2 blocks (n is the number of fields), labeled by ℓ and ℓ ′ , i.e., we explicitly take into account the possibility that either the cut-sky or anisotropic noise can induce correlations between different multipoles (non-diagonal covariance). The derivation of Equation (36) is provided in section 3.2, where we will show that it is equivalent to the exact likelihood in the full-sky regime. For more details and a thorough definition of the notation, see Hamimeche and Lewis [32]

Comparison With the Exact Likelihood in the Full-Sky Regime
In this section, we comment on the goodness of the approximations listed in the previous section. The goodness is defined in terms of the ability to match the exact likelihood in the full-sky regime up to a certain order, when both the exact likelihood and the approximate form are expanded around the maximum. This approach is described in Percival and Brown [29].
Let's start by expanding the Wishart distribution in Equation (19a) along the TT direction. In particular, we write Equation (19a) with the following substitutions: We further expand in ǫ.
We obtain: to show that the expansion along the EE axis provides the same form of Equation (38) for −2 ln[L(C EE ℓ )]. Let's now expand along the TE axis with the following The expansion is: Let's now move to expand each of the approximations reported in section 3.1.
• Symmetric Gaussian. The expansion of Equation (28) is of the same form along each of the standard axes TT, TE, EE, with a different normalization factor in the case of the expansion along TE: Note that the expansion is truncated at the second order on ǫ. This result is an exact expansion and it is expected, given the initial form (symmetric Gaussian) of the approximate likelihood. If we compare Equation (40a) with the expansion of the Wishart in Equations (38)-(39), we observe what follows. Firstly, the Symmetric Gaussian matches the Wishart only up to the second order on ǫ. Secondly, the approximate form is, by definition, symmetric in ǫ and therefore fails in capturing the skewness of the exact likelihood. Finally, it is biased low along the TT, EE axes in a sense that −2 ln(L gaussian (ǫ)) < −2 ln(L exact (ǫ)) for ǫ > 0 (i.e., for C ℓ >Ĉ ℓ ). For the opposite reason, it is biased high along the TE axis. • Improper Gaussian. The expansion of Equation (30) along the standard axes are as follows: With respect to the symmetric Gaussian, the improper Gaussian approximation is skewed in the same direction of the Wishart. However, it is still a correct match only up to the second order. With respect to the exact likelihood, Equation (41a) show that the improper Gaussian is biased high along the TT, EE directions, where −2 ln(L improper (ǫ)) > −2 ln(L exact (ǫ)) for ǫ > 0 (i.e., for C ℓ >Ĉ ℓ ). For the opposite reason, it is biased low along the TE direction.
• Determinant Gaussian. In this case, it is clear from Equation (31) that the likelihood is biased at each multipole ℓ as the C ℓ -dependent determinant term implies that the minimum value for this approximate form is not in ǫ = 0. Indeed, the expansions of Equation (31) along the standard axes include a term of order ǫ: However, it can be shown that this approximation, although biased at each individual ℓ, is unbiased "on average", i.e., reproduces the correct result with reasonable accuracy when summed over a wide-enough range of multiples (see HL). • Fiducial Gaussian. The expansion of Equation (32) is equivalent to the expansion in Equation (40a), only with a different normalization factor. Indeed, the covariance matrix in Equation (32) is fixed to that of a given fiducial model, and therefore the approximation is quadratic in ǫ. Note however that, although the form of the expansion is similar at any ℓ between the symmetric Gaussian and the fiducial Gaussian, the latter provides a better approximation of the exact likelihood when summed over a range of multipoles (see discussion in Hamimeche and Lewis [32]). • Log-normal. In this case, we have: in Equation (25), with Y C = Y C (Ĉ XY ℓ ) being the curvature matrix. The expansions along the auto-and cross-spectra directions can be easily obtained up to normalization factors: Regardless of the normalization factors, a comparison between Equations (44a)-(44c) and Equations (38) and (39) shows that the log-normal distribution provides a good approximation to the exact likelihood up to the second order in the expansion around the maximum. The two distributions have a different shape starting from the third-order term in the series expansion. It is also interesting to note that, normalization factors aside, the expansions along the standard axes are identical. This is a further difference with respect to the case of the Wishart distribution. • Offset log-normal. The log-normal distribution can be slightly modified in a way that it could approximate the exact Wishart distribution up to third order. The modified log-normal, or offset log-normal, is quadratic in: where the offset factors a TT , a TE , a EE can be adjusted to match the Wishart distribution up to the third order. The covariance matrix is again assumed to be Y C = Y C (Ĉ XY ℓ ). Expanding the offset log-normal in the usual way, one gets: A comparison between Equations (44a)-(44c) and Equations (38,39) makes it clear that the offset lognormal distributions is a good approximation to the Wishart distribution up to the third order in the expansion, provided that Comparing the TT expansion in Equation (41a) and in Equation (44a) with the TT expansion of the exact likelihood in Equation (38), it is clear that the weighted sum of the improper Gaussian and the lognormal distribution with weights 1/3 and 2/3, respectively matches the Wishart distribution up to the third order in ǫ: • Hamimeche-Lewis. By construction, this likelihood approximation matches exactly the Wishart distribution in the full sky regime. Indeed, the true power of this approximation stands in the fact that the exact quadratic form derived from the full-sky exact likelihood result is assumed to be valid also in the cut-sky regime and at high multipoles, where it is faster to evaluate than the exact calculation.
We show here the equivalence between the exact likelihood in Equation (17) and the Hamimeche-Lewis formalism in full sky. In what follows, we make use of the matrix notation adopted in Hamimeche and Lewis [32]. This notation is somehow different from the formalism used in the previous examples. However, it is a more suitable choice to better appreciate the H&L approximation. We assume the matrix of the estimatorsĈ ℓ to be positive definite. In the full-sky limit, given n gaussian fields, the likelihood function is defined as in Equation (16a) where 5 , with respect to Equation (16a), S ℓ →Ĉ ℓ and V ℓ → C ℓ . In passing from Equations (49a) to (49c), we consider that the symmetric form is defined using the Hermitian square root and C −1/2 for orthogonal U ℓ and diagonal D ℓ . In other words, we diagonalize the matrix C −1/2 In order to generalize Equation (49c) to the cut sky, we want to reshuffle it in such a way that it resembles a quadratic form: where the function g(x) is defined as and In order to transform Equation (50b) in a form that is quadratic also in the matrix elements, we exploit the following matrix identity 6 : Frontiers in Physics | www.frontiersin.org where X gℓ is the vector of C gℓ ≡ C elements for a given fiducial model C f ℓ , with dimension n(n + 1)/2, and M f ℓ is the covariance ofX evaluated at C ℓ = C f ℓ . Therefore, Equation (50b) can be rewritten as We stress that this formulation is exact in the full sky regime, as it has been obtained by means of matrix identities and no approximations have been adopted so far.

Including the Effects of Noise and Beam Smearing
The signal observed with a real CMB experiment is affected by various sources of experimental contaminations. Here, we focus on two main classes of experimental effects: the noise bias and the beam smearing. The noise bias is due to the instrumental noise from detectors that adds up to the cosmological signal. It has to be characterized and subtracted from the (overall) signal (an alternative approach posits in cross-correlating different detectors and assuming their individual noise to be uncorrelated, see section 3.4). In the simple case of isotropic noise in real space, the noise level is independent from the direction. This translates in a diagonal noise in harmonic space, i.e., the noise power spectrum N ℓ is an additive bias for the CMB power spectrum. A very simple example is the case of isotropic and homogenous noise in real space, i.e., the noise level is the same in each pixel. This translates in a "white noise" in harmonic space, i.e., N ℓ is constant in ℓ. A usual assumption is also to consider the noise in temperature and polarization to be uncorrelated. If the noise is anisotropic (i.e., it changes from pixel to pixel) for example because of a particular scanning strategy that induces anisotropic sky coverage, it may induce correlations between a ℓm , and different considerations apply.
The beam smearing is due to the fact that the instrument has a finite angular response. The signal observed along a certain direction takes contributions from all angular directions. These contributions are weighted with the angular response of the instrument. In real space, this effect is described as a convolution of the observed signal with the angular response (hereafter beam) In harmonic space, the convolution becomes a product between the harmonic expansions of the CMB fields and the beam a ′ ℓm = b ℓm a ℓm . In the simple case of gaussian beam of width σ FWHM = 2 √ 2 ln 2σ , the harmonic expression of the beam is independent from m and takes the simple form of The beam smearing is a multiplicative bias for the CMB power spectrum.
In presence of noise and beam smearing, the observed signal is d ℓm = B ℓ a ℓm + n ℓ , and the estimator in Equation (9) becomeŝ From Equation (55) it is clear thatD ℓ is a biased estimator of the true power spectrum C ℓ , D ℓ = B 2 ℓ C ℓ + N ℓ . In addition, the variance of the estimator takes an additional contribution. In presence of noise and beam effects, the variance becomes The variableD ℓ ≡Ĉ ℓ +N ℓ /B 2 ℓ still has a Ŵ distribution, and all the considerations made for theĈ ℓ estimator still apply to the (slightly) more general case of noise and beam biases, provided thatĈ ℓ is replaced withD ℓ . More in detail, when both temperature and polarization are considered, the matrix of estimators S ℓ still has a Wishart distribution [see Equation (17)] with a revised W ℓ = V ℓ /(2ℓ + 1) matrix 7 where we have allowed for different (and uncorrelated) noise levels in temperature and polarization, and for different beam sizes in temperature and polarization.
The covariance matrix of the estimatorsD ℓ is equivalent to that ofĈ ℓ in Equation (20a), provided that C TT ℓ , C EE ℓ are replaced with the power spectra corrected for noise and beam.

Multi-frequency Analysis
The results presented so far have been discussed assuming the special viewpoint of a single-frequency experiment. In reality, CMB experiments often rely on multi-frequency observations to better characterize the cosmological signal and extract it from the multi-component sky-signal observed (see discussions in e.g., [37,40]). Moreover, multiple detectors sharing the same central frequency are always available, so that the final signal at a certain frequency can be effectively thought as a weighted average of the signals observed with multiple detectors.
In general, if n maps are available, there are n − 1 combinations that are independent from the signal (if two maps share the same signal and have different noise properties, their difference is independent from the common signal). There is one independent combination defined as the weighted average of the n available maps where w i are the noise weights associated to each of the n maps. In the simple scenario of isotropic noise N ℓ for each map, the weights can be defined as The noise-weighted map is a sufficient statistics for the CMB field, and therefore all the considerations above about the choice of the likelihood also apply to the combined map. An estimator for the power spectrum can be constructed from the noise-weighted map.
Another possibility is to build estimatorsĈ ij ℓ = (1/(2ℓ + 1)) m a i ℓm (a j ℓm ) * from pairs of maps and then define a weighted estimatorĈ NW ℓ = ij w ijĈ ij ℓ , where the weights w ij depend on the noise levels in the individual maps. It can be shown that the latter solution is equivalent to estimating the observed power spectrum from the noise-weighted map, and again all the considerations about the likelihood choice apply to this case as well (see discussion in Hamimeche and Lewis [32], appendix C).
Regarding the latter solution, a more robust choice is to build the noise-weighted estimatorĈ ℓ from cross-spectra only, i.e., from pairs (ij) with i = j. If the noise in the individual maps is uncorrelated, to take cross-spectra is safe with respect to the introduction of possible biases in the final estimator due to unaccounted errors in the noise model 8 . However, the statistics of the estimator obtained from cross-spectraĈ CS ℓ may differ from that of the generic noise-weighted estimator. In particular, the cross-spectra might not be positive-definite. Therefore, in principle one should use a distribution forĈ CS ℓ other than the Wishart. However, it can be demonstrated (see e.g., appendix C in Hamimeche and Lewis [32]) that, in the limit of many maps available, the distribution ofĈ CS ℓ approaches that ofĈ NW ℓ , and hence one can use the same approximations developed in the case of the generic noise-weighted estimator.
Before concluding this subsection, a note on the covariance matrix. When multiple maps are available and the estimators are build from a combination of those maps, the expression for the covariance matrix can be generalized as follows: where ij, ab denote all possible combinations of pairs of maps, XY, WZ are pairs of fields T, E, B, and we have explicitly taken into account the possibility of mode-coupling between ℓ 1 , ℓ 2 (e.g., in the cut-sky regime). In the simple case of single-map in full-sky, Equation (59) reduces to Equation (20a).

GRAVITATIONAL LENSING
In the discussions so far, we have implicitly assumed that the CMB fields are unlensed. This is not the case in the reality. CMB photons traveling from the last scattering surface to the observer feel the gravitational effects of the evolving structures in the Universe. This effect is analogous to the weak lensing effect observed in galaxy surveys, where images of source galaxies are distorted and magnified by foreground structures acting as gravitational lenses. In the CMB case, the CMB as emitted at the last scattering surface is the source and the whole distribution of total (cold and baryonic) matter along the line of sight acts as the foreground lens. In practice, this means that the anisotropy 8 If a i ℓm = s i ℓm + n i ℓ , with noise n i uncorrelated for any map i = 1, ..., nmaps and signal s i , then m |a i ℓm (a fields observed at a certain direction in the sky are displaced with respect to the original direction of emission: where X = T, E, B and φ is the lensing potential. The gradient of the lensing potential gives the deflection angle α = ∇φ. The typical deflection that CMB anisotropies undergo is of order 2.5 arcmin [41]. The lensing potential is given by the integrated contribution of the gravitational potential along the line of sight 9 : where is the (Weyl) gravitational potential, χ is the conformal distance and W(χ) is a geometrical kernel. Gravitational lensing preserves the total variance of the CMB fields, being a bare displacement of the anisotropy distribution. Very roughly speaking, if we extracted the CMB power spectra from small patches of the sky 10 , we would observe the acoustic peaks to shift to either smaller or larger scales with respect to the full sky average (see discussion in e.g., [42]). The net effect is a smoothing of the acoustic peak structure in the CMB power spectra that can be as high as 20% in the case of the sharper structure in the EE power spectrum with respect to the unlensed case. Another important effect is the generation of spurious (i.e., not primordial) B-modes from the lensing of primordial E-modes, with a power spectrum that resembles a white noise contribution with noise level of ∼ 5 µK arcmin at ℓ < 100, representing a serious contaminant for searches of primordial B-modes.
A detailed description of the effects of gravitational lensing on the CMB spectra is beyond the scope of this manuscript, and can be found in the excellent review by Lewis and Challinor [41]. Here, it is relevant to stress that gravitational lensing modifies the statistical properties of the primary CMB fields in two ways. First, let's consider a fixed lensing potential realization and ensemble average over the CMB realizations. If we Taylorexpand Equation (60), take the harmonic expansion coefficients and compute the covariance of two fields X, X ′ = T, E, B, we get [43,44] x where the term in brackets is the Wigner-3j and f α is a weight of different xx ′ pairs depending on the unlensed power spectra and on geometrical factors (a full list can be found in Okamoto and Hu [43]). The C ℓ in the first term of the RHS are the lensed power spectra. From Equation (62), it is clear that gravitational lensing not only modifies the structure of the primary CMB spectra 9 The unperturbed line of sight in the Born approximation. 10 Small enough that the convergence and shear components of the deflection field can be assumed constant in the patch. by smearing the acoustic peaks, but also induces off-diagonal covariance terms (the second term in the RHS). Therefore, for a fixed lensing realization, the CMB field becomes anisotropic. This property can be exploited to construct an estimator for the lensing potential φ.

The lensing power spectrum
(where we are now taking the ensemble average over the lensing realizations as well) is related to the 4-point correlation function of the primary CMB fields, as it is clear by inspecting Equation (62). In other words, the second modification to the CMB statistics induced by lensing, when the ensemble average is taken over both the CMB and the lensing realizations, is a certain amount of non-Gaussianity measurable from the 4-point function. This property is exploited to construct an estimator for the lensing power spectrum. The presence of gravitational lensing effects represent a pernicious contaminant for searches for primordial GW. In this case, "delensing" procedures (see e.g., [41] and references therein for a description of delensing procedures) aimed to remove the lensing contamination from the measured CMB sky are crucial to allow for the possibility to detect the primordial tensor BB signal. On the other hand, the presence of gravitational lensing also enriches the amount of information that we can extract from the observations of the CMB sky. As an example, since the gravitational lensing is induced by forming structures, it carries information about the late time evolution of the Universe and can be exploited to constrain those cosmological parameters that govern those stages of the Universe expansion, such as massive neutrinos.
There are two ways in which the information encoded in the gravitational lensing signal can be accessed from CMB measurements. One can exploit the anisotropy and non-Gaussianity properties of the lensed CMB fields to construct estimators for the gravitational potential and for the lensing power spectrum, and use those observables directly in a likelihood analysis. This is what is done e.g., by the Planck collaboration [45], ACT [46], SPT [47], POLARBEAR [48], and this is the goal of future CMB experiments that will be able to reconstruct the lensing signal over a large range of angular scales with exquisite sensitivity (CORE [49], Simons Observatory [20], CMB-S4 [19], PICO [23]). Concerning the choice of the likelihood function for the lensing signal itself, it has been shown [50] that a quadratic expression in the observed lensing power spectrum with a non-diagonal fiducial covariance matrix provides satisfying results.
As seen in Equation (62), cosmological information carried by the gravitational lensing signal are also encoded in the CMB power spectra. In fact, the lensing contribution modifies the primary acoustic structure as emerged from last scattering. Current generation of Boltzmann solvers usually employed in cosmological analysis, such as CAMB [51] and CLASS [52], carefully compute the lensing effect when constructing the theoretical spectra to be compared with the measured spectra in the likelihood analysis. In general, when lensed spectra are considered, the likelihood analysis should reflect the different statistical properties of the lensed CMB. First of all, the non-Gaussian distribution of the lensed field may require to build a different likelihood function for the lensed fields. Even if the Gaussian approximation can be retained given the current experimental sensitivity, the covariance matrix may still need to reflect the presence of off-diagonal correlations induced by the lensing signal [53]. It has been shown that ignoring such correlations can still be safe for the sensitivity level of Planck [53,54], but it will become relevant for the next generation of experiments [55,56].
A final remark concerns cosmological analysis employing the combination of CMB power spectra and the lensing power spectrum. These two data sets are usually treated as independent. However, they are extracted from the same map and, as such, there is a certain level of correlation between the two. As pointed out in Schmittfull et al. [50], there are two sources of correlations: cosmic variance in the CMB field, which may affect the lensing reconstruction; cosmic variance in the lensing field, which affects not only the lensing reconstruction, but can propagate to the lensed CMB power spectra. An alternative solution that might reduce the correlation between fields is the joint analysis of the lensing power spectrum and unlensed CMB spectra. The reconstructed lensing signal-either from CMB observations themselves or from tracers of large-scale structure correlated with the lensing signal such as the cosmic infrared background (CIB)-can be used to delens the primary CMB spectra. This procedure is not only key to unveil the primordial BB tensor spectrum, but it can also improve the sensitivity to those cosmological parameters that are mostly constrained via measurements of the high-ℓ damping tail in the TT, TE, EE spectra, such as N eff [19,57].
All the above concerns will become much more significant for the next generation of experiments, when more sensitive polarization-based reconstructions will be available.

LIKELIHOOD APPROACHES-LARGE-SCALE REGIME
In this section, we review the basics of the likelihood approaches at large scales (low multipoles). We consider two generic classes of likelihood methods: pixel-based and simulationbased. When the focus of the likelihood analysis is the sky at large angular scales, the resolution of the map to be analyzed is low enough to make a pixel-based approach computationally feasible. Alternative approaches exploit the information encoded in harmonic space, and build the likelihood function from a simulation-based method or componentseparation based method (Blackwell-Rao). For large scale studies the Hamimeche-Lewis likelihood can be also considered, [58,59], for completeness we will explore later the performance of such likelihood compared with other approximations.

Pixel-Based Approach
The great advantage of the pixel-based approach lies in the fact that the likelihood function so defined is always exact (see section 2.3), including in the cut-sky regime. Equation (21) can be easily generalized in the presence of (Gaussian-distributed) noise, by defining the data vector as m = s + n, where s is the signal per pixel in temperature and polarization (s = (T, Q, U)) and n is the instrumental noise. We report here the likelihood function in pixel space for convenience: (63) The full covariance matrix M in Equation (63) is consequently generalized to the sum of the signal and noise covariance matrices M = S + N. Furthermore, the effect of beam smearing discussed in section 3.3-and also relevant for the large-scale data-is now taken into account when constructing the full covariance matrix in terms of the beam-weighted sum of Legendre polynomial (see Equation (23)). For example, the explicit expression for the weight P TT ℓ for temperature becomes wheren i is the unit vector pointing toward the ith pixel, B ℓ is given by the product of the instrumental beam Legendre transform and the (HEALPix [60]) pixel window 11 , and P ℓ is the Legendre polynomial of order ℓ. As already noted above, the actual feasibility of this mathematically rigorous approach only applies to the very large scales. Even so, massive parallel coding and memory requirements could still be a necessary ingredient. As an example, the Planck collaboration employed this approach in the low-ℓ likelihood analysis in temperature and polarization for the 2015 data release [37]. The map resolution was fixed at N side = 16 (for comparison, the analysis conducted by the WMAP team employed N side = 8 maps) to accommodate the ℓ < 47 multipoles in the analysis, resulting in N pix = 3 × 3072 = 9216 total number of pixels, further reduced by the application of the analysis mask. In evaluating the likelihood function in Equation (63), the data vector and the noise covariance matrix are fixed, while the signal covariance matrix is recomputed for any given cosmological model to be compared against data, following Equation (23). In practice, only a subsection of S is recomputed, in particular that subsection corresponding to ℓ < 30. The portion of S corresponding to multipoles 30 ≤ ℓ < 47 is precomputed from a fixed fiducial model. The choice of the fiducial model does not affect the performance of the likelihood. In fact, at the resolution employed in the large-scale analysis, the sensitivity to multipoles above ℓ ∼ 30 is strongly suppressed.
In 2013, a hybrid approach coupling a MonteCarlobased approach in temperature (Blackwell-Rao estimator, see section 5.2) to a pixel-based approach in polarization for ℓ < 23 was adopted [40]. In order to speed up the evaluation of the fully 11 The signal in each pixel is the average over the signal at each point within the surface area of the pixel p , f p = d w p f ( ), where w p = 1/ p is the weight inside the pixel and zero otherwise. The harmonic transform of the weight w p is the pixel window function. Including the full dependence of the window function on m, ℓ can be computationally demanding. However, if the size of the pixel is small compared to the angular resolution of the experiment, the m-dependence can be neglected and the m-averaged window function w ℓ can be defined. The power spectra computed from the pixelised maps C p ℓ are related to the unpixelised spectra C ℓ via C p ℓ = w 2 ℓ C ℓ .
pixel-based likelihood function for the 2015 release at any given cosmological model, the "brute-force" approach described here has been optimized with the implementation of the Sherman-Morrison-Woodbury formula, described below.

Sherman-Morrison-Woodbury Formula
The computational cost required by the pixel-based approach can be dramatically reduced if one consider that only a portion of the signal covariance matrix is reconstructed at any evaluation of the likelihood function. The full covariance matrix can then be decomposed into a varying part, which is function of the theoretical C ℓ (i.e., the power spectra of the theoretical models to be compared against data), and a fixed part given by the fiducial S and the noise covariance matrix. The following step is to further decompose the varying part of the covariance as S = V T AV, via a transformation V that effectively reduces the dimension of the actual evaluation cost from a N pix × N pix inversion to a n λ × n λ inversion, where n λ = 2ℓ + 1 is the dimension of the transformed matrix A. The latter is the only matrix that depends on the theoretical C ℓ and, therefore, it is the only matrix to be recomputed and inverted. The fixed portions of the covariance matrix as well as the transformation matrix V can be pre-computed and stored. For the Planck 2015 data release, the application of such mathematical formalisms allowed to speed-up the likelihood evaluation by an order of magnitude. The mathematical details leading to the application of the Sherman-Morrison-Woodbury formula can be found in Aghanim et al. [37] (appendix B.1) and Hinshaw et al. [61] (appendix A.1).

Blackwell-Rao Estimator
An alternative approach to the likelihood evaluation that overcomes the computational cost of an exact likelihood evaluation in pixel space is provided by the combination of the Gibbs sampling method with the Blackwell-Rao estimator. The Gibbs sampling is a MCMC method applied to the estimation of the observed CMB signal from a raw map containing signal, noise, and foreground contamination in a Bayesian framework [62][63][64]. The crucial output for the subsequent construction of the likelihood estimator is a set of samples of the CMB sky, or more precisely, a set of sample variances of the sky samples obtained with the Gibbs algorithm. The Blackwell-Rao estimator is then built as an average over the set of sample variances. Let's assume that the observed map m is composed by the CMB signal s and noise n (the following procedure can be generalized to the case in which foreground f also contribute to the total signal, see e.g., [65]). What we really want to evaluate is the joint probability of having a certain s with a certain C ℓ given the observed sky m, i.e., we want to evaluate P(s, C ℓ |m). A brute force evaluation by computing a grid of s and C ℓ is computationally prohibitive (it is even more prohibitive once one considers the inclusion of foreground into the equation). Another approach is to sample directly from the distribution via specific algorithms. Although it is again computationally unfeasible to sample directly from the joint distribution, as it would require inverting large and dense covariance matrices, it has been proved that the joint distribution can be reconstructed by iteratively sampling over the conditional distributions P(s|C ℓ , m) and P(C ℓ |s, m). In fact, the conditional distributions are known. They are a multivariate Gaussian (posterior distribution of a Wiener filter) for P(s|C ℓ , m) and an inverse Gamma distribution for P(C ℓ |s, m). As for the foregrounds, their marginal distribution does not usually have an analytic representation. However, it can be reconstructed numerically [65]. The iterative sampling is obtained with the implementation of a specific MCMC sampling algorithm, the Gibbs sampling, and follows these steps (we omit foregrounds for simplicity): • i) start from a guess power spectrum C 0 ℓ ; • ii) draw a sample s 1 from P(s|C 0 ℓ , m); • iii) draw a sample C 1 ℓ from P(C ℓ |s 1 , m); • iv) cycle over step ii) and iii) until convergence is reached.
This algorithm lays the ground for the subsequent evaluation of the posterior P(C ℓ (θ )|m), where θ is a set of cosmological parameters of a theoretical cosmological model. In principle, one could reconstruct a histogram of the sampled C ℓ from the Gibbs sampling and use the histogram to interpolate for a given theoretical model. However, this procedure is not efficient. Luckily, the Gibbs sampling allows to construct an efficient and arbitrarily exact estimator of the likelihood function, the so called Blackwell-Rao estimator. The basic idea [64,66] is to note that the C ℓ only depend on the CMB signal and not on the total sky signal (i.e., once we know the CMB sky, there is no additional information coming from the knowledge of other components), P(C ℓ |s, m) ∝ P(C ℓ |s). In addition, the C ℓ depend on the CMB signal only through its variance, i.e., P(C ℓ |s) ∝ P(C ℓ |Ĉ ℓ ), wherê C ℓ is the power spectrum of the sample CMB map s. Note that we are already familiar with the probability distribution P(C ℓ |Ĉ ℓ ) (see Equation (17)). At this point, we can write down the following chain of equivalent probability integrals (we omit the dependence of C ℓ from θ for brevity): where N Gibbs is the number of Gibbs samples evaluated in the component separation analysis, andĈ i ℓ is the power spectrum of the i-th sample. In other words, the posterior on the cosmological parameters of interest given the observed data can be obtained by averaging the individual posteriors over the Gibbs samples, that can be stored during the Gibbs implementation. The evaluation can be made more accurate by increasing the number of samples. It has to be noted that one only needs to store the variance of the sampled CMB maps s i up to a given multipole ℓ max , i.e., it is not necessary to store the much more memory-demanding samples themselves.
The accuracy of the Blackwell-Rao estimator depends on the number of samples required to reach convergence. The number of samples grows very steeply with the ℓ max considered in the analysis [66]. This limitation makes the Blackwell-Rao estimator more suited for the likelihood analysis of CMB large scales (low multipoles). Modified version of the Blackwell-Rao estimator have been proposed that allow to reduce the number of samples required for convergence, and therefore allow to extend the viability of this approach to higher ℓ max [67].
The Gibbs method has been employed by the Planck collaboration as a component separation method to obtain maps of the CMB temperature signal and foreground contaminants [68][69][70]. The Blackwell-Rao estimator, based on the Gibbs samples so obtained, has been employed by the Planck collaboration as an alternative likelihood method for the analysis of the temperature data at large scales [37,40]. It has been employed by the WMAP collaboration for the likelihood analysis of the large-scale (ℓ < 32) temperature data [4,71].

Simulation-Based Approach
In some cases accurate estimate of the noise are not available and/or the probability distribution of residual systematic effects, in map space, is not perfectly Gaussian, and therefore the probability distribution, in map or harmonic space, cannot be expressed analytically and it needs to be learned from simulations.
We describe in this section a simulation-based approach to evaluate the likelihood function in the low-multipole regime. The likelihood distribution is evaluated through realistic simulations of CMB, noise and possible residual systematics. We report here the main steps: • the initial step is the simulation of n theoretical auto-or crosspower spectra, related to n different cosmological models, represented by a set of cosmological parameters θ j . • For each theoretical power spectrum C th,i ℓ , m CMB maps are produced, including noise and other residual contaminants.
• For each map, the corresponding power spectrum is evaluated.
Therefore, k = n × m new simulated power spectra C sim,k ℓ are obtained. • By histogramming the k simulated power spectra ℓ-by-ℓ and θ j by θ j , the probability P(C sim ℓ |C th ℓ ) is built empirically. In evaluating this probability, it is necessary to interpolate it with a suitable function, in order to smooth the scatter due to the limited available number of simulations. In particular, one can define a low-order polynomial interpolation function f i ℓ (C sim ℓ , C th ℓ ) of the logarithmic of the C sim ℓ histogram for each i-th initial power spectrum, such that • Starting from this approximation, the likelihood function for the observed power spectra C data ℓ can be finally evaluated. The n couples (C th,i ℓ , f i ℓ (C th ℓ , C data ℓ )) can be considered as a tabulated version of the log of the joint probability log P(C data ℓ , C th ℓ ).
The joint probability can then be interpolated with a suitable low-order polynomial (as done above) for each multipole, g ℓ (C data ℓ , C th ℓ ), such that The sum in the multipole range provides the approximation for the log-likelihood, up to a constant, This approach has been used in the latest analysis of the Planck collaboration [3,38,59] to evaluate the low-multipoles polarization likelihood. We note that this likelihood approach requires the data to be provided in harmonic space as power spectra C data . A good estimator for the power spectrum at large scales is obtained via the quadratic maximum likelihood (QML) method. We provide a description of the method in section B.3.
As a final remark, we note that the simulation-based approach is very similar in spirit to the class of methods known as Approximate Bayesian Computation (ABC). ABC methods have been initially employed in various fields as a way to bypass the evaluation of the likelihood function with the use of simulated data [72][73][74][75][76][77][78][79][80][81]. Recent applications to astrophysics and cosmology include [82][83][84][85][86]. Similarly to ABC, the simulation-based approach approximates the likelihood by comparing simulated data with observed data using the QML approximations to the ML points of the spectra as a distance metric. However, while the likelihood approximation in ABC is usually done by discarding mismatching simulations, the simulation-based approach described here does so by fitting the constructed histogram and then slicing.

FOREGROUND CONTAMINATION: MODELING AND COMPONENT SEPARATION
We have neglected so far the possibility that the microwave sky contains more components than the CMB. In realistic observations, one has to take into account the fact that the observed signal at a given frequency is a combination of the CMB signal plus additional emissions from so called foreground contaminants. For the range of wavelengths of interest to CMB observations (from few tens to few hundreds GHz), the most relevant contaminants are atmospheric emission (for groundbased experiments in particular) and astrophysical foreground emission. The latter includes Galactic dust, synchrotron and freefree emission and extragalactic contaminants such as clustered and Poisson CIB emission, radio point sources, molecular lines; see e.g. [87,88] for a description of the multi-component microwave sky and way to create synthetic maps, and [89] for a concise review. The temperature sky is much more composite than the polarization sky. Nevertheless, CMB temperature signal is the dominant component over a wide range of frequencies and angular scales. On the other hand, polarized foreground can easily dominate over the cosmological signal, especially in the case of CMB B-modes.
To reduce the contamination from atmospheric emission, ground-based CMB experiments are usually placed in specific sites, at high altitudes (to reduce the thickness of the atmosphere above the telescope) and dry locations. Residual contamination can be further removed either by detector pair-differencing (see e.g., [11]) or by modulation of the signal (see e.g., [90]). Balloon-borne and especially satellite missions are less or not at all concerned with atmospheric contaminations. Emission from foreground sources is instead a common issue to all CMB observatories. Of course, it is necessary to account for such foreground contaminations in a proper likelihood analysis, i.e., to decompose the observed map into the individual components. This process is called component separation; various component separation methods exist. They differ for the domain of applicability (pixel space, harmonic space, wavelet space), and for the way data are described. For example, CMB and foregrounds can be parametrized as frequency spectra (parametric methods), can be separated imposing certain conditions to the various components modeled as arbitrary templates (non-parametric), can be separated with no a-priori knowledge of the individual components (blind). A rather complete list of component separation methods can be found in the LAMBDA archive 12 .
After component separation, the cleaned CMB maps may still be affected by residual contamination, which have to be further taken into account. At small scales, this is achieved in various steps, see e.g., the Planck analysis [37]. First, regions of the sky that show an excess contamination from foreground emission are masked away from the analysis. Secondly, the residual contamination in the remaining areas can be modeled at the power spectrum level via specific templates. In other words, one can exploit the fact that the harmonic shape of the foreground is different from the CMB power spectra, and the fact that the spectral dependence of the foreground emission is also different from the CMB spectral dependence. The data vector to be fed in the likelihood function would therefore contain both the CMB signal and the residual contamination, as obtained from observations of the real sky. At the same time, the theoretical spectra to be compared with the data would be given by the sum of the theoretical CMB spectra and the foreground templates. It is crucial that multi-frequency channels are available in order to efficiently fit for both the CMB and the foreground contaminants simultaneously.
The above prescription implicitly assumed that all the information about foreground emission in harmonic space is fully captured by their power spectrum. However, there is no 12 Component separation methods and related references can be accessed at the following url: https://lambda.gsfc.nasa.gov/toolbox/tb_comp_separation.cfm reason to believe that foreground are Gaussian distributed, and indeed they are not. The assumption that is usually made is that most of the non-Gaussian contribution is removed from the maps once the foreground-saturated regions are masked away. The non-Gaussianity of the remaining contaminations can be (and actually are) neglected at the likelihood level. This assumption is demonstrated to be reasonably accurate via dedicated simulations (see e.g., discussions in Aghanim et al. [37] and Ade et al. [40]) for the current generation of cosmological surveys. At the same time, a huge effort is devoted to the study of the propagation of uncertainties in the foreground modeling and removal to the final constraints on cosmological parameters in the context of the high-sensitivity next generation CMB surveys (see e.g., [19]).
Foreground contaminations are of course present at large angular scales as well, and must be taken into account when realistic data are analyzed. Taking the results from the Planck collaboration as an example [37,40], the foreground treatment at large scales is somehow different from the prescription described in the case of small scales. At low multipoles in temperature, the cleaned CMB map is obtained via Gibbs sampling (see section 5.2) implemented in COMMANDER and the bright region along the Galactic plane is further masked for likelihood analysis ( 7% of the sky). In polarization, the foreground-cleaning is implemented via template-fitting. Residual contamination can be taken into account in the construction of the noise covariance matrix to be employed in the likelihood function.
To conclude this section, we would like to emphasize that the topic of foreground modeling and component separation is extremely wide and the list of references reported in this manuscript is far from being exhaustive. The interested reader is encouraged to consider these references as a mere starting point. Further details can be accessed from the specific collaboration papers that describe the corresponding data processing, and from the overview papers of science forecasts by upcoming collaborations that describe their simulated pipelines of data reduction.

COMPARING LIKELIHOOD PERFORMANCE
In this section, we compare the performance of various likelihood approaches introduced in section 3 for the high-ℓ regime and in section 5 for the low-ℓ regime. In particular, we are interested in testing the different likelihood performance with respect to the ability to produce unbiased constraints on cosmological parameters. Where relevant, we also compare the different performance in terms of computational costs. In what follows, we first discuss the comparison of likelihood approaches in the small-scale regime (section 7.1), and then move to the presentation of the results in the large-scale regime (section 7.3).

Small-Scale Regime-Full Sky
In this section we test the performance of the different likelihood approximations at high-ℓ on simulated data and compare the results with the assessment presented in section 3.2, which was based on analytic arguments. We mainly want to show whether adopting a particular power spectrum likelihood approximation, when estimating cosmological parameters, may introduce a bias in the parameters recovered values or/and a misestimation of their associated error bars. In the following, we neglect the impact of foregrounds, and we consider CMB plus noise full-sky maps, for which we derive the temperature and polarization angular power spectra following Equation (15).
The simulated dataset consists of CMB plus noise maps. Specifically, we generate a set of 1000 maps of the CMB sky, m = (T, Q, U), drawn as Gaussian random realizations of fiducial temperature and polarization power spectra, that correspond to a set of given cosmological parameters. The full-sky maps are generated at a resolution of 3.4 arcmins (N side = 1024 in the HEALPix scheme [60]) and smoothed with a symmetric Gaussian beam of FWHM 10 arcmins 13 . To each of these maps we add a simulated realization of white isotropic noise corresponding to a noise level in polarization of σ n = 1µK arcmin, which is in the ballpark of what is expected from future CMB experiments. In this way temperature anisotropy maps are signal dominated across almost all the multipoles that are relevant for primary anisotropies, up to ℓ 2000, compatibly with present and forthcoming measurements.

Temperature-Only Results
Focusing on temperature alone, we adopt the different likelihood approximations of section 3 to estimate the cosmological parameters from the power spectra of the simulations. For simplicity we only fit for two parameters, specifically A s and n s , that are respectively the amplitude and the spectral index of the power spectrum of primordial density fluctuations. We generate a grid of model C ℓ 's keeping all the other CDM parameters fixed, while letting A s and n s vary in a broad range of values around the input fiducial model, C f ℓ . Given the angular power spectrum of each simulation,Ĉ ℓ , we evaluate the likelihood of the models at each multipole, L(C ℓ ), accounting for the noise contribution as described in section 3.3. Since here we are considering the small scale regime, the total likelihood for the set of cosmological parameters θ = (A s , n s ) is obtained by summing the log-likelihoods over the range of multipoles ℓ = [30,2000]: lnL(θ ) ∝ ℓ lnL(C ℓ ).
Once we have the total likelihood, assuming flat priors on parameters, we can estimate the expectation value of each parameter following Equation (78). By building histograms of the parameter values obtained from the 1000 simulated maps, we can reconstruct the posterior distribution of each parameter in a "frequentist" fashion. In particular, since the simulations input parameters are known, we can study the distribution of the biases in the estimated parameters, defined as the shifts with respect to the input fiducial values, normalized to the 1 σ marginal error of the posterior distribution. For each simulation, the bias reads p = (p − p in )/σ p , where p can be either A s or n s . The distributions we derived are shown in Figure 1 for the different likelihood approximations. They should be centered in zero, if there is no bias in the mean recovered parameters value, and they should have unit variance, if σ p , as derived from the standard bayesian analysis, is consistent with the true error.
Consistently with what has been presented in section 3.2, we find that the Log-Normal together with the symmetric and improper Gaussian approximations show significant bias in the recovered parameters. The bias is due to the fact that the approximations fail in capturing the correct skewness of the likelihood function. The symmetric Gaussian approximation appears to be the one with the largest associated bias and the wider distribution. For this approximation the covariance is proportional to the measured power spectrum squared,Ĉ 2 ℓ . This means that the uncertainties turn out to be larger than they should be ifĈ ℓ fluctuates upward with respect to the input fiducial model, and viceversa ifĈ ℓ has a downward fluctuation. As a consequence, upward fluctuations are given less weight in the likelihood than downward fluctuations. This translates into an overall downward bias on the amplitude of the power spectrum, and thus on A s , as it can be easily seen in Figure 1. The one-third-two-third approximation is a good match to the exact likelihood at third order, as shown in Figure 2, and it appears to provide unbiased parameter results. This is the likelihood approximation adopted by the WMAP team for the analysis of the temperature anisotropies on small angular scales [39]. The offset-lognormal also results in unbiased constraints (bottom left panel in Figure 1), as expected since the offset a TT = −1/4 has been chosen to match the exact likelihood up to third order in the expansion, see section 3.2. Also the fiducial and determinant Gaussian approximations provide unbiased cosmological parameter estimates, despite having a wrong shape at each single ℓ with respect to the exact likelihood. This is because at each particular ℓ the shape of the likelihood can randomly be wrong upwards or downwards with respect to the true C ℓ (see Figure 2), so when integrating over the entire range of multipoles there is negligible bias left. An obvious advantage of the fiducial Gaussian approximation with respect to the determinant Gaussian is that the covariance matrix can be pre-computed, inverted, and then kept fixed while sampling the parameter space. Aside from speeding up the computations, this makes it easier to account for additional uncertainties, non-Gaussianities and correlations in the data by simply adding extra terms in the pre-computed covariance matrix. The fiducial Gaussian approximation is the one used for the official smallscale likelihood analysis of the Planck data [40], and it has also been adopted for the cosmological analysis of data collected by the ground-based experiments ACT [97] and SPT [98]. Note, however, that this approximation works well if the models we are trying to fit are smooth, and if the fiducial model used to build the covariance matrix is sufficiently close to the true model. In the tests presented here the fiducial model entering the covariance matrix actually coincides with the true model, i.e., the power spectrum from which the simulated maps are generated. Obviously, in this case the covariance matrix is an optimal description of the simulated data. However, when dealing with real data we are not in the same fortunate position, since the true power spectrum underlying the observed sky is unknown and is exactly what we indent to estimate. In a real situation we can only deduce a fiducial model for the covariance at the best of our knowledge from both theory and observations. We can also try to improve the accuracy of a first guess covariance matrix by iterating the cosmological parameter extraction a few times, upgrading the covariance matrix at each iteration. Note that, we did not include the Hamimeche-Lewis approximation in the tests, the reason being that on the full-sky it coincides with the exact likelihood.
In order to perform an even more stringent validation of the likelihood approximations, we can estimate the bias of the mean of the recovered parameters from all the simulations: ( p − p in )/(σ p / √ N sims ), where the shift with respect to the input is normalized by the standard deviation of the mean, and N sims = 1000. This test confirms, at high-significance level, that Log-Normal, symmetric and improper Gaussian approximations provide biased results, while the other approximations are unbiased. For the latter likelihood approximations we find that the estimated uncertainties on parameters, σ p , agree with those derived from the exact likelihood at the sub-precent level. Furthermore, we find that the standard deviation of the values of (p − p in )/σ p is consistent to 1, with a precision well within the accuracy allowed by the finite number of simulations, i.e., 1/ √ 2 N sims = 0.022. This means that the uncertainties estimated from the bayesian analysis, σ p , are consistent with the scatter of the parameters of the simulations.

Temperature and Polarization Joint Results
For the fiducial Gaussian approximation, we repeat the test of parameters recovery from the full-sky simulated maps also including the polarization power spectra. As we have already commented, this likelihood approximation works well provided that the fiducial power spectra used to build the covariance matrix are close enough to the true power spectra. In order to investigate the impact on the final parameters of misestimating the covariance matrix, we change the fiducial model entering the likelihood with power spectra at the edge of the grid of models introduced above. These power spectra correspond to parameters more than 10 σ away from the input parameters. As expected, results show some sensitivity to the choice of the model, however not for the bias, but rather for the parameters marginal error bars. In fact, using the "wrong" covariance matrix leaves parameters estimates unbiased. This can be noticed in Figure 3, where we show the biases on A s and n s estimated from the simulations, both using the covariance matrix with the correct fiducial model and the one with the modified model. The latter covariance matrix, however, does not match exactly the signal in the simulated dataset, and as a consequence the estimated 1 σ marginal error bars on the cosmological parameters do not agree with the standard deviation from the simulations, showing a ∼ 15% mismatch.

Small-Scale Regime-Cut-Sky Tests
We repeat the above analysis introducing some further level of complication, specifically we assume to work with partial-sky and anisotropic noise data. We generate noise simulations which have a different level of noise variance for each pixel, thus mimicking the effect of a realistic scan strategy for which some regions of the FIGURE 1 | 2D histograms of the biases on the parameters expectation values estimated from 1000 simulated full-sky maps of CMB temperature plus noise. As described in more detail in the text, the biases are computed with respect to the input parameter values and normalized to the 1 σ uncertainty associated to the parameter, p = (p − p in )/σ p , where p can be either A s or n s . The blue circle indicates a null bias, while the red square is the center of the distribution. Note that for the cases plotted in the first row the two almost completely overlap.   sky are observed more often than others. To the simulated CMB plus noise maps we then apply a galactic sky mask, that removes the region of the galactic plane where the emission of foregrounds is expected to dominate over the CMB signal. For the sake of this test, we focus on temperature alone and we use a galactic mask that leaves for the analysis 73% of sky. The mask has been apodised with a Gaussian taper that smooths sharp edges and, thus, it helps in localizing the mask power in multipole space. The angular power spectra of the cut-sky maps are extracted using an estimator that corrects for the loss of modes due to the masking and which is based on the pseudo-C ℓ formalism. For completeness the estimator is described in Appendix B. The covariance matrix associated to these power spectra has been estimated using the analytic approximation given in Efstathiou [27], and assuming as fiducial model the same model from which the simulations have been generated.
For the parameters recovery test on the cut-sky we explore the fiducial gaussian and Hamimeche-Lewis likelihood approximations. The latter in the single field regime reduces to: (69) where the function g(x) has been introduced in Equation (51), and M f ℓℓ ′ is the covariance matrix of theĈ ℓ evaluated for C f ℓ .
Following the same procedure described in the previous subsection, we fit for the cosmological parameters A s and n s . We find that, also under more realistic conditions, the recovered parameters for the fiducial gaussian are unbiased well within the precision allowed by the finite number of Monte Carlo simulations. For the Hamimeche-Lewis approximation, instead, we detect a small bias at the level of 20 and 11% of the sigma on the parameter for A s and n s respectively, see Figure 4. For both likelihood approximations, however, we find that the parameters marginal errors from the bayesian analysis are consistent with the standard deviation of the simulations, and thus they appear to be a good description of the true uncertainties.
Despite the small average bias, the Hamimeche-Lewis approximation is expected to be more robust against the choice of the fiducial model entering the likelihood, see [32]. In order to verify this point, when computing the likelihood and the covariance matrix of theĈ ℓ , we use a fiducial model C f ℓ that corresponds to parameters about 10 σ away from those used to generate the simulations. As we have already shown in the previous subsection, in this case the fiducial Gaussian approximation provides still unbiased results, but the bayesian error bars on the parameters are not a good description of the true uncertainties. They differ from the standard deviation of the parameters from the simulations by about 14% on A s and 11% on n s , respectively. On the contrary, estimates with the Hamimeche-Lewis approximation still show the same level of average bias, but the uncertainties on the parameters are better characterized and they are in agreement with the standard deviation from the simulations.
Furthermore, since it is customary to assess the goodness of the parameters fit with the chi-square statistics, we may use the value of the likelihood at the best-fit to define an effective chisquare as χ 2 eff = min[−2lnL(θ |Ĉ ℓ )]. If we now compare the values we get for each likelihood approximation when simply varying the fiducial model, we find χ 2 eff of order a few hundreds for the fiducial Gaussian approximation, and χ 2 eff of a few tens for the Hamimeche-Lewis. As a consequence, trying to assess the goodness of the fit with the latter likelihood can surely provide more stable results, regardless of the fiducial model. However, it is worth stressing that in the present test we used a C f ℓ that deviates significantly from the model behind the simulations, had we chosen a model closer to the simulations, also the fiducial Gaussian approximation would have provided sensible results. For further discussion of this topic refer to the appendix B of Hamimeche and Lewis [32].

Large-Scale Regime
In this subsection, we focus on the comparison of different likelihood approaches devoted to the analysis of large-scale CMB data. We limit the comparison of large-scale likelihoods to the polarization signal only, since this is the main target of future CMB missions [99]. Moreover, the standard approaches used in temperature, i.e., pixel-based and Blackwell-Rao (see section 5.2) likelihoods, have been extensively characterized and validated by the Planck Collaboration, see e.g., [37,38].
We consider three approaches: pixel-based (see section 5.1), HL (see Equations 49a-54a) and simulation-based (see section 5.3) likelihood. As we did for the small scale regime, we show whether adopting a particular likelihood approximation when estimating cosmological parameters may introduce a bias, either in the recovered values of the parameters or on their associated error bars.
We start by generating a set of 1000 maps of the CMB sky drawn as Gaussian random realizations of a single fiducial power spectrum corresponding to a set of known cosmological parameters. The full sky maps are generated on a N side = 16 HEALPix grid, which roughly corresponds to a resolution of 3.7 degrees, smoothed with a cosine window function [38,100]. To each of these maps we add a realization of white isotropic noise corresponding to σ N = 0.01 µK 2 on a pixel at our resolution. We choose this particular noise level since it is small but not completely negligible with respect to the typical peak-to-peak CMB signal in a model with a reionization optical depth τ = 0.055, that would be roughly ∼ 0.5µK. We do not include foreground and systematics effect residuals in our simulations. We process the maps through a QML code computing the auto spectra of all our simulations.
To test the performance of the likelihood approximations, for simplicity, we only fit the reionization optical depth τ parameter keeping fixed the overall amplitude of the perturbations as parametrized through A s exp (−2τ ). For each realization we compute the mean value of τ . The histograms of the τ values, recovered using the three different approximations, are shown in Figure 5. The distributions all show a bias smaller than 5% of σ (τ ), perfectly compatible with the resolution of 1000 simulations, (i.e., 1/ √ 1000 ∼ 3%). The Hamimeche-Lewis likelihood and the simulation-based likelihood, both built on the QML estimates, perform similarly and are substantially unbiased. Likewise, the pixel-based likelihood results unbiased as it should, as also already extensively demonstrated by both the WMAP [101] and Planck [37] collaborations. The comparison with the pixel-based likelihood allows also to validate the width of the distributions for the other two likelihood approximations, which are compatible with the pixel based at the 2% level, within the resolution of the Montecarlo (i.e., 1/ √ 2 × 1000 ∼ 2%).

CONCLUSIONS
CMB science is nowadays a mature yet still flourishing branch of physics. The ultimate results from the Planck satellite and from large-aperture ground-based experiments (ACT and SPT) fully characterize the CMB temperature field with great accuracy up to very small angular scales, where the contaminations from Galactic and extra-Galactic foreground emissions become dominant. In addition, they have just scratched the surface of the rich information contained in the CMB polarization fields, also the target of small-aperture telescopes (BICEP/Keck, POLARBEAR/Simons Array, SPIDER). A large number of upcoming CMB surveys (Simons Observatory, CMB-S4, LiteBIRD) will continue the journey toward the understanding of the early phases of our Universe and its subsequent evolution.
In the highly sophisticated process that aims to efficiently compress the information contained in the raw observed data into maps, spectra, and eventually constraints on cosmological parameters, a key ingredient is the likelihood function L(d|θ ), i.e., the probability of observing a certain data collection d given a certain model θ (see section A). In the context of CMB analysis, d can be either a CMB map or CMB spectra, while θ can be the set of cosmological parameters describing the cosmological model under scrutiny. In the simple case of full-sky observations, in absence of foreground contaminations and latetime-Universe effects on the CMB distribution, an exact form of the likelihood function in both real (multivariate Gaussian) and harmonic space (Wishart distribution) can be easily derived, see section 2. Complications arising from realistic observations and data analysis, such as complicated noise properties, foreground obscuration, limited sky coverage, computational costs, etc, may spoil the assumptions on which the derivation of the exact likelihood functions mentioned above is based, or may limit the actual evaluation of the exact functions. Therefore, realistic analysis employ appropriate approximations of the exact likelihood functions, see sections 3, 5. The choice of the likelihood depends on multiple factors, such as sky-fraction retained, data resolution, computational costs, signal-to-noise properties. For example, it may (and actually did) happen that a certain likelihood approximation could work for a certain experiment given its sensitivity, and yet it could turn out to provide biased results for another, more sensitive experiment.
In this review, we have summarized the basics of CMB statistics (section 2) that lead to the definition of the exact likelihoods in real and harmonic space. Then, we have moved to the descriptions of the most common likelihood approximations employed by various CMB collaborations. For the sake of simplicity, we have separated the approximations better suited to the analysis of small angular scales (higher harmonic multipoles) in section 3 and those that better represent the data at large angular scales (lower multipoles) in section 5. Although we have mostly assumed an idealized scenario of isotropic instrumental noise, Gaussian beam, unlensed CMB spectra, and absence of foreground for pedagogical reasons, we have commented about the impact of realistic deviations from the aforementioned scenario, in such a way that the reader is aware of their existence and of the extensive effort devoted to their mitigation. In particular, non-trivial modifications to the primary CMB statistics induced by the gravitational lensing of CMB photons by the evolving large-scale structure in the Universe have been discussed in section 4. In section 6, we have briefly discussed the main foreground contaminants in the microwave sky and mentioned methods of component separation aimed to disentangle the cosmological signal from Galactic and extra-Galactic emissions.
In order to provide concrete examples of the different performance of various likelihood methods, we have reported results of the comparison between different likelihood approximations in section 7. In particular, we have tested the property of the likelihood function to be unbiased, i.e., to produce a posterior distribution of cosmological parameters that matches the true cosmology (the input values, in the case of simulated data) both in terms of the mean and of the variance of the distribution. Again, we have distinguished between approximations at small scales (section 7.1) and at large scales (section 7.3).
In Appendix, the interested reader can find more details about some basics of statistical inference (section A) and about the most common methods adopted to estimate CMB power spectra in the (realistic) cut-sky regime (section B).
The aim of this review is to bring together and summarize a large amount of information related to a crucial aspect of CMB data analysis. The final scope is to provide a handy, yet selfconsistent document to those who are approaching the field and those who are interested in learning some basic aspects of the field, and help them to navigate the vast literature produced over the past decades on these topics. If you have gone so far in your reading, we hope to have reached, at least partly, our goal.