Emulation of Cosmological Mass Maps with Conditional Generative Adversarial Networks

Weak gravitational lensing mass maps play a crucial role in understanding the evolution of structures in the Universe and our ability to constrain cosmological models. The prediction of these mass maps is based on expensive N-body simulations, which can create a computational bottleneck for cosmological analyses. Simulation-based emulators of map summary statistics, such as the matter power spectrum and its covariance, are starting to play increasingly important role, as the analytical predictions are expected to reach their precision limits for upcoming experiments. Creating an emulator of the cosmological mass maps themselves, rather than their summary statistics, is a more challenging task. Modern deep generative models, such as Generative Adversarial Networks (GAN), have demonstrated their potential to achieve this goal. Most existing GAN approaches produce simulations for a fixed value of the cosmological parameters, which limits their practical applicability. We propose a novel conditional GAN model that is able to generate mass maps for any pair of matter density Ωm and matter clustering strength σ 8, parameters which have the largest impact on the evolution of structures in the Universe, for a given source galaxy redshift distribution n(z). Our results show that our conditional GAN can interpolate efficiently within the space of simulated cosmologies, and generate maps anywhere inside this space with good visual quality high statistical accuracy. We perform an extensive quantitative comparison of the N-body and GAN -generated maps using a range of metrics: the pixel histograms, peak counts, power spectra, bispectra, Minkowski functionals, correlation matrices of the power spectra, the Multi-Scale Structural Similarity Index (MS-SSIM) and our equivalent of the Fréchet Inception Distance. We find a very good agreement on these metrics, with typical differences are <5% at the center of the simulation grid, and slightly worse for cosmologies at the grid edges. The agreement for the bispectrum is slightly worse, on the <20% level. This contribution is a step toward building emulators of mass maps directly, capturing both the cosmological signal and its variability. We make the code 1 and the data 2 publicly available.


INTRODUCTION
The N-body technique simulates the evolution of the Universe from soon after the big bang, where the mass distribution was approximately a Gaussian random field, to today, where, under the action of gravity, it becomes highly non-Gaussian. The result of an N-body simulation consists of a 3D volume where the positions of particles represent the density of matter in specific regions. This 3-dimensional representation can then be projected in 2 dimensions by integrating the mass along the line of sight with a lensing kernel. The resulting images are called sky convergence maps, often referred to simply as the cosmological mass maps. These maps can be compared with real observations with the purpose of estimating the cosmological parameters and testing cosmological models. Their simulation, however, is a very challenging task: a single large N-body simulation can take from a few hours to several weeks on a supercomputer (Springel et al., 2005;Potter et al., 2017;Collaboration et al., 2019;Sgier et al., 2019).
One approach to overcome this challenge is to use simulationbased emulators of summary statistics of the maps. Emulators have so far focused on: (a) the power spectrum, which is commonly used in cosmology Heitmann et al., 2016;Knabenhans et al., 2020;Angulo et al., 2020), (b) covariance matrices of 2-pt functions (Sgier et al., 2019;Taylor et al., 2013;Sato et al., 2011), and (c) non-Gaussian statistics of mass maps, which can be a source of significant additional cosmological information (Pires et al., 2009;Petri et al., 2013;Zürcher et al., 2020;Fluri et al., 2018). These approaches, however, always considered a specific summary statistic, which limits the type of analysis that can be performed using the massmap data. They typically do not simultaneously capture both the signal and its variation: the emulators interpolate the power spectrum across the cosmological parameter space, without considering the change in its covariance matrix, which is typically taken from the fiducial cosmology parameter set. This is a known source of potential error in the analysis (Eifler et al., 2009) and was shown to have a large impact on the deep learningbased constraints . The solution proposed in this work address these problems simultaneously. We construct a maplevel probabilistic emulator that generates the mass maps directly, and can accurately capture the signal and its variability. This emulator, built for a specific target survey dataset, would be of great practical use for innovative map-based cosmological analyses, additionally capturing the variation of the maps across the cosmological parameter space.
With a similar goal, multiple contributions have leveraged the recent advances in the field of deep learning to aid the generation of cosmological simulations. In particular, recent works (Mustafa et al., 2017;Rodriguez et al., 2018;Nathanaël et al., 2019;Tröster et al., 2019) have demonstrated the potential of Generative Adversarial Networks (GAN) (Goodfellow et al., 2014) for production of N-body simulations. The work of (Mustafa et al., 2017;Rodriguez et al., 2018;Nathanaël et al., 2019;Tröster et al., 2019;Giusarma et al., 2019;He et al., 2019) has shown deep generative models that can accurately model dark matter distributions and other related cosmological signals.
However, a practical application of these approaches in an end-to-end cosmological analysis is yet to be demonstrated. In this work, we take an essential step toward the practical use of generative models by creating the first emulator of weak lensing mass maps as function of cosmological parameters. This step allows the generate mass maps with any parameters without the need to retrain the generative model. Our conditional GAN model generates convergence maps dependent on values of two parameters that have the largest impact on the evolution of the Large Scale Structure (LSS) of the Universe: Ω m , which controls the matter density as a fraction of total density, and σ8, which controls the strength of matter density fluctuations (see (Refregier, 2003;Kilbinger, 2015) for reviews). Those are the only two parameters that can be effectively measured using the convergence maps data. After training, the conditional model can then interpolate to unseen values of σ 8 and Ω m by varying the distribution of the input latent variable. Other works (Tamosiunas et al., 2020;Villaescusa-Navarro et al., 2020) have since also explored such models, although with the emphasis on generating various cosmological fields themselves, either in 2 D or 3D.
To assess that the GAN-generated maps are statistically very close to the originals, we perform an extensive quantitative comparison. We evaluate our GAN using both cosmological and image processing metrics: the power spectral density, mass map histogram, peak histogram, the bispectrum, Minkowski functionals, Multi-Scale Structural Similarity (MS-SSIM) (Wang et al., 2003), and an adaptation of the Fréchet Inception Distance (FID) (Heusel et al., 2017). We also compare the statistical consistency of a batch of generated maps by computing the correlation matrices of power spectra. Moreover, we assess the agreement as a function of cosmological parameters. This set of comparisons is the most exhaustive presentation of the capacity of generative models to learn the dark matter maps, to date. In this work we use the data generated by (Fluri et al., 2019).
We build a sky convergence map dataset made of 57 different cosmologies (set of parameters) divided into a training set and a test set. The test set consists of 11 cosmological parameters sets was used to asses the capacity of the GAN to interpolate to unseen cosmologies.
This paper is structured as follows. In Section 2 we present a new type of generative adversarial network whose generated output can be conditioned on a set of parameters in the form of continuous values. Section 3 describes the simulation dataset used in this work. In Section 4 we describe the metrics used to evaluate the quality of the generative model. Section 5 shows the maps generated by our machine learning model, as well as compares its results to the original, simulated data. We summarize our findings and discuss the future prospects in Section 6. Appendix A contains the architectures of the neural networks used in this work.
discriminator D is to distinguish real (training) data from fake (generated) data. Meanwhile, the generator G produces samples with the goal of deceiving the discriminator into believing that the generated data is real. Both networks are trained simultaneously and if the optimization process is carried out successfully, the generator will learn to produce the data distribution (Goodfellow et al., 2014). Learning the optimal parameters of the discriminator and generator networks can be formulated as optimizing a minmax objective. Optimizing a GAN is a challenging task due to the fact that it consists of two networks competing against each other. In practice, one often observes unstable training behaviors which can be mitigated by relying on various types of regularization methods (Roth et al., 2017;Gulrajani et al., 2017). In this paper, we rely on Wasserstein GANs  with the regularization approach suggested in (Gulrajani et al., 2017). The model we use conditions both the generator and the discriminator on a given random variable y, yielding the following objective function, where P r and P z are the data and latent variable distributions. The parameter λ ≥ 0 is the penalty coefficient of the regularization term that ensures that the gradient norm of the discriminator is close to 1. This ensures that the discriminator is 1-Lipschitz, which is a requirement for optimizing the Wasserstein distance (Gulrajani et al., 2017;Arjovsky et al., 2017). The prior distribution of the latent variable, e.g., a uniform or a Gaussian distribution, defines implicitly the generator distribution P g by (x, y) G(z, y), z ∼ P z , y ∼ P y . Practically, there exist many techniques and architectures to condition the generator and the discriminator (Gauthier, 2014;Perarnau et al., 2016;Reed et al., 2016;Odena et al., 2017;Miyato and Koyama, 2018). However, all the architectures in these works are conditioning on discrete parameters. We instead propose a different design that works specifically for continuous parameters and will be shown to have good performance in practice. We note that our conditioning technique could be used with other architectures as well. For simplicity we describe the case of a single parameter, but our technique was implemented for the case of two parameters. Our idea is to adapt the distribution of the latent vector according to the conditioning parameters using the function z f (z, y). Specifically, the function f simply rescales the norm of the latent vector according to the parameter y. Given the range y ∈ [a, b], f reads: Using this function, the length of the z vector is mapped to the interval [l 0 , l 1 ]. In our case, we used l 0 0.1 n √ and l 1 n √ , where n is the size of the latent vector. For the discriminator, the parameters are concatenated directly after the convolutional layers as in (Reed et al., 2016). The relation between the features extracted from the convolutional layers and the parameters might in general be non-local. We therefore increase the complexity of the mapping functions of the discriminator and generator by adding some linear layers (as in a multi-layer perceptron) at the end of the discriminator and the beginning of the generator. The proposed model is sketched in Figure 1 and the architecture is described in more details in Appendix A. Specific parameters can be found in Table A1.

SKY CONVERGENCE MAPS DATASET
The data used in this work is the non-tomographic training and testing set introduced in (Fluri et al., 2019), without noise and intrinsic alignments. The simulation grid consists of 57 different cosmologies in the standard cosmological model: a flat Universe with cold dark matter (ΛCDM) (Lahav and Liddle, 2019). Each of these 57 configurations was run with different values of Ω m and σ 8 , resulting in the parameter grid shown in Figure 2. The output of the simulator consists of the particle positions in 3D space. The mass maps are obtained by the gravitational lensing technique (see (Bartelmann, 2010) for review). It consists of a tomographic projection of the particle densities along the radial (redshift) direction against the lensing kernel. This kernel is dependent on the relative distances between the observer and the lensed galaxies that are used to create the mass maps. The source galaxy redshift distribution n(z) used in this work is the non-tomographic distribution from (Fluri et al., 2019). The projected matter distribution is pixelized into images of size 128 px × 128 px, which corresponds to 5°× 5°of the sky. Eventually, the resulting dataset consists of 57 sets of 12,000 sky convergence maps for a total of 684,000 samples. At training time, we randomly rotate and flip the input image to augment the dataset.
The dataset is split into a training and test set in the following way: 11 cosmologies (132,000 samples) are selected for the test set, and the remaining 46 cosmologies (552,000 samples) are assigned to the training set, as depicted in Figure 2. This split is used to ensure that the model could interpolate to unseen cosmologies. At evaluation time, we use the cosmologies from the test set to validate the interpolation ability of our network. In the following sections, we show detailed summary statistics for the cosmologies marked with letters A, B, C, and D. We make the dataset publicly available. 3 2. the distribution of mass map pixels N pixels , compared using Wasserstein-1 distance and histograms, 3. the distribution of mass map peaks N peaks , which describes the distribution of values at the local maxima of the map, compared also compared using Wasserstein-1 distance and histograms, 4. the bispectrum B ℓ , which describes the three-point correlation of the folded triangles of different size, 5. Minkowski functionals, which are morphological measures of the map, and consist of three functions: V 0 , which describes the area of the islands after thresholding of the map at some density level, V 1 , their perimeter, and V 2 , their Euler characteristic (their number count minus the number of holes), 6. the Pearson's correlation matrices R ℓℓ' between the C ℓ of maps at different cosmologies, 7. the Multi-Scale Structural Similarity Index (MS-SSIM) (Wang et al., 2003;Odena et al., 2017), which is an image similarity measure commonly used in computer vision, 8. the Fréchet Distance between the output of a CNN regressor trained to predict Ω m , σ 8 , similarly to the Fréchet Inception Distance calculated using the Google Inception v3 network (Heusel et al., 2017).
The mass map histograms and the peak counts are simple statistics used to compare the maps and constrain cosmological models (see Gatti et al., 2020;Kacprzak et al., 2016 for examples). These metrics, however, ignore the spatial information in the maps. The angular power spectrum C ℓ or its real-space equivalent, the angular correlation function, is the most common statistic for constraining cosmology with LSS (see (Kilbinger, 2015) for review). The 2-pt functions capture only the Gaussian part of the fluctuations. The 3-pt correlation function, or the bispectrum, probes higher order information and has also been used for constraining cosmological models (Takada and Jain, 2003;Fu et al., 2014). Similarly, the Minkowski functionals have also been used for cosmological measurements (Petri et al., 2015) as an alternative statistic that extracts topological information from the maps.
The agreement between the pixel and peak values of N-body and GAN-generated images is quantified using the Wasserstein-1 distance W 1 (P, Q). This distance corresponds to the optimal transport of probability mass to turn the distribution P into Q. As it is scale-dependent, we calculate it after normalizing the pixel values: we subtract the mean and divide by the standard deviation. We use mean and standard deviation of all N-body generated images for a given cosmology, for both samples. This FIGURE 2 | The cosmological parameter grid used in this work, from (Fluri et al., 2019). The circles and diamonds show the training and the test sets, respectively. The total number of models was 57, of which 46 were used as the training set and 11 as the test set. The models labeled A, B, C, and D are investigated in more detail in Section 5.
Frontiers in Artificial Intelligence | www.frontiersin.org June 2021 | Volume 4 | Article 673062 way, the W 1 distance is easily interpretable: for a Gaussian with μ 0, σ 1, a 1σ shift of the mean corresponds to W 1 1, and scaling its variance by ×2 lead to W 1 ≈ 0.8. For B ℓ , C ℓ , and V 0,1,2 we calculate the simple fractional difference between the original and generated samples, defined as f x (x GAN /x N−body )/x N−body . We quantify the agreement between correlation matrices by comparing their Frobenious norms · F . For a N-dimensional, diagonal covariance matrix with elements σ 2 i , the Frobenious norm scales linearly with σ/ N √ . This way, it can be interpreted as a linear proxy for information content. We define the fractional difference between the Frobenious norm of GAN and N-body correlation matrices as: The Multi-Scale Structural Similarity Index (MS-SSIM) is useful in order to detect the problem commonly known as mode collapse, where the generator produces only a small subset of the training data distribution. Detecting this undesirable behavior is non-trivial as summary statistics can still agree during mode collapse. Taking inspiration from (Odena et al., 2017), one solution is to leverage the MS-SSIM score from (Wang et al., 2003) to quantify this effect. This metric was first proposed for prediction of similarity in human perception of images. Taking two images as inputs, it returns a value between 0 and 1, where 1 means "identical" and 0 means "completely different." As the mass maps are stochastic and only similar in a statistical way, we are not interested in the similarity between a pair of specific images, but in the average similarity of a large set of images. We calculate the significance of the difference in the SSIM measures in the following way: where 〈SSIM〉 is the mean score, and σ[SSIM] is the standard deviation. Large differences in the SSIM score indicate a significant difference in the samples generated by the GAN, thus pointing out to potential problems with the quality of the generated samples. On the other hand, a small difference will be an indicator that the generative model preserve the data statistics. Finally, we calculate an adaptation of the Fréchet Inception Distance (FID) (Heusel et al., 2017) between N-body and GAN -generated images. The Inception Score (IS) (Salimans et al., 2016) and FID have become standard measures for GANs. The idea consists to compare statistics of the output of the Google Inception-v3 network (Szegedy et al., 2016) for the ImageNet dataset (Deng et al., 2009). This has proven to be well correlated with human score. As the reference Inception network used for the FID was trained with the ImageNet dataset, its output statistics are meaningless for cosmological mass maps. To solve this challenge, we create our own reference network that is well suited for cosmological mass maps. This network is a CNN trained to perform a regression task and predict the true σ 8 , Ω m parameters, similarly to Schmelzle et al., 2017;Gupta et al., 2018). Its parameters and detailed explanations of its construction can be found in Table A2 and in Appendix B. The adapted FID score is obtained by comparing the regressor outputs for the N-body and GAN images. As regressor is composed of seven layers, this comparison depends on high order moments. Naturally, we expect that a well working conditional GAN should generate samples with similar output distribution to the one of the real samples. To estimate the distance between the two statistics distributions, we first approximate the network predictions with a normal distributions μ r , Σ r and μ g , Σ g , for the N-body and GAN -generated input, respectively. The FID is then calculated as: Note that this formula also correspond to the Wasserstein-1 distance between the two Gaussian distributions (Dowson and Landau, 1982). Eventually, before calculating FID, we normalize the network outputs for each true cosmology: we subtract the mean and divide by the standard deviation of the N-body sample.
For the ease of interpretation, we report the square root of FID. This way, a 1σ difference in the mean CNN predictions will correspond to FID 1/2 1. Similarly, a change of 1σ in the covariance matrix also leads to FID 1/2 1.

RESULTS
We trained the GAN model described in Section 2 and Appendix A. We used RMSPROP as an optimizer with an initial learning rate of 10 − 5 and a batch size of 64. The discriminator was updated 5 times more than the generator. The gradient penalty was set to 10 and the negative slope of the LeakyRelu α 0.2. It took a week to train the model for 40 epochs on a GeForce GTX 1080 GPU. Similar to (Reed et al., 2016;Odena et al., 2017;Miyato and Koyama, 2018;Mirza and Osindero, 2014), we use batches composed of samples from different parameter sets. Note that the batches were composed of samples from different cosmologies from the training set. The summary statistics are computed using 5,000 real and fake samples for every pair of parameters of the test set. The peaks are extracted by searching for all pixels greater than their 5 × 5 patch neighborhood, i.e. their 24 neighbors. Then, the histogram of the extracted peaks values is computed. We rely on LENSTOOLS (Petri, 2016) to compute the power spectra, bispectra and the Minkowski functionals. For the bispectrum, we use the folded configuration, with the ratio between one of the triangle sides and the base is set to the default value of 0.5. The SSIM is computed using the SCIKIT-IMAGE packge 4 (Van der Walt et al., 2014). We make our code is publicly available. 5 Figure 3 shows images generated by the conditional GAN and as well as original ones, for several values of Ω m and σ 8 parameters. They are visually indistinguishable. Furthermore, the image structure evolves similarly with respect of the cosmological parameters change. As predicted by the theory, increasing Ω m results in convergence maps with additional mass and increasing σ 8 in images with higher variance in pixel intensities. In Figure 4 the same latent variable z is used to generate different cosmologies. The smooth transition from low to high mass density hints that the latent variable control the overall mass distribution and the conditioning parameter its two cosmological properties σ 8 , Ω m . Figure 5 shows the histograms of pixels (top) and peaks (bottom) of the original maps simulated using N-body simulations (blue), and their GAN-generated equivalents (red), for the four models A,B,C,D shown in Figure 2. The peak counts were selected as maxima of the surrounding 24 neighbors. The solid line corresponds to the median of the histograms from 5,000 realisations, and the bands to 32% and 68% percentiles. The bottom part of each panel shows the fractional difference between the statistics, defined as f x (x GAN /x N−body ). The normlized Wasserstein-1 distance of the pixel values distribution (see Section 4) is: W pixel 1 0.04, 0.02, 0.02, 0.02, for models A, B, C, and D, respectively. That indicates that the histograms differ on the level of <5%. Similarly, the Wasserstein-1 distances of the peak value distribution for models A, B, C, and D is: W peak 1 0.04, 0.03, 0.01, 0.02. The agreement here is also very good, on <5% level.
The 2-pt and 3-pt statistics are shown in Figure 6. The power spectra C ℓ overlap almost perfectly for all the cosmologies lying inside the parameter grid used for training. Again, the agreement is better than 5%. The agreement for the bispectrum B ℓ is good for models B and C, but worse for A and B; the GAN model seems to underestimate the strength of the 3-pt correlations for these models, which differ by ≈20%. We note that the 3-pt signal is very weak and has a large variation, which may be difficult to model for the GANs.
The Minkowski functionals are presented in Figure 7. They were calculated using LENSTOOLS (Petri, 2016). The functional V 0 (first line) corresponds to the area of the emerging "islands," V 1 (second line) to their circumference, and V 2 (third line) to their Euler characteristic (their number count minus the number of holes in them). The value of threshold κ, above which the functional values, i.e. the "islands" are calculated, is shown on the x-axis. Here the agreement is typically better than 10%, with some model D agreeing much better, to ≈2%. The large differences in the fractional difference plots are due to instability close to value of V 0. The confidence limits of the summary statistics shown in these figures overlap very well, which indicates that the variability of these statistics is also captured very well by the GAN system.
The Pearson's correlation matrices R of the power spectra are shown in Figure 8. Those correlations were created from a coarsely-binned C ℓ in range ℓ ∈ [300, 3000]. The upper and lower triangular parts of the matrix show the original N-body correlations and the GAN correlations, respectively. We calculate the Frobenious norms of these matrices and compare their ratios using Eq. 3. For the models A, B, C, D this difference is: f R 0.02, 0.14, 0.06, 0.06. This agreement is overall very good, with model B being slightly worse. As the precision requirements for covariance matrices are not as strict as for the summary statistics, this level of agreement can be considered satisfactory for upcoming applications (Taylor et al., 2013).
We calculate the mean and standard deviation of MS-SSIM score between 5,000 randomly selected images for each cosmology, both for GAN and original N-body maps. We test if the mean SSIM score is consistent between the N-body and GAN data using Eq. 4. The SSIM difference significance for the four models A, B, C, and D are: s SSIM 0.08, 0.23, −0.27, 0.42,. This indicates very good statistical agreement for these models. Figure 9 shows the prediction of a regressor CNN trained on the N-body images with true σ 8 , Ω m values. For each category, we make the prediction with 500 randomly selected maps. The shaded areas show the 68 and 95% probability contours for the N-body image input (blue) and the GAN image input (red). The agreement is relatively good, but differences in the spread of these distributions is noticeable. The Fréchet Distance (FID) computed using the reference cosmological CNN, as described in Section 4, is: FID 1/2 1.26, 1.44, 1.00, 1.19, for models A,B,C, and D. This indicates a slight difference according to this metric and agree with the distributions in Figure 9.
We compare the summary statistics as a function of cosmological parameters for both the training and the test set. We used the training and test sets displayed in Figure 2. Figure 10 shows the six quantities as a function of cosmological parameters: top left: significance of the difference s SSIM in Multi-Scale Structural Similarity Index (Eq. 4), top center: Fréchet distance using a CNN regressor (Eq. 5).
Note that for a Gaussian distribution, a difference of FID 1/2 1 corresponds to either a 1σ shift in the mean or 1σ difference in standard deviation, top right: normalized Wasserstein-1 distance of the pixel value distribution. For a Gaussian distribution, a 1σ change in the mean corresponds to W 1 1, and an increase in standard deviation of ×2 to W 1 ≈ 0.8, bottom left: average fractional differences in the power spectrum f C ℓ , bottom center: fractional differences in the power bispectrum f Bℓ , bottom left: fractional difference in the Frobenious norm of the correlation matrices f R (Eq. 3).
Overall, we notice that the agreement between the N-body simulations and GAN-generated maps is the best in the center of the grid for both the training and test set. The fact that the differences in neighboring cosmologies are similar indicates that the GAN system can efficiently learn the latent interpolation of FIGURE 4 | Images generated with the same random seed but with different input cosmology parameters Ω m and σ 8 .
Frontiers in Artificial Intelligence | www.frontiersin.org June 2021 | Volume 4 | Article 673062 the maps. The agreement worsens at the edges of the grid. We observe the biggest deterioration in the realism of the GAN model for the high Ω m and low σ 8 parameters. This is most prominent for the correlation matrix and the SSIM differences. Conversely, the biggest difference for the bispectrum is present for low Ω m and high σ 8 .  Figure 2. The x-axis value κ is the map pixel intensity. The solid line is the median histogram from 5,000 randomly selected maps. The bands correspond to 32 and 68% confidence limits of the ensemble of histograms. The lower panels show the fractional difference between the median histograms.
FIGURE 6 | Comparison of the 2-pt and 3-pt functions between the original N-body maps and GAN-generated maps. The structure of this figure is the same as for Figure 5.
Frontiers in Artificial Intelligence | www.frontiersin.org June 2021 | Volume 4 | Article 673062 8 FIGURE 7 | Comparison of the Minkowski functional between the original N-body maps and GAN-generated maps. The value of threshold κ, above which the functional value is calculated, is shown on the x-axis. The functional V 0 corresponds to the area of the emerging "islands," V 1 to their circumference, and V 2 to their Euler characteristic (their number count minus the number of holes in them). The structure of this figure is the same as for Figure 5.  Figure 2. The upper triangular corresponds to the original N-body power spectra, while the lower triangular to the power spectra of GAN-generated images. The fractional difference of the Frobenious norms (Eq. 3) of these matrices is f R 0.02, 0.14, 0.06, 0.06, for models A, B, C, and D, respectively.
Frontiers in Artificial Intelligence | www.frontiersin.org June 2021 | Volume 4 | Article 673062 9 We proposed a new conditional GAN model for continuous parameters where conditioning is done in the latent space. We demonstrated the ability of this model to generate sky convergence maps when conditioning on the cosmological parameters Ω m and σ 8 . Our model is able to produce samples that resemble samples from the test set with good statistical accuracy, which demonstrates its generalization abilities. The agreement of the low order summary statistics (pixel and peak histograms and power spectrum) is very good, typically on the <5% level. Higher order statistics (Minkowski functionals, bispectrum) agree well, but with larger differences, generally around ≈10%, and in some cases ≈20%. The comparison of the Multi-Scale Structural Similarity Index (MS-SSIM) shows a good agreement in this metric, with the exception of the low σ 8 and high Ω m edge of the grid. Moreover, the GAN model is able to capture the variability in the conditioned dataset: we observe that the scatter of the summary statistics computed from an ensemble is very similar FIGURE 9 | Predictions of a regressor CNN trained to predict the Ω m , σ 8 from input images. The details of this experiment are described in Section 4 and the network architecture in Table A1 in Appedix A. The contours encircle the 68 and 95% samples for the N-body maps (blue) and GANgenerated maps from the test set. The black stars show the true values of the test set parameters. between the original and generated images. The investigation of the correlation matrices of the power spectra also shows a good agreement, with a quality deteriorating close to the edges of the grid, especially for low σ 8 and high Ω m . This is not unexpected, as the training set contains less information near the edges of the grid. More investigation is needed to more close inspect the behavior of the generative model in these areas. As generative models are rapidly growing in popularity in machine learning, we anticipate to be able to solve these problems in the near future. Our results offer good prospects for GAN-based conditional models to be used as emulators of cosmologydependent mass maps. As these models efficiently capture both the signal and its variability, the map-level emulators could potentially be used for cosmological analyses. They can accurately predict the power spectrum and its covariance, which is often unattainable in standard cosmological analyses (Eifler et al., 2009). It can also be used for non-Gaussian analyses of lensing mass maps, such as, for example, in (Zürcher et al., 2020;Parroni et al., 2020). Further experiments will be needed, however, to bring the generative models to a level where they can be of practical use in a full, end-to-end cosmological analysis.
In this paper, we have demonstrated the ability of generative AI models to serve as emulators of cosmological mass maps for a given redshift distribution of source galaxies n(z). Generative models have also been shown to work directly on the full or sliced 3D matter density distributions (Nathanaël et al., 2019;Tröster et al., 2019;Villaescusa-Navarro et al., 2020). The three dimensional generation of cosmological fields proves to be particularly difficult. As most of the survey experiments publish their lensing catalogs and their corresponding redshift distributions, the generation of projected maps, as shown in this work, could be of direct practical use. Another challenge will be posed by the large sky area of the upcoming surveys and their spherical geometry. Spherical convolutional neural networks architectures have been proposed (Perraudin et al., 2018;Krachmalnicoff and Tomasi, 2019;McEwen et al., 2021). These architectures are expected to be easy to implement with generative models, which offers good prospect for the development of spherical mass map emulators.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://renkulab.io/projects/nathanael. perraudin/darkmattergan, https://zenodo.org/record/4646764 AUTHOR CONTRIBUTIONS NP experiment lead, experiment execution, code maintenance, paper writing SM experiment execution, implementation of conditional GANs AL machine learning advice, consulting, paper writing TK dataset creation, cosmology advice, consulting, paper writing, plotting, method evaluation. Model Table A1 summarizes the architecture of the GAN system, i.e. the generator and the discriminator. From the latent variable z and the cosmological parameters σ 8 and Ω m , the generator starts by computing z using Eq. 2. As a second step, z is transformed with three linear layers. i.e. a Multi Layer Perceptron (MLP), that outputs a 32768 tensor (g 0 to g 2 ). The data is then reshaped to 8 × 8 × 512 (h 3 ) and further transformed with four deconvolutional layers with stride 2 and kernel sizes of 3 × 3 or 5 × 5 (g 4 to g 7 ). The last generator layer consists of a deconvolution with stride 1 and kernel size 7 × 7 and it is intended to generate fine-grained details (g 8 ). The discriminator is symmetric to the generator with two exceptions. First the parameters σ 8 and Ω m are concatenated in d 5 just before the first linear layer. Second, an extra linear layer is added at the end of the discriminator (d 9 ) in order to recover a single output. All layers are separated by a LeakyRelu activation function with the parameter α 0.2 (Maas et al., 2013).
Training The cosmological dataset described in Section 3 is used to train the GAN, where the batches are composed of samples from different cosmologies. We select a Wasserstein loss, with a gradient penalty of 10 . We use RMSProp as an optimizer with an initial learning rate of 10 − 5 , and a batch size of 64. The discriminator is updated 5 times more often than the generator. The model is trained for 10 − 5 epochs on a GeForce GTX 1080 GPU, which takes around 170 h.

APPENDIX B. REGRESSOR TRAINING
Given real and generated images, the general idea of the Frechet Inception Distance (FID) is to compute the distance between some of their complex statistics. For natural images, these statistics are given by the last layer, i.e. the logits, of a pretrained Inception-V3 network (Szegedy et al., 2016). As these statistics are meaningless for our cosmological data, we build new ones using a carefully designed regressor. Given an image, the regressor is trained to predict the two parameters Ω m and σ 8 . We provide the regressor weights with the code to make our FID metric reusable.
Data Naturally, we use the training dataset described in Section 3, i.e. 46 different cosmologies composed by 12000 images each. This training dataset is further randomly split into a regressor training set (80%) and a restressor test set (20%).
Model The architecture of the regressor is described in Table A2. It shares the same structure as the GAN discriminator. It consists of a four convolutional layers followed by three linear layers with leaky relu non-linearity. The last layer is a linear layer with two outputs and it is responsible for producing the predicted parameters. We select the LeakyRelu activation functions for better gradient propagation.
Training We use the mean squared error between the predicted and true parameters as a loss function. The model was trained for 20 epochs using an Adam (Kingma and Ba, 2014) optimizer with an initial learning rate of 3 · 10 − 5 , β 1 0.9, β 2 0.999 and ϵ 10 − 8 and a batch size of 64. The mean squared error evaluated on the test set corresponds to 8.93e −5 , which is low enough for the purpose of computing the FID.