Double k-Grid Method for Solving the Bethe-Salpeter Equation via Lanczos Approaches

Convergence with respect to the size of the k-points sampling grid of the Brillouin zone is the main bottleneck in the calculation of optical spectra of periodic crystals via the Bethe-Salpeter equation (BSE). We tackle this challenge by proposing a double grid approach to k-sampling compatible with the effective Lanczos-based Haydock iterative solution. Our method relies on a coarse k-grid that drives the computational cost, while a dense k-grid is responsible for capturing excitonic effects, albeit in an approximated way. Importantly, the fine k-grid requires minimal extra computation due to the simplicity of our approach, which also makes the latter straightforward to implement. We performed tests on bulk Si, bulk GaAs and monolayer MoS2, all of which produced spectra in good agreement with data reported elsewhere. This framework has the potential of enabling the calculation of optical spectra in semiconducting systems where the efficiency of the Haydock scheme alone is not enough to achieve a computationally tractable solution of the BSE, e.g., large-scale systems with very stringent k-sampling requirements for achieving convergence.


I. INTRODUCTION
Many-body perturbation theory (MBPT) offers the right framework for treating neutral excitations via Green's function methods [1][2][3][4][5].This requires solving the Bethe-Salpeter equation (BSE) [6][7][8], which relies on a two-particle propagator to account for the presence of electron-hole pairs (i.e., excitons).The description of excitonic effects is crucial to compute optical spectra in extended systems, particularly in semi-conductors and insulators, for which methods based on the Random Phase Approximation (RPA) or time-dependent (TD-) density functional theory (DFT) with (semi-)local exchange-correlation functionals tend not to agree with experimental results [1,2].Calculations within the BSE framework are generally much more cumbersome and computationally demanding than DFT ones, and it is rather easy to reach the limits of what can be practically computed.Hence, the need for convergence studies is a key aspect of every MBPT calculation to alleviate the computational burden as much as possible while, at the same time, trying to ensure an accurate description of the system at hand.In general, electronic structure calculations in periodic systems treated plane waves require convergence with respect to the size of this basis set as well as the sampling of the Brillouin zone (BZ).In particular, MBPT methods also require the inclusion of unoccupied states in the form of an, in principle, infinite summation that needs to be truncated to the minimum value that nonetheless captures the physics at play.Furthermore, the solution of the BSE requires far denser k-sampling than DFT calculations to achieve an accurate description of excitons.This is because the excitonic wave-functions are usually quite spread out, with a periodicity well beyond the unit cell, and in order to expand them in a basis of transitions {vck} (electron-hole space), very dense k-grids are required.Moreover, BSE methods do not exploit symmetry so they are solved in the full BZ.The cubic scaling of the number of k-points in bulk systems from one grid to the next makes matters even worse (the quadratic scaling in 2D systems is more manageable).Such k-grid requirements may still be feasible for small systems, with few atoms per unit cell and few valence electrons per atoms.However, medium to large size unit cells of atoms with many valence electrons (e.g.transition metals) become prohibitively costly as the number of k-points increases, and the solution of the BSE in a very dense grid (e.g. 60 × 60 × 60) is simply out of reach.For all these reasons, the issue of k-point convergence is critical for the solution of the BSE and represents the bottleneck in its computational implementation.Therefore, the introduction of alternative numerical methods and approximations that can effectively deal with k-point convergence in the BSE is of utmost importance.Albeit with the limitations described above, there are currently several approaches to solve the BSE.These are, in order of decreasing computational cost, inversion, full diagonalisation and Lanczos approaches, and will be described below.A first distinction would be based on whether the equation is solved in its Dyson-like form or re-cast as a two-particle Hamiltonian in transition space.The first approach requires the inversion of the BSE kernel matrix which, depending on the size of the matrix, can become impracticable.In such cases one would turn to the Hamiltonian formulation of the problem (see, for example, ref. [2]).In the latter, the two-particle Hamiltonian is diagonalised to obtain the eigen-values (excitonic energies) and eigenvectors (excitonic wave-functions).If the BSE matrix of a given system is still too big for full diagonalisation, one can resort to Lanczos [9] approaches which are usually a cost-effective option for sparse matrices [10].These algorithms have been widely used for the calculation of response functions, both at the TD-DFT [11,12] and BSE [13] levels.In the latter, Lanczos approaches eliminate the need for inverting the BSE kernel or fully diagonalising the two-particle Hamiltonian.Rather, the latter is re-expressed as a tri-diagonal matrix based on recursive relations, which leads to an iterative solution of the problem that is computationally cheaper than full diagonalisation.Unfortunately, while previously described solvers produce the full set of both excitonic energies and wavefunctions of the system at hand, Lanczos schemes lead to a partial solution of the problem.For instance, Haydock's implementation [14] of the Lanczos approach provides only the full set of the eigen-values of the two-particle Hamiltonian (i.e., one obtains the full spectrum but not the excitonic wave-functions).Despite the numerical advantages of Lanczos solvers, a given system could still be too big for computing optical spectra.As the diagonalisation itself ceases to be a problem with Lanczos schemes, the bottleneck now shifts to the previous step of computing and storing the BSE kernel, which can render the calculation impracticable depending on the size of the electron-hole (e-h) basis.Nothing too extreme would be required to reach this condition, e.g., a magnetic system with around 100 electrons per unit cell, slow convergence with respect to bands and a 6 × 6 × 6 k-grid would certainly be beyond reach.At this point, there is little alternative for solving the BSE and computing optical spectra, which is the challenge we intend to tackle in this manuscript.
The work presented here concentrates on improving the convergence of optical response spectra calculations within the BSE with respect to the number of k-points.
This issue has been the target of many research efforts over the years.Rohlfing et al.
introduced a scheme to interpolate the BSE matrix in the BZ [15].Their strategy is based on a double grid approach by which the kernel matrix elements are properly calculated on a coarse k-grid and approximated on a fine k-grid.As a function of q, the k-point difference between two transitions in e-h space, the BSE kernel is sharply peaked at the origin and a regular interpolation in the BZ would fail.However, expressing these matrix elements as aq −2 + bq −1 + c results in the coefficients varying slowly in the BZ.These coefficients are then interpolated by virtue of knowing them exactly in the coarse k-grid.Their approximation also considers the varying phases of the single-particle states in the BZ, which requires knowledge of the wavefunctions in the fine k-grid.This crucial point becomes a drawback when one is limited by memory and disk storage rather than computation, which is increasingly the case nowadays.More recently, Fuchs et al. proposed the use of hybrid k-meshes in the form of a coarse k-grid for the whole BZ and a denser k-grid around the Γ-point [16] only.Even though the kernel matrix elements are properly calculated on both grids, this method allows to refine k-sampling only where is needed, resulting in fewer k-points in total.The downside of using non-uniform grids becomes apparent in the calculation of the electron-hole attraction term of the BSE kernel, as knowledge of the screening at q-points not included in the hybrid grid itself will be needed.This complication requires additional computation (or at least an interpolation) if one intends to use the RPA screening, as is the case in this work.Kammerlander et al. applied double grid techniques to solving the BSE by inversion [17].In the latter, the BSE is solved on the coarse k-grid while the fine k-grid us used compute the independent particle part of the two-particle response function.This technique, which also benefits from Wannier interpolation of the Kohn-Sham (KS) orbitals, has proven successful in accurately reproducing the spectra of several materials.However, as it ultimately relies on matrix inversion, its application is limited to small systems, i.e., systems which could be computed by the inversion solver in the coarse grid, albeit underconverged.Finally, an interesting generalisation of the method in ref. [15] has been proposed by Gillet et al. [18], where the interpolation of the BSE kernel matrix element at a given fine-grid k-point considers eight coarse-grid k-points around it.
Importantly, this method is compatible with Haydock's solution scheme to the BSE.Moreover, substantial savings in memory requirements and disk storage are achieved by interpolating kernel matrix elements on the fly.Nevertheless, this method still requires knowledge of the KS orbitals in the fine grid.Depending on the number of bands and density of the fine grid, this can entail prohibitive memory requirements.
In this work, we also propose a double grid approach, including a coarse k-grid where the BSE kernel is properly calculated and a fine, denser k-grid where the corresponding matrix elements are approximated.At variance with ref. [17], we propose an approximation that is compatible with the computationally cheapest solution to the BSE, namely Lanczos-based iterative solvers.Crucially, this allows us to target materials for which optical spectra cannot be currently computed, i.e., relatively big systems that can only be solved by Lanczos approaches and in k-grids that fall short of a converged solution.Another distinctive feature of the method presented here is its simplicity.It is far easier to implement than previous attempts [15,16].Impor-tantly, the introduction of the fine k-grid requires minimal extra computation and memory with respect to the coarse k-grid.In particular, knowledge of the wavefunctions in the fine k-grid is not needed, nor is the calculation of the RPA screening in any extra k-point.Therefore, the computational cost remains roughly at the level of the coarse k-grid, while an approximate description of broad excitons is achieved by virtue of adding a fine k-grid.The remainder of the manuscript is structured as follows.Section II describes the proposed double grid approach in detail while Section III reports the results obtained for a variety of semiconductors.Section IV presents the gains in computational cost that our implementation achieves.It also outlines a comparative assessment of particular choices made within the method and discusses the limitations of the approach.

A. Haydock solution of the BSE
Optical absorption spectra are represented by the imaginary part of the macroscopic dielectric function Im[ M ], which is obtained by taking the long wavelength limit of an expression involving the microscopic dielectric function (q, ω)-where q represents the transferred momenta while ω is the frequency.For neutral excitations, (q, ω) is defined in terms of the polarisation or density-density response function χ, which is in turn calculated within the RPA, i.e., as a Dyson-like equation being the non-interacting polarisation χ 0 a product of non-interacting one-particle Green's functions that describe the propagation of one electron or one hole.However, optical spectra of extended systems require the inclusion of excitonic effects, which will ultimately lead us to a two-particle Green's function that describes the dynamics of an electron-hole pair {vck} (we only consider vertical transitions at point k in the BZ between an occupied band v and an empty band c).This is achieved by defining the macroscopic dielectric function via an interacting polarisation χ, i.e., M (q, ω) ≡ 1−v(q) χG=0,G =0 (q, ω).This interacting polarisation is obtained in terms of an electron-hole (e-h) Green's function L as in Eq. 1, In Eq. 1, ρ nmk (q, G) = nk|e i(q+G)•r |mk − q are the oscillator strengths.For simplicity, unpolarised electrons are assumed in the discussion, however we stress that the method is not limited to non-magnetic systems.The Bethe-Salpeter equation is then the Dyson-like equation for L, where the matrix Ξ is the so called BSE kernel, Vnmk The BSE kernel is written as shown in Eq. 3 and its two contributions, namely the e-h attraction W and the e-h exchange V , can be calculated as in Eqs. 4 and 5.The solution of this Dyson-like equation would require to invert the BSE kernel, which can be prohibitively costly as explained in Section I. Hence, the problem is re-cast in terms of a two-particle Hamiltonian in e-h space, where E nmk is the energy of the vertical transition from band n to band m at point k according to either the KS or quasi-particle (QP) energies.
Diagonalising this matrix would provide the excitonic eigen-values and eigenvectors required to compute optical spectra, however in this work, we focus on Lanczos-based methods.In particular, Haydock's algorithm [14,19,20] consists on an iterative method based on a set of recursive relations, namely, with n being the iteration index.This set of equations corresponds to Hermitian Hamiltonians (the pseudo-Hermitian case has a slightly more complicated form [21]).
Eqs. 7-9 allow for the calculation of the factors a and b, and the so called Haydock vector for the next iteration |V n+1 .The initial Haydock vector is calculated as On each iteration n, the optical spectrum is calculated according to, until the difference between spectra of successive iterations is below an acceptable threshold.

B. Double grid approach
First, we consider a coarse k-grid where no approximations are applied, i.e., the BSE kernel is computed for all vertical transitions involving k-points in this grid, which requires knowledge of the KS orbitals and energies (potentially corrected to QP energies) at each of these k-points.The solution of the BSE in this grid would typically be computationally manageable but produce underconverged optical spectra.
Thus, a much denser fine k-grid will be added to the system.We will denote k-points belonging to the fine grid with the letter κ, while those in the coarse grid will be labelled K.Moreover, κ-points will be grouped in domains centred around the Kpoints in such way that Dom(K i ) will be composed by the κ-points that are closer to K i than to any other K-point.The number of k-points in this fine grid would ordinarily be too large for the BSE to be solved in full, and hence, approximations will be introduced for the fine grid.The two-particle Hamiltonian in Eq. 6 can be thought of as a shift (the diagonal matrix containing the energies of each transition) plus a rotation (the BSE kernel).The approximation proposed implies that the diagonal matrix is calculated in the fine grid, for which knowledge of the KS energies of each band at every κ-point in the fine grid is required.The BSE kernel, however, will not be calculated in full but rather, every matrix element involving at least one transition in the fine grid will be approximated according to some rules for kernel extension.This allows the method to dispense with the KS orbitals in the fine grid, which has a great impact on memory requirements.
The way in which the BSE kernel is extended from the coarse to the fine grid has been carefully considered as it has significant impact on the results.The best agreement with experimental spectra was achieved with an approach we refer to as diagonal kernel extension (DKE).Let us consider one k-point in the coarse grid, K I .There will be a group of κ-points in the fine grid that map to it, namely those in the domain Dom(K I ).We will label those with a second numerical sub-index as κ I 1 , κ I 2 , κ I 3 , ..., κ I i , .... Given that the fine grid contains the coarse grid, we have that κ I 1 = K I , while κ I i with i = 1 are other fine grid points close to K I .Having established the nomenclature in this way, then DKE would imply the definition where the R.H.S is known and calculated exactly while the L.H.S is the unknown matrix element we are trying to approximate (see SI for a visual representation of Eq. 12).Thus, Eq. 12 is only exact for the k-point that belongs both the coarse and the fine grids (i = i = 1), and approximated otherwise.Even though the BSE kernel is not, in general, a diagonally-dominant matrix, it is true that the diagonal matrix elements usually have values orders of magnitude higher than those of immediately close off-diagonal elements.The DKE approach preserves this character when extending the kernel from the coarse grid to the fine grid.Essentially, each matrix element of the coarse grid BSE kernel expands into a block in the fine grid matrix.
The DKE method ensures that each block is strictly diagonal, which is very relevant when expanding one of the diagonal matrix elements of the coarse grid matrix.In Section IV B, the DKE is compared with a possible alternative kernel extension.
Finally, let us discuss how this double grid method fits within the Haydock algorithm.It is apparent from Eqs. 7-9 that this scheme relies mainly on the matrix vector multiplication H 2p |V n , so we will focus on how this is adapted to account for the fine grid.The two-particle Hamiltonian has already been described above, i.e., the BSE kernel is approximated by DKE (Eq.12) and the diagonal part needs no approximation as the KS (or QP) energies are known in the fine grid.All there is left is then to define how the Haydock vectors |V n are extended to the fine grid and initialised.The initial Haydock vector |V 0 is calculated in the coarse grid according to Eq. 10.Each component is associated to one transition vck and thus, when moving from the coarse to the fine grid, the number of components will increase according to the ratio between the number of κ-points and K-points.From Eq. 10, it is clear that the KS orbitals at the κ-points would be required to properly initialise the Haydock vector in the fine grid.As these orbitals are not available in our method, those components will be initialised as being equal to the corresponding transition in the coarse grid.In other words, where FG denotes the fine grid.It is apparent that |P FG has many more com- grouping components by their κ-point as fragments.These fragments are convenient for the matrix vector multiplication.In fact, this operation can be expressed as where c vcκ I i = vcκ I i |V are the components of the vector to be multiplied and r are, analogously, the coefficients of the resulting vector.Applying the DKE to the BSE matrix (Eq.12), we obtain where matrix elements in the R.H.S are those of the coarse-grid BSE kernel and the resulting summation runs over the K-points in the coarse grid only.Computationally, this means adding a loop over the κ-points in the domain of each K.

III. RESULTS
The double grid method proposed here to calculate optical spectra via the BSE has been implemented in the Haydock solver of the Yambo code [5,22] and tested on a variety of semiconductors.In this section, we present the resulting optical spectra of each material.We note that the spectra produced by this approach represented a sharp improvement with respect to the coarse grid solution, while requiring only a marginal increase in computational cost.Although Gamma-centred k-grids were used throughout this study, our method can also be used with shifted grids (see an example in SI).

A. Si bulk
It is notoriously difficult to converge the optical spectrum of bulk Si with respect to k-points since a very dense k-sampling is required to properly describe its excitons.
The starting point for our Si calculations is a severely under-converged 8 × 8 × 8 kpoint grid.The spectrum produced by this coarse grid alone shows numerous spurious peaks (see Fig. 1), which reveals a high degree of artificial localisation of the excitons imposed by the 8 × 8 × 8 k-grid.We then took the latter as the coarse grid for the double grid method and added a fine grid of κ-points to it.Fig. S1 shows that a fine (double) grid of 24 × 24 × 24 κ-points on top of this coarse grid immediately suppresses this artificial localisation.Denser double grids improve upon this result (See Fig. S2).Ultimately, the spectrum obtained with a 60 × 60 × 60 fine κ-grid on top of an 8 × 8 × 8 coarse K-grid is in close agreement with experiments (see Fig. 1).The comparison here is done with experimental data at 10 K available in the literature for Si bulk [23].

B. GaAs bulk
As in the case of Si, GaAs also requires very dense k-sampling for its optical response to be converged.The coarse grid in this case is an under-converged 10 × 10 × 10 Gamma-centred k-point grid.Indeed, the spectrum produced by this coarse grid alone also presents various spurious peaks, revealing a high degree of artificial localisation of its excitons (see Fig. 2).Fig. S3 shows that adding a fine (double) κ-grid of 20 × 20 × 20 does not solve the problem fully.However, the spectra with 40 × 40 × 40 or 60 × 60 × 60 κ-grids match the experimental data relatively well (see Fig. S4 and Fig. 2, respectively).In particular, the latter grid appears to capture the splitting of the first exciton present in the experimental data, although a denser coarse grid would be required as a better starting point to fully reproduce this feature.
Indeed, this result still seems to suffer from slight artificial localisation, e.g., at 3.5 eV.All k-grids are Gamma-centred.
The comparison is drawn to available experimental data for GaAs at 22 K [24].

C. MoS 2 monolayer
The convergence of the absorption spectrum of monolayer MoS 2 with the k-grid within BSE has been discussed in the appendix of Ref. [25].The latter study reports a converged spectrum using a very dense k-point grid and clarifies that when spin-orbit coupling is not considered, the first excitonic peak should not show any splitting.In fact, other works which used an under-converged k-grid, mistakenly showed a splitting of the first excitonic peak in collinear spin-polarised calculations with no spin-orbit coupling accounted for.
Here, we aim to reproduce the spectra in Ref. [25] via the double-grid method (i.e., at a fraction of the computational cost).Our results follow a similar trend to the spectrum thereby reported.In the work of Molina-Sanchez et al., spectra with 12 × 12 × 1 or 18 × 18 × 1 (single grids) show splitting of the first peak while (single) k-grids of 24 × 24 × 1 or 30 × 30 × 1 solve the issue.Our result with a 12 × 12 × 1 single grid is equivalent to that of Ref. [25] (except for a rigid shift in energy), i.e., it shows undue splitting (see Fig. 3).Adding a double grid of 24 × 24 × 1 κ-points also results in undue splitting while 48 × 48 × 1 appears to eliminate it (see Figs. S5 and S6, respectively).A double grid of 60 × 60 × 1 further improves upon this result (see Fig. 3).Importantly, the quality of the double grid spectrum with a double grid of 60 × 60 × 1 κ-points is not far from what was achieved in Ref. [25] with a 30 × 30 × 1 single grid.The double grid approach correctly captures the physics at play despite representing roughly the computational cost of a 12 × 12 × 1 regular BSE Haydock calculation (see Sec. IV).IV.DISCUSSION

A. Computational cost
As described above, Lanczos approaches to the BSE eliminate the need to invert the BSE kernel or fully diagonalise the two-particle Hamiltonian, which would become the bottleneck of the calculation whenever required.Instead, Lanczos solvers replace these highly demanding tasks by very efficient and computationally inexpensive iterative schemes.This numerical advantage means that the solution step itself does not drive the computational cost any longer, but rather, computing and storing the BSE matrix now becomes the bottleneck of the calculation.The method proposed in this work addresses this issue directly.First, the KS orbitals in the fine grid need not be available, i.e., not stored nor loaded into memory.Moreover, the kernel matrix elements in the fine grid, and consequently, the corresponding oscillator strengths, need not be calculated.As a result, the size of the BSE kernel matrix will effectively be that of the coarse grid.For instance, if we consider a coarse grid of 10 × 10 × 10 and a fine grid of 60 × 60 × 60, then there would be 1000 K-points and 216000 κ- Let us consider monolayer MoS 2 to address how the computational cost compares between our double grid approach with a given fine κ-grid and the regular (full ) BSE calculation using that same fine grid as the only (single) k-grid.Fig. 4 shows the combined time required to calculate the BSE kernel and solve the eigen-problem via Haydock's scheme as a function of k-points used in the calculations.For comparability purposes, all the calculations shown in Fig. 4 have been carried out with just one processor.For the full solution of the problem (brown circles) the number of k-points has quadratic scaling from one k-grid to another (as it does for any 2D material) and the CPU time scales quadratically with the total number of k-points.This latter dependence stems from the size of the e-h basis set and the number of matrix elements of the BSE kernel, i.e., (N K × N c × N b ) 2 .The computational cost of the double grid approach proposed in this work (green-blue diamonds connected by lines) increases only slightly with the size of the fine-grid, when the same coarse grid is used.Since the BSE kernel is calculated only in the coarse-grid, this increase is due to the Haydock solver, which now has to process larger Haydock vectors.Nonetheless, it is apparent that the increased CPU time due to Haydock is minor and far more manageable than the scaling of the full BSE problem.Overall, fine grid has little impact on the CPU time required by the method we propose.In fact, Fig. 4 clearly shows that the computational cost of the double grid method is roughly driven by the coarse grid.

B. Kernel extension to fine grid
The kernel extension is the key approximation used in the double-grid approach.
The DKE in Sec.II was selected for its simplicity and low computational cost.There are other possible kernel extensions.In particular, we also considered a kernel extension with similar characteristics, that we refer to as full kernel extension (FKE).First, let us define the FKE approach formally: FKE implies that each matrix element of the coarse grid BSE kernel is expanded into an all-ones block times the original matrix element, which leads to (see SI for a visual representation of Eq. 16).As a result, the way in which the fine-grid matrix vector multiplication is carried out also differs from DKE.In FKE, this operation is performed as Lines connect double-grid calculations using the same coarse-grid.All calculations were carried out in one processor for comparability purposes.We consider all e − h pairs from the top 3 valence bands to the 5 bottom conduction bands.
In terms of the spectra produced by either kernel extensions, the comparison consistently favours DKE over FKE in all the materials tested in this work.The difference may be less noticeable in systems with weaker excitonic effects.A comparison for the materials in Sec.III is shown in Fig. 5.In the case of Silicon (Fig. 5A), it is apparent that DKE is better than FKE at suppressing the artificial localisation found around 3.6 eV.For GaAs (Fig. 5B), DKE also shows an improvement with respect to FKE when dealing with the artificial localisation at around 3.1 eV.Finally, monolayer MoS 2 (Fig. 5C) also follows the trend found in this work, i.e., DKE is consistently better than FKE.In this case in particular, the difference between both approaches is very stark.In fact, the FKE approach shows little to none improvement with respect to the 12 × 12 × 1 single k-grid as far as the first exciton is concerned (cf.Fig. 3).
In order to explain the better performance of DKE over FKE, we will discuss the properties of the BSE kernel and the two-particle Hamiltonian matrices, which are related by Eq. 6.In general, the kernel matrix elements Ξ nmk n m k are sharply peaked at q = 0 [15,26], i.e., for k = k .This does not mean that every matrix element with q = 0 will have a higher value than the remaining matrix elements.
In fact, that is only true for the diagonal elements Ξ nmk nmk , while the q = 0 elements coupling different sets of bands (Ξ nmk n m k ) are closer in value to all other q = 0 matrix elements.This is again exemplified with monolayer MoS 2 in Fig. 6.The latter shows the module of every matrix element between a given transition (v = 13, c = 14 and k 1 = (−0.166,−0.166, 0)) and every other transition in the e-h space, i.e., one row of the BSE kernel matrix.This data is plotted as a function of the magnitude ||q||/||q|| max sgn(q x ), where q = k − k 1 .Fig. 6A shows the BSE kernel as obtained with a single grid of 6 × 6 × 1 k-points, where we can see that the diagonal matrix element (the selected transition with itself) is an order of magnitude higher than all other matrix elements (many of which also have q = 0).The fine grid of 12 × 12 × 1 k-points better captures the build-up to the peak of the graph as it has many more kpoints around the selected one (see Fig. 6B).Unfortunately, the double grid approach proposed here cannot capture this feature because it is meant not to imply any extra computation or storage of matrix elements at fine grid κ-points.However, the reader should bear in mind that while this feature is missing in our approximated BSE kernel, the benefits of this double grid approach reside in exactly knowing the transition energies at the fine grid κ-points (see SI for detailed discussion).At this point, what we expect from the approximated kernel is not to introduce unphysical matrix elements, and in this regard DKE performs much better than FKE.Fig. 6C shows how the BSE kernel matrix elements approximated by DKE still represent a function of q that is sharply peaked at the origin.Conversely, the FKE approach means that many matrix elements in Dom(k 1 ), and consequently at q = 0, will take the value of the peak.We know that such behaviour as a function of q would not arise should more k-points be included (see 6B).Hence, we believe DKE constitutes a better approximation of the BSE kernel than FKE.Further arguments in favour of the DKE over the FKE are presented in SI.

C. Limitations of the approach
The double-grid approach presented in Sec.II B is based on two approximations: the DKE (Eq.12) and the approximation of the starting Haydock vector (Eq.13).
The DKE has been extensively analysed in Sec.IV B. From the analysis, it emerges that the predominance of the matrix elements with q ≈ 0 is crucial to the success of the approximation.This is consistent with the spatial de-localisation of the exciton over many unit cells.Conversely, when the exciton is localised on few unit cells-as it is the case for instance in wide-gap insulators-the approximation may break down because of the significant contribution to the BSE kernel of matrix elements with q = 0. We verified this is the case, for example, for bulk hexagonal boron nitride (h-BN).The breakdown of the approach for these cases is, however, not critical.In fact, excitons that are localised on few unit cells can be described accurately with a modest k-point sampling and the double-grid is not needed.
The approximation for the starting Haydock vector (Eq.13) implies the assumption that (within the length gauge and dipole approximation) the dipole matrix elements in the fine grid can be approximated by those in the coarse grid, namely, for κ I i ∈ Dom(K I ), where r is the position operator.This assumption can be verified at the level of the independent particle approximation (IPA) by comparing the IPA spectrum obtained with the double-grid approach (which we call Haydock-IP) with the IPA spectrum calculated on the fine grid.In fact, in the independent particle case, Eq. 18 is the only approximation introduced by the double grid.For the systems considered in Sec.III, we verified that indeed the IPA spectra obtained within the double-grid approach agree well with the IPA calculated on the corresponding fine grid (see Fig. 7).It is also interesting to note that this particular approximation is valid for h-BN, which singles out the BSE kernel (q ≈ 0) approximation as the only factor hindering the application of the double-grid method to this material.In particular, GaAs shows a minor discrepancy in the IPA spectra around 2.1 eV (see Fig. 7), a region of the spectrum where k-point convergence is markedly difficult.This is due to the steep dispersion of the conduction band of GaAs around the Gamma point, where the optical gap occurs (see, for example, [24]).As a result, the approximation of the oscillator strengths around Gamma by the corresponding matrix element at Gamma (Eq.18) is a rather poor one, which translates into an unphysical feature around 2.1 eV in the BSE spectrum as well (see Fig. 2).
There are also instances in which the approximation in Eq. 18 breaks down substantially.As an example, Fig. 7 shows this breakdown for the optical absorption of bulk black-phosphorous (BP) along the armchair direction [27].The IPA spectrum obtained within the double-grid approach has strong peaks around 0.3 eV which are not present in the reference calculation.The appearance of this artefact can be understood considering that the dipole matrix elements (Eq.18) are calculated as , where v is the velocity operator.BP has a minimum KS band-gap of about 0.2 eV (0.1 eV at the DFT level) and thus the corresponding dipole matrix element is large.Within the double-grid approach, all fine-grid k-points in the domain of the k-point corresponding to the minimum KS band-gap use the same value which largely overestimates the actual dipole matrix element.Notably, carrying out the calculations in the primitive rather than in the conventional unit cell (see Fig. 7), improves the agreement with the reference IPA fine-grid spectrum, suggesting that in this case the coarse grid does a better job at sampling the Brillouin zone around the k-point corresponding to the minimum KS band-gap.Nevertheless, this Haydock-IP spectrum still presents artificial features between 0.5-1.0eV, preventing the application of the double-grid method presented in this work to BP.
Further to note is that using Eq.18 the extension from the coarse to the full grid is done for the position dipoles, i.e. within the length gauge.This implies that, to ensure gauge invariance [28], nκ i.e. to perform the extension of the velocity matrix elements and accordingly obtain Preliminary results show that such choice may in fact lead to better results for BP.

D. Workflow implementation
Based on this discussion, we can propose a workflow to assess whether a given material satisfies these approximations and could thus be described well by the doublegrid method presented here.Firstly, a full fine grid IP calculation must be converged with respect to k-points.

V. CONCLUSIONS
In this work, we presented a double grid approach to the problem of k-point sampling in the solution of the BSE equation for the calculation of optical spectra of semiconductors.This responds to the fact that very dense k-point grids are required for BSE calculations to be fully converged due to the large periodicity of excitonic wavefunctions, usually reaching several supercells.This sampling requirement is the bottleneck in BSE calculations and, for a wide variety of solids, this imposes a computational burden that renders the calculation prohibitively costly.We tackled this challenge by applying a double grid approach to the computationally cheapest among the BSE solvers, i.e., the Lanczos-based Haydock scheme, thus maximising the size and range of materials for which this method could be useful.Our double grid approach is based on combining a coarse k-grid where both KS eigen-values and eigenvectors are known with a fine k-grid where only KS energies are required, which eases memory and disk storage requirements.With this strategy, the coarse k-grid drives the computational cost while the k-fine grid tries to capture the physics of spread out excitons in an approximated way without requiring significant extra computation.
This scheme was implemented in the Yambo code (see SI for availability) and tested for bulk Si, bulk GaAs and monolayer MoS 2 , all of which are known to require very dense k-point grids to achieve convergence.The results are satisfactory in all cases, reproducing data reported elsewhere with a relatively low computational cost close to that of the coarse grid alone.There is a slight increase in the CPU time required by the Haydock step, however this scales very favourably with increasingly dense k-meshes, at variance with regular non-double grid approaches.Different ways to extend the BSE kernel calculated in the coarse grid to the fine grid are discussed and compared, determining that the so-called diagonal kernel extension is the preferred method.
The approximations introduced with the double-grid approach have been discussed, together with the limits they impose to its validity.On the one hand, the diagonal kernel extension limits the applicability of this approach to systems with excitons delocalised over many unit cells.On the other hand, the latter are precisely the main target of the double-grid approach, given that spatially localised excitons are usually well described by a relatively coarse k-grids.Further, we discussed how the validity of the approximation on the dipole matrix elements can be verified and controlled with inexpensive calculations at the level of the independent particle approximation.
In light of the promising results achieved by the double grid approach presented in this work, considering its simplicity and taking into account its compatibility with the very efficient Lanczos based BSE solution schemes (i.e., Haydock), we hope our work will facilitate the calculation of optical spectra in semiconductors that could not be computationally afforded to date.
ponents than |P , due to each coarse grid transition (at K I ) being replicated into many transitions at all the κ-points in the domain of K I .The recursive relations in Eqs.7-9 would formally require the multiplication of the fine grid (full) BSE kernel times Haydock vectors of the size of |P FG .In our implementation, we calculate the matrix-vector multiplication without storing the BSE kernel on the fine grid.Let us divide a given Haydock vector |V in fragments according to the κ-point of each transition.Formally, this would mean projecting |V over the different transitions (i.e, the abstract vectors |vcκ I i that form the new/extended basis of e-h space) and then

3 FIG. 1 :
FIG.1: Optical absorption spectra of bulk Si.BSE spectra calculated using the double-grid approach described in this work are assessed against the experimental spectrum at 10 K from Ref.[23].Spectra are calculated on an 8 × 8 × 8 coarse k-grid and a denser fine k-grid, indicated by N κ .N K = 8 3 corresponds to a standard calculation on an 8 × 8 × 8 k-grid while N κ = 60 3 corresponds to a double-grid calculation with a 60 × 60 × 60 fine grid.We consider all e − h pairs from the top 4 valence bands to the 4 bottom conduction bands.

3 FIG. 2 :
FIG.2: Optical absorption spectra of bulk GaAs.BSE spectra calculated using the doublegrid approach described in this work are assessed against the experimental spectrum at 22 K from Ref.[24].Spectra are calculated on an 10 × 10 × 10 coarse k-grid and a denser fine k-grid, indicated by N κ .N K = 10 3 corresponds to a standard calculation on an 10 × 10 × 10 k-grid while N κ = 60 3 corresponds to a double-grid calculation with a 60 × 60 × 60 fine grid.We consider all e − h pairs from the top 4 valence bands to the 4 bottom conduction bands.All k-grids are Gamma-centred.

2 FIG. 3 :
FIG.3: Optical absorption spectra of monolayer MoS 2 .BSE spectra calculated using the double-grid approach described in this work.Spectra are calculated on an 12 × 12 × 1 coarse k-grid and a progressively denser fine k-grid, indicated by N κ .N K = 12 2 corresponds to a standard calculation on an 12 × 12 × 1 k-grid while N κ = 60 3 corresponds to a double-grid calculation with a 60 × 60 × 1 fine grid.We consider all e − h pairs from the top 3 valence bands to the 5 bottom conduction bands.All k-grids are Gamma-centred.
points.The full BSE kernel would have (200 × N c × N b ) 2 more matrix elements than the approximated one.Depending on the number of bands required for convergence, the steps of computing and storing that many matrix elements may draw the line between what is feasible and what is not, not only in terms of processing power, but also due to memory and disk-storage limitations.

FIG. 4 :
FIG. 4: CPU time in seconds required to calculate and store the BSE kernel, and solve it via Haydock's iterative scheme as a function of the number of k-points.Brown circles represent full BSE calculations.The data for 48 × 48 × 1 and 60 × 60 × 1 k-grids have been estimated via a quadratic fitting.The diamonds denote double-grid BSE calculations.
FIG. 6: Module of the BSE Kernel matrix element between one transition (vck 1 ) and every other transition in the e-h space (nmk).The data plotted here corresponds to MoS 2 with all the bands required for convergence.Panel A shows the matrix elements considering only a single grid of 6 × 6 × 1 k-points.The DKE and FKE matrix elements are obtained from a 6 × 6 × 1 coarse K-grid and a 12 × 12 × 1 double κ-grid.The fine grid data is simply what DKE and FKE try to approximate, i.e., the kernel matrix elements obtained with one single grid of 12 × 12 × 1 k-points.
Alternatively one could choose the densest k-grid that can be treated at the IP level, where limitations usually reside on memory.Let us take an example where this first step of the procedure returns a 60 × 60 × 60 k-grid.The next stage would be to find an appropriate coarse k-grid, i.e., one that satisfies the approximation in Eq. 18.This would entail running several double-grid calculations at the IPA level (Haydock-IP) with varying coarse grid and with a fine grid of 60×60×60.By matching the Haydock-IP spectrum to the full 60 × 60 × 60 fine-grid IP spectrum, a sound coarse grid can be chosen by means of a fairly inexpensive procedure (e.g., 8 × 8 × 8).The next step in the workflow would necessarily involve BSE calculations as the approximation on the BSE kernel cannot be tested at the IPA level.With the chosen coarse grid, successive double-grid BSE calculations with varying fine grids should be carried out in order to converge the dimensions of the latter.It is worth mentioning that all these calculations have roughly the same computational cost and requirements, at the level of the coarse-grid 8 × 8 × 8 BSE calculation.It should be highlighted that convergence of the fine-grid in the double-grid method does not guarantee the validity of the DKE approximation.At this point, one should turn to available data, either experimental or theoretical, in order to assess the validity of the results on physical grounds.