DiffraGAN: a conditional generative adversarial network for phasing single molecule diffraction data to atomic resolution

Introduction Proteins that adopt multiple conformations pose significant challenges in structural biology research and pharmaceutical development, as structure determination via single particle cryo-electron microscopy (cryo-EM) is often impeded by data heterogeneity. In this context, the enhanced signal-to-noise ratio of single molecule cryo-electron diffraction (simED) offers a promising alternative. However, a significant challenge in diffraction methods is the loss of phase information, which is crucial for accurate structure determination. Methods Here, we present DiffraGAN, a conditional generative adversarial network (cGAN) that estimates the missing phases at high resolution from a combination of single particle high-resolution diffraction data and low-resolution image data. Results For simulated datasets, DiffraGAN allows effectively determining protein structures at atomic resolution from diffraction patterns and noisy low-resolution images. Discussion Our findings suggest that combining single particle cryo-electron diffraction with advanced generative modeling, as in DiffraGAN, could revolutionize the way protein structures are determined, offering an alternative and complementary approach to existing methods.


DiffraGAN intra-class variability assessment
To assess the intra-class variability of our generative model, we performed a series of experiments where multiple outputs were generated from the same but slightly augmented set of inputs.The objective was to determine the degree of variation in the model's output when exposed to identical input conditions.
Randomly selected batch of samples from the test dataset were used to generate several outputs from each of these samples (Figure S4).
The intra-class variability has been assessed with SSIM metric, where in all pairs of intra-class images SSIM was higher than 0.98, indicating a high degree of similarity among the generated images (Figure S4).
These findings suggest that our generative model maintains a high level of consistency in its output.The following are ways by which the underlying structure of this GAN can be altered to increase its generator's accuracy: 1.More Filters.
Adding more filters in the Conv2D layers in both or either of the discriminator and generator increases the number of parameters in the GAN that are available to train, and so the GAN can perform more nuanced computations.The downside of adding more filters is that larger GANs take longer to train.In addition, adding more filters could lead to filter redundancy, though it is not yet clear where that point is in this case.

Label Smoothing.
In a traditional GAN, the discriminator is trained to label 'real' images with ones and generated images with zeros.Changing the values of these labels (e.g. using 0.2 as the 'real' label and 0.8 as the generated label) changes the weighting of penalties on the generator and discriminator, and thus can change their training trajectories and the outcomes of training.

Adding/ Removing Batch Normalisation and Dropout.
Adding batch normalisation and/ or dropout layers to the discriminator or removing some from the generator could alter the training trajectory and outcome.These alterations could end up worsening its performance.One example of this is adding a batch normalisation layer after every Conv2D layer in the discriminator.Regardless of the number of epochs of training, this caused the generator to produce non-stable outputs.This addition also delayed GAN's convergence.After adding a batch normalisation layer before each LeakyReLU layer in the generator, once the GAN converged, the generator produced more reasonable outputs.

Changing The Size of The Discriminator Output.
As discussed in the method section, in traditional GANs, the output of the discriminator is 1x1 (one 'decision' per input image) while the discriminator structure used here has n-pixel patch structure.Increasing the size of the discriminator's output shape has been observed to reduce blurring, at the expense of increasing the frequency of 'hallucinated objects' appearing in images generated by the generator after training.While the discriminator is not guaranteed to behave this way when its output shape is varied, as before, the change could alter the training trajectory and outcome.

Figure S1 .
Figure S1.Validation of DiffraGAN using diffraction and image data from second test protein.Top row: input high resolution diffraction patterns.Second row: input low resolution, defocused images.Third row: images generated by the generator using these inputs.Bottom row: ground truth target images.The axes show pixel number.

Figure S2 .
Figure S2.Validation of DiffraGAN using diffraction and image data from third test protein.Top row: input high resolution diffraction patterns.Second row: input low resolution, defocused images.Third row: images generated by the generator using these inputs.Bottom row: ground truth target images.The axes show pixel number.

Figure S4 .
Figure S4.The plot visualizes the sets of generated images for each input sample in columns and illustrates subtle variations and the overall consistency in the model's output.

Figure S5 .
Figure S5.Validation of DiffraGAN using diffraction and image data from first test protein with a defocus value of 1500Å.Top row: input high resolution diffraction patterns.Second row: input low resolution, defocused images.Third row: images generated by the generator using these inputs.Bottom row: ground truth target images.The axes show pixel number.

Figure S6 .
Figure S6.Average FRC calculated from DiffraGAN-generated images and ground truth highresolution projection images of the first test protein (PDB ID: 1AUP).The low-resolution images have been generated with 1500 Å defocus.

Figure S7 .
Figure S7.Average FRC calculated from DiffraGAN-generated images and ground truth highresolution projection images of the first test protein (PDB ID: 1AUP).The low-resolution images have been generated with 2000 Å defocus.