Deep-Learning Computational Holography: A Review (Invited)

Deep learning has been developing rapidly, and many holographic applications have been investigated using deep learning. They have shown that deep learning can outperform previous physically-based calculations using lightwave simulation and signal processing. This review focuses on computational holography, including computer-generated holograms, holographic displays, and digital holography, using deep learning. We also discuss our personal views on the promise, limitations and future potential of deep learning in computational holography.


INTRODUCTION
Holography (Gabor, 1948) can record three-dimensional (3D) information of light waves on a twodimensional (2D) hologram as well as reproduce the 3D information from the hologram.Computergenerated holograms and holographic 3D measurements (digital holography) can be realized by simulating this physical process on a computer.Computer-generated holograms can be generated by calculating light wave propagation (diffraction) emitted from 3D objects.If this hologram is displayed on a spatial light modulator (SLM), the 3D image can be reproduced in space.Holographic displays can successfully reproduce the wavefront of 3D objects, making them ideal 3D displays (Hilaire et al., 1990;Takaki and Okada, 2009;Chang et al., 2020).
In contrast, digital holography (Goodman and Lawrence, 1967;Kim, 2010;Liu et al., 2018;Tahara et al., 2018) is a technique that uses an image sensor to capture a hologram of real macroscale objects or cells.Diffraction calculations are used to obtain numerically reproduced images from the hologram.Digital holography has been the subject of much research in 3D sensing and microscopy.In addition to coherent light, the technique of capturing holograms with incoherent light has been actively studied in recent years (Liu et al., 2018;Rosen et al., 2019).
Computational holography is the general term for handling holography on a computer.It has been widely used in 3D display, projection, measurement, optical cryptography, and memory.The following are common problems of computational holography that need to be addressed: (1) A high computational complexity for hologram and diffraction calculations.
(2) A limited image quality of the reproduced images from holograms, due to speckle noise, optical abberations, etc. (3) A large amount of data required to store holograms.The computational complexity of hologram calculations increases with the complexity of 3D objects and the resolution of a hologram.Digital holography requires diffraction calculations to obtain the complex amplitude of object light, followed by aberration correction of the optical system, and phase unwrapping, if necessary.Additionally, autofocusing using an object position prediction may be necessary.These are time-consuming calculations.
The quality of the reproduced images from a hologram is also a critical issue in holographic displays and digital holography.The following factors degrade reproduced images: high-order diffracted light due to the pixel structure of SLMs, quantized and non-linear light modulation of SLMs, alignment accuracy, and aberration of optical systems.
The amount of data in holograms is also a major problem.Data compression is essential for real-time hologram transmission and wide-viewing-angle holographic displays, which require holograms with a large spatial bandwidth product (Blinder et al., 2019).Hologram compression using existing data compression methods, such as JPEG and JPEG2000, and original compression methods for hologram have been investigated (Blinder et al., 2014;Birnbaum et al., 2019;Stepien et al., 2020) and recently the JPEG committee (ISO/ IEC JTC 1/SC 29/WG 1) initated the standardization of compression technology for holographic data.
Many studies have developed algorithms based on the physical phenomena of holography (diffraction and interference of light) and signal processing.In this paper, we refer to these algorithms as physically-based calculation.In 2012, AlexNet (Krizhevsky et al., 2012), which uses deep neural networks (DNNs), achieved an improvement of more than 10% over conventional methods in the ImageNet large-scale visual recognition challenge, a competition for object recognition rates.This led to a great deal of interest in deep learning (LeCun et al., 2015).In 2017, research using deep learning started increasing in computational holography.Initially, simple problems using deep learning, such as the hologram identification problem and restoration of holographic reproduced images, were investigated (Shimobaba et al., 2017a;Shimobaba et al., 2017b;Jo et al., 2017;Muramatsu et al., 2017;Pitkäaho et al., 2017).Currently, more complex deep-learningbased algorithms have been developed, and many results have been reported that outperform physically-based calculations.
This review presents an overview of deep-learning-based computer-generated hologram and digital holography.In addition, we outline diffractive neural networks, which are closely related to holography.It is worth noting that deep learning outperforms conventional physically-based calculations in terms of computational speed and image quality in several holographic applications.Additionally, deep learning has led to the development of techniques for interconverting images captured by digital holographic and other microscopes, blurring the boundaries between research areas.Furthermore, we will discuss our personal views on the relationship between physically-based calculations and deep learning in the future.
Figure 1 shows the data processing pipeline of holographic displays.From 3D data, acquired using computer graphics and 3D cameras, the distribution of light waves on a hologram is calculated using diffraction theory.The generated hologram is usually complex-valued data (complex holograms); however, SLMs can only modulate amplitude or phase.Therefore, we must encode the complex hologram into amplitude or phase-only holograms.The encoded hologram can be displayed on the SLM and the 3D image can be observed through the optical system.

Supervised Learning
In 1998, hologram generation using a neural network with three fully-connected layers was investigated (Yamauchi et al., 1998).However, this is not deep learning, but it is similar to current deep-learning-based hologram calculations.To the best of our knowledge, this is the pioneering work using neural networks for hologram computation.It performed end-to-end learning to train the neural network using a dataset consisting of 16 × 16-pixel input images and holograms.The end-to-end learning method is a supervised learning technique and allows a DNN to learn physical processes used in physically-based calculations from a dataset alone.This study showed that the neural network could optimize holograms faster than direct binary search (Seldowitz et al., 1987).It was impossible to adopt the current deep network structure due to poor computing resources.Additionally, even if DNN could be created, there was no algorithm (optimizer) to optimize its large number of parameters.For a while, neural networks were not the mainstream in hologram calculation, and physically-based calculations were actively studied.However, since 2018, hologram calculations have developed rapidly using deep learning.
Figure 2 shows the DNN-based hologram computation using supervised learning.Horisaki et al. (2018) designed a DNN that directly infers holograms from input 2D images using end-to-end learning.For end-to-end learning, it is necessary to prepare a large dataset of input images X and their holograms Y. DNNs can be represented as an arbitrary function by combining convolutional and other layers with nonlinear activation functions.In this paper, a DNN function is represented as N (X ; Θ), where Θ are the network parameters.The parameters Θ of the DNN in Horisaki et al. (2018) are updated by solving the minimization problem: where L is the loss function for calculating the error between the predicted hologram output from the DNN (N (X ; Θ)) and the ground-truth hologram (Y).This DNN can infer a hologram from a 64 × 64-pixel 2D image several times faster, and the image quality is the same as obtained with the Gerchberg-Saxton (GS) algorithm (Gerchberg, 1972;FienupFienup, 1982).Goi et al. (2020) proposed a method for generating binary holograms from 2D images directly using DNN.This study prepared a dataset of binary random patterns (binary holograms) and its reproduced images (original objects).The DNN was trained using end-to-end learning with the reproduced images as input of the DNN and the binary holograms as output.The output layer of the DNN should be a step function since it should be able to output binary values; however, this is not differentiable.The study Goi et al. (2020) used a differentiable activation function that approximates the step function.

Unsupervised Training
Unsupervised learning does not require the preparation of a dataset consisting of original images and its holograms, as discussed in Section 2.1.Figure 3 shows the DNN-based hologram calculation using unsupervised learning (Hossein Eybposh et al., 2020;Horisaki et al., 2021;Wu et al., 2021).We input the original 3D scene (or 2D image) X into the DNN and compute an inverse diffraction calculation (P −1 ) from the predicted hologram to the location of the original object to obtain the reproduced image.We calculate a loss function between the reproduced image and the original data.Then, we update the DNN parameters by solving the following minimization problem: We can use any diffraction calculation for the propagation calculation, provided that it is differentiable.We usually use the angular spectrum method (Goodman and Goodman, 2005).The lightwave distribution on a plane u d , which is z away from a plane u s , can be calculated using the angular spectrum method expressed as follows:   where i −1 √ , F and F −1 are the forward and inverse Fourier transforms, respectively; λ is the wavelength, and (f x , f y ) represent the spatial frequencies.Wu et al. (2021) showed that a hologram of a 4K 2D image could be generated in 0.15 s using unsupervised learning.The network structure of the issued DNN is U-Net (Ronneberger et al., 2015).Instead of the angular spectrum method, an inverse diffraction calculation to obtain the reproduced images was a single fast Fourier transform (FFT) Fresnel diffraction, which is computationally light.The DNN was trained using Eq. 2 a weighted combination of a negative Pearson correlation coefficient and a perceptual loss function (Johnson et al., 2016).The DNN method is superior to the GS method and Wirtinger holography (Chakravarthula et al., 2019) in computational speed; i.e., ×100 faster for the same reconstruction quality (Wu et al., 2021).
Hossein Eybposh et al. ( 2020) developed an unsupervised method called DeepCGH to generate holograms of 3D scenes using DNN.They have developed this method for two-photon holographic photostimulation, which can also be used for holographic displays.The network structure is U-Net.When 3D volume data X (x, y, z) representing a 3D scene are input to the DNN, the DNN outputs its hologram.From the output hologram, multiple inverse propagations (P −1 ) are performed to compute the 3D reproduced image X (x,y,z) ′ |P(N (X (x, y, z)))|.The DNN was trained by Eq. 2 with a loss function using the following cosine similarity x,y,z X 2 x, y, z x,y,z X ′2 x, y, z .
Since 3D volume data requires much memory, DNNs tend to be large.Therefore, the study Hossein Eybposh et al. (2020) used a method called interleaving (Shi et al., 2016) to reduce the DNN size.
By employing the method of Figure 3, Horisaki et al. ( 2021) trained an U-Net-based DNN using the following 3D mean squared root error (MSE) for the loss function, The hologram computation using DNN (Wu et al., 2021) introduced in this subsection showed that it can produce higher quality reproductions than conventional methods.However, the reproduced images were limited to two dimensions.The Methods (Hossein Eybposh et al., 2020;Horisaki et al., 2021) for calculating holograms of 3D objects using DNNs were also proposed, but the number of layers was limited to a few due to the resources of the computer hardware.A method introduced in the next subsection solves these limitations.

Layer Hologram Calculation Using the Deep Neural Network
Generally, layer-based hologram calculations (Okada et al., 2013;Chen et al., 2014;Chen and Chu, 2015;Zhao et al., 2015) generate sectional images at each depth from RGB and depth images.We compute diffraction calculations to the sectional images.Consequently, we employ these results to obtain the final hologram.Although the diffraction calculation can be accelerated using FFTs, the computational complexity of the layer method is still large, making it difficult to calculate 2K size holograms at video rate.
Layer-based hologram calculations using DNN have been investigated in Hossein Eybposh et al. ( 2020) and Horisaki et al. (2021).The study by Shi et al. (2021) published in Nature in 2021 had a great impact on holographic displays using the layer method.Figure 4 shows the outline of layerbased hologram calculations using DNN.This result significantly outperforms the computational speed and image quality of existing physically-based layer methods.The network structure was similar to that of ResNet (He et al., 2016).Additionally, DNNs were trained using two types of label data: RGBD images and their holograms.Since DNNs are suitable for 2D images, they work well with RGBD images used in layer hologram calculations.
This DNN was trained using two loss functions.The first loss function, L 1 , calculates the error between the hologram output from the DNN and the ground-truth hologram.The second loss function, L 2 , calculates the error between a reproduced image, obtained by an inverse diffraction calculation (P −1 ) with the propagation distance z from the predicted hologram, and its corresponding sectional image at z. Here, the hologram output from the DNN is in complex amplitude at an intermediate position between the 3D scene and final hologram.The study Shi et al. (2021) explained the reason for using intermediate holograms as follows: The convolutional layers of DNN use a 3 × 3 filter.If a 3D scene and hologram are far apart, it is impossible to represent the spread light waves without connecting many convolution layers, making the DNN very large.The DNN outputs a complex hologram at an intermediate position to alleviate the above problem.In the middle position, the light wave does not spread; thus, reducing the number of convolution layers.
Additionally, if the 3D scene and intermediate hologram are sufficiently close, these images will be similar, facilitating the DNN training.The intermediate hologram is propagated to the final hologram plane using the angular spectrum method and converted to an anti-aliased double phase hologram (Hsueh and Sawchuk, 1978;Shi et al., 2021).By displaying the anti-aliased double phase hologram on a phase-only SLM, speckle-free, natural, and high-resolution 3D images can be observed at video rates.
The study trained the DNN using their RGBD image dataset called MIT-CGH-4K.This dataset consists of 4,000 sets of RGBD images and intermediate holograms.It allows DNNs to work well with RGBD images rendered by computer graphics and real RGBD images captured by RGBD cameras.In many DNNbased color 3D reproductions, including this study, the timedivision method (Shimobaba and Ito, 2003;Oikawa et al., 2011) is employed.The time-division method enables color reproduction by displaying the holograms of the three primary colors synchronously with the RGB illumination light.However, it requires an SLM capable of high-speed switching.
The trained DNN can generate 1,920, ×, 1,080 pixel holograms at a rate of 60 Hz using a graphics processing unit.It can also generate holograms interactively at 1.1 Hz on a mobile device (iPhone 11 Pro) and at 2.0 Hz on an edge device with Google tensor processing unit (TPU).For the TPU a float 32 precision DNN was compressed into an Int8 precision DNN using quantization, which is one of the model compression methods for DNNs.

Camera-in-the-Loop Holography
The quality of reproduced images of holographic displays will be degraded because of the following factors: misalignment of optical components (beam splitters and lenses), SLM cover glass, aberrations of optical components, uneven light distribution of a light source on the SLM, and quantized and non-linear light modulation of SLM, as shown in the graph of Figure 5.
The GS algorithm, Wirtinger holography, and stochastic gradient methods (Chakravarthula et al., 2019) determine a hologram that yields the desired reproduced image using minimize ϕ L(P ideal (ϕ), a o ).Here ϕ, a o , and L represent a hologram, target image, and loss function (defined as the error between the target and reproduced images).Successful optimization with this method will be achieved when the actual optical system and ideal light wave propagation model P ideal are consistent.
Although some studies have been conducted to manually correct aberrations to get closer to the ideal propagation model P ideal , the camera-in-the-loop holography (Peng et al., 2020) has been proposed to automatically correct these image quality degrading factors.Figure 5 shows the outline of the camera-in-the-loop holography.The camera-in-the-loop holography differs from the GS algorithms, Wirtinger holography, and gradient descent methods because it uses actual reproduced images in the optimization loop.
In the camera-in-the-loop holography, a gradient descent method was used to find an ideal hologram as ϕ ← ϕ − αzL/zϕ, where α is the learning rate, L is the loss function used to calculate the error between an actual reproduced image captured by a camera and target image, and zL zϕ zL zP zP zϕ , where P represents the actual optical system, including unknown aberrations.However, the gradient zP zϕ cannot be calculated due to the unknown parameter.The camera-in-the-loop holography approximates the unknown gradient as follows where P′ is a known propagation model.For example, if P′ is a free-space propagation between the SLM and reproduced image, it can simply use a diffraction calculation as P′ P ideal .The gradient zL zP can be calculated using reproduced images captured by a camera.
The following research is an extension of the camera-in-theloop holography: high-quality holographic display using partially coherent light (LED light source) (Peng et al., 2021), holographic display using Michelson setup to eliminate undiffracted light of SLM (Choi et al., 2021), optimizing binary phase holograms (Kadis et al., 2021), holographic display that suppresses highorder diffracted light using only computational processing without any physical filters (Gopakumar et al., 2021), and further improvement of image quality by using a Gaussian filter to remove noise that is difficult to optimize (Chen et al., 2022).
The above camera-in-the-loop holography needs to be reoptimized for each target image, which can take several minutes.To solve this problem, HoloNet, a combination of camera-in-theloop holography and DNN, was proposed (Peng et al., 2020).Figure 6 shows a schematic of HoloNet.HoloNet consists of two DNNs and a physically-based calculation (diffraction calculation).The camera is required for the training stage of the DNN; however, it is not required for the inference stage.DNN1 outputs the optimal phase distribution of the target image.The phase distribution and target image are combined to form a complex amplitude.Then, a Zernike-compensated diffraction calculation is performed by considering the aberrations of the optical system.DNN2 transforms the complex amplitude obtained by the diffraction calculation into a phase-only hologram suitable for SLM.HoloNet can generate full-color holograms with 2K resolution at 40 frames per second.Chakravarthula et al. (2020) proposed an aberration approximator.The aberration approximator uses a U-Netbased DNN.The DNN infers the aberrations of an optical system to obtain holograms that are corrected for the aberrations.The conditional GAN (Isola et al., 2017) was used to train the DNN, and the training datasets were numerical reproduced images of holograms generated assuming an ideal optical system and reproduced images from the actual optical system captured by a camera.2020) are complex processing.The study Kavaklı et al. (2022) obtained an optimized point spread function for diffraction calculation from the error between numerically reproduced images from holograms calculated from the ideal diffraction calculation and the actual reproduced images captured by a camera.It is worth noting that the optimized point spread function has an asymmetric distribution different from the point spread function in the ideal case.Additionally, the optimized point spread function reflects the aberrations of the optical system.We can obtain holograms that give an ideal reproduction image by calculating holograms with the optimized point spread function.

Image Quality Enhancement
A reproduced image of a hologram calculated using random phase will have speckle noise.Park and Park (2020) proposed a method for removing speckle noise from random phase holograms.In this method, the reproduced image (light-field data) is first numerically computed from a random phase hologram.Since the reproduced light-field data also contains speckle noise, this method employs a denoising convolutional neural network (Zhang et al., 2017a) to remove this noise.Furthermore, a speckle-free reproduction image can be observed by recalculating the hologram from the speckle-free light-field data.Ishii et al. (2022) proposed the image quality enhancement of zoomable holographic projections using DNNs.To obtain a reproduced image larger than the hologram size, it is necessary to use a random phase; however, this gives rise to speckle noise.The random phase-free method (Shimobaba and Ito, 2015), which applies virtual spherical waves to the original image and calculates the hologram using a scaled diffraction calculation (Shimobaba et al., 2013), can avoid this problem.However, it does not apply well to phase-only holograms.A DNN of (Ishii et al., 2022) converts a phase-only hologram computed using the random phase-free method in an optimized phase-only hologram.Two layers for computing the forward and inverse scaled diffraction (Shimobaba et al., 2013) are introduced before and after DNN.Then, the DNN is trained using unsupervised learning, as discussed in Section 2.2.In the inference, the two layers are removed, and a phase-only hologram is computed using the random phase-free method and a scaled diffraction calculation is input to the DNN to optimize a zoomable phaseonly hologram.

Hologram Compression
The amount of data in holograms is a major problem.Data compression is essential for real-time hologram transmission and wide-viewing-angle holographic displays, which require holograms with large spatial bandwidth products.Existing compression techniques [e.g., JPEG, JPEG 2000, and highefficiency video coding (HEVC)] and distinctive compression techniques have been proposed (Blinder et al., 2014;Birnbaum et al., 2019;Stepien et al., 2020), which aim to take the distinctive signal properties of digital holograms into account.Compression of hologram data is not easy because holograms have different statistical properties from general natural images, so standard image and video codecs will achieve sub-optimal performance.Several DNN-based hologram data compression algorithms have been proposed to address this matter.
When JPEG or other compression algorithms targeted to natural image date are used for hologram compression, essential high-frequency components are lost, and block artefacts will perturb the hologram viewing.In Jiao et al. (2018), a simple DNN with three convolution layers was used to restore the JPEG-compressed hologram close to the original one.The DNN learns the relationship between the JPEGdegraded hologram and the original hologram using end-toend learning.Although it was tested on JPEG, it can easily be applied to other compression methods, making it highly versatile.
In Shimobaba et al. (2019a) and Shimobaba et al. (2021a), holograms were compressed through binarization using the error diffusion method (Floyd, 1976).The U-Net-based DNN restored binary holograms to the original grayscale.If the input hologram is 8 bits, the data compression ratio is 1/8.DNN can obtain better reproduction images than JPEG, JPEG2000, and HEVC at the same bit rate.

Hiding of Information in Holograms
Steganography is a technique used to hide secret images in a host image (also called cover image).The hidden images must not be known to others.A closely related technique is watermarking: it embeds copyright information (e.g., a copyright image) in the host image.The copyright information can be known by others, but it must be impossible to remove.These techniques are collectively referred to as information hiding.Many holographic information hiding techniques have been proposed (Jiao et al., 2019).For example, the hologram of a host image can be superimposed on that of a hidden image to embed hidden information (Kishk and Javidi, 2003).The hidden information should be encrypted with double random phase encryption (Refregier and Javidi, 1995) to prevent it from being read.An important difference with digital information hiding is that holographic information hiding allows for optical encryption and decryption of the hidden image, and handling 3D host and hidden information.
The combination of holographic information hiding and DNN can improve the resistance to attacks and the quality of decoded images.In Wang et al. (2021), the holograms of host and hidden images were superimposed on a single hologram using a complementary mask image.Each hologram was converted into a phase-only hologram by patterned-phase-only holograms (Tsang et al., 2017).The hologram of the hidden image is encrypted with double random phase encryption (Refregier and Javidi, 1995).When the final hologram is reconstructed, we can observe only the host image.Since the mask image is the key, we can observe the hidden image when the mask image is multiplied with the hologram, but the image quality is considerably degraded.This degradation is recovered using a DNN; DenseNet (Huang et al., 2017) was used as the DNN.It is trained by end-to-end learning using the dataset of degraded and ground-truth hidden images.
In Shimobaba et al. (2021b), a final hologram u recorded a hologram u h of a host image and a hologram u e of a hidden image was calculated as u P z1 {u h } + αP z2 {u e }.Here, P z is the diffraction calculation of the propagation distance z; z 1 and z 2 are the distance between the hologram and each image; α is the embedding strength of the hidden hologram.We can make the reproduced hidden image less noticeable by making α sufficiently small.Here, it was set to 4% of the amplitude of the host hologram.It is not easy to identify the reproduced hidden image at this value.Therefore, if we want to identify it, DNN recovers the hidden image.The DNN was trained using reproduced hidden and ground-truth hidden images.U-Net and ResNet were used as the network structure.Both networks could recover the hidden images.

DIGITAL HOLOGRAPHY USING DEEP LEARNING
In digital holography (Goodman and Lawrence, 1967;Kim, 2010;Liu et al., 2018;Tahara et al., 2018) image sensors are used to capture holograms of real macroscale objects and cells.It is possible to obtain a reproduced image from the hologram using diffraction calculation.Digital holography has been the subject of much research in 3D sensing and microscopy.Figure 7 shows the process of digital holography.We calculate a diffraction calculation from a hologram captured by an image sensor to obtain a reproduction image in a computer.If the reconstructed position of the target object needs to be known accurately, autofocusing is required to find the focus position by repeating diffraction calculations.Autofocusing looks for a position where the reconstructed image is sharp.Aberrations are superimposed due to optical components and alignment errors.Meanwhile, it is necessary to correct this aberration.Since digital holography can obtain complex amplitudes, simultaneous measurement of amplitude and phase is possible.The phase can be obtained by calculating the argument of a complex value using the arctangent function, but its value range is wrapped into [−π, +π).Therefore, phase unwrapping is required to reproduce the thickness of an object from its phase.However, the above processes are time-consuming computations.In this section, we introduce digital holography using DNNs.We can speed up some (or all) of the timeconsuming processing using DNNs.Furthermore, DNNs have successfully obtained reproduced images with better image quality than conventional methods.For a more comprehensive and detailed description of digital holography using DNNs, see review papers Rivenson et al. (2019), Javidi et al. (2021), andZeng et al. (2021).

Depth Estimation
A general method for estimating the focus position is to obtain the most focused position by calculating reproduced images at different depths from the hologram.The focus position is determined using metrics, such as entropy, variance, and Tamura coefficient (Zhang et al., 2017b).This process requires an iterative diffraction calculation, which is computationally time-consuming.An early investigation of autofocusing using DNNs was to estimate the depth position of a target object from a hologram.The depth prediction can be divided into two categories: classification and regression problems.Pitkäaho et al. (2019) proposed the depth position prediction as a classification problem.They showed that DNNs for classification commonly used in the MNIST classification problem could classify the range of 260-272 mm, where the target object is located, into five depths at 3 mm intervals.
DNNs for estimating the depth location as a regression problem (Ren et al., 2018;Shimobaba et al., 2018) infer a depth value z directly from a hologram image (or its spectrum) H.This network is similar to that of the classification problem but with only one neuron in the output layer.The training is performed using end-to-end learning as minimize Θ L(N (H; Θ), z), where z is the ground-truth depth value.The MSE and other metrics are usually used as loss functions.We can obtain a focused reproduced image through a diffraction calculation using the estimated depth distance from a hologram.

Phase Unwrapping
Phase unwrapping in physically-based calculation (Ghiglia and Pritt, 1998) connects wrapped phases to recover the thickness (or optical path length) of a target object.Phase unwrapping algorithms have global, region, path-following, and qualityguided algorithms.Additionally, a method that applies the transport intensity equation has been proposed (Martinez-Carranza et al., 2017).These methods are computationally time-consuming.
Many methods have been proposed to perform phase unwrapping by training DNNs with end-to-end learning using a dataset of wrapped phase and their unwrapping images (Wang et al., 2019a;Qin et al., 2020).Once trained, DNNs can rapidly generate unwrapped phase images.Phase unwrapping using Pix2Pix (Isola et al., 2017), a type of generative adversarial network (GAN) (Goodfellow et al., 2014), has been proposed (Park et al., 2021).Pix2Pix can be thought of as a supervised GAN.This study prepared a dataset of wrapped phase and their unwrapped phase images generated using the quality-guided algorithm (Herráez et al., 2002).The U-Net-based generator employs this dataset to generate a realistic unwrapped phase image from the unwrapped phase image to fool the discriminator.The discriminator is trained to detect whether it concerns a generated or real unwrapped phase image.Such adversarial learning can produce high-quality unwrapped images.

Direct Reconstruction Using the Deep Neural Network
As a further development, research has been conducted to obtain aberration-eliminated, autofocusing, and phase unwrapping images directly by inputting holograms into DNNs.

Supervised Learning
A reproduced image can be obtained by propagating holograms captured by inline holography back to the object plane.However, since the reproduced image contains a twin image and direct light, it is necessary to remove unwanted lights using physicallybased algorithms, e.g., phase recovery algorithms.This requires multiple hologram recordings and computational costs for diffraction calculations.Rivenson et al. (2018) inputs a reproduced image obtained using an inverse diffraction calculation (P −1 ) to the object plane into a DNN (N ) to obtain a twin image-free reproduced image.The prepared dataset consists of a hologram H and a groundtruth complex amplitude field Y.Then, they trained the DNN using end-to-end learning as minimize Θ L(N (P −1 (H); Θ), Y).They used MSE as the loss function L. The ground-truth complex amplitudes were obtained from eight holograms with different recording positions using the multiheight phase retrieval algorithm (Greenbaum and Ozcan, 2012).This study showed that this DNN could reproduce images comparable to those obtained using the multiheight phase retrieval algorithm without time-consuming phase recovery.
Although the study of Rivenson et al. (2018) required the results of propagation calculations from a hologram to be input to the DNN, eHoloNet (Wang et al., 2018) developed DNN that does not require propagation calculations and directly infers object light from a hologram.They created a dataset consisting of a hologram H and its ground-truth object light Y.The DNN was trained with the following end-to-end learning: minimize Θ L(N (H; Θ), Y).MSE was used as the loss function L. They employed phase distributions displayed on SLM for collecting ground-truth object lights instead of real objects.
Y-Net (Wang et al., 2019b) separates the upsampling path of U-Net (Ronneberger et al., 2015) into two parts and outputs the intensity and phase of a reproduced image.The dataset includes captured holograms and their ground-truth intensity and phase images.Y-Net is trained using end-to-end learning.Compared with the case where the output layer of U-Net has two channels, and each channel outputs an intensity image and a phase image, Y-Net has successfully obtained better reproduction images.
The above researches are about digital holographic measurement of microorganisms and cells.However, 3D particle measurement is essential to understand the spatial behavior of tiny particles, such as bubbles, aerosols, and droplets.It is applied to flow path design of flow cytometers, environmental measurement, and 3D behavior measurement of microorganisms.Digital holographic particle measurement can measure one-shot 3D particles; however, it requires timeconsuming post-processing using diffraction calculations and particle position detection.3D particle measurement using holography and DNN has been proposed.The study Shimobaba et al. (2019b) prepared a dataset consisting of holograms and their particle position images, a 2D image showing the 3D position of the particle.The position of a pixel indicates the position of the particle in the plane, and its color indicates the depth position of the particle.U-Net was trained using end-to-end learning with the dataset.The DNN can transform holograms to particle position images.The effectiveness of the method was confirmed by simulation.
The study of Shimobaba et al. (2019b) was conducted using simple end-to-end learning.However, Shao et al. (2020) inputs two more pieces of information (depth map and maximum phase projection, both obtained by preprocessing the hologram) to their U-Net in addition to holograms.Additionally, by developing a loss function, this study successfully obtained 3D particle images with a particle density 300 times higher than that of Shimobaba et al. (2019b).Chen et al. (2021) incorporated compressive sensing into DNN and trained it using end-to-end learning.The input of the DNN were 3D particle holograms, whereas the output was 3D volume data of the particles.Unlike (Shimobaba et al., 2019b;Shao et al., 2020), Zhang et al. (2022) used the Yolo network (Joseph et al., 2016).When a hologram is an input to the DNN, it outputs a 6D vector containing a boundary box that indicates the location of the particle, its objectiveness confidence, and the depth position of the particle.

Unsupervised Learning
End-to-end learning requires a dataset consisting of a large amount of paired data (captured hologram and object light recovered using physically-based algorithms).Since the interference fringes of holograms vary significantly depending on the holographic recording conditions and target objects, there is no general-purpose hologram dataset.Therefore, it is necessary to create an application-specific datasets, which requires much effort.Unsupervised learning is also used for DNNs for digital holography.Li et al. (2020) showed that using a deep image prior (Ulyanov et al., 2018), a twin image-free reproduced image can be obtained using only a captured inline hologram without large datasets.Furthermore, an auto-encoder was used for the DNN network structure.The deep image prior (Ulyanov et al., 2018) initializes the DNN with random values and inputs a fixed image to the DNN for training.For example, the deep image prior can be used to denoise an image from noisy input.This technique works due to the fact that DNNs are not good at representing noise.The deep image prior is also useful for super-resolution and inpainting.In Li et al. (2020), DNN was trained using the following unsupervised learning: minimize Θ L(P(N (P −1 (H fix ); Θ)), H fix ), where H fix is a captured inline hologram, and N is the DNN with the network parameter Θ.The reproduced image of an inline hologram (P −1 (H fix )) includes a twin image, which can be considered noise.By inputting the noisy reproduced image into the DNN, the DNN outputs the complex amplitude field of the target object with reduced twin image using the principle of the deep image prior.This study conducted a diffraction calculation (P) of the DNN output to generate a hologram.It learns Θ to minimize the error between computed and captured holograms.Consequently, the study obtained that the quality of a reproduced image is better than using a state-of-the-art compressed sensing (Zhang et al., 2018b).
PhysenNet (Wang et al., 2020b) was also inspired by the deep image prior.PhysenNet can infer the phase image of a phase object by inputting its hologram into a DNN.The network is a U-Net, trained using the following unsupervised learning: minimize Θ L(P(N (H fix )); Θ), H fix .The phase distribution output from the DNN is computed by diffraction to generate holograms.The DNN is trained to minimize the error between the measured and generated holograms.The minimization formula is slightly different from Li et al. (2020).

Generative Adversarial Network
GANs (Goodfellow et al., 2014), one of the training methods for DNNs, have been widely used in computational holography because of their excellent image transformation capabilities.Liu et al. (2019a) used the conditional GAN for superresolution in digital holographic microscopy.Conditional GAN is a method that adds ground-truth information to GAN; it is a supervised learning method.Figure 8 shows a schematic of Liu et al. (2019a).As shown in the figure, X is the low-resolution hologram; G is the generating network (using U-Net); G(X ) is the high-resolution hologram output from the generating network; Z is a ground-truth high-resolution hologram; D is a discriminating network that can distinguish whether a high-resolution hologram is a generated or a groundtruth hologram.The datasets of low-and high-resolution holograms are taken with on-chip digital holographic microscopy.The high-resolution holograms are captured by changing objective lenses with different numerical apertures.Alternatively, the image sensor can be laterally shifted to capture multiple low-resolution holograms, which are superresolved using the physically-based algorithm (Greenbaum et al., 2014) to generate high-resolution holograms.
Similar to Liu et al. (2019a), Liu et al. (2019b) employed the conditional GAN to generate accurate color images from holograms captured at three wavelengths suitable for point-ofcare pathology.Conditional GAN can produce holographic images with high accuracy.However, a dataset must be prepared since it is supervised learning, which requires much effort.To overcome this problem, holographic microscopy using cycle GANs with unsupervised learning has been investigated (Yin et al., 2019;Zhang et al., 2021).

Interconversion Between Holographic and Other Microscopes
Many microscopes, such as bright-field, polarized light, and digital holographic microscopes, have been developed, each with its strengths and weaknesses.Interconversion between the reproduced image of a holographic microscope and that of another microscope has been investigated using deep learning.It has become possible to overcome each other's shortcomings.In many cases, GANs, which are excellent at transforming images, are used to train DNNs.
Bright-field microscopy allows simple observation of specimens using a white light source; however, transparent objects must be stained.Additionally, only 2D amplitude information of a target object can be obtained due to the shallow depth of focus.Wu et al. (2019) showed that digital holographic reproduced images could be converted to bright-field images using GAN.In contrast, Go et al. (2020) converted an image taken by bright-field microscopy into a hologram.They showed that it is possible to recover the 3D positional information of particles from this hologram.Additionally, they developed a system that can capture bright-field and holographic images simultaneously to create a dataset.The GAN generator produces holograms from bright-field images, and the discriminator is trained to determine whether it is a generated or captured hologram.Liu et al. (2020) converted the reproduced image of digital holographic microscopy into a polarized image of polarized light microscopy.Polarized light microscopy has problems, such as a narrow field of view and the need to capture several images with different polarization directions.The study Liu et al. (2020) showed that a DNN trained by a GAN could infer a polarization image from a single hologram.The dataset consists of data pairs of holograms taken using a holographic microscope and polarized light images taken with single-shot computational polarized light microscopy (Bai et al., 2020) of the same object.

Holographic Classification
Holographic digital microscopy can observe the phase of transparent objects, such as cells, allowing for label-free observation of cells.By using this feature, a rapid and labelfree screening of anthrax using DNN and holographic microscopy has been proposed (Jo et al., 2017).The DNN consists of convolutional layers, MaxPoolings, and classifiers using fully-connected layers.
O'Connor et al. ( 2020) classified holographic time-series data.They employed a low-cost and compact shearing digital holographic microscopy (Javidi et al., 2018) made with a 3D printer to capture and classify holographic time-series data of blood cells in animals, and healthy individuals, and those with sickle cell disease in humans.
Figure 9 shows a schematic of O'Connor et al. (2020).In the second step of the off-axis phase reconstruction, only the object light component is Fourier filtered, as in a conventional off-axis hologram, to obtain the phase image in the object plane using a diffraction calculation (Takeda et al., 1982;Cuche et al., 2000).The feature extractor extracts features from the phase image.The manually extracted and automatically extracted features from DNNs, which are transfer-learned from DenseNet (Huang et al.,

FASTER DEEP NEURAL NETWORKS
Deep learning, as introduced above, entails a neural network running on semiconductors.The switching speed of transistors governs its speed, and its power consumption is high.To solve this problem, an optical neural network has been proposed (Goodman and Goodman, 2005;Genty et al., 2021).
Research on optical computers has a long history.For example, pattern recognition by optical computing was reported in 1964 (Vander LUGH, 1964).This research used optical correlation to perform simple recognition.Optical computers use a passive hologram used as a modulator of light.Therefore, it requires little power and can perform the recognition process at exactly the speed of light.Research has been recently conducted on optical deep learning (Genty et al., 2021).In this study, we introduce one of them, the diffractive DNN (D 2 NN), which is closely related to holography (Lin et al., 2018).
Figure 10 shows the D 2 NN and semiconductor-based DNN.A D 2 NN modulates the input light modulated by some information with multiple diffractive layers (holograms).It learns the amplitude and phase of the diffractive layers to strengthen the light intensity of the desired detector.For example, in the case of classification, the input light of the classification target is modulated in each diffractive layer, and the diffractive layer is learned to strengthen the light intensity of the detector corresponding to the target.Existing deep-learning frameworks, such as Keras, Tensorflow, or PyTorch, can be used to train the diffractive layers.
In Figure 10A, a light wave U i is diffracted by ith diffractive layer.The propagated light wave U i+1 before the next diffractive layer is expressed as follows: where P i+1 is the diffraction between the layers of i and i + 1, and • is the Hadamard product.For P i , general diffraction calculations, such as the angular spectrum method, can be used.The forward calculation of D 2 NN is completed by iterating Eq. 7 as many times as the number of diffractive layers.Since the calculation of Eq. 7 consists of the entirely differentiable operations, each diffractive layer can be optimized by automatic differentiation from the forward calculation.The D 2 NN is trained on a computer, and the trained diffractive layers are recorded on an optical modulator (photopolymer or SLM).These optical modulators correspond to the layers of the semiconductor-based DNN.A D 2 NN can be constructed by arranging these layers in equal intervals.The classification rate can be further improved (Watanabe et al., 2021) by arranging the diffractive layers in a non-equally spaced manner.The spacing of diffractive layers is a hyperparameter, which is not easy to tune manually.Watanabe et al. (2021) employed a Bayesian optimization technique, the treestructured Parzen estimator (James et al., 2011), for hyperparameter tuning.
Figure 10B shows a semiconductor-based DNN.The output X i+1 of the input X i at the ith layer of this DNN can be expressed by where W i is the weight parameters, B i is the bias (not shown in the figure), • is the matrix product, and F i is the activation function.
Semiconductor-based DNNs can represent arbitrary functions because Eq. 8 contains nonlinear activation functions.However, Eq. 7 of D 2 NN has no activation function; therefore, it can only handle linear problems.Still, there are many applications where D 2 NNs work effectively for linear problems.
The study Lin et al. (2018) investigated the MNIST classification accuracy using D 2 NN.When a five-layer D 2 NN was validated through simulation, it achieved a 91.75% classification rate.Meanwhile, the classification rate improved to 93.39% for a seven-layer.The state-of-the-art classification rate for electrical DNNs was 99.77%.When the layers were 3D printed and optically tested, a classification rate of 88% was achieved despite the manufacturing and alignment errors of the layers.
D 2 NNs are usually constructed with relatively shallow layers.Dou et al. (2020) applied the idea of ResNet (He et al., 2016) to D 2 NN and reduced the gradient vanishing problem in deep diffractive layers.ResNet reduced gradient vanishing by introducing shortcuts, whereas Res-D 2 NN (Dou et al., 2020) introduces optical shortcut connections, as shown in Figure 10C.When a 20-layer D 2 NN and Res-D 2 NN were run through the MNIST classification problem, the identification rates were 96.0% and 98.4%, respectively, with the Res-D 2 NN showing superior performance.
Sakib Rahman and Ozcan (2021) showed through simulations that a twin image-free holographic reproduced image could be obtained using a D 2 NN.When holograms captured by inline digital holography are reproduced, blurry conjugate light is superimposed on the object light.Phase recovery algorithms, compressive sensing, and deep learning are used to remove this conjugate light, all of which operate on semiconductors.The study Sakib Rahman and Ozcan (2021) trained a D 2 NN to input light from a hologram into the D 2 NN and pass it through several diffractive layers to obtain a twin image-free reproduced image.The loss function L is defined as follows: where • 2 denotes the ℓ2 norm, and the first term is the error between the inferred image I of D 2 NN and the ground-truth image Y; the second term is the spectral error; the third term is the diffraction efficiency defined as the ratio of the power P I of the reproduced image to the total power P illum of the illumination light.The third term subtracts the diffraction efficiency from 1 so that the loss function becomes smaller as the diffraction efficiency increases.α 1 and α 2 are hyperparameters.The amount of modulation of the diffractive layers is determined by minimizing this loss function.

OUR PERSONAL VIEW AND DISCUSSION
In previous sections, we introduced computational holography, including computer-generated holograms, holographic displays, digital holography, and D 2 NN, using deep learning.Several studies have shown that deep learning outperforms existing physically-based calculations.In this section, we briefly discuss our personal view on deep learning.Algorithms for computer-generated hologram in holographic display include point-cloud, polygon, layer, and light-field methods.Several physically-based algorithms have been proposed for layer methods (Okada et al., 2013;Chen et al., 2014;Chen and Chu, 2015;Zhao et al., 2015).The DNN-based method (Shi et al., 2021) proposed by Shi et al. (2021) has been a near-perfect layer method in computational speed and image quality.Physically-based layer methods are inherently computationally expensive due to the iterative use of diffraction calculations.The DNN in Shi et al. (2021) skips this computational process and can map input RGBD images directly to holograms.This study showed that DNN could generate holograms two orders of magnitude faster than sophisticated physically-based layer methods.
Holograms generated using the layer method are suitable for holographic near-eye display because a good 3D image can be observed from the front of holograms (Maimone et al., 2017).These holograms have a small number of hologram pixels.Additionally, since the holograms do not need to have a wide viewing angle, they have only low-frequency interference fringes, indicating low spatial bandwidth product (Blinder et al., 2019).These features are suitable for DNNs, which is why current hologram generation using DNNs is mainly for layer holograms.Holograms with a large spatial bandwidth product have a wide viewing angle, allowing a large 3D image to be observed by many people.However, this would require largescale holograms.Such holograms require a pixel pitch of about a wavelength and billions to tens of billions of pixels (Matsushima and Sonobe, 2018;Matsushima, 2020).Holograms are formed from high-frequency interference fringes, and hologram patterns appear noisy at first glance.Current DNNs have difficulty handling such large-scale holograms due to memory issues and computational complexity.Additionally, deep image prior (Ulyanov et al., 2018) points out that current DNNs based on convolutions are not good at generating noisy patterns.Therefore, hologram generation with large spatial bandwidth products using DNNs is a big challenge.
Since DNNs were developed from image identification, RGBD images used in the layer method are suitable for DNNs.However, it is not easy for DNN to handle coordinate data formats used in the point cloud and polygon methods.So far, few studies exist on how to handle the point cloud method (Kang et al., 2021) using DNN.The authors look forward to further progress in these studies.
Deep learning is a general-purpose optimization framework that can be used in any application involving signals.However, it is difficult to answer whether it can outperform existing methods in all applications and use cases.Using optical cryptography and single-pixel imaging as examples, Jiao et al. (2020) compared a well-known linear regression method (GeorgeSeber and AlanLee, 2012) with deep learning.They concluded that the linear regression method is superior in both applications.DNNs require a lot of tuning: tuning the network structure and hyperparameters, selecting appropriate loss functions and optimizers, and preparing a large dataset.If we tune them properly, which is not necessary in existing physically-based methods, we may obtain excellent results.However, it requires much effort.Ultimately, deep learning is a sophisticated fitting technique, so analytical models matching the ground truth physics may be favorable whenever knowable and efficiently computable.Thus, it is essential to choose appropriate physically-based methods and deep learning in the future.
Deep learning requires the preparation of a large number of datasets, which generally require much effort.Computergenerated holograms using DNNs also require the preparation of datasets; however, they can be generated on a computer.Therefore, there is no need to take holograms with an actual optical system, except for systems such as the camera-inthe-loop holography.Digital holography is more problematic, as it requires a great deal of effort to acquire information about target objects and their holograms.Unsupervised learning, as discussed in Section 3.3, is ideal.However, unlike DNNs trained in supervised and unsupervised manners, phase recovery algorithms and compressed sensing can recover target object lights using only few known information about the target objects.Thus, they do not require a dataset.For supervised learning, DNNs should be trained by generating data pairs of holograms and their object lights using phase recovery algorithms and compressed sensing, as stated in Rivenson et al. (2018).
The generalization performance of DNNs is also essential.For example, in the case of digital holography, there is no guarantee that a DNN trained on a dataset with a particular object and optical system will be able to accurately recover object lights from holograms captured in other situations.Therefore, to improve the generalization performance of DNNs, we can use datasets that include various types of data, and techniques such as domain adaptation (Tzeng et al., 2017), which has been the subject of much research in recent years.
Furthermore, DNNs have outperformed physically-based calculations in many applications of computational holography.So, will there still be a need for physically-based calculations in the future?The answer is yes, because DNNs require large datasets which need to be generated using sophisticated physically-based calculations.Additionally, the validity of the results generated using DNNs should be benchmarked with the results obtained using physically-based calculation.Meanwhile, several attempts have been made on introducing layers of physically-based calculations in DNNs (Rivenson et al., 2018;Wang et al., 2020b;Hossein Eybposh et al., 2020;Li et al., 2020;Chen et al., 2021;Horisaki et al., 2021;Shi et al., 2021;Wu et al., 2021;Ishii et al., 2022;Kavaklı et al., 2022).Therefore, it will be necessary to continue research on physically-based calculations in terms of speed and image quality to speed up these layers.

CONCLUSION
In this review, we comprehensively introduced computational holography, including computer-generated holography, holographic display, digital holography using deep learning, and D 2 NNs using holographic technology.Computational holography using deep learning has outperformed conventional physically-based calculations in several applications.Additionally, we briefly discussed our personal view on the relationship between DNNs and physically-based calculations.Based on these discussions, we believe that we need to continue research on deep learning and physically-based calculations.The combination of deep learning and physicallybased calculations will further lead to a groundbreaking computational holography research.

FIGURE 1 |
FIGURE 1 | Data processing pipeline for holographic displays.

FIGURE 2 |
FIGURE 2 | Deep neural network-based hologram computation using supervised learning.

FIGURE 3 |
FIGURE 3 | Deep neural network-based hologram calculation using unsupervised learning.

FIGURE 7 |
FIGURE 7 | Process flow of digital holography.

FIGURE 8 |
FIGURE 8 | Architecture of a conditional generative adversarial network.
2017), are input to a long short-term memory network (LSTM) to classify the cells.LSTM is a recurrent neural network (RNN).RNNs have a gradient vanishing problem as the time-series data become longer; however, LSTMs can solve this problem.The studyO'Connor et al. (2020) showed that LSTM significantly improved the classification rate of the cells compared to traditional machine learning methods, such as the random forest and support vector machine.The classification of spatiotemporal COVID-19 infected and healthy erythrocytes was reported(O'Connor et al., 2021) using this technique.