Learning-Based Image Transport Through Disordered Optical Fibers With Transverse Anderson Localization

Fiber-optic imaging systems play a unique role in biomedical imaging and clinical practice due to their flexibilities of performing imaging deep into tissues and organs with minimized penetration damage. Their imaging performance is often limited by the waveguide mode properties of conventional optical fibers and the image reconstruction method, which restrains the enhancement of imaging quality, transport robustness, system size, and illumination compatibility. The emerging disordered Anderson localizing optical fibers circumvent these difficulties by their intriguing properties of the transverse Anderson localization of light, such as single-mode-like behavior, wavelength independence, and high mode density. To go beyond the performance limit of conventional system, there is a growing interest in integrating the disordered Anderson localizing optical fiber with deep learning algorithms. Novel imaging platforms based on this concept have been explored recently to make the best of Anderson localization fibers. Here, we review recent developments of Anderson localizing optical fibers and focus on the latest progress in deep-learning-based imaging applications using these fibers.


INTRODUCTION
The integration of optical fiber devices and imaging processing algorithms enables the fiberoptic imaging system (FOIS) to perform imaging deep into organs or tissues in a minimally invasive way, which is a formidable task for other imaging techniques, such as the conventional microscopy. The general layout ( Figure 1A) of a FOIS consists of the following components: an optical fiber, a proximal-end illumination unit, a distal-end collection unit, and a data processing unit. Depending on the application and the optical fiber type used, the outer diameter and the optical fiber length can range from ∼125 to ∼1,000 µm and from a few centimeters to a few meters, respectively [1]. The miniature size, high flexibility, and long light delivery distance lay the foundation of FOIS's uniqueness, opening new horizons and creating numerous opportunities for both basic biomedical research and clinical practices. In fundamental research, such as deep brain imaging, an FOIS can be easily implanted in freely moving animal skulls for long-term imaging studies [2][3][4][5][6]. For clinical practices, a handheld FOIS can go deep into human organs or tissues with minimized penetration damages, which significantly benefits clinical diagnostics and surgical procedures [7][8][9]. Different types of optical fibers have been explored to develop the FOIS utilizing various mechanisms of image transport and recovery [1,4,[10][11][12][13][14][15][16][17][18][19][20][21][22][23][24][25][26][27][28][29]. Among FOIS solutions, multicore optical fibers (MCFs) and multimode optical fibers (MMFs) are two widely deployed fiber types. Several state-of-the-art systems using MMFs or MCFs have demonstrated excellent imaging performance and made tremendous progress in different application scenarios [3-6, 11, 23, 25, 26, 29-31]. Despite these conventional optical fibers' success, some challenges remain, hindering further enhancement of FOIS imaging capabilities. The main issues are the high sensitivity to environmental perturbations, low imaging quality and speed, complex and expensive systems, and poor compatibility with incoherent spectrally broad illumination. These challenges originate from restrictions related to both the optical fiber device and the image reconstruction technique. For MMFs, its single large core supports thousands of orthogonal modes. The imaging information is encoded in the multimode interference speckle patterns ( Figure 1B). Benefiting from its small diameter (∼200 µm), MMF-based FOISs are reported to be the least invasive endoscopic imaging method, especially for deep brain imaging [3]. Unfortunately, the multimode interference of MMFs is extremely sensitive to any tiny variations, such as mechanical bending or thermal perturbations, that effects the fiber refractive index distribution [4,12]. Some attempts have been made to tackle this difficulty, such as an insightful complex theoretical framework and the application of graded-index MMFs [11,12,[32][33][34]. Nevertheless, this issue is far from truly resolved. For most demonstrated biomedical imaging applications, the MMF still has to be kept stringently in its shape and the length is limited to a few centimeters, which severely limits the implementation of MMF-based FOIS in many scenarios [3][4][5][6]33]. While faced with robustness issues, powerful techniques have been developed to unscramble imaging information embedded in the speckle patterns recorded with MMF-based FOISs [11,12,32,35]. As indicated in Figure 1B, the wave propagation behavior through the MMF is calibrated by measuring the transmission matrix (TM) with interferometry and a wavefront shaping device, such as a spatial light modulator (SLM) or a digital micromirror device (DMD) [3,35]. The TM method has been successfully demonstrated in practical biomedical imaging [3,4,6]. Yet, due to the restrictions of the MMF multimode interference, the TM-based imaging process is also vulnerable to external perturbations. Minor thermal fluctuations (a few degrees Celsius) or tiny mechanical twisting (a few hundred micrometers) can change mode coupling and scramble the pre-calibrated TM [4]. In addition, the experimental realization and the imaging algorithm of the TM-based method require relatively complex and high-cost systems while being limited by imaging speed and illumination coherence [17,[35][36][37]. Meanwhile, the imaging quality is often impaired by evident artifacts, such as defective background and ghost images [3,4].
MCFs have a much larger diameter than the MMFs, ranging from a few hundred micrometers up to 1 mm. MCFs are widely utilized imaging fibers and have been applied with great success in practical applications [1,2,13,14,20,22,23,26,30]. They consist of thousands of individual cores, which are often referred to as "coherent fiber bundles" [1]. Each core in MCFs can work as a pixel to sample and transport the intensity image ( Figure 1C). Although the image sampling is straightforward, the densely compacted core patterns featured in MCFs produce pixelated artifacts in transported images ( Figure 1C) [14,[38][39][40]. The compact structure even further limits the imaging robustness, imaging quality, and illumination choice. For example, the coherent core-to-core coupling is sensitive to wavelength tuning and perturbations [15-17, 38, 39, 41]. Severely blurred images are obtained away from the optimal wavelength or under fiber deformations. Especially for techniques using a wavefrontshaping method to mitigate pixelated artifacts, the strong core-tocore coupling in MCFs makes the imaging rather intolerant to perturbations. To mitigate the influence of the core-to-core coupling, MCF-based FOISs resort to narrowband illumination and deploy a low mode density design. Besides the cross-talk issue, conventional MCF-based FOISs usually require bulky and complex distal optics or mechanical actuators, which limit the extent of miniaturization and can induce severe penetration damage [10,13].
The physical properties of the optical fiber fundamentally restrain the system performance, whereas the image reconstruction algorithm deployed in the FOIS is equally important to 1) recover the object from raw data and 2) simplify the hardware realization. In practice, the raw imaging data from the proximal end of the optical fiber are not readily interpretable; they are either speckle patterns or feature severe artifacts. The imaging information is most likely hidden or incomplete with these sparse and noisy patterns. To reconstruct the object from the raw fiber-delivered data is an ill-posed inverse imaging problem that lacks a unique solution or is unstable with raw data. There are different approaches to obtain the estimation of the object. The process of transforming the object into the raw image through the imaging system can be modeled by a forward operator. Based on the forward model as well as prior knowledge about the objects, most conventional methods tackle this issue through solving the regularized optimization problem by carefully designed regularizers and minimization algorithms. Such model-based methods have been widely implemented and achieved great success. Yet, some difficulties remain to be overcome 1): significant artifacts arise with more noisy or lowquality raw imaging data 2): the handcrafted model limits the universality and might be unavailable for some complex physical systems. On the other hand, the choice of the image reconstruction method also affects the complexity of the experimental system. For methods requiring wavefront shaping process, the configurations and workflow tend to be complicated and induce high costs.
Instead of relying on conventional optical fibers and algorithms, another avenue to go beyond the barrier would be exploring new waveguiding physics as well as resort to a learningbased approach to tackle the inverse imaging problem. Recently, the emerging transverse Anderson localizing optical fiber (ALOF) provided a lot of evidence that transversely random fiber structures can be utilized as astonishing robust and highquality imaging carriers [42][43][44][45]. ALOFs can potentially supersede conventional optical fibers based on their counterintuitive but intriguing properties: highly multimode systems with single-mode-like behaviors and wavelengthindependent point spread functions [46][47][48][49]. They enable imaging encoding through densely distributed localized fiber modes that are highly robust to perturbations. They can also transport images under broadband illumination without wavelength-dependent blurry problems suffered by the MCFs. The underlying physics originates from the transverse Anderson localization effects, which guarantees robust, broadband, and high-quality image transport [42,43]. While the novel waveguiding physics can relieve the device restrictions, it still requires a well-designed imaging reconstruction algorithm to make the utmost of the fiber. Due to their outstanding performance on image classifications, segmentations, and reconstructions, deep-learning (DL) methods have motivated researchers to deploy these algorithms in the fiber optical imaging area [18,19,27,[50][51][52][53][54]. DL-based research in optics and photonics is fast-growing. It has gained great success in various applications, and proved to outperform conventional model-based algorithms in many imaging problems [55,56]. In particular, DL methods can be well adapted for the inverse problem of the FOISs. One important reason is that accurate physics modeling of the complex wave propagations through special optical fibers is often a formidable task. For example, due to the absence of analytical solutions, numerical simulations require substantial computational resources even for simplified wave propagation within a ALOF [57]. On the other hand, DLbased solutions boosted by big data stand for a universal approach without the need for a handcrafting forward model [58][59][60]. They are able to directly "learn" the underlying physics of a complex waveguiding system merely relying on a set of training data without any prior knowledge [19]. Boosted by the newgeneration graphics processing units (GPUs), many DL-based tasks can be processed with a personal computer and reach milliseconds per frame imaging speed for a trained DL neural network. Unlike conventional solutions, DL-based methods directly utilize raw intensity images and claim no particular requirements on the coherence or polarization properties of the illumination. They can, therefore, bypass the constraints of complex and high-cost optical systems, such as interferometry and wavefront shaping devices, leading to cost-effective configurations with low complexity.
In this review, we focus on the learning-based ALOF imaging systems. Anderson localization related research is an active and extremely broad area. We mainly focus on the discoveries of ALOF that are related to imaging applications. We will present recent progress of the Anderson localization of light in waveguide-like structures in the first section. Following the discussion of ALOFs, we will give a brief introduction to the basics of the DL and convolutional neural network (CNN). Finally, we will give a summary of recent progress in imaging through integrating ALOF with CNNs. Due to the fiber-optic imaging oriented applications, the discussion of the algorithm will be limited to the deep convolutional neural network (DCNN).

Historical Review on the Origin of ALOF
We summarize the historical development of ALOFs in Figure 2. We mainly select those which made important impacts on the imaging applications of ALOFs. As shown in Figure 2, Anderson localization was first introduced by P.W. Anderson to describe electron's motion in a highly disordered medium within the quantum mechanic's framework [61]. In his seminal paper, disordered defects in the potential landscape cause multiple scattering of electron waves, resulting in spatially localized electronic states. Since it is a consequence of the wave nature, Anderson localization is broadly applicable to both quantum mechanical waves defined by the Schrödinger equation and classical wave systems, such as acoustics, elastics, electromagnetics, and optics [62][63][64][65][66][67][68]. Among various classical wave realizations, Anderson localization of light has attracted tremendous attention due to mature experimental tools to probe the localization phenomena and diverse possibilities to construct disordered "optical potentials" using disordered refractive index distributions [69][70][71][72]. Many efforts have been made to observe and apply Anderson localization of light in various systems [44][45][46][73][74][75][76][77][78][79][80][81][82]. For Anderson localization of light to occur, the wave scattering must be strong enough so that the wavelength in the medium is comparable to the scattering mean free path, the so-called Ioffe-Regel criterion [83]. In 3D system, it remains quite challenging to satisfy this criterion. Even if large refractive index variations meet the needs for observing 3D localization of light, the optical system usually introduces considerable losses, making it difficult to differentiate Anderson localization-induced exponential decay from optical loss-induced exponential decay. But this restriction is considerably relieved in quasi-2D optical systems [84,85]. One realization of such a system is a waveguide-like structure ( Figure 3A 1 ) where the refractive index distribution is disordered in the transverse plane but uniform along the optical wave propagation direction [42,84]. To require the optical waves to be localized in the 2D transverse plane, merely the wavevector's transverse  The sample is exposed to a solvent to differentiate between polymethyl methacrylate (PMMA) and polystyrene (PS) polymer. The darker regions are PMMA; the Material filling fraction is ∼50%; The feature size is ∼0.9 µm. (B 3 ) Transported images of numbers "4" and "6" through a 5-cm-long PALOF sample. The numbers are elements from group 3 in the 1951 USAF resolution test chart. (C 1 ) SEM image of GALOF cross section: white areas are fused silica and black areas are air holes; the material filling fraction is ∼28.5%; The feature size is ∼1.6 µm. (C 2 ) Transported images of numbers "3" and "5" through a 4.5-cm-long GALOF sample. The numbers are elements from group 3 in the 1951 USAF resolution test chart. (C 3 ) Elements in the 1951 USAF resolution test chart. (D) GALOF fabrication process. (A 1, A 2 ), (B 1 ,B 2 ), and (C 1 -C 3 ) are adapted with permission from [42,86], and [ components need to be taken into consideration. The transverse component can be 10 to 100 times smaller than the full wavevector [85,86]. Even if the mean free path is much larger than the wavelength, localization of light could still occur transversely in quasi-2D optical systems. As the simulation demonstrated in Figure 3A 2 , by coupling a Gaussian beam into the disordered waveguide, the beam first goes through an initial expansion and eventually localizes to a stable state fluctuating around with a stable beam radius [42]. The above optical wave behavior in the quasi-2D system is often cited as transverse Anderson localization (TAL). The 2D TAL was proposed by Abdullaeav et al. and De Raedt et al. independently based on different mechanisms [84,87]. Abdullaeav's scheme is to impose the disorder on top of an existing ordered lattice with periodic potential. Without the imposed randomness, the wave behavior would be the Blochperiodic solutions that extend over the whole lattice. By introducing sufficient disorder into effective index or coupling coefficients, the wave propagation would collapse into localized states in certain regions of the lattice. For De Raedt's scheme, instead of introducing disorder to an existing lattice, a completely random underlying potential (e.g., 2D refractive index distribution in an optical fiber) is created. De Raedt proposed an optical-fiber-like waveguide structure as shown in Figure 3A 1 . The pixels in the transverse plane have a randomly chosen refractive index of n 1 or n 2 with equal probabilities. The longitudinal refractive index distribution is uniform. The size of the pixel is assumed to be comparable to the wavelength. Based on intensive numerical simulations, De Raedt demonstrated that optical waves decay exponentially in the transverse plane and remain localized transversely with longitudinal beam propagation. Such TAL behavior is caused by multiple scattering in the transverse plane and similar to the simulation results shown in Figure 3A 2 . Abdullaeav's and De Raedt's results were purely theoretical investigations. The first experimental observation of TAL of light was demonstrated by Segev's team in 2007 [85]. Their optical system is based on a scheme similar to Abdullaeav's proposition: disorder is imposed on an existing ordered triangular lattice of waveguides using photorefractive crystals. In this pioneering work, they use an intense laser to write a transversely disordered but longitudinally uniform refractive index distribution into the photorefractive crystal and probe the TAL of light with another laser beam. The photoinduced refractive index variation is on the order of 10 −4 . The small variations of the refractive index result in large localization beam radii with significant standard deviations among different realizations of the random refractive index profiles. In this case, the beam radius of TAL is meaningful in a statistically averaging sense. Yet, by introducing sufficiently large refractive index variations, the sample-to-sample variations of TAL beam radius can be significantly suppressed so that the localized beam radius of one realization can resemble the ensemble average [57,86,[88][89][90]. This self-averaging behavior would guarantee similar localization lengths for different disordered refractive index profile realizations, highly desired for pushing TAL optical waveguides into practical applications. After Segev team's pioneering work, many efforts have been made to explore TAL phenomena, which finally led to the development of ALOFs [91][92][93][94].
In the following discussions, we mainly take the PALOFs and GALOFs as examples. For device-level applications, one would expect that the beam's localization radius is sufficiently stable based on self-averaging behavior. Otherwise, the transported beam radius could vary with the transverse location in the disordered structure unpredictably. Stable localized beam radii require a proper ALOF design with enhanced TAL. Previous studies in the ALOF design suggest the following most relevant transverse structure parameters: the transverse size of crosssection, the feature size (width of each refractive index pixel), the material filling fraction (ratio of low-index materials to the high-index host medium), and the refractive index contrast [42,57,90,98]. First, the transverse size of the ALOF should be large enough so that the localized modes in the interior area would not be affected by boundary effects and are merely decided by the TAL mechanism. For example, the diameters of most recently reported ALOFs range from ∼125 to ∼400 µm. Second, the optimum feature size was speculated to be around half of the free-space wavelength for fiber materials with a refractive index around 1.5. The above design is just an educated guess, whereas the truly optimal feature size relative to the wavelength is still in dispute [98]. Recent observation of localization radius being independent of the wavelength within a rather wide spectral range has cast more shadows on the optimal feature size speculation [47,48]. While this issue needs more investigation to provide more experimental evidence, our empirical observations based on GALOF fabrications and tests show that strong TAL requires a feature sizes on the order of the free-space wavelength [44]. Third, the optimal material filling fraction suggested by intense numerical simulation shows that 50% should be the ideal design [42,57]. This parameter is from the observation that a higher material filling fraction results in a smaller localized beam radius. It should be noted that the above conclusions apply to refractive index contrasts below 0.5. For even larger refractive index contrasts, it is still an open question [98]. While a 50% material-filling fraction is relatively easier to achieve for PALOFs, a similar material-filling fraction is still quite challenging for GALOF fabrication. The reported highest material filling fraction of the GALOF is still below 30%. Finally, increasing the refractive index contrast between the two filling materials can generally enhance TAL and reduce the localized beam radius. Previous investigations have confirmed that an index difference of ∼0.5 for GALOF can considerably reduce the localized beam radius compared to the Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 710351 5 index difference of ∼0.1 for PALOF. Nevertheless, one should be cautious with the finding that the dependence of localization length's reduction on index difference enhancement tends to saturate asymptotically [42]. Therefore, further reductions in localization length may be quite small by increasing index difference beyond the threshold value.
Besides the discussed design parameters, the optical losses of the fiber materials need to be taken into consideration for specific applications. For imaging applications, PALOF fibers suffer from huge signal attenuations in the visible band, limiting their image transportation distance to less than 20 cm [74]. In comparison, the fused silica based GALOFs have much smaller material losses, which is shown to support image transport distance, at least, to 1m level [44]. Fused silica is also the mature and widely deployed optical fiber industrial-grade material, making it cost-effective and easy-to-implement. Based on the above roadmap of ALOF design, both PALOFs and GALOFs have demonstrated superior imaging capabilities. As shown in Figures 2B,C, they can directly transport high-quality intensity images from a resolution test chart. Different observations have further confirmed that the quality of the images transported by ALOFs are comparable to or even higher than some of the best commercially available coherent fiber bundles. As a special case of MCFs, the core-tocore coupling in fiber bundles degrades the point spread function with increasing transmission distance [101]. In order to suppress the cross talk between individual cores, the MCFs usually have to randomize the core size and control the core density, which results in low mode density. But low mode density brings in more severe pixelated artifacts. ALOFs resolve the contradiction by strongly coupling all the neighboring sites but preventing the cross talk through extreme randomness. Therefore, ALOFs feature about two orders higher mode densities than MCFs and can outperform them in terms of transmitted image quality. Especially, the point spread function of ALOFs is determined by the localization length that is independent of the transmission distance.

Fabrication Techniques of ALOF
Finally, the fabrication of ALOFs is also an important topic to explore. PALOF's fabrication has been reported by Mafi's team [99,102]. To fabricate the PALOF, 40,000 polymethyl methacrylate (PMMA) strands are randomly mixed with 40,000 polystyrene (PS) strands first. Then the randomly mixed strands are assembled into a preform with a square cross section, the side length of which is ∼2.5 inches. The preform is further drawn into the final PALOF with a diameter of ∼250 µm. The optimal GALOF fabrication recipe is still under investigation. The first GALOF was reported by Karbasi et al. in 2012, which is drawn from "satin quartz" (Heraeus Quartz) at Clemson University [95,99]. The "satin quartz" is a type of porous artisan glass. This type of GALOF has a diameter of 250 μm, which is drawn from a rod preform with a diameter of 8 mm. The average air-filling fraction is about ∼5%. The feature size of air holes varies from 0.2 to 5.5 µm. Chen and Li at Corning Inc. also reported their random air-line GALOFs that were fabricated using the outside vapor deposition process [96,99]. They first create a silica soot blank by soot deposition in the laydown process. The silica soot is chlorine dried in a consolidation furnace, then further consolidated in the presence of 100% N 2 . The N 2 was trapped in the blank to form glass with randomly distributed air bubbles. Finally, the preform with random air bubbles are drawn into fibers with random airlines. The air-filling fraction of the air-line GALOFs is lower than 2%. The air hole size is being around 0.2-0.4 µm. Being limited by the low air-filling fractions, these early reported GALOFs can only support TAL in some local areas of the transverse plane and are not suited for the image transport. In 2017, Zhao et al. developed GALOFs with ∼28.5% air-filling fraction using the well-established stack-and-draw fabrication technique [44,79]. The feature sizes are around 1.6 µm. Due to the high air-filling fraction, TALs can be observed across the whole disordered area. A high-quality image transport process has been demonstrated through a meter-long GALOF sample [44]. The fabrication workflow of Zhao's GALOF is shown in Figure 3D. In the preform fabrication phase, hundreds of silica capillary tubes are first drawn with various outer diameters (ODs) and inner diameters (IDs). The ODs of the capillaries vary from ∼100 to 180 µm. The ratio of ID to OD ranges from 0.5 to 0.8. These capillaries are cut into the same 1-m-long length and randomly mixed. Then they are assembled and fed into a silica jacket with an inner diameter of ∼15 mm to create the preform. In the fiber fabrication phase, the first step is to draw the preform into a cane with ∼3 mm OD. The second step is to draw the cane to the desired fiber size (OD: ∼400 μm, ID: ∼280 µm). During the fiber fabrication process, it is important to monitor the variations of the cross section with a bright-field microscope. The finished GALOF samples feature characteristic distributions of air-hole areas that typically range from 0.64 µm 2 to over 100 μm 2 . Statistically, air holes with an area of 2.5 µm 2 cover the largest disordered area.

Wavelength Dependence and Wavefront Qualities of TAL in ALOF
ALOFs' intriguing mode properties lay the foundation for developing robust FOIS with high imaging quality. Different types of ALOFs share similar properties regardless of the specific materials. Previous investigations focus more on PALOF-based platforms since PALOFs appeared earlier than GALOFs. The early investigations on ALOF's beam multiplexing properties proved that multiple-beam propagations are feasible for PALOFs [42,88]. The spatially multiplexed beams are also highly robust: the TAL beam propagation channels can withstand substantial bending, the degree of which goes beyond the limit of conventional optical fiber [88]. In later research, Giancarlo et al. discovered that the TAL transmission channels in the PALOF demonstrate a high degree of resilience to mechanical perturbations and variations of beam coupling positions, which are strong evidence of singlemode channels [46]. This explains the high stability of PALOF's beam multiplexing against macro bending. The emerging GALOFs further confirm the ALOFs' high robustness against strong mechanical fiber bending through image transport tests [18,44]. As shown in Figure 4A, for the same meter-long GALOF Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 710351 6 sample, the transmitted pattern under a 180-degree mechanical bending is almost the same as the one delivered through the straight fiber [44]. Recent research also showed that a considerable number of transmission localized modes in GALOFs have low M 2 values (close to ∼1) based on both numerical simulations and experimental measurements ( Figure 4B-D) [49]. Here, M 2 value is a widely used metric to evaluate laser beam quality, which is equal to one for an ideal diffraction-limited beam. More details regarding the calculations of M 2 values refer to Ref. [104]. A double-slit interference experiment in this research proves the high spatial coherence of these localized GALOF modes. The above observations demonstrate that the localized modes in GALOF exhibit nearly-diffraction-limited wavefront quality, making the localized transmission channels comparable to single-mode optical fibers. Significantly, these high-quality modes can be excited easily without any expensive and sophisticated devices, such as spatial light modulators. What further distinguishes the ALOFs from other imaging fibers is the wavelength independence of the localization lengths over a reasonably broad spectral range (∼1 µm bandwidth). Since the width of the point spread function of the ALOFs is determined by the localization lengths, the imaging capabilities of the ALOFs should not be degraded with broadband illumination. This phenomenon has been observed experimentally for both PALOFs and GALOFs by different research groups [47,48]. Referring to Figure 4F, two different localized spots at the GALOF are picked up to investigate the dependence of localization length on wavelength tuning. It appears that the localization length fluctuates around a stable averaging value for wavelengths varying from 540 to 1,600 nm. With the same GALOF sample, the colorful light-emitting array from the smartphone screen pixels is coupled into the input facet of the GALOF sample. Due to the wavelength independence properties, after transmitting  Figure 4E). All above-mentioned unique properties guarantee the device foundations for robust high-quality colorful image transport through ALOFs, which breaks the bottlenecks imposed by the conventional MMFs or MCFs and opens more possibilities for fiber-optic imaging. We summarize some of the important imaging-related parameters of three different types of fiber (MMF, MCF and GALOF) in Table 1. Since the GALOF developed at CREOL is the first glass-air ALOF that supports high-quality long-distance imaging, in the following discussions, we mainly focus on this type of ALOF.

Fundamentals of CNN
The learning-based approach has attracted lots of attention recently, mainly focusing on the DCNNs to tackle the inverse imaging problem of the fiber-optic system [18,19,27,[50][51][52]105]. DCNN is a specific class of machine learning techniques. Nowadays, the deep neural network is often praised by mass media as one of the most fascinated techniques. From a historical perspective, the neural network has been more than 50 years old since Frank Rosenblatt first developed the perceptron in 1957 [106,107]. The concept of CNNs is also well known for more than 30 years. The neural network with the backpropagation was proposed and deployed to solve imaging problems as early as the 1980s [108][109][110]. In the past half century, the research of neural networks experienced a few degressions and progressions. The resurgence is spurred by the rise of the graphic processing unit (GPU), the breakthrough in the algorithms for large-scale deep neural networks, and the huge amount of data for various applications brought by the digital era [58,[111][112][113][114][115][116]. In the following discussions, we focus on some basics of neural networks [117]. Neural network architectures are mostly based on multiple-layer computational geometry. The latter layer receives output from the previous layer. Every single layer consists of many processing units, which can be called "neurons". We can take a simple two-layer fully-connected neural network as an example to illustrate the above framework ( Figure 5A). In this two-layer neural network, neurons in each layer are connected to all the previous layer outputs (or the input data), which is the so-called fully-connected or densely-connected architecture. The neurons can take numerous inputs and generate a new output. The output of the neurons further serves as the input for the neurons in the next layers. The processing of each neuron consists of a linear operation and a nonlinear operation. As shown in Figure 5B, the linear operation includes weighted averaging plus an additional bias. In this process, the weighted parameters and bias parameters are introduced into the neural network. These parameters are trainable, the values of which are iterated and updated in the training process. Following the linear operations, the activation function acts on the output of the linear operation to impose nonlinearity. There are several widely used activation functions, such as sigmoid, tanh, or ReLU. Among these different options, ReLU is the most popular activation function in practice and plays a central role in DCNNs ( Figure 5C). The first step to solve a specific problem is to "teach" the neural networks using the data. In this training process, the parameters are iteratively adjusted by optimizing the cost function so that the neural network "learns" the underlying model from the data. The training process can be supervised or unsupervised. We mainly take supervised learning as an example here. Generally, the data would be divided into three datasets, the training dataset, the validation dataset, and the test dataset. The training dataset is what the neural network sees and learns from. The validation dataset accompanies the dynamic training process to generate temporal test errors for real-time evaluation. The neural network merely evaluates the training and tunes the parameters based on the validation data but never learns from the validation dataset. Different from the validation dataset, the test dataset is only be used once after the training process. It serves to evaluate the trained model's capability to what extent it could accurately generalize problem. A schematic of the training workflow is shown in Figure 5A. The training data are loaded into the network after a randomized initialization. The optimization of the parameters is then evaluated by calculating the loss through the cost function based on a certain metric. For imaging problems, mean squared error (MSE), mean absolute error (MAE), and structural similarity index measure (SSIM) are widely accepted metrics [118,119]. The training process aims to minimize the cost function using these metrics by iteratively updating each trainable layer's parameters. In Figure 5D, we visualize the basic concept of the gradient descent process and exemplify the equations to update the weights and the bias. In practice, popular optimization methods, such as the Adam optimization algorithm, are mainly based on the variants of  [58,120]. The SGD can use a subset of the data to update the model iteratively, while gradience descent requires running through all the data during the training phase. In order to apply the SGD to train the network, the backpropagation procedure was developed ( Figures 5A,B,D) [108,109]. The procedure computes the gradient of the cost function relative to each layer's weights based on the chain rules. The gradient propagates backward from the output to the input such that the gradients relative to the weights of each layer can be calculated.
The above neural network basics can be directly applied to the understanding of CNNs. A CNN is designed to handle multiple array data, which fits the imaging data, such as 3D array color imaging data or 2D array gray-scale imaging data [58]. The general purpose of the CNN is to extract feature maps from the imaging data through convolutional layers and pooling layers. It usually applies many alternating layers of the convolutional operation and pooling operation to gradually extract high-level features from the low-level features. As shown in Figures 5E,F, each unit of the feature maps is connected to the previous feature maps' local area through a kernel (or a filter) that carries the weight parameters and is shared by the same feature map. Specifically speaking, the kernel moves in the feature maps with a certain stride to calculate the weighted sum. The weighted sum is added with the bias map ( Figure 5F). Then, the above linear operation results are passed to a nonlinear activation operation, such as the ReLU. It should be noted that different feature maps are using different kernels. The choice of the kernel size depends on the specific applications (3 × 3 kernel in Figure 5). The above operation's primary purpose is to detect the local conjunctions of imaging features of each layer. The reason is that the local group values of imaging data are often highly correlated, and the local statistics of imaging data are invariant to locations [58]. For a regular CNN, the pooling layer follows the above convolutional layer. The pooling operation is to extract the maximum or average local patch in the feature maps ( Figure 5G). The pooling layer manages to merge semantically similar imaging features into one. It extracts dominant features that are invariant to locations [58]. After going through the convolutional layers and pooling layers, the feature maps are flattened into a vector. This vector is subsequently fully connected to other densely connected layers. The final layer outputs a vector containing the probability distribution for a classification problem with a size matching the desired parameters, such as shown in Figure 5E. The sample architecture shown in Figure 5E is an oversimplified model. In practice, very deep architecture containing dozens or even hundreds of layers can be deployed to tackle various imaging problems. In particular, the DCNN model to solve the inverse imaging problem is mostly based on an encoder-decoder architecture in which the input imaging data are down-sampled first (similar to Figure 5E) and then up-sampled to output a 2D or 3D array data with the same size as the image of the object. More details about a few popular DCNN architectures are shown in the next section.

The Advantages of the Learning-Based Approach
The DCNN-based learning approach has great advantages over model-based approaches. As we can see from Figures 6A,B, a general fiber-optic imaging problem can be formulated as recovering the image of the object from noisy and distorted raw imaging data. In real-world imaging configurations, this inverse imaging problem is ill-posed. Some conventional methods aiming to develop a direct inverse operator of the forward operator H would suffer from significant artifacts for the ill-posed inverse problems. Model-based conventional methods develop the regularized formulation to overcome these difficulties. Assuming that Y HX (Y: raw image, H: forward operator, X: Object), the estimation of the object X can be reconstructed through solving a regularized optimization problem: X argmin In Eq. 1, f is the cost function to measure the error between HX and Y. φ is the regularizer that encodes the prior knowledge of the object. It promotes the solutions matching with the prior knowledge and reduces ill-posedness. c is the regularization parameter that tunes the relative strength of the two terms.
Although the model-based approach plays an undisputed central role in dealing with the inverse problem, it imposes some demanding requirements that limit its application and performance. From Eq. 1, the model-based methods required modeling the forward operator accurately and handcrafting the cost function, the regularizer as well as the optimization algorithm for each new application. However, it is challenging to develop a general design that can handle a large class of problems. For some complex physics systems, such as the wave propagation in the ALOF, even accurate modeling of the forward operator is not readily available. Besides, the optimization procedure has to be performed for each imaging operation, which may take a few minutes up to a few hours per frame for a typical optimization process [60]. The DCNN-based learning approach can overcome these limitations faced by the conventional model-based approach. As a data-driven solution, the DCNN "learns" the parametric function for the inverse problem [59] from the training data directly. There is no need to handcraft the forward model, the regularizer, the cost function, and the optimization algorithm. This unique feature circumvents the unavailability of the physics model for complicated multimode fiber-optic imaging systems. More than that, a data-driven learning capability simultaneously contributes to simplifications of the experimental hardware. For previous imaging FOIS solutions, the experimental configurations often FIGURE 6 | (A) A general fiber-optic imaging system. The physics model of the fiber-optic imaging system can be treated as a forward operator H. H acts on the object X to generate the raw image. A digital camera collects the raw image Y, (B) Learning-based approach (convolutional encoder-decoder architecture) to solve the inverse problem of reconstructing X from the raw image. R stands the inverse operator that maps the raw image into X, the estimation of the object.
Frontiers in Physics | www.frontiersin.org November 2021 | Volume 9 | Article 710351 feature a complicated design in that cracking the underlying physics model requires multi-dimensional measurements [3,4,34]. In contrast, digital camera processed intensity images can be directly deployed to uncover any complex underlying models through the neural networks' self-learning. DCNN stands for a more general framework that includes the model-based approach as one of its special cases. It is, therefore, not surprising that the DCNN architecture can be decoupled from a specific problem and transferred between various applications. Moreover, the DCNN is much more computationally efficient than the model-based approach: trained DCNN merely needs calculation times of milliseconds per frame. In addition to generality and speed, recent research has also demonstrated that the DCNN yields high-quality solutions to inverse imaging problems [19,[121][122][123][124][125][126][127][128][129]. In general, the deployment of the DCNN in the fiber-optic imaging opens a new avenue for The image of the sample is relayed through the 4f system consisting of a 10× objective and a tube lens. The relayed image is split into two copies: one is recorded by camera 1, working as the reference image (ground truth); the GALOF samples the other one for generating raw image purpose. The fiber-sampled image is transported through a meter-long GALOF sample (∼80 cm) and finally recorded by camera 2. The data collected by camera two is the raw images. A heater is attached at the center of the GALOF sample to increase the fiber temperature. The imaging process is tested under both straight and bent fiber status. As shown in the inset, the input end of the GALOF is fixed while the output end is bent by a bending shift distance d. The relation between the bending angle θ and the bending shift distance d is given by d L[1-cos(θ)]/θ, (B) The architecture of the DCNN for gray-scale cell image reconstruction based on a U-Net framework. The image is adapted with permission from [19]. More details refer to [19], (C) The architecture of the DCNN for color cell image reconstruction is based on an inception CNN framework. The image is adapted with permission from [134] © The Optical Society, (D) Schematic of the cell recognition setup. The design of the setup (D) is similar to the setup in (A), except that the beamline for reference image collection is removed due to its imaging-free capabilities. The setup just collects the raw speckle images for classification purposes. Similar to the inset shown in (A), the imaging recognition process is also tested under bent fiber status with the same bending mechanism. More details refer to [52]. (E) The architecture of the DCNN for cell recognition is based on a VGG framework. More details refer to [52].  [130][131][132]. In the following sections, we mainly show our recent progress for this type of FOIS by integrating the ALOF-based FOIS with DCNN.

RECENT PROGRESS OF LEARNING-BASED FIBER-OPTIC IMAGING WITH GALOFS
Based on the previous discussions of ALOFs and the DCNN, we proceed to review our recent progress based on the integration of GALOFs and DCNNs. In the introduction section, we listed the issues encountered by conventional FOISs: the high sensitivity to perturbations, low imaging quality and speed, complex and expensive systems, and poor compatibility with incoherent broadband illumination. Our lab recently focuses on developing the GALOF-DCNN solution to mitigate these barriers and enhance the FOIS's performance to the next level [18,19,52,133,134]. Our systems can be divided into two categories: image recovery and image recognition. For image recovery, the GALOF-DCNN system achieves highly robust image transport and nearly artifact-free imaging quality as well as the capability of imaging objects with varying depths. For image recognition, the GALOF-DCNN system demonstrates highly accurate and very robust (up to ∼74 bending degree) classifications without imaging reconstruction. The image reconstruction system designs are illustrated in Figures 7A-C. The learning approach for GALOF-DCNN systems is based on supervised learning. Therefore, the experimental systems have to collect the raw images and the reference images for the same area of interest simultaneously. The setup ( Figure 7A) contains two beamlines: one is for ground truth data; the other is for raw imaging data. Meanwhile, the setup also incorporates a temperature control module and a mechanical bending function, aiming to investigate the robustness of the GALOF-DCNN system under various thermal and mechanical perturbations. Depending on the light source's properties, there are two different DCNN architectures adapted for the image reconstructions. Referring to Figures 7B,C, the U-Net model is utilized for gray-scale image reconstruction [135], while the inception model is applied to color image reconstruction [136]. The U-Net model or the inception model merely defines the general framework. Each sampling block's specific layer design is carefully customized to fit the GALOF-DCNN system and a particular illumination. For example, within the U-Net model, the ResNet framework [137] is applied in each individual down-sampling block as well as in the up-sampling blocks ( Figure 7B), which dramatically improves the quality of image reconstruction (more details can be found in Ref. [19]). Despite that the U-Net model works well for gray-scale fiber imaging, the inception model proves to be more effective for color fiber imaging based on imaging performance evaluation. This is a consequence of the inception network's ability to extract image features at varying scales through simultaneous utilization of different kernel sizes. It should be noted that the inception model is also optimized to fit the GALOF-DCNN system: each parallel branch contains a customized U-Net module (more details are given in Ref. [134]).
The image reconstruction results are demonstrated in Figures  8A-D. First, the GALOF-DCNN system is able to deliver nearly artifact-free single-color or full-color cell images in real-time through a meter-long fiber ( Figure 8A), which is a very difficult task for conventional FOIS. The GALOF's unique properties, such as high-quality wavefront, high mode density, and wavelength-independent localization length, remove the device-level limitations for high-quality imaging. On the basis of the GALOF transport, the DCNN accurately simulate the underlying inverse operator of the imaging system, which finally recovers the fine features of the cell sample with high accuracy. The training time for the GALOF-DCNN system is ∼6.4 h for 15,000 imaging pairs. Once trained, the reconstruction speed is 0.05 s per frame with nearly artifact-free quality (more details in Ref. [19]). Second, the GALOF-DCNN demonstrates superior imaging robustness compared to existing FOISs using other types of optical fibers. Referring to Figure 8B, the quality of cell imaging is almost not affected even under a 2-cm bending shift and a 30°C temperature increase. The high robustness is mainly attributed to the single-mode-like behavior of the localized modes embedded in GALOF's disordered structure. Similar behaviors can hardly be observed in other multimode waveguide systems, which distinguishes the GALOF-DCNN system from FOISs based on conventional fibers. It should be noted that the DCNN model deployed in these robustness tests is only trained one time using the data obtained from straight fiber operating at room temperature. The test time for any bent fiber or fiber being heated is just tens of milliseconds. This one-time-only training method is fundamentally different from other learningbased FOISs that perform several time-consuming re-trainings for each individual bending angle or thermal status. The GALOF-DCNN truly provides a practical one-shot solution for highspeed, robust and artifact-free cell imaging based on its superior robustness. Third, the GALOF-DCNN is highly adaptable for variations of object depth. For objects located at various distances from the fiber input facet (0-4 mm), the system successfully transports high-quality cell images without the assistance of extra distal-end optics ( Figure 8C). It shows that the GALOF-DCNN system can tolerate the defocus up to a few millimeters. This capability enables a simplified distal-end design that could minimize the penetration damage to the living object. Fourth, the GALOF-DCNN is proved to be a general system by a transferlearning test, which is also strong evidence of capturing the underlying physics model well. As shown in Figure 8D, the DCNN is trained using a mixture of different objects: human red blood cells, frog blood cells, and polymer microspheres. Then, the trained model is directly deployed to test the object of bird blood cells which represent a different cell type. It still maps the raw image into cell image reconstruction with reasonably high imaging quality. It should be noted that the learning-based research in the area of optics and photonics often train and test the DCNN using the same type of objects that shares similar image features. It is not trivial to train and test the DCNN using objects carrying significantly different image features.
It is well-known that a high-quality image is vitally important for recognizing cell types. However, we recently prove that highquality image recovery is not always necessary for the cell recognition. We develop the system shown in Figures 7D,E to realize imaging-free object recognition for various cell types. The configuration shares a similar layout for the image reconstruction task except that the beamline for the reference data is removed.
The reason is that the raw data label switches from ground-truth images to cell type identification numbers. The cell type identification is the prior knowledge so that it can be separately processed using the computer. Correspondingly, the DCNN model is based on a VGG architecture ( Figure 7E) [138] which is designed for image classification tasks (more details are given in Ref. [52]). Based on the above design, the raw speckle image data are loaded into the trained VGG model. The VGG model outputs the probability distributions to identify the specific cell type ( Figure 7E). Similar to the image reconstruction system, the robustness and the depth variation adaptability of the object The DCNN is trained and tested using the same type of cell breed. More details refer to [19], (D) Transfer learning test for cell image transport under straight fiber status and 0 mm object depth. The DCNN model is trained (human red blood cells, frog blood cells and polymer microspheres) and tested (bird blood cells) using different types of cell breeds. More details refer to [19]. Images in (B-D) are adapted with permission from [19], (E) Cell recognition test results under different bending (0 mm object depth) and object depth (straight fiber) status. The images are adapted with permission from [52]. More details refer to [52]  recognition system are also evaluated based on the prediction accuracy. Related results are demonstrated in Figure 8E. The accuracy of the cell recognition is about ∼92% for straight GALOF sample with a 0 mm object depth. When fixing the object depth at 0 mm, a nearly bending-independent classification accuracy is observed for strong mechanical deformation up to ∼74°(corresponding to a 45 cm bending shift distance). The error tolerance of the cell classification is apparently much larger than the image reconstruction task (∼3°). The mechanism may be attributed to the fault tolerance of VGGbased designs in conjunction with the localized mode's singlemode nature. It should be noted that the current GALOF is far from being perfect in that many extending modes co-exist with the localized modes. Such extending modes do not share the stability with respect to perturbations. Despite that, the 74-degree bending angle with a meter-long fiber sample would be sufficient to enable most practical medical endoscopy applications. In addition to the high robustness, the depth variation adaptability tests also show that the GALOF-DCNN system has a superior capability of handling the defocusing issue. For an object depth of 0.5 mm, the accuracy is as high as ∼92%, while for depths smaller than 1 mm, the accuracy remains higher than 80%.

OUTLOOK
The disordered structure of the current GALOF is far from the optimal design. To fully exploit the potential of transverse Anderson localization, further investigations on the fabrication technique are required to improve the fiber quality and its random structure parameters, such as air-filling fraction and the air-hole uniformity. The current stack-and-draw technique for GALOF faces some challenges. It requires hundreds of manually assembled capillaries to create the preform and a recursive manufacturing process to obtain the final product, which is time-consuming and labor-intensive. More importantly, the random structures' parameters are hard to approach the optimal design due to the limitation of the packed tube geometry. Due to similar reasons, it is also difficult to maintain a high degree of repeatability and accuracy of the random structure. The potential solutions to these challenges might be in additive manufacturing or 3D printing technique. It has been proved that 3D printing can be applied to create complex cross-sections for optical fiber preform with high accuracy and repeatability [139][140][141][142]. Especially, single-step fiber fabrication can be realized by printing the preform in just a few hours. A similar 3D printing technique could be deployed to make the preform for GALOF fabrication and simplify the recursive manufacturing process. In addition to the fabrication techniques, the glass material is another important factor that determines the imaging performance of GALOFs. The current GALOF mainly works at the visible band. For biomedical applications, it is desired to extend the spectral range from the visible band to ∼1,500 nm. This spectral range is the therapeutic window that enables optical detection and treatment in a living body [1]. Being limited by the transmission window of silica, it is difficult to go beyond ∼3 µm. Instead of silica, tellurite glass has a broad transmission window up to 7 μm, a larger refractive index, and high thermal and chemical stabilities [143,144]. It could be the ideal candidate to develop a novel GALOF reaching the nearinfrared range. Recently reported all-solid-state ALOF has already shown the great potential of tellurite glasses by transporting near-infrared (∼1.55 µm) images [100]. For the algorithms, the current learning approach is based on supervised learning, which requires a large amount of highquality labeled imaging data. However, it is often quite challenging to meet this requirement in practice. For example, one important application scenario of a fiber-imaging system is the endoscopic imaging of organs or tissues. Limited by the unique imaging objects and environments, it is quite challenging to access the distal end of the imaging unit and acquire labeled training data. To resolve these issues, unsupervised or semi-supervised learning approaches might be able to provide a new avenue for future systems in that they do not need strictly labeled data [127,[145][146][147]. This would release the demanding requirements on the amount of necessary training data and time as well as reduce the heavy burden on system calibrations. To implement unsupervised or semisupervised learning, integrating the physics modeling with the DCNN architecture is shown to be a wise choice [127,146]. Besides, recently fast-growing generative adversarial networks would be able to provide another potential solution to fiber imaging systems as well [148]. Overall, learning-based GALOF FIOSs appear to make the best use of both the GALOF's unique properties based on TAL and the learning approach's high performance in solving imaging problems. With more improvements to come, the interplay between the GALOF and the deep learning algorithms have great potential to dramatically enhance the performances of future FOISs. We are very optimistic that our findings contribute to the development of next generation high-fidelity fiber optic imaging systems for basic biomedical research and clinical practice.

AUTHOR CONTRIBUTIONS
JZ conceived the review topic, prepared the data, made the figures, and wrote the first draft. XH, SG, JA-L, RC, and AS provided assistance in data and figure preparations. All authors discussed, revised, and approved the manuscript.

ACKNOWLEDGMENTS
We acknowledge Professor Arash Mafi's assistance in preparing the figures.