Transient Motion Classification Through Turbid Volumes via Parallelized Single-Photon Detection and Deep Contrastive Embedding

Fast noninvasive probing of spatially varying decorrelating events, such as cerebral blood flow beneath the human skull, is an essential task in various scientific and clinical settings. One of the primary optical techniques used is diffuse correlation spectroscopy (DCS), whose classical implementation uses a single or few single-photon detectors, resulting in poor spatial localization accuracy and relatively low temporal resolution. Here, we propose a technique termed Classifying Rapid decorrelation Events via Parallelized single photon dEtection (CREPE), a new form of DCS that can probe and classify different decorrelating movements hidden underneath turbid volume with high sensitivity using parallelized speckle detection from a 32 × 32 pixel SPAD array. We evaluate our setup by classifying different spatiotemporal-decorrelating patterns hidden beneath a 5 mm tissue-like phantom made with rapidly decorrelating dynamic scattering media. Twelve multi-mode fibers are used to collect scattered light from different positions on the surface of the tissue phantom. To validate our setup, we generate perturbed decorrelation patterns by both a digital micromirror device (DMD) modulated at multi-kilo-hertz rates, as well as a vessel phantom containing flowing fluid. Along with a deep contrastive learning algorithm that outperforms classic unsupervised learning methods, we demonstrate our approach can accurately detect and classify different transient decorrelation events (happening in 0.1–0.4 s) underneath turbid scattering media, without any data labeling. This has the potential to be applied to non-invasively monitor deep tissue motion patterns, for example identifying normal or abnormal cerebral blood flow events, at multi-Hertz rates within a compact and static detection probe.

Fast noninvasive probing of spatially varying decorrelating events, such as cerebral blood flow beneath the human skull, is an essential task in various scientific and clinical settings. One of the primary optical techniques used is diffuse correlation spectroscopy (DCS), whose classical implementation uses a single or few single-photon detectors, resulting in poor spatial localization accuracy and relatively low temporal resolution. Here, we propose a technique termed Classifying Rapid decorrelation Events via Parallelized single photon dEtection (CREPE), a new form of DCS that can probe and classify different decorrelating movements hidden underneath turbid volume with high sensitivity using parallelized speckle detection from a 32 × 32 pixel SPAD array. We evaluate our setup by classifying different spatiotemporal-decorrelating patterns hidden beneath a 5 mm tissue-like phantom made with rapidly decorrelating dynamic scattering media. Twelve multi-mode fibers are used to collect scattered light from different positions on the surface of the tissue phantom. To validate our setup, we generate perturbed decorrelation patterns by both a digital micromirror device (DMD) modulated at multi-kilo-hertz rates, as well as a vessel phantom containing flowing fluid. Along with a deep contrastive learning algorithm that outperforms classic unsupervised learning methods, we demonstrate our approach can accurately detect and classify different transient decorrelation events (happening in 0.1-0.4 s) underneath turbid scattering media, without any data labeling. This has the potential to be applied to non-invasively monitor deep tissue motion patterns, for example identifying normal or abnormal cerebral blood flow events, at multi-Hertz rates within a compact and static detection probe.

INTRODUCTION
Non-invasive probing and identification of hemodynamic events deep inside tissue, such as cerebral blood flow (CBF), is essential for both clinical and scientific studies. In the past, numerous optical methods have been developed to detect and monitor CBF, such as diffuse optical spectroscopy (DOS; Gibson and Dehghani, 2009), diffuse optical tomography (DOT; Durduran et al., 2010), functional near-infrared spectroscopy (fNIRS; Ferrari and Quaresima, 2012), and photoacoustic tomography (PAT; Wang and Yao, 2016). These methods typically measure the absorption change caused by blood oxygenation, which is correlated with blood flow change. Recent extension of these methods can probe even deeper into tissue by time-gating multi-scattered light from non-superficial layers (Torricelli et al., 2014), which can also be implemented in the frequency domain using polychromatic measurements (Kholiqov et al., 2020).
Instead of looking at the absorption change, another class of techniques attempt to measure the dynamics directly by recording the temporal fluctuations of scattered light, among which established techniques are optical coherence tomography angiography (OCTA; Spaide et al., 2018) and laser speckle contrast imaging (LASCI; Briers et al., 2013). While there are impressive demonstrations using these methods to create microscopic vascular images close to surface, OCTA and LASCI are not ideal for detecting hemodynamics hidden underneath densely scattering tissue. A primary all-optical technique to noninvasively detect dynamic events deep inside tissue is diffuse correlation spectroscopy (DCS; Durduran and Yodh, 2014). DCS detects hemodynamic events by recording the decorrelation of the light: when coherent light enters thick turbid media, such as tissue, it randomly scatters and produces a speckle pattern. Living tissue is full of microscopic movements, which causes the light to fluctuate, or decorrelate (Brake et al., 2016). Different phenomena (e.g., tissue movement or blood flow) occur at different speeds, which causes the rate of light decorrelation to differ. In the past, DCS has been widely applied to study brain activity and cerebral health by monitoring cerebral blood flow (Buckley et al., 2014). To probe deep inside tissue, DCS needs to sample the fluctuations of a few speckle modes at a very high speed (microsecond sampling periods). Thus, traditional implementations usually use only one or very few fibers to collect light from the surface, with the light from each fiber detected by one or few singlepixel single photon sensitive detectors, such as single photon avalanche detectors (SPADs), or photomultipler tubes (PMTs). However, detecting light from only one surface location limits localization accuracy. Moreover, few photons per speckle mode reach the surface after traveling through highly turbid media. To achieve a sufficient signal-to-noise ratio (SNR), long integration times are thus required to achieve a useful estimation of the light decorrelation, which limits the ability to detect transient biological events. While the previous methods can mechanically translate the DCS probe to measure speckles from different surface locations to improve spatial localization (Han et al., 2015;He et al., 2015), this further increases the data acquisition time, the risk of motion-induced artifacts, and setup complexity.
Recently developed highly parallelized DCS (PaDS) demonstrates that detecting multiple speckles across many optical sensor pixels results in significantly faster correlation sampling rate (Johansson et al., 2019;Sie et al., 2020;Liu W. et al., 2021;Xu S. et al., 2021;Zhou et al., 2021). Further, advances in contrastive representation learning  facilitates the use of deep artificial neural networks to create an embedding space where similar inputs of unique sub-types are clustered together without any data labeling required. As training ground truth labels are usually expensive to acquire in experiments, it is strongly desired to adopt a deep contrastive learning method that works well with unsupervised data (Yang et al., 2017). Building upon these insights, we propose a new technique here, termed Classifying Rapid decorrelation Events via Parallelized single photon dEtection (CREPE), which uses a novel multi-fiber PaDS system based on massive parallel detection using a 32× 32 SPAD array. Figure 1 provides a conceptual illustration of the proposed method. The key features are • The highly parallelized light detection improves the SNR and sensitivity of the DCS, and detecting speckles from multiple surface positions allows localizing and classifying spatiotemporally varying decorrelating patterns. • CREPE is a zero-shot method, meaning it does not require training with labels or external datasets (Xian et al., 2017;Hospedales et al., 2020).
We validate this novel methodology by accurately classifying spatiotemporally varying patterns hidden beneath a 5 mm tissuelike phantom made with rapidly decorrelating scattering media.

Tissue Phantom Design
Figures 2A-C illustrates our speckle sensing probe design. The light source is a 670 nm long coherence length diode-pumped solid-state laser (MSL-FN-671, Opto Engine LLC, USA). We attenuate illumination radiance to below 200 mW/cm 2 to satisfy the safety limit (ANSI, 2014). Light is guided to the phantom surface using a 50µm, 0.22 numerical aperture (NA) multi-mode fiber (MMF). Twelve MMFs are placed circularly around the source at source-detector separations of 9 mm to collect reflected light, then bundled, and imaged to the SPAD array using a single lens system. Each MMF has a 250µm core diameter. An iris diaphragm placed immediately after the lens is used to reduce the numerical aperture of the imaging system to map one speckle to one SPAD pixel on average. A schematic of the speckle imaging system, as well as the sourcedetector positions, can be found in Supplementary Figure 1. Figure 3A plots a typical frame of the speckle patterns captured on the camera, with 12 roughly perceptible circular spots. Figures 2B,C summarize our phantom setup. To create dynamic scattering phantoms that mimic movements within living tissue, we used polysterene microsphere solutions at two different concentrations(4.55 × 10 6 #/mm 3 and 7.58 × 10 6 #/mm 3 ) enclosed in a thin-walled 5-mm thick cuvette. We termed these two scattering volumes as Tissue I and Tissue II, which results in an estimated reduced scattering coefficient of µ ′ s = 0.7mm −1 and experimentally measured absorption coefficient of µ a = 0.01mm −1 for Tissue I, and µ ′ s = 1.2mm −1 , µ ′ s = 0.02mm −1 for Tissue II . These optical properties  closely resemble the optical properties of tissue from human and model organisms, respectively (Durduran et al., 2010;Jacques, 2013). Underneath the tissue phantom, we placed dynamically fluctuating objects that perturb the decorrelation measured at the surface. We considered two different decorrelation perturbation mechanisms. First, we used a fast changing DMD display flipping at multi-kilo-hertz. We used such display as it's easily reconfigurable and can generate various spatial-temporal varying dynamic scattering patterns that induce additional decorrelation similar to biological phenomena, such as blood flow . Second, we placed two plastic tubes containing the same solution flowing at constant rates. The speed of the Shows the data processing method, where the autocorrelation from each SPAD pixel were computed, and averaged across each fiber position to generate a set of curves for each decorrelating event.
FIGURE 4 | Proposed deep clustering method for zero-shot decorrelation event classification. The network contains a stacked auto-encoder that transfers the input data into a latent low-dimension space, then reconstructs the input data from the latent features. A clustering module is used to impact the network weights update to form a classification friendly low-dimension space. Overall, the network is trained with the loss function at the bottom of the figure, with all the variables explained at the end of the Section 2.2.
flowing liquid inside the tube was controlled with two syringe drivers (New Era, US1010). While this is not as versatile as the DMD, in this way we were able to create more biologically realistic events by mimicking blood vessels. To measure the light fluctuation from different surface locations, we used a 12-fiber-detector PaDS system carefully described in . Figure 2D shows a picture of the PaDS probe we used. Figure 2E plots the photon sensitive region simulated using a Monte Carlo method (Jönsson and Berrocal, 2020). Source and detector geometry is labeled.

Data Processing
To generate a data point per decorrelation event, the temporal autocorrelation for each fiber location was estimated. Although there are other ways to compute temporal statistics across a SPAD array Jazani et al., 2019), this perpixel method is robust and widely used (Johansson et al., 2019;Sie et al., 2020;Liu W. et al., 2021). Figure 3A illustrates several representative frames captured by the SPAD camera, sampling at 667 kHz (1.5 µs sampling period), in which the speckles in each pixel fluctuated rapidly. We first computed the normalized temporal intensity autocorrelation (Durduran and Yodh, 2014) of each pixel as g p,q where I p,q (t) is the number of photons detected by the q-th SPAD for p-th fiber at time t; τ is the time delay, and · T int computes time-average estimated by integrating over T int . After calculating g p,q 2 (τ ) for every single SPAD, we can obtain an Frontiers in Neuroscience | www.frontiersin.org ensemble-averaged, noise-reduced autocorrelation g p 2 (τ ) for each fiber position by averaging g p,q 2 (τ ) that are collected by the Q p unique SPADs detecting light emitted by the same multi-mode detection fiber, for the pth multi-mode fiber (MMF). We use a look-up table to identify the Q p SPADs within the array that receives light from the pth MMF. Next, we compile the g p 2 (τ ) from each fiber into a set of 12 average intensity autocorrelation curves per decorrelation event, {x i } i=1,2,..,N , for N events of interest, and aim to classify these event measurements into K categories. While one could use a simple clustering method such as k-means, the high dimensionality inherent to PaDS data benefits from dimensionality reduction. Recent advances in deep unsupervised learning demonstrate that a non-linear transform, such as an artificial neural network, can generate clustering-friendly embedding for state-of-the-art classification results when jointly trained with the cluster module (Aljalbout et al., 2018). Therefore, we proposed to use a deep clustering network (DCN; Yang et al., 2017) to learn a low-dimension representation of the PaDS data for classification, as detailed in Figure 4. The DCN contains a stacked autoencoder, consisting of an encoder f θ (·) that embeds the PaDS data into a low-dimension manifold before a decoder g θ (·) maps the embedding back to the original space of the data point. A k-means++ clustering module (Arthur and Vassilvitskii, 2006) is connected to the dimension-reduced latent features of the network, aiming to help weights update to separate the data points in the low-dimension space. Mathematically, the problem can be formulated by the cost function where s i is the one-hot assignment vector for x i , picking up one-column from M. The k-th column of M represents the centroid of the k-th cluster. s i,j stands for the j-th element of s i . The first ℓ 2 loss here is the data fidelity term, which ensures the "bottleneck" contains information to reconstruct the high-dimension autocorrelation curves. The contrastive kmeans clustering-specific loss help separate the data points in the embedding space. To jointly optimize the two parts of loss, we alternate between updating the autoencoder weights using stochastic gradient, and finding new centroids for clusters.

RESULTS
We created three datasets as a first validation of our new method, to evaluate the performance in separating spatial, temporal, and spatio-temporal varying decorrelating events. We first displayed 800 spatially different patterns, in this case, handwritten letters from the EMNIST dataset (four classes: "D, " "U, " "K, " "E"; 200 examples of each) onto the 10.6 × 13.9mm 2 fixed DMD area. Some representative patterns are shown in Figure 5A. We attempted to separate these decorrelation patterns into their categories using both proposed DCN method and t-distributed stochastic neighbor embedding (TSNE) (Van der Maaten and Hinton, 2008), a widely used classic dimension reduction method. The decorrelation patterns were placed underneath 5 mm turbid volume described in Section 2.1. Figure 5B plots two of the eight reduced-dimensions from the 800 events using proposed method. These data points were generated by decorrelation events hidden under 5 mm turbid volume and the autocorrelations were computed using a 0.4 s integration time. Figure 5C summarizes the classification accuracy of both methods at two different integration times. We see that both methods (TSNE and proposed) can classify the decorrelation events with accuracy higher than chance (25% accuracy for quaternary classifications), but the proposed method performs better. We note that the classification accuracy for events hidden beneath Tissue I (µ ′ s = 1.2mm −1 , µ a = 0.02mm −1 , close to human tissue optical property) are lower than for Tissue II (µ ′ s = 0.7mm −1 , µ a = 0.01mm −1 , close to model organisms tissue properties). This is because the sensitivity of our PaDS method in detecting fast, small decorrelation events decreases as the scattering scene becomes more turbid . Additionally, while reduced integration allows identification of more transient events, the accuracy when using 0.2 s integration time is less than when using 0.4 s.
Next, we presented 800 spatio-temporally varying patterns containing two differently sized circles onto the DMD display (as shown in Figure 6A). Similarly, we plotted two of the eight reduced dimensions using both TSNE and proposed method (Figures 6B,C). Again, these data points were generated by computing the autocorrelations using 0.4 s integration time.
We see the method performs better at classifying two circles of different sizes and speeds than classifying the letters, due to the fact that the perturbed decorrelation areas covered by the two circles are larger than the those of the letters.
We then we applied our method to classify temporally varying patterns generated using two 3 mm tubes ( Figure 7A). The dynamic scattering fluid in the tubes either did not flow, or flowed at 1.4 and 0.7 mm/s (as reference, human arterial blood flow at 4.9-19 cm/s, while venous blood flow at 1.5-7.1 cm/s; Klarhöfer et al., 2001), driven by two syringe pumps. This resulted in nine different possible combinations ( Figure 7A). We generated 100 decorrelation events for each category, resulting in 900 data points. As the perturbations generated using fluid dynamics were more noticeable than the DMD, we only show results using Tissue II. Figure 7B plots two of the eight reduced dimensions of the 900 data points using both methods at 0.2 s integration time. Figure 7C summarizes the accuracy of both methods using 0.1 and 0.2 s integration time.
Finally, we conduct a study attempt to evaluate the performance of the proposed method for separating flowing scatter at different speed contrast embedded in different depths underneath tissue phantom. Figure 8A shows the decorrelation events we attempted to separate. Those events were generated by placing two 3 mm diameter tubes filled with scattering volume placed underneath 5-8 liquid phantom. The scattering liquid in the tube either did not flow, or flowed at 1.0 and 2.0 mm/s, driven by two syringe pumps. Figures 8B-D plots two of the eight dimensions of the embedding using proposed method, when the perturbation is underneath 5-8 mm. Figure 8E Barplots of the classification accuracy of proposed method using 0.2 s integration time. This suggests the method can separate decorrelating events generated with high flow rate better, and the performance degrades when the perturbation is placed 8 mm underneath. This is due to the fact that the source-detector separation we use is less sensitive to deeper tissue regions.

DISCUSSION
In summary, we developed CREPE, a parallelized, fast, sensitive photon sensing method that records the speckle fluctuations from 12 unique tissue surface positions, along with a deep embedding processing software that can separate the decorrelation events occurring underneath turbid volumes. As a first demonstration, we showed that our approach can detect and categorize various transient movement perturbations through rapidly decorrelating dynamic scattering tissue phantoms. Our method does not require expensive data labels to train the network, and therefore has a great potential to be applied in clinical in vivo studies. To ensure effective clinical translation, there are several improvements that can be made to both the system design and processing algorithm. First, as shown in camera images in Figure 3, the detection fiber bundle we use did not map surface speckles to all 32 × 32 SPAD pixels to maximize the speckle detection efficiency. Future work should strive to custom-design a fiber bundle that provides better array coverage. In addition, while small source-detector separations used in this work give a better spatial resolution for close to surface regions, it prevents the proposed method from being applied to deeper tissue monitoring applications. Integrating recently developed time-of-flight methods (Sutin et al., 2016;Kholiqov et al., 2020) should be considered to improve CREPE for detecting deeper tissue signals without compromising spatial resolution. In addition, we hope the proposed fast parallelized speckle sensing method can be easily adapted to existing established, such as diffuse correlation tomography and speckle contrast optical tomography, to form three-dimension quantitative blood flow images in high speed (Zhou et al., 2006;Varma et al., 2014;Mazdeyasna et al., 2018;Ren et al., 2020). Moreover, as the proposed method relies on the entire set of autocorrelation curves rather than fitted values assuming relatively simple geometry of the heads, we expect the method can be used to separate different deep tissue dynamic events from both semi-infinite and non-semiinfinite geometries. However, these autocorrelation curves might change across different subjects. Therefore, developing a multidistance multi-wavelength DCS system (Tamborini et al., 2017) that can simultaneously measure baseline optical properties can potentially improve the robustness of the current method for cross-subject studies. Further, while it is difficult to further increase the SPAD array sampling rate, which is required to record light traveling longer distances, we expect pixel-count for monolithic CMOS SPAD arrays to continue to rise (e.g., one megapixel SPAD arrays are now available; Canon, 2021). This provides promising opportunities to utilize spatial speckle statistics to help understand decorrelation events occurring deep in tissue Xu et al., 2022). Integrating CREPE with these speckle contrast methods on a SPAD array with higher pixel counts should be investigated to ensure reliable translation into clinical use.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
SX, WL, XY, RQ, KK, and PK constructed the hardware setup. SX, WL, and JJ designed the software. SX, KZ, LK, EB, PM, and RH wrote the manuscript. HW, EB, SH, and RH supervised the project. All authors contributed to the article and approved the submitted version.